This notebook has been adapted from... 

https://github.com/callysto/basketball-and-data-science/blob/main/content/01-introduction.ipynb, with permmission.

(Open in 
[Callysto](https://hub.callysto.ca/jupyter/hub/user-redirect/git-pull?repo=https://github.com/pbeens/Data-Analysis&branch=main&subPath=BADS/01-Intro/01-03-sorting-data.ipynb&depth=1) | [Colab](https://githubtocolab.com/pbeens/Data-Analysis/blob/main/BADS/01-Intro/01-03-sorting-data.ipynb)) 

# Let’s Get Our Data

In [1]:
import pandas as pd

# URL of the CSV file containing data for Pascal Siakam
url = 'https://raw.githubusercontent.com/callysto/basketball-and-data-science/main/content/data/nba-players/Pascal_Siakam.csv'

# Read the CSV file into a pandas DataFrame
df = pd.read_csv(url)

# Display the DataFrame
display(df)

Unnamed: 0,Season,Age,Tm,Lg,Pos,G,GS,MP,FG,FGA,...,FT%,ORB,DRB,TRB,AST,STL,BLK,TOV,PF,PTS
0,2016-17,22.0,TOR,NBA,PF,55,38,15.6,1.9,3.7,...,0.688,1.2,2.2,3.4,0.3,0.5,0.8,0.6,2.0,4.2
1,2017-18,23.0,TOR,NBA,PF,81,5,20.7,3.1,6.1,...,0.621,1.0,3.5,4.5,2.0,0.8,0.5,0.8,2.0,7.3
2,2018-19,24.0,TOR,NBA,PF,80,79,31.9,6.5,11.8,...,0.785,1.6,5.3,6.9,3.1,0.9,0.7,1.9,3.0,16.9
3,2019-20,25.0,TOR,NBA,PF,60,60,35.2,8.3,18.4,...,0.792,1.1,6.3,7.3,3.5,1.0,0.9,2.5,2.8,22.9
4,2020-21,26.0,TOR,NBA,PF,56,56,35.8,7.8,17.2,...,0.827,1.7,5.5,7.2,4.5,1.1,0.7,2.3,3.1,21.4
5,2021-22,27.0,TOR,NBA,PF,68,68,37.9,8.8,17.8,...,0.749,1.9,6.6,8.5,5.3,1.3,0.6,2.7,3.3,22.8
6,2022-23,28.0,TOR,NBA,PF,71,71,37.4,8.9,18.5,...,0.774,1.8,6.0,7.8,5.8,0.9,0.5,2.4,3.2,24.2
7,Career,,,NBA,,471,377,30.6,6.5,13.2,...,0.774,1.5,5.1,6.5,3.5,0.9,0.7,1.9,2.8,17.0


As a reminder, here are our columns:

|Column|Meaning|
|-|-|
|Age|Player's age on February 1 of the season|
|Lg|League|
|Pos|Position|
|G|Games|
|GS|Games Started|
|MP|Minutes Played Per Game|
|FG|Field Goals Per Game|
|FGA|Field Goal Attempts Per Game|
|FG%|Field Goal Percentage|
|3P|3-Point Field Goals Per Game|
|3PA|3-Point Field Goal Attempts Per Game|
|3P%|3-Point Field Goal Percentage|
|2P|2-Point Field Goals Per Game|
|2PA|2-Point Field Goal Attempts Per Game|
|2P%|2-Point Field Goal Percentage|
|eFG%|Effective Field Goal Percentage*|
|FT|Free Throws Per Game|
|FTA|Free Throw Attempts Per Game|
|FT%|Free Throw Percentage|
|ORB|Offensive Rebounds Per Game|
|DRB|Defensive Rebounds Per Game|
|TRB|Total Rebounds Per Game|
|AST|Assists Per Game|
|STL|Steals Per Game|
|BLK|Blocks Per Game|
|TOV|Turnovers Per Game|
|PF|Personal Fouls Per Game|
|PTS|Points Per Game|

<span style="font-size:10px">*This statistic adjusts for the fact that a 3-point field goal is worth one more point than a 2-point field goal.</span>

# Dropping a Line by Index Number

Notice we still have that last line, "Career"? Let's drop that.

Look closely above and you'll see that it's index #7. 

In [None]:
display(df.drop(7))

What happens if we change the "7" to another number? Try it! (Then change it back to 7!)

Let's display the dataframe again, to see if the change is permanent.

In [None]:
display(df)

Nope, the Careers line is still there. 

Let's now tell the program we want to make this permanent by using the `inplace=True` argument in `drop()`.

In [None]:
df.drop(7, inplace=True)

display(df)

That's better!

# Sorting

What if we want to sort the data by personal fouls?

In [None]:
display(df.sort_values('PF'))

What if we want them descending instead of ascending? Simply add `ascending=False` to the arguments in `sort_values()`. (The default is `ascending=True`)

In [None]:
df.sort_values('PF', ascending=False)

Let's sort on two columns, for example first by Blocks Per Game and then by Steals Per Game. Notice that now we put the column names in a list (**[ ]**).



In [None]:
df.sort_values(['BLK', 'STL'])

Let's reduce the columns we're looking at and save it in a new dataframe, then work with that from now on.

In [None]:
df_2 = df[['G', 'GS', 'MP', 'FG', 'FGA']]

display(df_2)

# Exercise

Modify the program below to only display the columns 'Season', 'FG%', '2P%', and '3P%' sorted by 'FG%'.

In [None]:
import pandas as pd

url = 'https://raw.githubusercontent.com/callysto/basketball-and-data-science/main/content/data/nba-players/Pascal_Siakam.csv'

df = pd.read_csv(url)

display(df)

# Extra Challenge

Produce this graph using the code stub below.

![raptors-2023-top-5-points.png](attachment:raptors-2023-top-5-points.png)

In [None]:
import pandas as pd
import plotly_express as px

url = r'https://raw.githubusercontent.com/pbeens/Data-Analysis/main/Data/raptors-2023.csv'

# put the rest of the code here!

Next Lesson: [Bar Graphs](../02-visualize/02-01-bar-graphs.ipynb) ([GitHub link](https://github.com/pbeens/Data-Analysis/blob/f74aee1f8912a8a1e80ec13c277203f62bebadc2/BADS/02-visualize/02-01-bar-graphs.ipynb))