# Sorting Our Data

This notebook has been adapted from... 

https://github.com/callysto/basketball-and-data-science/blob/main/content/01-introduction.ipynb, with permmission.

(Open in 
[Callysto](https://hub.callysto.ca/jupyter/hub/user-redirect/git-pull?repo=https://github.com/pbeens/Data-Dunkers&branch=main&subPath=Demos/01-03-sorting-data.ipynb&depth=1) | [Colab](https://githubtocolab.com/pbeens/Data-Dunkers/blob/main/Demos/01-03-sorting-data.ipynb)) 

## Let’s Get Our Data

In [None]:
import pandas as pd

# URL of the CSV file containing data for Pascal Siakam
url = 'https://raw.githubusercontent.com/pbeens/Data-Dunkers/main/Data/Pascal_Siakam.csv'

# Read the CSV file into a pandas DataFrame
df = pd.read_csv(url)

# Display the DataFrame
display(df)

You can view the raw CSV file [here](https://raw.githubusercontent.com/pbeens/Data-Dunkers/main/Data/Pascal_Siakam.csv).

As a reminder, here are our columns:

| Field Name | Definition | Field Name | Definition |
|---|---|---|---|
| AST | The total number of assists a player has made. | FTM | The total number of free throws the player has made. |
| BLK | The total number of opponent shots a player has deflected or prevented. | GP | The number of games in which the player has appeared. |
| DREB | The total number of rebounds a player has grabbed on the defensive end. | GS | The number of games in which the player was in the starting lineup. |
| FG_PCT | The percentage of field goal attempts that are successful. | MIN | The total number of minutes the player has played. |
| FG2_PCT | The percentage of two-point field goal attempts that are successful. | OREB | The total number of rebounds a player has grabbed on the offensive end. |
| FG2A | The total number of two-point field goal attempts by the player. | PF | The total number of personal fouls committed by the player. |
| FG2M | The total number of two-point field goals a player has made. | PLAYER_AGE | The age of the player. |
| FG3_PCT | The percentage of three-point field goal attempts that are successful. | PTS | The total number of points a player has scored. |
| FG3A | The total number of three-point field goal attempts by the player. | REB | The total number of rebounds (offensive + defensive) a player has collected. |
| FG3M | The total number of three-point field goals a player has made. | SEASON_ID | The identifier for the basketball season. |
| FGA | The total number of field goal attempts by the player. | STL | The total number of times a player has successfully taken the ball away from an opponent. |
| FGM | The total number of field goals a player has made. | TEAM_ABBREVIATION | The abbreviated name of the team. |
| FT_PCT | The percentage of free throw attempts that are successful. | TEAM_ID | A unique identifier for the team. |
| FTA | The total number of free throw attempts by the player. | TOV | The total number of times a player loses possession of the ball. | 

## Dropping a Line by Index Number

Notice we still have that last line, that has "TOT" for the Team Abbreviation? Let's drop that.

If you look closely and you'll see that it's index #9. 

Here is the code to delete (or "drop") that line (or "index number"):

In [None]:
display(df.drop(9))

What happens if we change the "9" to another number? Try it!

Let's display the dataframe again, to see if the change is permanent.

In [None]:
display(df)

It's not. The Careers line is still there. 

Let's tell the program we want to make this permanent by asigning it back into the dataframe.

In [None]:
df = df.drop(9)

display(df)

That's better!

## Sorting

What if we want to sort the data by steals (STL)?

In [None]:
display(df.sort_values('STL'))

What if we want them descending instead of ascending? Simply add `ascending=False` to the arguments in `sort_values()`. (The default, if you don't tell it otherwise, is `ascending=True`)

In [None]:
df.sort_values('STL', ascending=False)

Let's reduce the columns we're looking at and save it in a new dataframe named `df_2`. 

If we were to continue working with just these columns we then work with `df_2` from now on.

In [None]:
df_2 = df[['SEASON_ID', 'TEAM_ABBREVIATION', 'GP', 'GS', 'BLK', 'STL', 'MIN', 'FGM', 'FGA']]
display(df_2)

We'll work with `df_2` from now on.

Let's sort on two columns now, for example first by Blocks Per Game (BLK) and then by Steals Per Game (STL). Notice that we must put the column names in a list (`[ ]`).

In [None]:
df_2.sort_values(['BLK', 'STL'])

Can you confirm that it sorted correctly? 

## Exercise

Modify the program below to only display the columns 'SEASON_ID', 'FG_PCT_', 'FG2_PCT', and 'FG3_PCT' sorted by 'FG_PCT'.

In [None]:
import pandas as pd

url = 'https://raw.githubusercontent.com/pbeens/Data-Dunkers/main/Data/Pascal_Siakam.csv'

df = pd.read_csv(url)

# Add your code here

display(df)

## Extra Challenge

Produce this graph using the code stub below. You can view the raw data [here](https://raw.githubusercontent.com/pbeens/Data-Analysis/main/Data/raptors-2023.csv).

![raptors-2023-top-5-points.png](https://raw.githubusercontent.com/pbeens/Data-Analysis/Images/raptors-2023-top-5-points.png)



In [None]:
import pandas as pd
import plotly_express as px

url = r'https://raw.githubusercontent.com/pbeens/Data-Analysis/main/Data/raptors-2023.csv'

# put the rest of the code here!

---
Next Lesson: [Adding New Columns](03-03-new-columns.ipynb) ([GitHub link](https://github.com/pbeens/Data-Dunkers/blob/main/Demos/03-03-new-columns.ipynb))