# Filtering Our Data (Hoops Activity)

Open in [Callysto](https://hub.callysto.ca/jupyter/hub/user-redirect/git-pull?repo=https://github.com/pbeens/Data-Dunkers&branch=main&subPath=Demos/hoops_filtering_data.ipynb&depth=1) | [Colab](https://githubtocolab.com/pbeens/Data-Dunkers/blob/main/Demos/hoops_filtering_data.ipynb).


# Lesson Objectives

By the end of this lesson, students will be able to:
- Use Pandas to load and display data from an external CSV file.
- Filter DataFrame rows based on specific conditions to exclude irrelevant data, such as seasons beyond a certain year.
- Reduce the complexity of data by filtering out unnecessary columns, focusing on relevant statistics.
- Apply multiple conditions to filter data, such as filtering for specific performance metrics like free throw percentage and game starts.
- Combine filters using logical operators (`&` for "and", `|` for "or") to refine data analysis further.
- Execute and interpret basic comparison operations in Python for data filtering, enhancing their data manipulation skills.

## Let’s Get Our Data

In [1]:
import pandas as pd

# URL of the CSV file containing data for Pascal Siakam
url = 'https://raw.githubusercontent.com/pbeens/Data-Dunkers/main/Data/Pascal_Siakam.csv'

# Read the CSV file into a pandas DataFrame
df = pd.read_csv(url)

# Display the DataFrame
display(df)

Unnamed: 0,PLAYER_ID,SEASON_ID,LEAGUE_ID,TEAM_ID,TEAM_ABBREVIATION,PLAYER_AGE,GP,GS,MIN,FGM,...,REB,AST,STL,BLK,TOV,PF,PTS,FG2M,FG2A,FG2_PCT
0,1627783,2016-17,0,1610612761,TOR,23.0,55,38,859.0,103,...,185,17,26,45,33,109,229,102,198,0.515
1,1627783,2017-18,0,1610612761,TOR,24.0,81,5,1679.0,253,...,364,159,62,42,67,166,589,224,366,0.612
2,1627783,2018-19,0,1610612761,TOR,25.0,80,79,2548.0,519,...,549,248,73,52,154,241,1354,440,731,0.602
3,1627783,2019-20,0,1610612761,TOR,26.0,60,60,2110.0,500,...,439,207,61,53,148,170,1371,369,739,0.499
4,1627783,2020-21,0,1610612761,TOR,27.0,56,56,2006.0,437,...,405,250,64,37,130,174,1196,364,715,0.509
5,1627783,2021-22,0,1610612761,TOR,28.0,68,68,2578.0,596,...,580,360,85,42,181,225,1551,521,989,0.527
6,1627783,2022-23,0,1610612761,TOR,29.0,71,71,2652.0,630,...,556,415,65,36,169,228,1720,537,1026,0.523
7,1627783,2023-24,0,1610612761,TOR,29.0,39,39,1354.0,325,...,246,190,32,10,83,87,865,279,478,0.584
8,1627783,2023-24,0,1610612754,IND,29.0,24,24,786.0,203,...,168,102,17,8,34,62,494,175,297,0.589
9,1627783,2023-24,0,0,TOT,29.0,63,63,2140.0,528,...,414,292,49,18,117,149,1359,454,775,0.586


## Only Include Seasons with the Raptors

There are a few things you will notice from the data above. The first is that it includes data from the year Pascal Siakam was traded to the Indiana Pacers, as well as a line that has totals and averages for his career. To eliminate these, we can simply make sure we only include anything up to and including the 2022-23 season.

These are the symbols we use for comparison operations in Python:

|Symbol|Meaning|
|-|-|
|>|greater than|
|<|less than|
|==|is equal to|
|!=|not equal to|
|>=|greater than or equal to|
|<=|less than or equal to|
|&|and|
|\||or|

In [2]:
# Filter the DataFrame to exclude seasons after 2022-23
filter = df['SEASON_ID'] <= '2022-23'
df = df[filter]

display(df)

Unnamed: 0,PLAYER_ID,SEASON_ID,LEAGUE_ID,TEAM_ID,TEAM_ABBREVIATION,PLAYER_AGE,GP,GS,MIN,FGM,...,REB,AST,STL,BLK,TOV,PF,PTS,FG2M,FG2A,FG2_PCT
0,1627783,2016-17,0,1610612761,TOR,23.0,55,38,859.0,103,...,185,17,26,45,33,109,229,102,198,0.515
1,1627783,2017-18,0,1610612761,TOR,24.0,81,5,1679.0,253,...,364,159,62,42,67,166,589,224,366,0.612
2,1627783,2018-19,0,1610612761,TOR,25.0,80,79,2548.0,519,...,549,248,73,52,154,241,1354,440,731,0.602
3,1627783,2019-20,0,1610612761,TOR,26.0,60,60,2110.0,500,...,439,207,61,53,148,170,1371,369,739,0.499
4,1627783,2020-21,0,1610612761,TOR,27.0,56,56,2006.0,437,...,405,250,64,37,130,174,1196,364,715,0.509
5,1627783,2021-22,0,1610612761,TOR,28.0,68,68,2578.0,596,...,580,360,85,42,181,225,1551,521,989,0.527
6,1627783,2022-23,0,1610612761,TOR,29.0,71,71,2652.0,630,...,556,415,65,36,169,228,1720,537,1026,0.523
