# What can we do with Pandas?

In this example, we'll be working w/ data from the 2023-24 Boston Celtics

In [None]:
# Before we analyze anything, we need to import pandas
import pandas as pd

### Loading data from a csv file

We can load data into Pandas from a csv (comma-separated variable) file. This data represents the Celtics roster.

In [None]:
celtics = pd.read_csv('boston_celtics_2023_2024.csv')

### Selecting Data (Previewing)

Let's examine the first 10 rows of our data.

In [None]:
celtics.head(10)

### Inspecting the structure of the data frame.

Let's see what the data looks like.

In [None]:
celtics.info()

### Selecting Data by column

What colleges did the team go to? 

In [None]:
celtics.college


Let's inspect the data types

In [None]:
print(type(celtics))
print(type(celtics.college))

### Selecting Multiple Columns

Well, that isn't useful. Let's add the player names too. 

In [None]:
celtics[['player','college']]

Let's check the data type again. (HINT: It's different when selecting multiple columns!) 

In [None]:
type(celtics[['player','college']])

### Selecting Rows

iloc is a way to select rows based on integer location. Let's select Jaylen Brown.

In [None]:
celtics.iloc[4]

We can also select them using python's **slice** notation. The second number is *non-inclusive*

In [None]:
celtics.iloc[2:7]

### Selecting Rows by Logic

Who's our fives? (Centers)

In [None]:
celtics[celtics.position == 'C']

Who has a birthday coming up? 

In [None]:
celtics[celtics.birth_date.str.contains('June')]

Who plays guard?

In [None]:
celtics[(celtics.position == 'PG') | (celtics.position =='SG')]

Which players weren't born in the US, but attended college? 

In [None]:
celtics[(celtics.country_code != 'us') & (celtics.college)]

[HINT: The query above is looking for a defined cell, let's see what happens we find an undefined cell)

In [None]:
# This is a crappy way to hunt down NaNs. 
celtics[celtics.isnull().any(axis=1)]

Who went to college in California? 

In [None]:
celtics[celtics.college.isin(['California','UCLA','USC'])]

### Setting Indices

(This is using pandas' loc, not iloc)

Let's set the starting lineup!  

In [None]:
starting_lineup = celtics.loc[[2,3,4,5,8]]
starting_lineup

We're going to use this starting a lineup a lot. Wouldn't it be nice if we could update the indexes? 

In [None]:
new_starting_lineup = starting_lineup.reset_index()
new_starting_lineup

Hmm. So that's cool, but now I have a wasted data frame. 'starting_lineup' didn't update...

In [None]:
starting_lineup

Let's fix that. 

In [None]:
starting_lineup.reset_index(inplace=True)
starting_lineup

That's much better. I don't really need that second index column. Get outta here. 

In [None]:
starting_lineup.reset_index(drop=True, inplace=True)
starting_lineup

HA! Tricked you. That doesn't work. Those columns are still there. It's a bit of an either or. 

This makes **inplace=True** obsolete when you don't need the index column, because the solution is to reassign to the same variable...

In [None]:
starting_lineup = celtics.loc[[2,3,4,5,8]]
starting_lineup = starting_lineup.reset_index(drop=True)
starting_lineup