# A Quick Guide To Pandas

This notebook serves as a quick overview to key features when using the Pandas library.  I find myself repeatedly searching for the same commands on stackoverflow so this guide serves to consolidate those most commonly used for quick access.  For this example I am using the "NCHS - Leading Causes of Death: United States" dataset from data.gov

### Importing Pandas into your project/notebook

Simply bring Pandas into your project with the below import statement

In [42]:
import pandas as pd

### Loading CSV Files

CSV files can be loaded into a dataframe via the .read_csv method.  Since Pandas 0.19.2, you can directly pass the url 

In [43]:
df=pd.read_csv("https://data.cdc.gov/api/views/bi63-dtpu/rows.csv?accessType=DOWNLOAD")

### Exploring Data

The command can be used to quickly view the first few records of the dataframe (default is 5 records)

In [44]:
df.head()

Unnamed: 0,Year,113 Cause Name,Cause Name,State,Deaths,Age-adjusted Death Rate
0,1999,"Accidents (unintentional injuries) (V01-X59,Y8...",Unintentional Injuries,Alabama,2313.0,52.2
1,1999,"Accidents (unintentional injuries) (V01-X59,Y8...",Unintentional Injuries,Alaska,294.0,55.9
2,1999,"Accidents (unintentional injuries) (V01-X59,Y8...",Unintentional Injuries,Arizona,2214.0,44.8
3,1999,"Accidents (unintentional injuries) (V01-X59,Y8...",Unintentional Injuries,Arkansas,1287.0,47.6
4,1999,"Accidents (unintentional injuries) (V01-X59,Y8...",Unintentional Injuries,California,9198.0,28.7


Calling list on your dataframe will extract the column names of your dataframe

In [45]:
list(df)

['Year',
 '113 Cause Name',
 'Cause Name',
 'State',
 'Deaths',
 'Age-adjusted Death Rate']

### Filtering Data

You can filter a dataframe via df.loc[df['column_name'] == some_value]

This is similar to a SQL statement such as SELECT * FROM TABLE WHERE COLUMN = some value

Similar operator functions exist such as greater than, less than, etc...

In [46]:
df.loc[df['State'] == 'Connecticut'].head()

Unnamed: 0,Year,113 Cause Name,Cause Name,State,Deaths,Age-adjusted Death Rate
6,1999,"Accidents (unintentional injuries) (V01-X59,Y8...",Unintentional Injuries,Connecticut,1034.0,29.3
59,2000,"Accidents (unintentional injuries) (V01-X59,Y8...",Unintentional Injuries,Connecticut,1175.0,32.8
112,2001,"Accidents (unintentional injuries) (V01-X59,Y8...",Unintentional Injuries,Connecticut,1056.0,29.5
165,2002,"Accidents (unintentional injuries) (V01-X59,Y8...",Unintentional Injuries,Connecticut,1182.0,32.5
218,2003,"Accidents (unintentional injuries) (V01-X59,Y8...",Unintentional Injuries,Connecticut,1112.0,30.2


You can select the data for a specific row with df['COLUMN NAME'] or via dot notation df.COLUMNNAME

In [47]:
df['Cause Name'].head()

0    Unintentional Injuries
1    Unintentional Injuries
2    Unintentional Injuries
3    Unintentional Injuries
4    Unintentional Injuries
Name: Cause Name, dtype: object

In [48]:
df.Year.head()

0    1999
1    1999
2    1999
3    1999
4    1999
Name: Year, dtype: int64

Note that the data is returned in a pandas series object

In [49]:
type(df['Cause Name'])

pandas.core.series.Series

### Iterating through a dataframe

You can iterate through a dataframe by using the .iterrows() command.  This is helpful for keeping track of values or updating values.  For example you may have two datasets in which you wish to update values of records in one dataset based on values in another.

In [50]:
for index, row in df.iterrows():
    # Do some stuff
    break