# DataFrames

DataFrames are the workhorse of pandas and are directly inspired by the R programming language. We can think of a DataFrame as a bunch of Series objects put together to share the same index. Let's use pandas to explore this topic!

In [2]:
import pandas as pd
import numpy as np

In [3]:
from numpy.random import randn
np.random.seed(101)

In [104]:
ri = pd.read_csv("RI_clean.csv")

In [126]:
ri.head()

Unnamed: 0,date_and_time,police_department,driver_gender,driver_age_raw,driver_age,driver_race,violation,search_conducted,contraband_found,stop_outcome,is_arrested,stop_duration,out_of_state,drugs_related_stop,district
0,2005-01-02 01:55:00,600,M,1985.0,20.0,White,Speeding,False,False,Citation,False,0-15 Min,False,False,Zone K1
1,2005-01-02 20:30:00,500,M,1987.0,18.0,White,Speeding,False,False,Citation,False,16-30 Min,False,False,Zone X4
3,2005-01-04 12:55:00,500,M,1986.0,19.0,White,Equipment,False,False,Citation,False,0-15 Min,False,False,Zone X4
4,2005-01-06 01:30:00,500,M,1978.0,27.0,Black,Equipment,False,False,Citation,False,0-15 Min,False,False,Zone X4
5,2005-01-12 08:05:00,0,M,1973.0,32.0,Black,Other,False,False,Citation,False,30+ Min,True,False,Zone X1


## Selection and Indexing

Let's learn the various methods to grab data from a DataFrame

In [None]:
# Pass a column name
ri['driver_age_raw']

In [None]:
# Pass a list of column names
ri[['driver_age_raw','driver_age']]

**DataFrame Columns are just Series**

In [123]:
type(ri['driver_age_raw'])

pandas.core.series.Series

**Creating a new column:**

In [None]:
ri['zone'] = ri['district'] + ri['police_department']

In [None]:
ri.columns

**Removing Columns**

In [None]:
ri.drop('zone', axis=1)

In [None]:
# Not inplace unless specified!
ri.head()

In [None]:
ri.drop('zone', axis=1, inplace=True)

In [None]:
ri.head()

**Selecting Rows**

select rows based off of position aka 'index'

In [None]:
df.iloc[2]

### Conditional Selection

An important feature of pandas is conditional selection using bracket notation, very similar to numpy:

In [140]:
ri.columns

Index(['date_and_time', 'police_department', 'driver_gender', 'driver_age_raw',
       'driver_age', 'driver_race', 'violation', 'search_conducted',
       'contraband_found', 'stop_outcome', 'is_arrested', 'stop_duration',
       'out_of_state', 'drugs_related_stop', 'district'],
      dtype='object')

In [131]:
over_21 = ri['driver_age'] > 21

In [None]:
ri[over_21].head()

In [None]:
ri[over_21]['violation']

In [None]:
ri[over_21][['violation','search_conducted']]

For two conditions you can use | and & with parenthesis:

In [None]:
# and = & in pandas
white_over_21 = ri[(ri['driver_age'] > 21) & (ri['driver_race'] == 'White')]
white_over_21.head()

In [None]:
# or = | in pandas
citation_or_arrest = ri[(ri['stop_outcome'] == 'Citation') | (ri['is_arrested'] == True)]
citation_or_arrest