# DataFrames

DataFrames are the workhorse of pandas and are directly inspired by the R programming language. We can think of a DataFrame as a bunch of Series objects put together to share the same index. Let's use pandas to explore this topic!

In [1]:
import pandas as pd
import numpy as np

<br>

## <span style="color:blue">1. Loading Data and Initial Exploration</span>

#### `.read_csv()`: Load `csv` data into `pandas`.

```python
ri = pd.read_csv("RI_clean.csv")
```

In [3]:
ri = pd.read_csv("RI_clean.csv")

## Data frame attributes 

#### `.shape`: How many rows? How many columns?

`(rows, columns)`

```python
ri.shape
```

In [4]:
ri.shape

(509681, 16)

#### `.dtypes`: List the `dtype` of each column.

```python
ri.dtypes
```

In [5]:
ri.dtypes

date_and_time          object
police_department      object
driver_gender          object
driver_age_raw        float64
driver_age            float64
driver_race            object
violation              object
search_conducted       object
search_type            object
contraband_found         bool
stop_outcome           object
is_arrested            object
stop_duration          object
out_of_state           object
drugs_related_stop       bool
district               object
dtype: object

In [6]:
ri.dtypes.index

Index(['date_and_time', 'police_department', 'driver_gender', 'driver_age_raw',
       'driver_age', 'driver_race', 'violation', 'search_conducted',
       'search_type', 'contraband_found', 'stop_outcome', 'is_arrested',
       'stop_duration', 'out_of_state', 'drugs_related_stop', 'district'],
      dtype='object')

### ```ri.columns```

In [7]:
ri.columns

Index(['date_and_time', 'police_department', 'driver_gender', 'driver_age_raw',
       'driver_age', 'driver_race', 'violation', 'search_conducted',
       'search_type', 'contraband_found', 'stop_outcome', 'is_arrested',
       'stop_duration', 'out_of_state', 'drugs_related_stop', 'district'],
      dtype='object')

In [13]:
# for element in ri.columns:
#     print(element)
for i, element in enumerate(ri.columns):
    print(i, element)

0 date_and_time
1 police_department
2 driver_gender
3 driver_age_raw
4 driver_age
5 driver_race
6 violation
7 search_conducted
8 search_type
9 contraband_found
10 stop_outcome
11 is_arrested
12 stop_duration
13 out_of_state
14 drugs_related_stop
15 district


## Data frame methods

#### `.head()`: Take a "peek" into any `pandas` `dataframe`.

```python
ri.head()
```

In [14]:
ri.head()

Unnamed: 0,date_and_time,police_department,driver_gender,driver_age_raw,driver_age,driver_race,violation,search_conducted,search_type,contraband_found,stop_outcome,is_arrested,stop_duration,out_of_state,drugs_related_stop,district
0,2005-01-02 01:55:00,600,M,1985.0,20.0,White,Speeding,False,,False,Citation,False,0-15 Min,False,False,Zone K1
1,2005-01-02 20:30:00,500,M,1987.0,18.0,White,Speeding,False,,False,Citation,False,16-30 Min,False,False,Zone X4
2,2005-01-04 11:30:00,0,,,,,,False,,False,,,,,False,Zone X1
3,2005-01-04 12:55:00,500,M,1986.0,19.0,White,Equipment,False,,False,Citation,False,0-15 Min,False,False,Zone X4
4,2005-01-06 01:30:00,500,M,1978.0,27.0,Black,Equipment,False,,False,Citation,False,0-15 Min,False,False,Zone X4


**```.tail()```: Take a "peek" into any ```pandas``` ```dataframe```**

In [16]:
ri.tail()

Unnamed: 0,date_and_time,police_department,driver_gender,driver_age_raw,driver_age,driver_race,violation,search_conducted,search_type,contraband_found,stop_outcome,is_arrested,stop_duration,out_of_state,drugs_related_stop,district
509676,,,,,,,,,,False,,,,,False,Zone NA
509677,,,,,,,,,,False,,,,,False,Zone NA
509678,,,,,,,,,,False,,,,,False,Zone NA
509679,,,,,,,,,,False,,,,,False,Zone NA
509680,,,,,,,,,,False,,,,,False,Zone NA


#### `.describe()`: Summary statistics of entire `pandas` `dataframe`.

Only shows data on series that have number values

```python
ri.describe()
```

In [17]:
ri.describe()

Unnamed: 0,driver_age_raw,driver_age
count,480632.0,478986.0
mean,1970.509997,33.982027
std,108.187159,12.702864
min,0.0,15.0
25%,1967.0,23.0
50%,1980.0,31.0
75%,1987.0,43.0
max,8801.0,99.0


.describe(include=['object']) describes for objects not numbers

In [19]:
ri.describe(include=['object'])

Unnamed: 0,date_and_time,police_department,driver_gender,driver_race,violation,search_conducted,search_type,stop_outcome,is_arrested,stop_duration,out_of_state,district
count,509671,509671,480584,480608,480608,509671,17762,480608,480608,480608,479800,509681
unique,433005,73,2,5,6,2,25,6,2,5,2,7
top,2013-01-22 22:10:00,500,M,White,Speeding,False,Incident to Arrest,Citation,False,0-15 Min,False,Zone X4
freq,99,113927,349446,344734,268744,491909,6998,428388,464005,386665,321276,135349


<hr>
<br>
<br>

## <span style="color:blue"> 2. 'Naive' Selection and Indexing </span>

Let's learn the various methods to grab data from a DataFrame

### Selecting Columns

#### Grab a *single* column.

```python
# Pass a column name
ri['driver_age_raw'].head()
```

In [25]:
# Pass a column name
ri['driver_age_raw'].head()
#ri[4] # cant reference columns by number

0    1985.0
1    1987.0
2       NaN
3    1986.0
4    1978.0
Name: driver_age_raw, dtype: float64

In [26]:
ri['contraband_found'].tail()

509676    False
509677    False
509678    False
509679    False
509680    False
Name: contraband_found, dtype: bool

#### Grab *multiple* columns.

```python
# Pass a list of column names
ri[['driver_age_raw','driver_age']].head()
```

In [30]:
# Pass a list of column names
ri[['driver_age_raw','driver_age']].head() # it will make a dataframe
ri[['driver_age_raw','driver_age']].describe() # call methods on dataframe objects that are returned 

Unnamed: 0,driver_age_raw,driver_age
count,480632.0,478986.0
mean,1970.509997,33.982027
std,108.187159,12.702864
min,0.0,15.0
25%,1967.0,23.0
50%,1980.0,31.0
75%,1987.0,43.0
max,8801.0,99.0


#### All `DataFrame` Columns Are `pandas` `series` Objects.

```python
type(ri['driver_age_raw'])
```

In [31]:
type(ri['driver_age_raw'])

pandas.core.series.Series

In [32]:
type(ri[['driver_age_raw','driver_age']])

pandas.core.frame.DataFrame

<br>

### Creating Columns:

#### Remember the pattern? `print` the `shape` of a `dataframe` before AND after you try to alter anything in pandas.

```python
ri.shape
```

In [33]:
ri.shape

(509681, 16)

#### Make the change. `pandas` `series` behave like `numpy` `arrays`.

Note: we can save over an existing column or just create a new one, the only condition that must be met is, the indeces of the column you are replacing for an existing column or appending to the end. must match the index of the `dataframe`.

```python
# save over ri['event'] to create new column!
ri['violation'] + " " + ri['stop_outcome']
```

In [35]:
# save over ri['event'] to create new column!
ri['violation'] + " " + ri['stop_outcome'] # turns into a series

0                      Speeding Citation
1                      Speeding Citation
2                                    NaN
3                     Equipment Citation
4                     Equipment Citation
5                         Other Citation
6                      Speeding Citation
7              Moving violation Citation
8                      Speeding Citation
9                      Speeding Citation
10          Registration/plates Citation
11                     Speeding Citation
12             Moving violation Citation
13                     Speeding Citation
14                   Other Arrest Driver
15                     Speeding Citation
16                     Speeding Citation
17                                   NaN
18                     Speeding Citation
19                     Speeding Citation
20                     Speeding Citation
21                    Equipment Citation
22                    Equipment Citation
23                     Speeding Citation
24              

#### Always verify that your changes took place!

```python
ri.shape
```

In [36]:
ri.shape

(509681, 16)

In [37]:
# save over ri['event'] to create new column!
ri['description'] = ri['violation'] + " " + ri['stop_outcome'] # turns into a series

In [39]:
ri.shape

(509681, 17)

In [40]:
ri

Unnamed: 0,date_and_time,police_department,driver_gender,driver_age_raw,driver_age,driver_race,violation,search_conducted,search_type,contraband_found,stop_outcome,is_arrested,stop_duration,out_of_state,drugs_related_stop,district,description
0,2005-01-02 01:55:00,600,M,1985.0,20.0,White,Speeding,False,,False,Citation,False,0-15 Min,False,False,Zone K1,Speeding Citation
1,2005-01-02 20:30:00,500,M,1987.0,18.0,White,Speeding,False,,False,Citation,False,16-30 Min,False,False,Zone X4,Speeding Citation
2,2005-01-04 11:30:00,0,,,,,,False,,False,,,,,False,Zone X1,
3,2005-01-04 12:55:00,500,M,1986.0,19.0,White,Equipment,False,,False,Citation,False,0-15 Min,False,False,Zone X4,Equipment Citation
4,2005-01-06 01:30:00,500,M,1978.0,27.0,Black,Equipment,False,,False,Citation,False,0-15 Min,False,False,Zone X4,Equipment Citation
5,2005-01-12 08:05:00,0,M,1973.0,32.0,Black,Other,False,,False,Citation,False,30+ Min,True,False,Zone X1,Other Citation
6,2005-01-18 08:15:00,300,M,1965.0,40.0,White,Speeding,False,,False,Citation,False,0-15 Min,True,False,Zone K3,Speeding Citation
7,2005-01-18 17:13:00,0,M,1967.0,38.0,Hispanic,Moving violation,False,,False,Citation,False,16-30 Min,True,False,Zone X1,Moving violation Citation
8,2005-01-23 23:15:00,300,M,1972.0,33.0,White,Speeding,False,,False,Citation,False,0-15 Min,True,False,Zone K3,Speeding Citation
9,2005-01-24 20:32:00,600,M,1987.0,18.0,White,Speeding,True,Probable Cause,True,Citation,False,0-15 Min,True,True,Zone K1,Speeding Citation


OverRide description series in dataframe

In [54]:
# save over ri['event'] to create new column!
ri['description'] = ri['violation'] + "--" + ri['stop_outcome'] # turns into a series
ri['Description'] = ri['violation'] + "********" + ri['stop_outcome'] # turns into a series
# do something to columnb and override that column 

In [43]:
ri.head()

Unnamed: 0,date_and_time,police_department,driver_gender,driver_age_raw,driver_age,driver_race,violation,search_conducted,search_type,contraband_found,stop_outcome,is_arrested,stop_duration,out_of_state,drugs_related_stop,district,description
0,2005-01-02 01:55:00,600,M,1985.0,20.0,White,Speeding,False,,False,Citation,False,0-15 Min,False,False,Zone K1,Speeding--Citation
1,2005-01-02 20:30:00,500,M,1987.0,18.0,White,Speeding,False,,False,Citation,False,16-30 Min,False,False,Zone X4,Speeding--Citation
2,2005-01-04 11:30:00,0,,,,,,False,,False,,,,,False,Zone X1,
3,2005-01-04 12:55:00,500,M,1986.0,19.0,White,Equipment,False,,False,Citation,False,0-15 Min,False,False,Zone X4,Equipment--Citation
4,2005-01-06 01:30:00,500,M,1978.0,27.0,Black,Equipment,False,,False,Citation,False,0-15 Min,False,False,Zone X4,Equipment--Citation


<br>

### Removing Columns

#### Remember the pattern? `print` the `shape` of a `dataframe` before AND after you try to alter anything in pandas.

```python
ri.shape
```

In [45]:
ri.shape

(509681, 17)

#### Make the change.

```python
ri.drop('event', axis=1)
```

Used to drop rows or columns

axis=0 means rows 

axis=1 means columns

## Doesn't happen in place, must reassign

In [46]:
ri.drop('description',axis=1)

Unnamed: 0,date_and_time,police_department,driver_gender,driver_age_raw,driver_age,driver_race,violation,search_conducted,search_type,contraband_found,stop_outcome,is_arrested,stop_duration,out_of_state,drugs_related_stop,district
0,2005-01-02 01:55:00,600,M,1985.0,20.0,White,Speeding,False,,False,Citation,False,0-15 Min,False,False,Zone K1
1,2005-01-02 20:30:00,500,M,1987.0,18.0,White,Speeding,False,,False,Citation,False,16-30 Min,False,False,Zone X4
2,2005-01-04 11:30:00,0,,,,,,False,,False,,,,,False,Zone X1
3,2005-01-04 12:55:00,500,M,1986.0,19.0,White,Equipment,False,,False,Citation,False,0-15 Min,False,False,Zone X4
4,2005-01-06 01:30:00,500,M,1978.0,27.0,Black,Equipment,False,,False,Citation,False,0-15 Min,False,False,Zone X4
5,2005-01-12 08:05:00,0,M,1973.0,32.0,Black,Other,False,,False,Citation,False,30+ Min,True,False,Zone X1
6,2005-01-18 08:15:00,300,M,1965.0,40.0,White,Speeding,False,,False,Citation,False,0-15 Min,True,False,Zone K3
7,2005-01-18 17:13:00,0,M,1967.0,38.0,Hispanic,Moving violation,False,,False,Citation,False,16-30 Min,True,False,Zone X1
8,2005-01-23 23:15:00,300,M,1972.0,33.0,White,Speeding,False,,False,Citation,False,0-15 Min,True,False,Zone K3
9,2005-01-24 20:32:00,600,M,1987.0,18.0,White,Speeding,True,Probable Cause,True,Citation,False,0-15 Min,True,True,Zone K1


Drop more than one column

In [49]:
ri.drop(['description','Description'],axis=1)

Unnamed: 0,date_and_time,police_department,driver_gender,driver_age_raw,driver_age,driver_race,violation,search_conducted,search_type,contraband_found,stop_outcome,is_arrested,stop_duration,out_of_state,drugs_related_stop,district
0,2005-01-02 01:55:00,600,M,1985.0,20.0,White,Speeding,False,,False,Citation,False,0-15 Min,False,False,Zone K1
1,2005-01-02 20:30:00,500,M,1987.0,18.0,White,Speeding,False,,False,Citation,False,16-30 Min,False,False,Zone X4
2,2005-01-04 11:30:00,0,,,,,,False,,False,,,,,False,Zone X1
3,2005-01-04 12:55:00,500,M,1986.0,19.0,White,Equipment,False,,False,Citation,False,0-15 Min,False,False,Zone X4
4,2005-01-06 01:30:00,500,M,1978.0,27.0,Black,Equipment,False,,False,Citation,False,0-15 Min,False,False,Zone X4
5,2005-01-12 08:05:00,0,M,1973.0,32.0,Black,Other,False,,False,Citation,False,30+ Min,True,False,Zone X1
6,2005-01-18 08:15:00,300,M,1965.0,40.0,White,Speeding,False,,False,Citation,False,0-15 Min,True,False,Zone K3
7,2005-01-18 17:13:00,0,M,1967.0,38.0,Hispanic,Moving violation,False,,False,Citation,False,16-30 Min,True,False,Zone X1
8,2005-01-23 23:15:00,300,M,1972.0,33.0,White,Speeding,False,,False,Citation,False,0-15 Min,True,False,Zone K3
9,2005-01-24 20:32:00,600,M,1987.0,18.0,White,Speeding,True,Probable Cause,True,Citation,False,0-15 Min,True,True,Zone K1


In [57]:
ri = ri.drop(['description','Description'],axis=1)

#### Always verify that your changes took place!

```python
# Not 'inplace' unless dataframe is overwritten!
ri.shape
```

<br>
<br>

### Selecting Rows

select rows based off of position aka 'index'

#### `.iloc[]`

```python
ri.iloc[3:10]
```

In [58]:
ri.iloc[3:10]

Unnamed: 0,date_and_time,police_department,driver_gender,driver_age_raw,driver_age,driver_race,violation,search_conducted,search_type,contraband_found,stop_outcome,is_arrested,stop_duration,out_of_state,drugs_related_stop,district
3,2005-01-04 12:55:00,500,M,1986.0,19.0,White,Equipment,False,,False,Citation,False,0-15 Min,False,False,Zone X4
4,2005-01-06 01:30:00,500,M,1978.0,27.0,Black,Equipment,False,,False,Citation,False,0-15 Min,False,False,Zone X4
5,2005-01-12 08:05:00,0,M,1973.0,32.0,Black,Other,False,,False,Citation,False,30+ Min,True,False,Zone X1
6,2005-01-18 08:15:00,300,M,1965.0,40.0,White,Speeding,False,,False,Citation,False,0-15 Min,True,False,Zone K3
7,2005-01-18 17:13:00,0,M,1967.0,38.0,Hispanic,Moving violation,False,,False,Citation,False,16-30 Min,True,False,Zone X1
8,2005-01-23 23:15:00,300,M,1972.0,33.0,White,Speeding,False,,False,Citation,False,0-15 Min,True,False,Zone K3
9,2005-01-24 20:32:00,600,M,1987.0,18.0,White,Speeding,True,Probable Cause,True,Citation,False,0-15 Min,True,True,Zone K1


#### All `DataFrame` Rows Are *Also* `pandas` `series` Objects.

```python
type(ri.iloc[3])
```

In [59]:
type(ri.iloc[3])

pandas.core.series.Series

#### Prove this fact to yourself!

```python
ri.iloc[3]
```

In [64]:
print(ri.head(0))
print(ri.head())

Empty DataFrame
Columns: [date_and_time, police_department, driver_gender, driver_age_raw, driver_age, driver_race, violation, search_conducted, search_type, contraband_found, stop_outcome, is_arrested, stop_duration, out_of_state, drugs_related_stop, district]
Index: []
         date_and_time police_department driver_gender  driver_age_raw  \
0  2005-01-02 01:55:00               600             M          1985.0   
1  2005-01-02 20:30:00               500             M          1987.0   
2  2005-01-04 11:30:00                 0           NaN             NaN   
3  2005-01-04 12:55:00               500             M          1986.0   
4  2005-01-06 01:30:00               500             M          1978.0   

   driver_age driver_race  violation search_conducted search_type  \
0        20.0       White   Speeding            False         NaN   
1        18.0       White   Speeding            False         NaN   
2         NaN         NaN        NaN            False         NaN   
3      

In [66]:
ri['stop_outcome'].head()

0    Citation
1    Citation
2         NaN
3    Citation
4    Citation
Name: stop_outcome, dtype: object

In [68]:
ri.iloc[3] # row index is column names  indices are not the numbers at the side; other way around; think transpose  # panda series is result 

date_and_time         2005-01-04 12:55:00
police_department                     500
driver_gender                           M
driver_age_raw                       1986
driver_age                             19
driver_race                         White
violation                       Equipment
search_conducted                    False
search_type                           NaN
contraband_found                    False
stop_outcome                     Citation
is_arrested                         False
stop_duration                    0-15 Min
out_of_state                        False
drugs_related_stop                  False
district                          Zone X4
Name: 3, dtype: object

In [73]:
ri.head().T

Unnamed: 0,0,1,2,3,4
date_and_time,2005-01-02 01:55:00,2005-01-02 20:30:00,2005-01-04 11:30:00,2005-01-04 12:55:00,2005-01-06 01:30:00
police_department,600,500,0,500,500
driver_gender,M,M,,M,M
driver_age_raw,1985,1987,,1986,1978
driver_age,20,18,,19,27
driver_race,White,White,,White,Black
violation,Speeding,Speeding,,Equipment,Equipment
search_conducted,False,False,False,False,False
search_type,,,,,
contraband_found,False,False,False,False,False


<hr>
<br>
<br>

## <span style="color:blue"> 3. Conditional Selection </span>

An important feature of `pandas` is conditional selection using bracket notation, very similar to `numpy`:
```python
dataframe[some_condition] 
```

#### condition needs to be a series

In [79]:
ri['driver_gender'] == 'M' # returns pandas series # typically do conditions with columns

0          True
1          True
2         False
3          True
4          True
5          True
6          True
7          True
8          True
9          True
10         True
11         True
12         True
13         True
14         True
15        False
16         True
17        False
18        False
19         True
20         True
21         True
22         True
23         True
24         True
25        False
26         True
27         True
28         True
29         True
          ...  
509651     True
509652    False
509653     True
509654     True
509655    False
509656    False
509657    False
509658     True
509659    False
509660    False
509661    False
509662     True
509663     True
509664     True
509665    False
509666     True
509667     True
509668     True
509669     True
509670     True
509671    False
509672    False
509673    False
509674    False
509675    False
509676    False
509677    False
509678    False
509679    False
509680    False
Name: driver_gender, Len

In [80]:
ri['driver_gender'].unique()

array(['M', nan, 'F'], dtype=object)

In [81]:
is_male = ri['driver_gender'] == 'M'
ri[is_male]

Unnamed: 0,date_and_time,police_department,driver_gender,driver_age_raw,driver_age,driver_race,violation,search_conducted,search_type,contraband_found,stop_outcome,is_arrested,stop_duration,out_of_state,drugs_related_stop,district
0,2005-01-02 01:55:00,600,M,1985.0,20.0,White,Speeding,False,,False,Citation,False,0-15 Min,False,False,Zone K1
1,2005-01-02 20:30:00,500,M,1987.0,18.0,White,Speeding,False,,False,Citation,False,16-30 Min,False,False,Zone X4
3,2005-01-04 12:55:00,500,M,1986.0,19.0,White,Equipment,False,,False,Citation,False,0-15 Min,False,False,Zone X4
4,2005-01-06 01:30:00,500,M,1978.0,27.0,Black,Equipment,False,,False,Citation,False,0-15 Min,False,False,Zone X4
5,2005-01-12 08:05:00,0,M,1973.0,32.0,Black,Other,False,,False,Citation,False,30+ Min,True,False,Zone X1
6,2005-01-18 08:15:00,300,M,1965.0,40.0,White,Speeding,False,,False,Citation,False,0-15 Min,True,False,Zone K3
7,2005-01-18 17:13:00,0,M,1967.0,38.0,Hispanic,Moving violation,False,,False,Citation,False,16-30 Min,True,False,Zone X1
8,2005-01-23 23:15:00,300,M,1972.0,33.0,White,Speeding,False,,False,Citation,False,0-15 Min,True,False,Zone K3
9,2005-01-24 20:32:00,600,M,1987.0,18.0,White,Speeding,True,Probable Cause,True,Citation,False,0-15 Min,True,True,Zone K1
10,2005-02-09 03:05:00,500,M,1976.0,29.0,White,Registration/plates,True,"Probable Cause,Protective Frisk",False,Citation,False,0-15 Min,False,False,Zone X4


indices will be missing, check for index 2 and 15 for example

In [None]:
female_drivers = ri[~is_male] # ~ inverts similar to !=

#### Quick reminder: this is what our `dataframe` looks like and its `shape`

```python
print("Rows:", ri.shape[0])
print("Columns:", ri.shape[1])
ri.head()
```

In [82]:
print("Rows:", ri.shape[0])
print("Columns:", ri.shape[1])
ri.head()

Rows: 509681
Columns: 16


Unnamed: 0,date_and_time,police_department,driver_gender,driver_age_raw,driver_age,driver_race,violation,search_conducted,search_type,contraband_found,stop_outcome,is_arrested,stop_duration,out_of_state,drugs_related_stop,district
0,2005-01-02 01:55:00,600,M,1985.0,20.0,White,Speeding,False,,False,Citation,False,0-15 Min,False,False,Zone K1
1,2005-01-02 20:30:00,500,M,1987.0,18.0,White,Speeding,False,,False,Citation,False,16-30 Min,False,False,Zone X4
2,2005-01-04 11:30:00,0,,,,,,False,,False,,,,,False,Zone X1
3,2005-01-04 12:55:00,500,M,1986.0,19.0,White,Equipment,False,,False,Citation,False,0-15 Min,False,False,Zone X4
4,2005-01-06 01:30:00,500,M,1978.0,27.0,Black,Equipment,False,,False,Citation,False,0-15 Min,False,False,Zone X4


#### Construct a `boolean` `series` to index the `dataframe`.

```python
over_21 = ri['driver_age'] > 21
print(over_21)
```

In [86]:
over_21 = ri['driver_age'] > 21 #boolean check for every value
print(over_21)
ri[over_21]

0         False
1         False
2         False
3         False
4          True
5          True
6          True
7          True
8          True
9         False
10         True
11         True
12        False
13         True
14        False
15         True
16         True
17        False
18        False
19         True
20        False
21         True
22         True
23        False
24        False
25         True
26         True
27         True
28         True
29         True
          ...  
509651    False
509652     True
509653     True
509654     True
509655    False
509656    False
509657    False
509658     True
509659     True
509660     True
509661     True
509662    False
509663     True
509664    False
509665     True
509666     True
509667     True
509668     True
509669     True
509670     True
509671    False
509672    False
509673    False
509674    False
509675    False
509676    False
509677    False
509678    False
509679    False
509680    False
Name: driver_age, Length

Unnamed: 0,date_and_time,police_department,driver_gender,driver_age_raw,driver_age,driver_race,violation,search_conducted,search_type,contraband_found,stop_outcome,is_arrested,stop_duration,out_of_state,drugs_related_stop,district
4,2005-01-06 01:30:00,500,M,1978.0,27.0,Black,Equipment,False,,False,Citation,False,0-15 Min,False,False,Zone X4
5,2005-01-12 08:05:00,0,M,1973.0,32.0,Black,Other,False,,False,Citation,False,30+ Min,True,False,Zone X1
6,2005-01-18 08:15:00,300,M,1965.0,40.0,White,Speeding,False,,False,Citation,False,0-15 Min,True,False,Zone K3
7,2005-01-18 17:13:00,0,M,1967.0,38.0,Hispanic,Moving violation,False,,False,Citation,False,16-30 Min,True,False,Zone X1
8,2005-01-23 23:15:00,300,M,1972.0,33.0,White,Speeding,False,,False,Citation,False,0-15 Min,True,False,Zone K3
10,2005-02-09 03:05:00,500,M,1976.0,29.0,White,Registration/plates,True,"Probable Cause,Protective Frisk",False,Citation,False,0-15 Min,False,False,Zone X4
11,2005-02-11 01:20:00,300,M,1978.0,27.0,Black,Speeding,False,,False,Citation,False,0-15 Min,True,False,Zone K3
13,2005-02-17 04:15:00,500,M,1952.0,53.0,White,Speeding,False,,False,Citation,False,0-15 Min,False,False,Zone X4
15,2005-02-24 01:20:00,200,F,1983.0,22.0,White,Speeding,False,,False,Citation,False,0-15 Min,True,False,Zone X3
16,2005-02-24 05:50:00,200,M,1965.0,40.0,White,Speeding,False,,False,Citation,False,0-15 Min,False,False,Zone X3


#### Index original `dataframe` with `boolean` `series`.

```python
ri[over_21].head()
```

In [88]:
ri[over_21].head() # note indices 0-3 are missing 

Unnamed: 0,date_and_time,police_department,driver_gender,driver_age_raw,driver_age,driver_race,violation,search_conducted,search_type,contraband_found,stop_outcome,is_arrested,stop_duration,out_of_state,drugs_related_stop,district
4,2005-01-06 01:30:00,500,M,1978.0,27.0,Black,Equipment,False,,False,Citation,False,0-15 Min,False,False,Zone X4
5,2005-01-12 08:05:00,0,M,1973.0,32.0,Black,Other,False,,False,Citation,False,30+ Min,True,False,Zone X1
6,2005-01-18 08:15:00,300,M,1965.0,40.0,White,Speeding,False,,False,Citation,False,0-15 Min,True,False,Zone K3
7,2005-01-18 17:13:00,0,M,1967.0,38.0,Hispanic,Moving violation,False,,False,Citation,False,16-30 Min,True,False,Zone X1
8,2005-01-23 23:15:00,300,M,1972.0,33.0,White,Speeding,False,,False,Citation,False,0-15 Min,True,False,Zone K3


#### Grab a column from this new `dataframe`

```python
ri[over_21]['violation']
```

In [89]:
ri[over_21]['violation']

4                   Equipment
5                       Other
6                    Speeding
7            Moving violation
8                    Speeding
10        Registration/plates
11                   Speeding
13                   Speeding
15                   Speeding
16                   Speeding
19                   Speeding
21                  Equipment
22                  Equipment
25                   Speeding
26                   Speeding
27                   Speeding
28                   Speeding
29                   Speeding
30                   Speeding
32           Moving violation
33        Registration/plates
36           Moving violation
37                   Speeding
38                   Speeding
39                   Speeding
40                   Speeding
41                   Speeding
42                   Speeding
43                   Speeding
44                   Speeding
                 ...         
509631       Moving violation
509632               Speeding
509633    

<br>
<br>

### Multiple Conditionals

For two (or more) conditions, use the `|` and `&` operators, make sure to use `()` parentheses to seperate each condition:

##### Note: `&` in `pandas` is equivalent to python `and`

```python
# Save to variable white_over_21 when satisfied.
ri[( ri['driver_race'] == "White" ) & ( ri["driver_age"] > 21 )]
```

In [92]:
# do incrementally
is_white = ri['driver_race'] == "White"
ri[(is_white) & (over_21)]

Unnamed: 0,date_and_time,police_department,driver_gender,driver_age_raw,driver_age,driver_race,violation,search_conducted,search_type,contraband_found,stop_outcome,is_arrested,stop_duration,out_of_state,drugs_related_stop,district
6,2005-01-18 08:15:00,300,M,1965.0,40.0,White,Speeding,False,,False,Citation,False,0-15 Min,True,False,Zone K3
8,2005-01-23 23:15:00,300,M,1972.0,33.0,White,Speeding,False,,False,Citation,False,0-15 Min,True,False,Zone K3
10,2005-02-09 03:05:00,500,M,1976.0,29.0,White,Registration/plates,True,"Probable Cause,Protective Frisk",False,Citation,False,0-15 Min,False,False,Zone X4
13,2005-02-17 04:15:00,500,M,1952.0,53.0,White,Speeding,False,,False,Citation,False,0-15 Min,False,False,Zone X4
15,2005-02-24 01:20:00,200,F,1983.0,22.0,White,Speeding,False,,False,Citation,False,0-15 Min,True,False,Zone X3
16,2005-02-24 05:50:00,200,M,1965.0,40.0,White,Speeding,False,,False,Citation,False,0-15 Min,False,False,Zone X3
19,2005-03-16 12:05:00,0,M,1980.0,25.0,White,Speeding,False,,False,Citation,False,0-15 Min,False,False,Zone X1
21,2005-03-19 14:15:00,900,M,1981.0,24.0,White,Equipment,False,,False,Citation,False,0-15 Min,False,False,Zone K2
25,2005-03-29 23:20:00,300,F,1971.0,34.0,White,Speeding,False,,False,Citation,False,0-15 Min,True,False,Zone K3
26,2005-03-30 22:45:00,300,M,1969.0,36.0,White,Speeding,False,,False,Citation,False,0-15 Min,True,False,Zone K3


```python
# Save to variable female_over_21 when satisfied.
ri[( ri['driver_gender'] == "F" ) & ( ri["driver_age"] > 21 )]
```

In [93]:
# Save to variable female_over_21 when satisfied.
ri[( ri['driver_gender'] == "F" ) & ( ri["driver_age"] > 21 )]

Unnamed: 0,date_and_time,police_department,driver_gender,driver_age_raw,driver_age,driver_race,violation,search_conducted,search_type,contraband_found,stop_outcome,is_arrested,stop_duration,out_of_state,drugs_related_stop,district
15,2005-02-24 01:20:00,200,F,1983.0,22.0,White,Speeding,False,,False,Citation,False,0-15 Min,True,False,Zone X3
25,2005-03-29 23:20:00,300,F,1971.0,34.0,White,Speeding,False,,False,Citation,False,0-15 Min,True,False,Zone K3
36,2005-06-18 16:30:00,500,F,1964.0,41.0,White,Moving violation,False,,False,Arrest Driver,True,30+ Min,False,False,Zone X4
37,2005-07-06 11:22:00,0,F,1973.0,32.0,White,Speeding,False,,False,Citation,False,0-15 Min,False,False,Zone X1
43,2005-07-13 16:45:00,500,F,1978.0,27.0,White,Speeding,False,,False,Citation,False,0-15 Min,True,False,Zone X4
44,2005-07-13 19:00:00,500,F,1966.0,39.0,White,Speeding,False,,False,Citation,False,0-15 Min,True,False,Zone X4
45,2005-07-14 09:30:00,500,F,1969.0,36.0,White,Speeding,False,,False,Citation,False,0-15 Min,True,False,Zone X4
47,2005-07-14 11:20:00,500,F,1981.0,24.0,White,Speeding,False,,False,Citation,False,0-15 Min,True,False,Zone X4
48,2005-07-14 14:55:00,500,F,1980.0,25.0,Black,Speeding,False,,False,Citation,False,0-15 Min,True,False,Zone X4
57,2005-07-18 20:30:00,300,F,1978.0,27.0,White,Speeding,False,,False,Citation,False,0-15 Min,True,False,Zone K3


#### Lets take a look at these new `dataframes`

```python
white_over_21.describe()
```

In [94]:
white_over_21.describe()

NameError: name 'white_over_21' is not defined

```python
female_over_21.describe()
```

<br>
<br>

##### Note: `|` in `pandas` is equivalent to python `or`

```python
# Save to variable speeding_or_arrest when satisfied.
ri[(ri['violation'] == 'Speeding') | (ri['is_arrested'] == True)]
```

In [96]:
# Save to variable speeding_or_arrest when satisfied.
speeding_or_arrest = ri[(ri['violation'] == 'Speeding') | (ri['is_arrested'] == True)]
print(len(speeding_or_arrest))
print(len(ri[(ri['violation'] == 'Speeding') & (ri['is_arrested'] == True)]))


281693
3654


#### Lets take a look at the new `dataframe`

```python
speeding_or_arrest.describe()
```

In [99]:
print(speeding_or_arrest.describe())
print()
print(speeding_or_arrest.describe(include=['object']))


       driver_age_raw     driver_age
count   281693.000000  281125.000000
mean      1973.363921      33.436151
std         77.380923      12.689631
min          0.000000      15.000000
25%       1968.000000      23.000000
50%       1980.000000      30.000000
75%       1986.000000      42.000000
max       8801.000000      97.000000

              date_and_time  police_department driver_gender driver_race  \
count                281693             281693        281685      281693   
unique               254706                 57             2           5   
top     2015-01-10 09:11:00                300             M       White   
freq                     44              53556        192689      220076   

       violation search_conducted         search_type stop_outcome  \
count     281693           281693               10082       281693   
unique         6                2                  25            6   
top     Speeding            False  Incident to Arrest     Citation   
freq 