# Reviewing the Pandas Janitor Closet
### _Cleaning Tools_

In [2]:
import pandas as pd

In [3]:
df = pd.read_csv('data/tidied_threaded_data_pull.csv') # get the file 
df.shape # get its dimensions and inspect in data viewer

(2718, 8)

# Accessor review
### `iloc`
Select the first and third row with the second and fifth columns by index
```python
df.iloc[[1,3], [2,5]]
```
Select the first and third row with the second and fifth columns by column name
### `loc`
```python
df.loc[[1,3], ['Title','URL']]
```
**Note**, dataframe indexes do not have to be sequential numbers. Selecting by labelled index is done with `loc`

In [5]:
df.iloc[[1,3], [2,5]] # "Get Those 2 Rows At those Two Columns"

Unnamed: 0,Title,URL
1,TAuch_FracSand-Mine-USSilica-Ottawa_IL_LightHa...,https://www.flickr.com/photos/fractracker/4972...
3,TAuch_FracSand-Mine-Unimin-NorthUtica_IL_Light...,https://www.flickr.com/photos/fractracker/4972...


In [6]:
df.loc[[1,3], ['Title','URL']] # same same but different 😆

Unnamed: 0,Title,URL
1,TAuch_FracSand-Mine-USSilica-Ottawa_IL_LightHa...,https://www.flickr.com/photos/fractracker/4972...
3,TAuch_FracSand-Mine-Unimin-NorthUtica_IL_Light...,https://www.flickr.com/photos/fractracker/4972...


In [7]:
#### `loc` is for inclusive range selection
df.loc[1:3, ['Title','URL']]

Unnamed: 0,Title,URL
1,TAuch_FracSand-Mine-USSilica-Ottawa_IL_LightHa...,https://www.flickr.com/photos/fractracker/4972...
2,TAuch_FracSand-Mine-USSilica-Ottawa_IL_LightHa...,https://www.flickr.com/photos/fractracker/4972...
3,TAuch_FracSand-Mine-Unimin-NorthUtica_IL_Light...,https://www.flickr.com/photos/fractracker/4972...


In [8]:
#### `iloc` is for exclusive range selection
df.iloc[1:4, [2,5]]

Unnamed: 0,Title,URL
1,TAuch_FracSand-Mine-USSilica-Ottawa_IL_LightHa...,https://www.flickr.com/photos/fractracker/4972...
2,TAuch_FracSand-Mine-USSilica-Ottawa_IL_LightHa...,https://www.flickr.com/photos/fractracker/4972...
3,TAuch_FracSand-Mine-Unimin-NorthUtica_IL_Light...,https://www.flickr.com/photos/fractracker/4972...


# Series Boolean Comparison

In [9]:
all(pd.Series([True, True])) # if everything in all() is True then return True. 

True

In [10]:
all(pd.Series([False, False])) # Note, if everything in all() is False then return False.

False

In [18]:
pd.Series([False, False]).all()

np.False_

# Releasing Heavy Dataframe form memory

In [16]:
# demonstrate a way to delete a dataframe. This is useful when you have a large dataframe that you no longer need. In this case, `test` is very small though.
test = df[df.loc[:,'Title'].map(lambda x: x.split('_')[0] in ['TAuch'])]
test.shape

(2613, 8)

In [17]:
del test
import gc # garbage collection
gc.collect() # Returns the number of objects it has collected and deallocated

1103