# Pandas Basics Part 3 — Workbook

In this workbook, we're going to explore the basics of the Python library Pandas.

## Import Pandas

To use the Pandas library, we first need to `import` it.

In [5]:
import pandas as pd

## Change Display Settings

By default, Pandas will display 60 rows and 20 columns. I often change [Pandas' default display settings](https://pandas.pydata.org/pandas-docs/stable/user_guide/options.html) to show more rows or columns.

In [6]:
pd.options.display.max_rows = 200

## Get Data

To read in a CSV file, we will use the method `pd.read_csv()` and insert the name of our desired file path. 

In [7]:
pd.read_csv('Bellevue_Almshouse_Dataset.csv')

Unnamed: 0,date_in,first_name,last_name,full_name,age,gender,disease,profession,children,sent_to,sender1,sender2
0,1847-04-17,Mary,Gallagher,Mary Gallagher,28.0,f,recent emigrant,married,Child Alana 10 days,Hospital,superintendent,hd. gibbens
1,1847-04-08,John,Sanin (?),John Sanin (?),19.0,m,recent emigrant,laborer,Catherine 2 mo,,george w. anderson,edward witherell
2,1847-04-17,Anthony,Clark,Anthony Clark,60.0,m,recent emigrant,laborer,Charles Riley afed 10 days,Hospital,george w. anderson,edward witherell
3,1847-04-08,Lawrence,Feeney,Lawrence Feeney,32.0,m,recent emigrant,laborer,Child,,george w. anderson,james donnelly
4,1847-04-13,Henry,Joyce,Henry Joyce,21.0,m,recent emigrant,,Child 1 mo,,george w. anderson,edward witherell
...,...,...,...,...,...,...,...,...,...,...,...,...
9593,1846-05-23,Joseph,Aton,Joseph Aton,69.0,m,,shoemaker,,,[blank],
9594,1847-06-17,Mary,Smith,Mary Smith,47.0,f,,,,Hospital Ward 38,[blank],
9595,1847-06-22,Francis,Riley,Francis Riley,29.0,m,lame,superintendent,,,[blank],
9596,1847-07-02,Martin,Dunn,Martin Dunn,4.0,m,,,,,[blank],


In [8]:
type(pd.read_csv('Bellevue_Almshouse_Dataset.csv'))

pandas.core.frame.DataFrame

This creates a Pandas [DataFrame object](https://pandas.pydata.org/pandas-docs/stable/user_guide/dsintro.html#dataframe), one of the two main data structures in Pandas. A DataFrame looks and acts a lot like a spreadsheet, but it has special powers and functions that we will discuss below and in the next few lessons.

| Pandas objects | Explanation                         |
|----------|-------------------------------------|
| `DataFrame`    | Like a spreadsheet, 2-dimensional    |
| `Series`      | Like a column, 1-dimensional                     |

We assign the DataFrame to a variable called `bellevue_df`. It is common convention to name DataFrame variables `df`, but we want to be a bit more specific. 

In [40]:
bellevue_df = pd.read_csv('Bellevue_Almshouse_Dataset.csv')

# Is NA?

In [41]:
bellevue_df['profession'].isna()

0       False
1       False
2       False
3       False
4        True
        ...  
9593    False
9594     True
9595    False
9596     True
9597     True
Name: profession, Length: 9598, dtype: bool

In [42]:
bellevue_df['profession'].isna().value_counts()

False    8579
True     1019
Name: profession, dtype: int64

In [43]:
bellevue_df['profession'].notna()

0        True
1        True
2        True
3        True
4       False
        ...  
9593     True
9594    False
9595     True
9596    False
9597    False
Name: profession, Length: 9598, dtype: bool

# Fill NA

In [44]:
bellevue_df['profession'].fillna('No profession')

0              married
1              laborer
2              laborer
3              laborer
4        No profession
             ...      
9593         shoemaker
9594     No profession
9595    superintendent
9596     No profession
9597     No profession
Name: profession, Length: 9598, dtype: object

In [46]:
bellevue_df['profession'] = bellevue_df['profession'].fillna('No profession')

## String Methods

In [47]:
bellevue_df['profession'].str.title()

0              Married
1              Laborer
2              Laborer
3              Laborer
4        No Profession
             ...      
9593         Shoemaker
9594     No Profession
9595    Superintendent
9596     No Profession
9597     No Profession
Name: profession, Length: 9598, dtype: object

In [48]:
bellevue_df['profession'].str.contains('magic')

0       False
1       False
2       False
3       False
4       False
        ...  
9593    False
9594    False
9595    False
9596    False
9597    False
Name: profession, Length: 9598, dtype: bool

In [62]:
profession_filter = bellevue_df['profession'].str.contains('teach')
bellevue_df[profession_filter]

Unnamed: 0,date_in,first_name,last_name,full_name,age,gender,disease,profession,children,sent_to,sender1,sender2,senders
2196,1847-03-12,Michael,Rush,Michael Rush,40.0,m,recent emigrant,teacher,,Bellevue Garret,george w. anderson,peter c. johnston,george w. anderson and peter c. johnston
2693,1846-03-11,Thomas,Brady,Thomas Brady,45.0,m,,teacher,,,commissioners of emigration,agent,commissioners of emigration and agent
3774,1846-07-06,Henry,Dunlap,Henry Dunlap,66.0,m,,teacher,,,george w. anderson,,
4284,1846-09-03,John B.,Murray,John B. Murray,45.0,m,,teacher,,,george w. anderson,,
4287,1846-09-03,Alexander,Alcock,Alexander Alcock,46.0,m,,teacher,,,george w. anderson,,
4612,1846-10-15,John,Dillon,John Dillon,32.0,m,,teacher,,,george w. anderson,,
5225,1847-03-01,George F.,Robins,George F. Robins,57.0,m,destitution,teacher,,Bellevue Garret,george w. anderson,edward witherell,george w. anderson and edward witherell
5615,1847-05-08,William,Smith,William Smith,27.0,m,destitution,school teacher,,Randall's Island,george w. anderson,edward witherell,george w. anderson and edward witherell
6254,1847-08-05,Patrick,McGowen,Patrick McGowen,24.0,m,sickness,teacher,,Hospital,william w. lyons,,
8305,1847-05-27,William,Smith,William Smith,29.0,m,destitution,teacher,,Blackwell's Island,moses g. leonard,edward witherell,moses g. leonard and edward witherell


## Rename Column

In [50]:
bellevue_df.rename(columns={'date_in': 'date_arrived'})

Unnamed: 0,date_arrived,first_name,last_name,full_name,age,gender,disease,profession,children,sent_to,sender1,sender2
0,1847-04-17,Mary,Gallagher,Mary Gallagher,28.0,f,recent emigrant,married,Child Alana 10 days,Hospital,superintendent,hd. gibbens
1,1847-04-08,John,Sanin (?),John Sanin (?),19.0,m,recent emigrant,laborer,Catherine 2 mo,,george w. anderson,edward witherell
2,1847-04-17,Anthony,Clark,Anthony Clark,60.0,m,recent emigrant,laborer,Charles Riley afed 10 days,Hospital,george w. anderson,edward witherell
3,1847-04-08,Lawrence,Feeney,Lawrence Feeney,32.0,m,recent emigrant,laborer,Child,,george w. anderson,james donnelly
4,1847-04-13,Henry,Joyce,Henry Joyce,21.0,m,recent emigrant,No profession,Child 1 mo,,george w. anderson,edward witherell
...,...,...,...,...,...,...,...,...,...,...,...,...
9593,1846-05-23,Joseph,Aton,Joseph Aton,69.0,m,,shoemaker,,,[blank],
9594,1847-06-17,Mary,Smith,Mary Smith,47.0,f,,No profession,,Hospital Ward 38,[blank],
9595,1847-06-22,Francis,Riley,Francis Riley,29.0,m,lame,superintendent,,,[blank],
9596,1847-07-02,Martin,Dunn,Martin Dunn,4.0,m,,No profession,,,[blank],


## Add Column

In [51]:
bellevue_df['senders'] = bellevue_df['sender1'] + ' and ' +  bellevue_df['sender2'] 

In [52]:
bellevue_df

Unnamed: 0,date_in,first_name,last_name,full_name,age,gender,disease,profession,children,sent_to,sender1,sender2,senders
0,1847-04-17,Mary,Gallagher,Mary Gallagher,28.0,f,recent emigrant,married,Child Alana 10 days,Hospital,superintendent,hd. gibbens,superintendent and hd. gibbens
1,1847-04-08,John,Sanin (?),John Sanin (?),19.0,m,recent emigrant,laborer,Catherine 2 mo,,george w. anderson,edward witherell,george w. anderson and edward witherell
2,1847-04-17,Anthony,Clark,Anthony Clark,60.0,m,recent emigrant,laborer,Charles Riley afed 10 days,Hospital,george w. anderson,edward witherell,george w. anderson and edward witherell
3,1847-04-08,Lawrence,Feeney,Lawrence Feeney,32.0,m,recent emigrant,laborer,Child,,george w. anderson,james donnelly,george w. anderson and james donnelly
4,1847-04-13,Henry,Joyce,Henry Joyce,21.0,m,recent emigrant,No profession,Child 1 mo,,george w. anderson,edward witherell,george w. anderson and edward witherell
...,...,...,...,...,...,...,...,...,...,...,...,...,...
9593,1846-05-23,Joseph,Aton,Joseph Aton,69.0,m,,shoemaker,,,[blank],,
9594,1847-06-17,Mary,Smith,Mary Smith,47.0,f,,No profession,,Hospital Ward 38,[blank],,
9595,1847-06-22,Francis,Riley,Francis Riley,29.0,m,lame,superintendent,,,[blank],,
9596,1847-07-02,Martin,Dunn,Martin Dunn,4.0,m,,No profession,,,[blank],,
