# Pandas Basics Part 1 — Workbook

In this workbook, we're going to explore the basics of the Python library Pandas.

## Import Pandas

To use the Pandas library, we first need to `import` it.

In [1]:
import pandas as pd

## Change Display Settings

By default, Pandas will display 60 rows and 20 columns. I often change [Pandas' default display settings](https://pandas.pydata.org/pandas-docs/stable/user_guide/options.html) to show more rows or columns.

In [2]:
pd.options.display.max_rows = 200

## Get Data

To read in a CSV file, we will use the method `pd.read_csv()` and insert the name of our desired file path. 

In [None]:
pd.read_csv('Bellevue_Almshouse_Dataset.csv')

In [None]:
type(pd.read_csv('Bellevue_Almshouse_Dataset.csv'))

This creates a Pandas [DataFrame object](https://pandas.pydata.org/pandas-docs/stable/user_guide/dsintro.html#dataframe), one of the two main data structures in Pandas. A DataFrame looks and acts a lot like a spreadsheet, but it has special powers and functions that we will discuss below and in the next few lessons.

| Pandas objects | Explanation                         |
|----------|-------------------------------------|
| `DataFrame`    | Like a spreadsheet, 2-dimensional    |
| `Series`      | Like a column, 1-dimensional                     |

We assign the DataFrame to a variable called `bellevue_df`. It is common convention to name DataFrame variables `df`, but we want to be a bit more specific. 

In [2]:
bellevue_df = pd.read_csv('Bellevue_Almshouse_Dataset.csv')

## Begin to Examine Patterns

### Select Columns as Series Objects `[]`

To select a column from the DataFrame, we will type the name of the DataFrame followed by square brackets and a column name in quotations marks.

In [None]:
bellevue_df['age']

Technically, a single column in a DataFrame is a [*Series* object](https://pandas.pydata.org/pandas-docs/stable/user_guide/dsintro.html#dsintro).

In [None]:
type(bellevue_df['age'])

## Pandas Methods

| Pandas method | Explanation                         |
|----------|-------------------------------------|
| `.sum()`      | Sum of values                       |
| `.mean()`     | Mean of values                      |
| `.median()`   | Median of values         |
| `.min()`      | Minimum                             |
| `.max()`      | Maximum                             |
| `.mode()`     | Mode                                |
| `.std()`      | Unbiased standard deviation         |
| `.count()`    | Total number of non-blank values    |
| `.value_counts()` | Frequency of unique values |

### ❓  How old (on average) were the people admitted to the Bellevue Almshouse?

In [None]:
bellevue_df['age']

### ❓  How old was the oldest person admitted to Bellevue?

In [None]:
bellevue_df['age']

### ❓  How young was the youngest person?

In [None]:
bellevue_df['age']

### ❓ What were the most common professions among these Irish immigrants?

To count the values in a column, we can use the `.value_counts()` method.

What patterns do you notice in this list? What seems strange to you? What can we learn about the people in the dataset *and* the people who created the dataset?

In [None]:
bellevue_df['profession'].value_counts()

### ❓ What are the most common diseases?

In [None]:
bellevue_df['disease'].value_counts()

### ❓  Where were most people sent?

In [None]:
bellevue_df['sent_to'].value_counts()

## Examine Subsets

### ❓  Why were people being sent to Hostpital Ward 38?

To explore this question, we can filter rows with a condition.

In [None]:
bellevue_df['sent_to'] == 'Hospital Ward 38'

In [None]:
sent_filter = bellevue_df['sent_to'] == 'Hospital Ward 38'

In [None]:
bellevue_df[sent_filter]

## ❓  What data is missing? What data do you wish we had?