# Quick Reference
---

## Lesson 1 Recap

### Importing `pandas` Library
```
import pandas as pd
```
- Libraries only need to be imported once in a notebook
- It's good practice to consolidate all your `import` commands together near the start of your notebook

### Reading a CSV File

To read a CSV file and store it as a DataFrame variable:
```
df = pd.read_csv('some_cool_data.csv')
```

### Quick and Easy Summaries of a DataFrame

Number of rows and columns (rows first, columns second): 
```
df.shape
```

Names and data types of each column: 
```
df.dtypes
```
Just the names of each column:
```
df.columns
```

#### Rows at a Glance

- First 5 rows:
```
df.head()
```
- Last 5 rows:
```
df.tail()
```
- A random sampling (1 row):
```
df.sample()
```
- The number of rows can be specified as an input to any of the above methods (e.g. `df.tail(7)` returns the last 7 rows)

#### Summary Statistics

Full set of summary statistics (min, max, mean, standard deviation, etc.) for each column of a DataFrame:
```
df.describe()
```

Mean value of each column:
```
df.mean()
```

And similarly for other summary statistics: `df.min()`, `df.max()`, `df.median()`, `df.std()`

Optional keyword argument to `min` and `max` methods, to include only numerical data columns:
```
df.max(numeric_only=True)
```

---

## Lesson 2 Recap

### DataFrames and Series

DataFrames and Series are both data types that belong to the `pandas` library.

- **DataFrame:** 2-dimensional array, like a table in a spreadsheet
  - The rows are axis 0
  - The columns are axis 1
- **Series:** 1-dimensional array, like a single column or row in a spreadsheet

### Working with DataFrame Columns

Each column of a DataFrame is a Series.
```
series_X = df['X']
```

Performing basic calculations:
```
double_X = 2 * df['X']
```

Adding a new column to a DataFrame: 
```
df['Double X'] = 2 * df['X']
```

### Saving to CSV

Saving a DataFrame to a CSV file: 
```
df.to_csv('cool_output.csv', index=False)
```
- To include the DataFrame's index as a column in the CSV file, omit the `index=False` keyword argument.


### Simple Graphs

To display `pandas` / `matplotlib` graphs inline in your notebook, you need to run the following magic command:
```
%matplotlib inline
```
- This command only needs to be run once in a notebook
- It's good practice to run this command at the same time as your `import` commands, near the start of your notebook

#### Creating Line Plots

For a Series:
```
series.plot()
```
For a single column of a DataFrame:
```
df['Column A'].plot()
```

For all columns in a DataFrame, with each column as a line on the same plot:
```
df.plot()
```

For all columns in a DataFrame, with a separate subplot for each column:
```
df.plot(subplots=True)
```

To adjust the size of a graph, use the `figsize` keyword argument to the `plot` method, where `figsize` is a tuple of (width, height). For example to create a figure with subplots that is 6" wide by 8" tall:
```
df.plot(subplots=True, figsize=(6, 8))
```
---

## Lesson 3 Recap

### Counting Unique Values

Unique values in a Series: 
```
series.unique()
```

Number of unique values in a Series:
```
series.nunique()
```
or you could use `len(series.unique())`

Counts of each unique value in a Series
- Excluding missing values:
```
series.value_counts()
```
- Including missing values:
```
series.value_counts(dropna=False)
```

### Bar Charts

Plot a horizontal bar chart of a Series: 
```
series.plot(kind='barh')
```
For a vertical bar chart, use `kind='bar'`.

### Text Processing

Apply string methods to a text Series&mdash;use string methods in `series.str`:
```
series_lower = series.str.lower()
```
Apply multiple methods with method chaining:
```
series_lower_stripped = series.str.lower().str.strip()
```
  
  
### Missing Data

Locate missing values in a Series or DataFrame
```
data.isnull()
```

Calculate the total number of missing values in a Series, or in each column of a DataFrame: 
```
data.isnull().sum()
```
---

## Lesson 4 Recap

### Sorting

Sort a DataFrame based on the values in the column `'Column B'`:
```
df.sort_values('Column B')
```
To sort in descending order, use the keyword argument `ascending=False`


### Aggregation

For basic aggregation operations, use the `groupby` method chained with an aggregation method (e.g., `mean`, `sum`, `max`, `min`, `count`).

For example, to find the mean values for data grouped by `'Column B'`: `
```
df.groupby('Column B').mean()
```

### Subsets

#### Selecting Columns

To select a subset of columns from a DataFrame: 
```
df_sub = df[['Column C', 'Column A', 'Column B']]
```

#### Selecting Rows with a Filter

To select a subset of rows with a filter:
  - Create a filter (Boolean Series)
  - Fill any missings in the filter using the `fillna` method (if necessary)
  - Use the filter to extract the desired rows from the DataFrame

Filter Example 1: string method `contains` with text data
```
snowing = weather_all['Conditions'].str.contains('Snow')
snowing = snowing.fillna(False)
weather_snowing = weather_all[snowing]
```

Filter Example 2: comparison operator with numerical data
```
temp_warm = weather_all['Temperature (C)'] > 20
temp_warm = temp_warm.fillna(False)
weather_warm = weather_all[temp_warm]
```