<a id="recap"></a>
# Lesson 1 Recap

### Importing `pandas` Library
```
import pandas as pd
```

### DataFrames and Series

Data in `pandas` is organized into DataFrames and Series.

- **DataFrame:** 2-dimensional array, like a table in a spreadsheet
  - The rows are axis 0
  - The columns are axis 1
- **Series:** 1-dimensional array, like a single column or row in a spreadsheet
  - Each individual column or row of a DataFrame is represented as a Series

### Reading a CSV File

To read a CSV file and store it as a DataFrame variable:
```
df = pd.read_csv('some_cool_data.csv')
```

Missing data in a DataFrame or Series is represented as `NaN` ("not a number").

### Quick and Easy Summaries of a DataFrame

|||
---|----
**Useful Attributes** |
Number of rows and columns (rows first, columns second) | `df.shape` 
Names and data types of each column |  `df.dtypes` 
Just the names of each column | `df.columns` 
**Rows at a Glance** |
First `n` rows (default 5) |`df.head(n)`
Last `n` rows (default 5) | `df.tail(n)`
A random sampling of `n` rows (default 1) | `df.sample(n)`


#### Summary Statistics

Full set of summary statistics (min, max, mean, standard deviation, etc.) for each numerical column of a DataFrame:
```
df.describe()
```

Mean value of each column:
```
df.mean()
```

And similarly for other summary statistics: `df.min()`, `df.max()`, `df.median()`, `df.std()`

### Working with DataFrame Columns

Each column of a DataFrame is a Series.
```
series_X = df['X']
```

The DataFrame methods listed above can be applied to a Series, for example:
- `df['X'].head()`
- `df['X'].max()`

Basic calculations with a Series and adding a new column to a DataFrame: 
```
df['Double X'] = 2 * df['X']
```

### Categorical Data

For a column `df['Category']` of categorical data, some useful summary methods are:

|||
---|---
Unique values | `df['Category'].unique()`
Number of unique values | `df['Category'].nunique()`
Counts of each unique value | `df['Category'].value_counts()`

*Note: These methods can only be applied to a Series (not a DataFrame).*

### Plots

To display `pandas` / `matplotlib` graphs inline in your notebook, you need to run the following magic command:
```
%matplotlib inline
```
- This command only needs to be run once in a notebook
- It's good practice to run this command at the same time as your `import` commands, near the start of your notebook

Create quick and easy plots of Series and DataFrames with `plot`:
- Two syntax options to specify the kind of plot. For example, to create a histogram of `series_X` with 20 bins:
  - `series_X.plot(kind='hist', bins=20)`, or
  - `series_X.plot.hist(bins=20)`
- Default kind of plot is a line plot, for example:
  - `df['A'].plot()` creates a line plot of column `'A'` of `df`
- To adjust the size of a plot, use the `figsize` keyword argument, for example:
  - `df['A'].plot.hist(bins=20, figsize=(8, 4))`