# Visualization basics with Matplotlib, Pandas and Seaborn

In this lab you will be working with the data you've selected for your mini-capstone. I expect by now everyone is capable of loading up their data into working memory, eg, a Pandas Dataframe.

This lab will have code to perform an exploratory data analysis using the most common visualization tools available in Matplotlib, Pandas and Seaborn.

### Import Libraries

In [1]:
import pandas as pd 
import seaborn as sns
import matplotlib.pyplot as plt

### Load a Dataset

I expect you to be able to fill write the code necessary to load your own dataset. It is not uncommon to accompany code to load data with some basic wrangling techniques, such as dropping columns, changing datatypes, or renaming columns. Include that code here as well.

Lastly, it is very common that datasets are stored as CSV or Excel files, which can be stored in working memory as a Pandas Dataframe. Pandas is an open source, BSD-licensed library providing high-performance, easy-to-use data structures and data analysis tools for Python programmers and data scientists. Below is are two lines of code that load and print the first five rows of an Excel spreadsheet.

Example:
```python
df = pd.read_excel('filepath/file.xlsx')
df.head()
```

In [1]:
### Write the code to load your dataset here.

### Basic Summary Statistics

Basic summary statistics can be produced with Pandas `.describe()` function, returning a Pandas Dataframe. In addition, individual calculations can be made on specific columns using stand-alone functions, such as `.mean()`.

Example:
```python
df.describe()
```

In [8]:
### Apply the describe function to your dataset, or to a subset of features, to produce summary statistics. 

### Pandas Visualization Toolkit

Pandas has plenty of useful [visualization tools](https://pandas.pydata.org/pandas-docs/stable/user_guide/visualization.html) available out of the box.

#### Area Plots

Use Pandas `.plot()` function to create an [area plot](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.plot.area.html). Try changing the `figsize` and `stacked` from their defaults. What do you notice about the range of your data?

Example:
```python
df.plot(kind='area', stacked=False, figsize=[10,7])
```

In [10]:
### Write code for an area plot.

#### Scatter Plots

Use Pandas `.plot()` function to create a [scatter plot](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.plot.scatter.html) from your Pandas Dataframe. Do you see any patterns in the two features in your plot?

Example:
```python
df.plot(kind='scatter', x=df['x_values'], y=df['y_values'])
```

In [5]:
### Create a scatter plot.

Try changing the *color* of data points based on a numeric column in your dataset.

Example:
```python
df.plot(kind='scatter', x=df['x_values'], y=df['y_values'], c=df['COLOR_VALUES'])
```

In [6]:
### Re-create the scatter plot by changing the color of the data points.

Lastly, try changing the *size* of the data points based on a numeric column in your dataset. If you don't have one that makes sense, that's ok, use it anyway.

Example:
```python
df.plot(kind='scatter', x=df['x_values'], y=df['y_values'], s=df['SIZE_VALUES'])
```

In [7]:
### Re-create the scatter plot by changing the size of the data points.

#### Box plots

Use Pandas `.plot()` function to create a [box plot](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.plot.scatter.html) of the numeric columns in your Pandas Dataframe.

Example:
```python
df.plot(kind='box')
```

In [16]:
### Create a boxplot.

#### Histograms
Use Pandas `.plot()` function to create a [histogram](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.plot.scatter.html) of only one column in your Pandas Dataframe. The `alpha` argument change the transparency of the colors used in the graph.

Example:
```python
df['feature'].plot(kind='hist', alpha=0.5)
```

In [1]:
### Create a histogram.

Try instead to use the `.hist()` function instead of `.plot()`. In this case, Pandas will plot *all* numeric columns in your dataset as a histogram. If your dataset has more than 12 numeric columsn, then I would recommend you subset them or skip this cell entirely.

Example:
```python
df.hist()
```

In [21]:
### Re-create the histogram.

#### Histogram with Kernel Density Estimation (KDE)

Example:
```python
df['wall_area'].plot(kind='kde')
```

In [11]:
### Recreate histogram with a KDE.

### Seaborn Visualization Toolkit

[Seaborn](https://seaborn.pydata.org/) is a Python visualization library based on matplotlib. It provides a high-level interface for drawing attractive statistical graphics. You may also want to browse the example gallery to get a sense for what you can do with seaborn and then check out the tutorial and API reference to find out how.

Seaborn has a lot of the same methods as Pandas, like [boxplots](http://seaborn.pydata.org/generated/seaborn.boxplot.html?highlight=box%2520plot#seaborn.boxplot) and [histograms](http://seaborn.pydata.org/generated/seaborn.distplot.html), but also comes with some novel tools.

#### Countplot

This is a visualization best for two categorical features. The [Countplot](https://seaborn.pydata.org/generated/seaborn.countplot.html) function produces a barplot based on the frequencies of values in one feature, split by the classes of some another feature. The `x` or `y` argument represents the frequencies used for the barplot and the `hue` argument specifices a column to group the frequencies by, such as a binary variable. Us the `.countplot()` function to create a countplot in Seaborn.

Example of counting one feature grouped by another:
```python
sns.countplot(data=df, y='feature1', hue='target')
```

In [51]:
### Generate a countplot below.

#### Boxplot

Seaborn allows you to create [boxplots](https://seaborn.pydata.org/generated/seaborn.boxplot.html) like Pandas, with plenty of easily accessible customizations. Using the `.boxplot()` function to create a boxplot in Seaborn. The `y` argument will plot the boxplot vertically and the `x` argument will plot the column horizontally. You can comebine the two to create multiple boxplots by passing a numeric feature to `y` and a categorical feature to `x`.

Example:
```python
sns.boxplot(data=df, y='numeric', x='category')
```

In [50]:
### Create a boxplot below.

#### Violin Plot

[Violinplots](https://stanford.edu/~mwaskom/software/seaborn/generated/seaborn.violinplot.html?highlight=violinplots) are often a useful alternative to box-and-whisker plots because they can integrate the information from a traditional boxplot with a kernel density estimate. The violin plot will take arguments `x`, `y`, and `hue` to produce a useful visualization. `y` is the numeric column you want to plot, while `x` is likely another predictor, and `hue` would represent the target.

Example:
```python
sns.violinplot(data=df, x='feature2', y='feature1', hue='target', split=True, scale='count')
```

In [56]:
### Write the violin plot below.