## <font color="coral">Into to Matplotlib/Pyplot/Seaborn</font>

#### <font color="coral">In this notebook we'll be looking at the Matplotlib Pyplot and Seaborn libraries for data visualization. This notebook will briefly cover:</font>
    
- Plotting with Matplotlib
- Changing figure aesthetics with 
    
#### <font color="coral"> Notebook usage</font>

This notebook is intended to be worked through top-to-bottom, feel free to change what you'd like and experiment with any ideas you have. To advance in the notebook you can click on each cell and hit the "▶️" button, or pressing "Shift-Enter/Return". Try running the cell below!
    

In [None]:
# Jupyter magic line needed to render nicely in notebook
%matplotlib inline

import matplotlib as mpl
import matplotlib.pyplot as plt
import pandas as pd

### <font color="coral"> Matplotlib Intuition</font>

It's not an understatement to say that ["Matplotlib is probably the single most used Python package for 2D-graphics"](https://github.com/rougier/matplotlib-tutorial). Importantly, it provides ways to create both quick-and-dirty and publication-quality figures.

Specifically we will use the Pyplot submodule which provides a MATLAB-like interface for plotting. Create your first plot by running the cell below.

In [None]:
z = [1, 2, 3, 4, 5]
z_square = [1, 4, 9, 16, 25]
z_square_inv = [25, 16, 9, 4, 1]
z_log = [0.0, 0.69, 1.1, 1.39, 1.61]
z_10s = [1, 10, 100, 1000, 10000]

In [None]:
# If we provide a single list to plot, matplotlib assumes it is a sequence 
# of y values, and automatically generates the x values for us
plt.plot(z)

# Display the current plot
plt.show()

In [None]:
# We can plot x against y if we provide two lists to plot
plt.plot(z, z_square)
plt.show()

In [None]:
# Calling plotting functions multiple times will layer plots
plt.plot(z)
plt.plot(z_log)
plt.show()

In [None]:
# We can change the style of curve or add markers
...
plt.show()

In [None]:
# We can set the color using a hexidecimal color code or a named color
...
plt.show()

In [None]:
# We can create subplots part of the same figure
# plt.subplots returns both the figure and an array of all the individual axes
...

# Plots on the left subplot
...

# Plots on the right subplot
...

# Diplays the entire figure
fig.show()

In [None]:
# We can label curves and move the legend
fig, axes = plt.subplots(ncols=2)

...

fig.show()

In [None]:
# We can change the scale or plot a semilog graph directly
fig, axes = plt.subplots(nrows=3, ncols=2)

# First row, semilogx
...

# Second row, semilogy
...

# Third row, loglog
...

# Pads space between subplots
...
fig.show()

In [None]:
# We can customize most aesthetics of the figure
fig, ax = plt.subplots()
ax.plot(z, z_log, color='coral')

# Hide top and right borders
...

# Annotate the graph
...

fig.show()

There are lots of other types of useful plots! Check out the [matplotlib gallery](https://matplotlib.org/stable/gallery/index.html) for **a ton** of detailed examples.

- `bar()`: [Make a bar plot](https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.bar.html#matplotlib.pyplot.bar).
- `boxplot()`: [Make a box and whisker plot](https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.boxplot.html#matplotlib.pyplot.boxplot).
- `pie()`: [Plot a pie chart](https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.plot.html#matplotlib.pyplot.plot).
- `plot()`: [Plot y versus x as lines and or markers](https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.plot.html#matplotlib.pyplot.plot).
- `scatter()`: [A scatter plot of y vs x](https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.scatter.html#matplotlib.pyplot.scatter).

In the above examples we used lists for simplicity, but matplotlib uses numpy `arrays` internally. This means we can pass numpy `arrays` or Pandas `DataFrames` as arguments to any of the plotting functions.

#### <font color="coral"> Aside on seaborn </font>

Seaborn is a Python data visualization library built on top of matplotlib. It's often used for it's R/ggplot-esque aesthetics. 

The functions in Seaborn are functions are tailored towards statistics and play much nicer with Pandas data frames. A full API reference can be found [here](https://seaborn.pydata.org/api.html). 

In [None]:
import seaborn as sns

# Configure seaborn as the default for plotting.
sns.set()

# Jupyter magic line needed to render nicely in notebook
%matplotlib inline

fig, ax = plt.subplots()
ax.plot(z, z_square)
fig.show()

In [None]:
sns.set_style('white')

fig, ax = plt.subplots()
ax.plot(z, z_square)
ax.plot(z, z_square_inv)
ax.plot(z, z_log)
fig.show()

## <font color="coral"> Exploratory Data Analysis with Pandas and Matplotlib

Visualization is a major tool for exploratory data analysis. We can use visualizations both to help us answer questions about our data and uncover new questions to answer.

#### <font color="coral"> Our practice data </font>

The data we will be using is volcano eruption and event data, provided publicity by [RforDataScience](https://github.com/rfordatascience/tidytuesday/blob/master/data/2020/2020-05-12/).

Let's look at the data now!

After loading our data, we can remind ourselves the names and data types of each column using the `.info()` method.

In [None]:
...

In [None]:
...

#### <font color="coral">Investigating distribution of volcano elevation by region</font>

Suppose we want to understand how the distribution of volcano elevation changes between geographic regions. This is the perfect use case for a box plot!

First, let's start off by creating a single box plot for the entire dataset.

In [None]:
# Instead of a list, we can pass in a column from a Pandas DataFrame to Pyplot directly!
...

# We set axis labels with plt.ylabel and plt.xlabel
...

# We set the title with plt.title
...

plt.show()

We can use `.describe()` from Pandas to quickly get common statistics.

In [None]:
...

So what's going on with the negative elevation volcanoes? Filter the data frame to only include those with negative elevation and inspect the first few rows using `.head()` from Pandas.

In [None]:
...

Our dataset includes submarine volcanoes! This is a great example of how creating quick visualizations can help uncover otherwise hard-to-spot details in our dataset by pointing us in the right direction.

Now let's move on to creating a separate box plot for each region. Before plotting, we should check how many unique regions there are.

In [None]:
...

14 regions might be too many to fit on a single plot. Let's filter our data to only include volcanos in Alaska, Canada and the Western US, and Mexico and Central America.

In [None]:
...

Without visualizing, we can use Pandas to estimate the distribution of elevation by region. `.describe()` works on data frames grouped by one or more columns too!

In [None]:
...

In [None]:
# Create a grouped boxplot using seaborn
fig, ax = plt.subplots()
...

ax.spines['top'].set_visible(False)
ax.spines['right'].set_visible(False)

# Rotate x-axis labels by 15 degrees
ax.set_xticklabels(ax.get_xticklabels(), rotation=15, ha="right")
fig.show()

#### <font color="coral"> Investigating correlation between latitude and longitude </font>

Suppose we want to understand the relationship between the latitude and longitude of volcanos and whether they are correlated.

First let's plot a scatterplot of the latitude and longitude of each volcano colored by its region.

In [None]:
fig, ax = plt.subplots()
...

ax.spines['top'].set_visible(False)
ax.spines['right'].set_visible(False)

fig.show()

In [None]:
# Let's select only the Canadian and Western American volcanoes
# and fit a regression line.
ca_us_volcanoes = ...

fig, ax = plt.subplots()
...

ax.spines['top'].set_visible(False)
ax.spines['right'].set_visible(False)

fig.show()

#### <font color="coral"> Now it's your turn! </font>

What's one question that you have about the data we've loaded here? Are there any types of plots that you want to learn how to use?

Use the space below to explore the data and answer your question.