# Introduction to Matplotlib and Seaborn



## Introduction to Matplotlib

Matplotlib is a data visualization package built on numpy arrays for making 2D plots. It makes extensive use of numpy arrays and other extension code to provide good performance even for large arrays.

The package is fairly old, with version 0.1 released in 2003. The main advantage of this package is that it allows for flexible visualization in Python across versions and operating systems.

It is important to note the hierarchy: Name of library is Matplotlib whereas the name of the module is matplotlib.pyplot

Anatomy of a figure:

![alt text](https://pbs.twimg.com/media/Cr5jxB-UkAAmvBn?format=jpg&name=medium)


### Basic plots in Matplotlib:

- Line Plot
- Scatter plot
- Pie charts
- Histograms
- Bar charts
- Paths
- Three dimensional plotting
- Images

### Getting Started

Since we are using Jupyter notebook, we start our notebook with this line:

In [None]:
%matplotlib inline

This ensures that our visualizations will render in the notebook itself.

We now load the matplotlib library:

In [None]:
import matplotlib.pyplot as plt

### Basic Plots



#### Line Plot
Line plots can be linear or non-linear depending on the kind of data we have. A simple representation has been shown below:

![alt text](https://docs.oracle.com/cd/E57185_01/CBREG/images/graphics/linearvsnonlinear.gif)

We can generate two numpy arrays and then plot them as a line plot.


In [None]:
import numpy as np

In [None]:
x1 = np.arange(0,10,1)
x2 = np.linspace(0,5,10)

In [None]:
np.linspace(0,5,11)

In [None]:
plt.plot(x1, x1, label='linear')
plt.plot(x1, x2**2, label='quadratic')

plt.title("Linear vs Non Linear Plots")
plt.legend()
plt.show()

##$ Making Plots bigger or Smaller

In [None]:
plt.subplots(figsize=(3,10))

plt.plot(x1, x1, label='linear')
plt.plot(x1, x2**2, label='quadratic')

plt.title("Linear vs Non Linear Plots")
plt.legend()
plt.show()

#### Making Subplots



In [None]:
fig, [ax1, ax2] = plt.subplots(1,2)
ax1.plot(x1,x2)
ax1.set_title('LINEAR PLOT')   
ax1.set_xlabel('x label')     
ax1.set_ylabel('y label')   

ax2.plot(np.cos(x2))
ax2.set_title('CURVE PLOT')
ax2.set_xlabel('x label')
ax2.set_ylabel('y label')

plt.show()


In [None]:
fig, [ax1, ax2, ax3] = plt.subplots(1,3)
ax1.plot(x1,x2)
ax1.set_title('LINEAR PLOT')   
ax1.set_xlabel('x label')     
ax1.set_ylabel('y label')   

ax2.plot(np.cos(x2))
ax2.set_title('CURVE PLOT')
ax2.set_xlabel('x label')
ax2.set_ylabel('y label')

ax3.plot(x1**3)
ax3.set_title('CURVE PLOT')
ax3.set_xlabel('x label')
ax3.set_ylabel('y label')

plt.show()


In [None]:
fig, [ax1, ax2, ax3, ax4] = plt.subplots(1,4)
ax1.plot(x1,x2)
ax1.set_title('LINEAR PLOT')   
ax1.set_xlabel('x label')     
ax1.set_ylabel('y label')   

ax2.plot(np.cos(x2))
ax2.set_title('CURVE PLOT')
ax2.set_xlabel('x label')
ax2.set_ylabel('y label')

ax3.plot(x1**3)
ax3.set_title('CURVE PLOT')
ax3.set_xlabel('x label')
ax3.set_ylabel('y label')

ax4.plot(np.arange(0,10,1)/3)
ax4.set_title('CURVE PLOT')
ax4.set_xlabel('x label')
ax4.set_ylabel('y label')

plt.show()


### Making Subplots bigger or Smaller

In [None]:
fig, [ax1, ax2] = plt.subplots(1,2, figsize = (10,4))
ax1.plot(x1,x2)
ax1.set_title('LINEAR PLOT')   
ax1.set_xlabel('x label')     
ax1.set_ylabel('y label')   

ax2.plot(np.cos(x2))
ax2.set_title('CURVE PLOT')
ax2.set_xlabel('x label')
ax2.set_ylabel('y label')

plt.show()


#### Scatter Plots

Scatter plots are very useful to see the relationship between two variables (also known as correlation).

Letâ€™s consider an example where we are given a data-set with price of houses in a neighborhood. Some of the variables on which the price could depend include area and number of years since it was built.

Positive correlation: When one variable increases with the increase in the other variable, and vice-versa

Negative correlation: When one variable decreases with the increase in the other variable, and vice-versa

In [None]:
x = np.random.randn(10)
y = np.random.randn(10)

In [None]:
plt.scatter(x,y)
plt.xlabel('X axis')
plt.ylabel('Y axis')

plt.show()

#### Pie Charts

It is a circular statistical graphic, which is divided into slices to illustrate numerical proportion of each slice. It is an important business tool to visually see the percentage shares of different components under investigation.

In [None]:
labels = ['Vegetarians', 'Non-Vegetarians', 'Vegans']
sizes = [40, 10, 15]
explode = (0, 0.4, 0)

In [None]:
fig1, ax1 = plt.subplots()
ax1.pie(sizes, explode=explode, labels=labels, autopct='%1.1f%%', shadow=True, startangle=90)
ax1.axis('equal')
plt.show()

## Introduction to Seaborn

Seaborn is a data visualization library that was created to complement matplotlib and it is closely integrated with pandas data structures. Some of the advantages of this library are:

- Streamlines the amount of code needed to write to create matplotlib visualizations
- Enables us to work with pandas dataframes which is a big advantage over matplotlib
- Provides specialized support for using categorical variables
- Has functions to examine relationships between multiple variables
- Provides convenient views onto the overall structure of complex datasets

### Getting Started

Let's proceed with importing our libraries

In [None]:
%matplotlib inline
import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd
#sns.set()

### Basic Plots

#### Line Plots

We can create a Pandas DataFrame using the two variables we created earlier.

In [None]:
data = pd.DataFrame()
data['X'] = np.arange(0,10,1)
data['Y'] = data['X']**2
data['Z'] = data['Y']**2

In [None]:
data

Now that we have our DataFrame, we can plot the data using seaborn.

In [None]:
sns.lineplot(data.X, data.Y)

#### Scatter Plots
Here we use the seaborn lmplot function and set the option to fit the regression line to false.

In [None]:
sns.lmplot("X", "Y", data=data, fit_reg=False)

In [None]:
#sns.lmplot(data.X, data.Y, data= data, fit_reg=False)

As we can see, here it is also possible to enhance a scatter-plot to include a linear regression model

### Using Pandas

Matplotlib has been integrated into Pandas and now we are able to plot matplotlib visualizations directly from Pandas without having to use the matplotlib or seaborn syntax. However, we still need to import matplotlib and set %matplotlib inline

#### Working on Dataframes Using Seaborn
Let's start by reading the 'vehicles.csv' file that we had already downloaded before. (vehicles data)

In [None]:
data = pd.read_csv('vehicles.csv') 
data.head()

In [None]:
sns.scatterplot(x="CO2 Emission Grams/Mile", y="Highway MPG", data=data)

In [None]:
sns.scatterplot(x="CO2 Emission Grams/Mile", y="Highway MPG", hue="Drivetrain", data=data)

In [None]:
sns.scatterplot(x="CO2 Emission Grams/Mile", y="Highway MPG", hue="Drivetrain", size = "Year", data=data)

In [None]:
plt.subplots(figsize=(300,100))
sns.scatterplot(x="CO2 Emission Grams/Mile", y="Highway MPG", hue="Drivetrain", size = "Year", data=data)

Seaborn provides a host of functions that lets you present that data in a more meaningful and visually appealing way  (link)

##### relplot: Facetgrid
The new version of seaborn uses this function that combines the lineplot() and scatterplot() using a Facetgrid.

FacetGrid class helps in visualizing the relationship between multiple variables on the same plot by introducing multiple dimensions using parameters such as col (for column), hue, style, and size. For example, a plot using relplot() is shown below:

In [None]:
sns.relplot(x='CO2 Emission Grams/Mile', y='Highway MPG', data=data)

In [None]:
sns.relplot(x='CO2 Emission Grams/Mile', y='Highway MPG', hue="Fuel Type", data=data)

In [None]:
sns.relplot(x='CO2 Emission Grams/Mile', y='Highway MPG', hue="Fuel Type", size="Fuel Type", data=data)

##### Catplot

These are specialized categorical plots. It is used to show the relationship between a numerical and one or more categorical variables

In [None]:
sns.catplot(x="Highway MPG", y="Drivetrain", data=data)

In [None]:
sns.catplot(x="Highway MPG", y="Drivetrain", hue="Fuel Type", data=data)

In [None]:
sns.catplot(x="Highway MPG", y="Drivetrain", hue="Fuel Type", data=data, height=20)

## Summary

In this lesson we learned how to plot visualizations using three different tools. All three use matplotlib under the hood. However, Seaborn and Pandas have improved on the original Python library and enabled us to visualize DataFrames as well.