# Plotting Multiple Data Series 

## Introduction 

There are many cases when a more elaborate visualization can help us understand our data better. Therefore, in this lesson we will focus on generating such visualizations.

## MULTIPLE LINE PLOTS
Recalling our vehicles dataset, we might want to compare the relationship between city MPG, highway MPG and CO2 emissions.

In order to do this, we can use the .plot function in Pandas. With this function, we can specify which variables will be in the x axis and which will be in the y axis. We will put CO2 emissions in the x axis and the MPG variables in the y axis.

In order to get a meaningful visualization, we should sort our DataFrame by these variables first. This is because Python does not sort by default. It will just connect a line between any two points in the chart that are sequential. This can lead to a very unclear chart.

In the code examples below, we use the vehicles dataset.

In [None]:
%matplotlib inline
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

In [None]:
vehicles = pd.read_csv('vehicles.csv')

In [None]:
vehicles.head()

In [None]:
vehicles.sort_values(by=["CO2 Emission Grams/Mile", "City MPG", "Highway MPG"], inplace=True)

In [None]:
vehicles.plot(x="CO2 Emission Grams/Mile", y=["City MPG"])

In [None]:
vehicles.plot(x="CO2 Emission Grams/Mile", y=["Highway MPG"])

In [None]:
vehicles.plot(x="CO2 Emission Grams/Mile", y=["City MPG", "Highway MPG"])

## MULTIPLE BAR PLOTS
When plotting categorical data, there is value to plotting two or more groups side by side and being able to compare them. There are a few ways of creating such a plot.

### Side By Side Bar Plots
If we include multiple columns in our bar plot, they will show up side by side in different colors.

In the example below we aggregate both highway and city MPG by drivetrain. Since a bar plot will plot one value per group, we will aggregate and compute the mean.

In [None]:
vehicles[["Highway MPG", "City MPG"]].agg("mean")

In [None]:
vehicles[["Highway MPG", "City MPG", "Drivetrain"]].groupby(["Drivetrain"]).agg("mean")

In [None]:
vehicles_mean = vehicles[["Highway MPG", "City MPG", "Drivetrain"]].groupby(["Drivetrain"]).agg("mean")
vehicles_mean.plot.bar()

In [None]:
vehicles[["Highway MPG", "City MPG", "Drivetrain"]].groupby(["Drivetrain"]).agg("mean").plot.bar() 

### Side By Side Horizontal Bar Plots
We can use the .barh function to produce horizontal bars.

In [None]:
vehicles_mean.plot.barh()

## SCATTER MATRICES
A scatter matrix is a useful tool particularly in exploratory data analysis. We can look at the pairwise relationships between multiple variables at the same time. Typically what we look for is linear relationships between the pairs of variables. This information can help us in the future when modeling the data. There are also non linear relationships that we can detect like a logarithmic or exponential relationship between two variables. In this case, we can apply a transformation to the variables to produce a linear relationship.

We will be using the scatter_matrix function. This function will create a scatter plot for any two numeric variables in our data.

By default the scatter matrix displays the histogram of each variable along the diagonal. We can also show the kernel density estimation along the diagonal instead.

In [None]:
pd.plotting.scatter_matrix(vehicles) #, figsize=(50,50))
plt.show()

This visualization may seem a bit cluttered but it tells us quite a bit about our data. The main takeaways are that there is a linear relationship between combined MPG, city MPG and highway MPG. There is a non linear relationship between MPG and CO2 emissions and MPG and fuel cost per year. The relationship between those pairs of variables could benefit from a transformation in order to make those relationships linear.



In [None]:
import seaborn as sns

In [None]:
sns.pairplot(vehicles)

In [None]:
sns.pairplot(vehicles, hue = "Cylinders") 

## Summary 

In this lesson we learned how to plot multiple pieces of information in one chart. We plotted two line graphs in one chart as well as two bar graphs. Additionally, we created a scatter chart with all pairwise combinations in a dataset. These charts can be very useful; however, we must perform appropriate data transformations sometimes in order to make them more interpretable. The code snippets presented in this lesson will serve you as a template to create you own dataset visualizations for your projects.