# Plotting

Up until now, all we have been doing is loading in, manipulating, changing, and saving our data. Examining the data like this is nice, but sometimes it also good to look at the data in various plots and graphs so that we can get a better understanding of it and what it means. 

Many of the plotting functions that the pandas library provides us with are just wrapper functions of python's matplotlib library. The following graphs can be done using dataframe  and series objects:
* Histograms
* Density Plots
* Scatterplots
* Hexbin Plots
* Boxplots

The following examples use the students dataframe created in the last chapter and we are going to create different graphs and plots off of it.

## Histogram

In [9]:
# This is an example of a histogram based on the age column of the students dataframe
%matplotlib notebook 
import pandas
import matplotlib.pyplot as plt

students = pandas.DataFrame({
    'Age' : [18,18,19,23,22,18,28,20,21,24],
    'Standing' : ['Freshman', 'Freshamn', 'Sophmore', 'Senior', 'Senior', 'Freshman', 'Junior', 'Sophmore', 'Junior', 'Senior'],
    'Major' : ['CITE','COSC','POSC','MATH','CITE','PHYS','COSC','CITE','ANTH','SOCI'],
    'Grade' : [18, 19, 91, 96, 78, 82, 90, 79, 89, 85]},
    index=['Student 1','Student 2','Student 3','Student 4','Student 5','Student 6','Student 7','Student 8','Student 9','Student 10'],
    columns=['Age', 'Standing', 'Major', 'Grade']
)

students_age = students['Age']

students_age.hist()


<IPython.core.display.Javascript object>

<matplotlib.axes._subplots.AxesSubplot at 0x114cce048>

Rather than using the hist( ) function to plot histograms, you can also use the subplots method of the matplotlib library.

Using the subplots method, it returns a tuple. The first item in the tuple is the figure of the plot. Figure objects allow us to change figure-level attributes and save the figure to an image file, but we aren't going to mess with this much. The other object the subplots function returns is the axes object. This is the objects where we can set what type of graph we want to see.

For example if we want to see another histogram but using the subplots method, we do the following:


In [10]:

fig, ax = plt.subplots()
ax = students['Age'].plot.hist()  # We tell the axes object to use the Age column in a histogram
plt.show()

<IPython.core.display.Javascript object>

You can also plot multiple columns from a dataframs. If those numbers from the columns overlap, it would be hard to tell the where one bar stopped. The plot.hist( ) function allows you to change the transparency of the bars to so that you can see the overlapping columns.

In [16]:
fig, ax = plt.subplots()
ax = students[['Age', 'Grade']].plot.hist(alpha=0.5, ax=ax)
plt.show()

<IPython.core.display.Javascript object>

# Other Plots

The dataframe objects in pandas have more than just histograms to plot. They also have:
* Density Plots
* Scatter Plots
* Hexbin Plots
* Box Plots


In [17]:
# Here is an example of a density plots
# KDE stands for a kernel density estimation

fig, ax = plt.subplots()
ax = students['Grade'].plot.kde()
plt.show()

<IPython.core.display.Javascript object>

In [18]:
# Here is an example of a scatter plot
# The scatter function takes x and y as attributes representing the x and y axes

fig, ax = plt.subplots()
ax = students.plot.scatter(x='Age', y='Grade', ax=ax)
plt.show()

<IPython.core.display.Javascript object>

In [26]:
# Here is an example of a hexbin plot
# Hexbin plots categorize all of the data into hexagonal bins. 
# The number of items that fall in that bin determine its color density
# As well as the gridsize attribute to scale the plot

fig, ax = plt.subplots()
ax = students.plot.hexbin(x='Age', y='Grade', ax=ax)
plt.show()


<IPython.core.display.Javascript object>

In [25]:
# To make the bins more readable, you can use the gridsize attrubite

fig, ax = plt.subplots()
ax = students.plot.hexbin(x='Age', y='Grade', gridsize=15, ax=ax)
plt.show()

<IPython.core.display.Javascript object>

In [27]:
# Here is an example of a box plot
# The box() method only requires that you set the axes object to itself
# It will take the numerical valued columns from the dataframe and ignore the other columns

fig, ax = plt.subplots()
ax = students.plot.box(ax=ax)
plt.show()

<IPython.core.display.Javascript object>