In [None]:
# Let's plot the mean and median side by side in a negatively skewed distribution.
# Sadly, arrays don't have a nice median method, so we have to use a numpy function to compute it.
import numpy
import matplotlib.pyplot as plt

# Plot the histogram
plt.hist(test_scores_negative)
# Compute the median
median = numpy.median(test_scores_negative)

# Plot the median in blue (the color argument of "b" means blue)
plt.axvline(median, color="b")

# Plot the mean in red
plt.axvline(test_scores_negative.mean(), color="r")

# See how the median is further to the right than the mean?
# It's less sensitive to outliers, and isn't pulled to the left.
plt.show()

In [None]:
# The cleaned up data has been loaded into the new_titanic_survival variable
import matplotlib.pyplot as plt
import numpy
plt.hist(new_titanic_survival["age"])
plt.axvline(numpy.median(new_titanic_survival["age"]), color="b")
plt.axvline(new_titanic_survival["age"].mean(), color="r")
plt.show()

### Instructions
Create a Matplotlib subplot grid with the following properties:
4 rows by 1 column,
figsize of 4 (width) by 8 (height),
each Axes instance should have an x-value range of 0.0 to 5.0.

Generate the following histograms:
First plot (top most): Histogram of normalized Rotten Tomatoes scores by users.
Second plot: Histogram of normalized Metacritic scores by users.
Third plot: Histogram of Fandango scores by users.
Fourth plot (bottom most): Histogram of IMDB scores by users.

In [None]:
import matplotlib.pyplot as plt
import pandas as pd
movie_reviews = pd.read_csv("fandango_score_comparison.csv")

fig = plt.figure(figsize=(5,12))
ax1 = fig.add_subplot(4,1,1)
ax2 = fig.add_subplot(4,1,2)
ax3 = fig.add_subplot(4,1,3)
ax4 = fig.add_subplot(4,1,4)

ax1.set_xlim(0,5.0)
ax2.set_xlim(0,5.0)
ax3.set_xlim(0,5.0)
ax4.set_xlim(0,5.0)

movie_reviews["RT_user_norm"].hist(ax=ax1)
movie_reviews["Metacritic_user_nom"].hist(ax=ax2)
movie_reviews["Fandango_Ratingvalue"].hist(ax=ax3)
movie_reviews["IMDB_norm"].hist(ax=ax4)

### Multiple plot
When we were working with a single plot, pyplot was storing and updating the state of that single plot. We could tweak the plot just using the functions in the pyplot module. When we want to work with multiple plots, however, we need to be more explicit about which plot we're making changes to. This means we need to understand the matplotlib classes that pyplot uses internally to maintain state so we can interact with them directly. Let's first start by understanding what pyplot was automatically storing under the hood when we create a single plot:

a container for all plots was created (returned as a Figure object)
a container for the plot was positioned on a grid (the plot returned as an Axes object)
visual symbols were added to the plot (using the Axes methods)
A figure acts as a container for all of our plots and has methods for customizing the appearance and behavior for the plots within that container. Some examples include changing the overall width and height of the plotting area and the spacing between plots.

We can manually create a figure by calling pyplot.figure():


fig = plt.figure()
Instead of only calling the pyplot function, we assigned its return value to a variable (fig). After a figure is created, an axes for a single plot containing no data is created within the context of the figure. When rendered without data, the plot will resemble the empty plot from the previous mission. The Axes object acts as its own container for the various components of the plot, such as:

values on the x-axis and y-axis
ticks on the x-axis and y-axis
all visual symbols, such as:
markers
lines
gridlines
While plots are represented using instances of the Axes class, they're also often referred to as subplots in matplotlib. To add a new subplot to an existing figure, use Figure.add_subplot. This will return a new Axes object, which needs to be assigned to a variable:


axes_obj = fig.add_subplot(nrows, ncols, plot_number)


In [None]:
If we want the figure to contain 2 plots, one above the other, we need to write:


ax1 = fig.add_subplot(2,1,1)
ax2 = fig.add_subplot(2,1,2)
# This will create a grid, 2 rows by 1 column, of plots. 
# Once we're done adding subplots to the figure, we display everything using plt.show():

import matplotlib.pyplot as plt
fig = plt.figure()
ax1 = fig.add_subplot(2,1,1)
ax2 = fig.add_subplot(2,1,2)
plt.show()
Let's create a figure, add subplots to it, and display it.

### Automated plot 

In [None]:

fig = plt.figure(figsize=(12,12))

for i in range(5):
    ax = fig.add_subplot(5,1,i+1)
    start_index = i*12
    end_index = (i+1)*12
    subset = unrate[start_index:end_index]
    ax.plot(subset['DATE'], subset['VALUE'])

plt.show()

### Same plot multiple line

In [None]:
unrate['MONTH'] = unrate['DATE'].dt.month
unrate['MONTH'] = unrate['DATE'].dt.month
fig = plt.figure(figsize=(6,3))

plt.plot(unrate[0:12]['MONTH'], unrate[0:12]['VALUE'], c='red')
plt.plot(unrate[12:24]['MONTH'], unrate[12:24]['VALUE'], c='blue')

plt.show()

### Bar chart

Creating Bars
When we generated line charts, we passed in the data to pyplot.plot() and matmplotlib took care of the rest. Because the markers and lines in a line chart correspond directly with x-axis and y-axis coordinates, all matplotlib needed was the data we wanted plotted. To create a useful bar plot, however, we need to specify the positions of the bars, the widths of the bars, and the positions of the axis labels. Here's a diagram that shows the various values we need to specify:

Matplotlib Barplot Positioning

We'll focus on positioning the bars on the x-axis in this step and on positioning the x-axis labels in the next step. We can generate a vertical bar plot using either pyplot.bar() or Axes.bar(). We'll use Axes.bar() so we can extensively customize the bar plot more easily. We can use pyplot.subplots() to first generate a single subplot and return both the Figure and Axes object. This is a shortcut from the technique we used in the previous mission:


##### fig, ax = plt.subplots()


The Axes.bar() method has 2 required parameters, left and height. We use the left parameter to specify the x coordinates of the left sides of the bar. We use the height parameter to specify the height of each bar. Both of these parameters accept a list-like object:


 Positions of the left sides of the bars. [0.75, 1.75, 2.75, 3.75, 4.75]
from numpy import arange
bar_positions = arange(5) + 0.75

 Heights of the bars.  In our case, the average rating for the first movie in the dataset.
num_cols = ['RT_user_norm', 'Metacritic_user_nom', 'IMDB_norm', 'Fandango_Ratingvalue', 'Fandango_Stars']
bar_heights = norm_reviews.ix[0, num_cols].values

ax.bar(bar_positions, bar_heights)
We can also use the width parameter to specify the width of each bar. This is an optional parameter and the width of each bar is set to 0.8 by default. The following code sets the width parameter to 1.5:


ax.bar(bar_positions, bar_heights, 1.5)
ax.bar(bar_positions, bar_heights, 1.5)

In [None]:
import matplotlib.pyplot as plt
from numpy import arange
num_cols = ['RT_user_norm', 'Metacritic_user_nom', 'IMDB_norm', 'Fandango_Ratingvalue', 'Fandango_Stars']

bar_heights = norm_reviews.ix[0, num_cols].values
bar_positions = arange(5) + 0.75
fig, ax = plt.subplots()
ax.bar(bar_positions, bar_heights, 0.5)
plt.show()

### Aligning Axis Ticks And Labels for bar

By default, matplotlib sets the x-axis tick labels to the integer values the bars spanned on the x-axis (from 0 to 6). We only need tick labels on the x-axis where the bars are positioned. We can use Axes.set_xticks() to change the positions of the ticks to [1, 2, 3, 4, 5]:


tick_positions = range(1,6)
ax.set_xticks(tick_positions)
Then, we can use Axes.set_xticklabels() to specify the tick labels:


num_cols = ['RT_user_norm', 'Metacritic_user_nom', 'IMDB_norm', 'Fandango_Ratingvalue', 'Fandango_Stars']
ax.set_xticklabels(num_cols)
If you look at the documentation for the method, you'll notice that we can specify the orientation for the labels using the rotation parameter:


ax.set_xticklabels(num_cols, rotation=90)
Rotating the labels by 90 degrees keeps them readable. In addition to modifying the x-axis tick positions and labels, let's also set the x-axis label, y-axis label, and the plot title.

In [None]:
num_cols = ['RT_user_norm', 'Metacritic_user_nom', 'IMDB_norm', 'Fandango_Ratingvalue', 'Fandango_Stars']
# bar lenght = value of the column
bar_heights = norm_reviews.ix[0, num_cols].values 
# bar positions width
bar_positions = arange(5) + 0.75
# bar position order 
tick_positions = range(1,6)

fig, ax = plt.subplots()
ax.bar(bar_positions, bar_heights, 0.5)
ax.set_xticks(tick_positions)
ax.set_xticklabels(num_cols, rotation=90)
plt.xlabel("Rating Source")
plt.ylabel("Average Rating")
plt.title('Average User Rating For Avengers: Age of Ultron (2015)')
plt.show()

#### horizontal bar

In [None]:
num_cols = ['RT_user_norm', 'Metacritic_user_nom', 'IMDB_norm', 'Fandango_Ratingvalue', 'Fandango_Stars']
# bar lenght = value of the column
bar_heights = norm_reviews.ix[0, num_cols].values 
# bar positions width
bar_positions = arange(5) + 0.75
# bar position order 
tick_positions = range(1,6)

fig, ax = plt.subplots()
ax.barh(bar_positions, bar_heights, 0.5)
ax.set_xticks(tick_positions)
ax.set_xticklabels(num_cols, rotation=90)
plt.xlabel("Rating Source")
plt.ylabel("Average Rating")
plt.title('Average User Rating For Avengers: Age of Ultron (2015)')
plt.show()

#### Histogram In Matplotlib
We can generate a histogram using Axes.hist(). This method has only 1 required parameter, an iterable object containing the values we want a histogram for. By default, matplotlib will:

* calculate the minimum and maximum value from the sequence of values we passed in
* create 10 bins of equal length that span the range from the minimum to the maximum value
* group unique values into the bins
* sum up the associated unique values
* generate a bar for the frequency sum for each bin
* The default behavior of Axes.hist() is problematic for the use case of comparing distributions for multiple columns using the * same binning strategy. This is because the binning strategy for each column would depend on the minimum and maximum values, instead of a shared binning strategy. We can use the range parameter to specify the range we want matplotlib to use as a tuple:
<img src="histogram_binning.png">


Histograms help us visualize continuous values using bins while bar plots help us visualize discrete values. The locations of the bars on the x-axis matter in a histogram but they don't in a simple bar plot. 
Lastly, bar plots also have gaps between the bars, to emphasize that the values are discrete.

In [None]:
fig = plt.figure(figsize=(5,20))
ax1 = fig.add_subplot(4,1,1)
ax2 = fig.add_subplot(4,1,2)
ax3 = fig.add_subplot(4,1,3)
ax4 = fig.add_subplot(4,1,4)

ax1.hist(norm_reviews['Fandango_Ratingvalue'], 20, range=(0, 5))
ax1.set_title('Distribution of Fandango Ratings')
ax1.set_ylim(0, 50)

### Box Plot
A box plot consists of box-and-whisker diagrams, which represents the different quartiles in a visual way. Here's a box plot of the values in the RT_user_norm column:
<img src="boxplot_intro.png">


The two regions contained within the box in the middle make up the interquartile range, or IQR. The IQR is used to measure dispersion of the values. The ratio of the length of the box to the whiskers around the box helps us understand how values in the distribution are spread out.

We can generate a boxplot using Axes.boxplot().


ax.boxplot(norm_reviews['RT_user_norm'])
Matplotlib will sort the values, calculate the quartiles that divide the values into four equal regions, and generate the box and whisker diagram.

In [None]:
fig,ax = plt.subplots()
ax.boxplot(norm_reviews['RT_user_norm'])

ax.set_xticklabels(['Rotten Tomatoes'])
ax.set_ylim(0, 5)

#### Multiple Box Plots
From the box plot we generated using Rotten Tomatoes ratings, we can conclude that:

the bottom 25% of user ratings range from around 1 to 2.5
the top 25% of of user ratings range from around 4 to 4.6
To compare the lower and upper ranges with those for the other columns, we need to generate multiple box-and-whisker diagrams in the same box plot. When selecting multiple columns to pass in to Axes.boxplot(), we need to use the values accessor to return a multi-dimensional numpy array:

In [None]:
ax.boxplot(norm_reviews[num_cols].values)
num_cols = ['RT_user_norm', 'Metacritic_user_nom', 'IMDB_norm', 'Fandango_Ratingvalue', 'Fandango_Stars']
ax.boxplot(norm_reviews[num_cols].values)