## Agenda
>- **Interfaces of Matplotlib**<BR>
>- **Deep dive into the pyplot interface**<BR>
>- **Configuring Matplotlib(improving asthetics of the plots)**<BR>
>- **Advance use of Matplotlib**</br>
        .Interactive plots
        .Maps
        .Statistical Analysis (We will revist the Stats again!)
        .Matplotlib (if we got time I will compare and contrast Bokeh!)

### The greatest value of picture is when it forces us to notice what we never expected to see
                                                                                -- John Tukey

In [None]:
!pip install matplotlib

In [None]:
%matplotlib inline
import matplotlib.pyplot as plt

from IPython.display import set_matplotlib_formats
set_matplotlib_formats('retina')

## The Census Income Data Set

Now that we have our notebook properly set up to work with matplotlib, it's time to pull in some data and get our hands dirty.

The University of California at Irvine maintains a repository of over 300 real world data sets for use in testing machine learning algorithms. This repository is a fantastic resource since it allows us to play around with a relatively small, real world dataset while ignoring all of the cumbersome pre-processing steps you'd normally have to perform before getting a chance to explore the data.

In this quick example of matplotlib in action, we'll be using the ["Census Income"][1] data set, also known as the ["Adult"][2] data set.  ** The stated purpose of this data set is to predict whether or not an individual makes more than \$50k a year based on data gathered during the 1994 Census** . While we won't be using this data set to test a predictive algorithm, we can still make use of the data to find some interesting insights. 

[1]: http://archive.ics.uci.edu/ml/datasets/Census+Income
[2]: http://archive.ics.uci.edu/ml/datasets/Adult



In [None]:
import csv
import urllib2
from collections import namedtuple

In [None]:
# Create a namedtuple constructor for each record in the Census data
fields = ('age', 
          'workclass', 
          'fnlwgt', 
          'education', 
          'education_num', 
          'marital_status', 
          'occupation', 
          'relationship', 
          'race', 
          'sex', 
          'capital_gain', 
          'capital_loss', 
          'hours_per_week', 
          'native_country', 
          'target')
CensusRecord = namedtuple('CensusRecord', fields)

In [None]:
# Download and read in the data from the UCI Machine Learning Repository
response = urllib2.urlopen('http://archive.ics.uci.edu/ml/machine-learning-databases/adult/adult.data')
adult_data_csv = response.read().strip()

In [None]:
# Convert each record into a format that's easier to work with (i.e.,
# wrap each record in the namedtuple that we created).
data = []
for row in csv.reader(adult_data_csv.splitlines()):
    data.append(CensusRecord(
        age              = int(row[0]),
        workclass        = row[1].strip(),
        fnlwgt           = float(row[2].strip()),
        education        = row[3].strip(),
        education_num    = int(row[4]),
        marital_status   = row[5].strip(),
        occupation       = row[6].strip(),
        relationship     = row[7].strip(),
        race             = row[7].strip(),
        sex              = row[9].strip(),
        capital_gain     = int(row[10]),
        capital_loss     = int(row[11]),
        hours_per_week   = int(row[12]),
        native_country   = row[13].strip(),
        target           = row[14].strip()))

## Exploring the Data Set


A histogram is a fantastic visualization for getting a feel for the distribution of your data set but should be used with **continous data** and bar chart is for discrete data.

In [None]:
from collections import Counter
import numpy as np

In [None]:
# Calculate the frequency count for each education level.
# counter takes iterable object and does count of each value in it.
freqs = Counter(r.education_num for r in data)

In [None]:
freqs

In [None]:
# Draw the bar chart/histogram
plt.figure(figsize=(10, 4))
plt.bar(freqs.keys(), freqs.values(), width=1);

In [None]:
# Calculate the frequency count for each education level. Our 
# keys in the Counter object will be a tuple of the form 
# (number, name) so it will be possible to sort the keys.
freqs = Counter((r.education_num, r.education) for r in data)

In [None]:
# Create a list of names sorted by the education level number
names = [name for _, name in sorted(freqs.keys())]

In [None]:
# Create a list of counts in the same order as the names
counts = [freqs[key] for key in sorted(freqs.keys())]

In [None]:
# An array containing the x coordinates of the left sides of the bars in the chart
left = range(len(names))

In [None]:
# Draw the bar chart/histogram
plt.figure(figsize=(10, 4))

# Plot the counts at the given x
plt.bar(left, counts, width=1)

# Change the x-axis ticks to the education level name. To make it
# easier to read, we rotate the labels 90 degrees and move them to
# the center of the bar by adding 0.5 to each value in the left 
# array (we chose 0.5 because we set the width of each bar to 1 earlier).
plt.xticks(left, names);

In [None]:
import numpy as np

# Calculate the frequency count for each education level. Our 
# keys in the Counter object will be a tuple of the form 
# (number, name) so it will be possible to sort the keys.
freqs = Counter((r.education_num, r.education) for r in data)
# Create a list of names sorted by the education level number
names = [name for _, name in sorted(freqs.keys())]
# Create a list of counts in the same order as the names
counts = [freqs[key] for key in sorted(freqs.keys())]
# An array containing the x coordinates of the left sides of the bars in the chart
left = np.arange(len(names))

# Draw the bar chart/histogram
plt.figure(figsize=(10, 4))
# Plot the counts at the given x
plt.bar(left, counts, width=1)
# Change the x-axis ticks to the education level name. To make it
# easier to read, we rotate the labels 90 degrees and move them to
# the center of the bar by adding 0.5 to each value in the left 
# array (we chose 0.5 because we set the width of each bar to 1 earlier).
plt.xticks(left + 0.5, names, rotation=90);