# Basic Plotting with matplotlib

#### Alberto Cario : The functional Art
#### Edward R Tufte : The visual Display of Quantative Information 

### Alberto Cario's Visualization wheel for design

#### Complex & Deeper wheel [ Generally Used by Scientist and Engineers]

    Abstraction
    Functionality
    Density
    Multidimensionality
    Originality
    Novelty

#### Intelligible & Shallower [Generally used by Artist and Journalist]

    Figuration
    Decoration
    Lightness
    Unidimensionality
    Familiarity
    Redundancy

#### Graphical heuristics 
It is not a procedure or a science to be followed instead it is a convention of practise while plotting the data

Graphical heuristics is broadly classified to two sub categories
    
    Data Ink Ratio
    Chart Junk

The Data ink ratio stress on removing the unnecessary data from the plot in order to have high data ink ratio.
The approach here to is to have high data ink ratio this is achieved by removing those elements from the plot which don't add any value or information to the plot.
Ex : background colour, borders, grids, colours to the observations, legends etc...


Matplotlib is a powerful open source for tool kit for represnting and Visualization of data, Matplotlib is created by John Hunter 

To enable web based rendering we make use of 

    %matplotlib notebook

There are many ways to render the output of a matplotlib, since we are using a web based noteook jupyter , here we are making use of the ipython magic %matplotlib notebook.
Remember that in the Jupyter Notebook, the IPython magics are just helper functions which set up the environment so that the web based rendering can be enabled. 

### Matplotlib Architecture

#### Backend

    Deals with the rendering of plots to the screen or files
    In jupyter notebook we use the inline backend
    There are also backends called hard copy backends, which support rendering to graphics formats, like scalable vector grapics, SVGs, or PNGs. 
    
#### Artist layer

    Contains containers such as Figure, Subplot and Axes
    Contains premitives such as Line 2D and rectangle, and collections, such as a pathCollection
    
#### Scripting layer
    
    Simplifies the access to the artist and backend layers i.e. pyplot
    The pyplot scripting layer is a procedural method for building a visualization

In [1]:
%matplotlib notebook

In [2]:
import matplotlib as mpl
mpl.get_backend()

'nbAgg'

In [3]:
import matplotlib.pyplot as plt
plt.plot?

In [4]:
plt.plot(3,2)

<IPython.core.display.Javascript object>

[<matplotlib.lines.Line2D at 0x8604a90>]

In [5]:
plt.plot(3, 2, '*')

[<matplotlib.lines.Line2D at 0x8ba99f0>]

Here the plt.plot function takes *args i.e. variable number of arguments but in the pairs of X and Y.

The third argument we are passing to the function will be a string that and it will represent the data point.

The interactive back end is because of %matplotlib notebook and other backend like inline uses %matplotlib inline magic instead of the %matplotlib notebook magic. The inline magic is not interactive and creates a new plot as new cells in the notebook.

The scripting layer pyplot is managing a lot of objects. It keeps track of the latest figure, of sub plots and of the axis objects etc so the need to interacting with the artist layer is not required and in this pyplot module does all the magic needed for plotting.

In [6]:
plt.plot(2.9,1.77, '.')

[<matplotlib.lines.Line2D at 0x8ba9a30>]

Let's see how to make a plot without using the scripting layer.

In [7]:
# First let's set the backend without using mpl.use() from the scripting layer
from matplotlib.backends.backend_agg import FigureCanvasAgg
from matplotlib.figure import Figure

# create a new figure
fig = Figure()

# associate fig with the backend
canvas = FigureCanvasAgg(fig)

# add a subplot to the fig
ax = fig.add_subplot(111)

# plot the point (3,2)
ax.plot(3, 2, '.')

# save the figure to test.png
# you can see this figure in your Jupyter workspace afterwards by going to
# https://hub.coursera-notebooks.org/
canvas.print_png('test.png')

In [9]:
# create a new figure
plt.figure()

# plot the point (3,2) using the circle marker
plt.plot(3, 2, '-o')

# get the current axes
ax = plt.gca()

# Set axis properties [xmin, xmax, ymin, ymax]
ax.axis([0,6,0,10])

<IPython.core.display.Javascript object>

[0, 6, 0, 10]

In [11]:
# create a new figure
plt.figure()

# plot the point (1.5, 1.5) using the circle marker
plt.plot(1.5, 1.5, 'o')
# plot the point (2, 2) using the circle marker
plt.plot(2, 2, '.')
# plot the point (2.5, 2.5) using the circle marker
plt.plot(2.5, 2.5, '*')
k = plt.gca()
k.axis([0,5,0,5])

<IPython.core.display.Javascript object>

[0, 5, 0, 5]

In [12]:
# get current axes
ax = plt.gca()
ax.axis([0,6,0,6])
# get all the child objects the axes contains
ax.get_children()

[<matplotlib.lines.Line2D at 0x73a2df0>,
 <matplotlib.lines.Line2D at 0x73a2eb0>,
 <matplotlib.lines.Line2D at 0x73a8830>,
 <matplotlib.spines.Spine at 0x737e110>,
 <matplotlib.spines.Spine at 0x737ae90>,
 <matplotlib.spines.Spine at 0x737afd0>,
 <matplotlib.spines.Spine at 0x737e230>,
 <matplotlib.axis.XAxis at 0x737e330>,
 <matplotlib.axis.YAxis at 0x73838d0>,
 <matplotlib.text.Text at 0x7390c30>,
 <matplotlib.text.Text at 0x7390c70>,
 <matplotlib.text.Text at 0x7390cb0>,
 <matplotlib.patches.Rectangle at 0x7390cd0>]

### Scatterplots

In [13]:
import numpy as np

x = np.array([1,2,3,4,5,6,7,8])
y = x

plt.figure()
plt.scatter(x, y) # similar to plt.plot(x, y, '.'), but the underlying child objects in the axes are not Line2D

<IPython.core.display.Javascript object>

<matplotlib.collections.PathCollection at 0x73dae50>

In [14]:
import numpy as np

x = np.array([1,2,3,4,5,6,7,8])
y = x

# create a list of colors for each point to have
# ['green', 'green', 'green', 'green', 'green', 'green', 'green', 'red']
colors = ['green']*(len(x)-1)
colors.append('red')

plt.figure()

# plot the point with size 100 and chosen colors
plt.scatter(x, y, s=100, c=colors)

<IPython.core.display.Javascript object>

<matplotlib.collections.PathCollection at 0xa6606b0>

In [17]:
# convert the two lists into a list of pairwise tuples
zip_generator = zip([1,2,3,4,5], [6,7,8,9,10])

print(list(zip_generator))
# the above prints:
# [(1, 6), (2, 7), (3, 8), (4, 9), (5, 10)]

zip_generator = zip([1,2,3,4,5], [6,7,8,9,10])
# The single star * unpacks a collection into positional arguments
print(list(zip(*zip_generator)))
# the above prints:
# (1, 6) (2, 7) (3, 8) (4, 9) (5, 10)

[(1, 6), (2, 7), (3, 8), (4, 9), (5, 10)]
[(1, 2, 3, 4, 5), (6, 7, 8, 9, 10)]


In [18]:
# use zip to convert 5 tuples with 2 elements each to 2 tuples with 5 elements each
print(list(zip((1, 6), (2, 7), (3, 8), (4, 9), (5, 10))))
# the above prints:
# [(1, 2, 3, 4, 5), (6, 7, 8, 9, 10)]


zip_generator = zip([1,2,3,4,5], [6,7,8,9,10])
# let's turn the data back into 2 lists
x, y = zip(*zip_generator) # This is like calling zip((1, 6), (2, 7), (3, 8), (4, 9), (5, 10))
print(x)
print(y)
# the above prints:
# (1, 2, 3, 4, 5)
# (6, 7, 8, 9, 10)

[(1, 2, 3, 4, 5), (6, 7, 8, 9, 10)]
(1, 2, 3, 4, 5)
(6, 7, 8, 9, 10)


In [19]:
plt.figure()
# plot a data series 'Tall students' in red using the first two elements of x and y
plt.scatter(x[:2], y[:2], s=100, c='red', label='Tall students')
# plot a second data series 'Short students' in blue using the last three elements of x and y 
plt.scatter(x[2:], y[2:], s=100, c='blue', label='Short students')

<IPython.core.display.Javascript object>

<matplotlib.collections.PathCollection at 0xa904bb0>

In [20]:
# add a label to the x axis
plt.xlabel('The number of times the child kicked a ball')
# add a label to the y axis
plt.ylabel('The grade of the student')
# add a title
plt.title('Relationship between ball kicking and grades')

<matplotlib.text.Text at 0xa68d7f0>

In [21]:
# add a legend (uses the labels from plt.scatter)
plt.legend()

<matplotlib.legend.Legend at 0x73fbef0>

In [22]:
# add the legend to loc=4 (the lower right hand corner), also gets rid of the frame and adds a title
plt.legend(loc=4, frameon=False, title='Legend')

<matplotlib.legend.Legend at 0xa90bc70>

In [23]:
# get children from current axes (the legend is the second to last item in this list)
plt.gca().get_children()

[<matplotlib.collections.PathCollection at 0xa904490>,
 <matplotlib.collections.PathCollection at 0xa904bb0>,
 <matplotlib.spines.Spine at 0xa677b90>,
 <matplotlib.spines.Spine at 0xa677950>,
 <matplotlib.spines.Spine at 0xa677a70>,
 <matplotlib.spines.Spine at 0xa677cb0>,
 <matplotlib.axis.XAxis at 0xa677db0>,
 <matplotlib.axis.YAxis at 0xa6825b0>,
 <matplotlib.text.Text at 0xa68d7f0>,
 <matplotlib.text.Text at 0xa68d650>,
 <matplotlib.text.Text at 0xa68d4f0>,
 <matplotlib.legend.Legend at 0xa90bc70>,
 <matplotlib.patches.Rectangle at 0xa68d510>]

In [24]:
# get the legend from the current axes
legend = plt.gca().get_children()[-2]
legend

<matplotlib.legend.Legend at 0xa90bc70>

In [25]:
# you can use get_children to navigate through the child artists
legend.get_children()[0].get_children()[1].get_children()[0].get_children()

[<matplotlib.offsetbox.HPacker at 0xa68ac70>,
 <matplotlib.offsetbox.HPacker at 0xa68ac90>]

In [26]:
# import the artist class from matplotlib
from matplotlib.artist import Artist

def rec_gc(art, depth=0):
    if isinstance(art, Artist):
        # increase the depth for pretty printing
        print("  " * depth + str(art))
        for child in art.get_children():
            rec_gc(child, depth+2)

# Call this function on the legend artist to see what the legend is made up of
rec_gc(plt.legend())

Legend
    <matplotlib.offsetbox.VPacker object at 0x0A66D3F0>
        <matplotlib.offsetbox.TextArea object at 0x0A673E70>
            Text(0,0,'None')
        <matplotlib.offsetbox.HPacker object at 0x0A682910>
            <matplotlib.offsetbox.VPacker object at 0x0A682090>
                <matplotlib.offsetbox.HPacker object at 0x0A673FD0>
                    <matplotlib.offsetbox.DrawingArea object at 0x0A682750>
                        <matplotlib.collections.PathCollection object at 0x0A682EF0>
                    <matplotlib.offsetbox.TextArea object at 0x0A682310>
                        Text(0,0,'Tall students')
                <matplotlib.offsetbox.HPacker object at 0x0A673E90>
                    <matplotlib.offsetbox.DrawingArea object at 0x0A673970>
                        <matplotlib.collections.PathCollection object at 0x0A673890>
                    <matplotlib.offsetbox.TextArea object at 0x0A673950>
                        Text(0,0,'Short students')
    FancyBboxPatch

### Line Plots

In [48]:
import numpy as np

linear_data = np.array([1,2,3,4,5,6,7,8])
exponential_data = linear_data**2

plt.figure()
# plot the linear data and the exponential data
plt.plot(linear_data, '-o',  exponential_data, '-o')

<IPython.core.display.Javascript object>

[<matplotlib.lines.Line2D at 0xb1e9f10>,
 <matplotlib.lines.Line2D at 0xb1ee030>]

In [49]:
# plot another series with a dashed red line
plt.plot([22,44,55], '--r')

[<matplotlib.lines.Line2D at 0xb207a90>]

In [50]:
plt.xlabel('Some data')
plt.ylabel('Some other data')
plt.title('A title')
# add a legend with legend entries (because we didn't have labels when we plotted the data series)
plt.legend(['Baseline', 'Competition', 'Us'])

<matplotlib.legend.Legend at 0xb2076f0>

In [51]:
# fill the area between the linear data and exponential data
plt.gca().fill_between(range(len(linear_data)), 
                       linear_data, exponential_data, 
                       facecolor='blue', 
                       alpha=0.25)

<matplotlib.collections.PolyCollection at 0xb2132b0>

In [52]:
plt.figure()

observation_dates = np.arange('2017-01-01', '2017-01-09', dtype='datetime64[D]')

plt.plot(observation_dates, linear_data, '-o',  observation_dates, exponential_data, '-o')

<IPython.core.display.Javascript object>

[<matplotlib.lines.Line2D at 0x71e2b70>,
 <matplotlib.lines.Line2D at 0x71d39b0>]

In [54]:
import pandas as pd

plt.figure()
observation_dates = np.arange('2017-01-01', '2017-01-09', dtype='datetime64[D]')
observation_dates = list(map(pd.to_datetime, observation_dates)) # trying to plot a map will result in an error
plt.plot(observation_dates, linear_data, '-o',  observation_dates, exponential_data, '-o')

<IPython.core.display.Javascript object>

[<matplotlib.lines.Line2D at 0xcb34dd0>,
 <matplotlib.lines.Line2D at 0xcb49810>]

In [55]:
x = plt.gca().xaxis

# rotate the tick labels for the x axis
for item in x.get_ticklabels():
    item.set_rotation(45)

In [56]:
# adjust the subplot so the text doesn't run off the image
plt.subplots_adjust(bottom=0.25)

In [57]:
ax = plt.gca()
ax.set_xlabel('Date')
ax.set_ylabel('Units')
ax.set_title('Exponential vs. Linear performance')

<matplotlib.text.Text at 0xcb36610>

In [58]:
# you can add mathematical expressions in any text element
ax.set_title("Exponential ($x^2$) vs. Linear ($x$) performance")

<matplotlib.text.Text at 0xcb36610>

In [65]:
x = np.array([1,2,3,4,5])
y = np.array([6,7,8,3,10])
plt.figure()
plt.plot(x,y,'*')
myplot = plt.gca()
myplot.axis([0,6,0,11])

<IPython.core.display.Javascript object>

[0, 6, 0, 11]

In [66]:
x = np.array([1,2,3,4,5])
y = np.array([6,7,8,3,10])
plt.figure()
plt.plot(x,y,'-*')
myplot = plt.gca()
myplot.axis([0,6,0,11])

<IPython.core.display.Javascript object>

[0, 6, 0, 11]

The difference betwen a regular plot and a line plot is the third argument a string, if the - is prefixed to a string then it will draw lines between points, provided there are more than one value for  and y

### Bar Charts

In [89]:
plt.figure()
xvals = range(len(linear_data))
plt.bar(xvals, linear_data, width = 0.3)

<IPython.core.display.Javascript object>

<Container object of 8 artists>

In [90]:
new_xvals = []

# plot another set of bars, adjusting the new xvals to make up for the first set of bars plotted
for item in xvals:
    new_xvals.append(item+0.3)

plt.bar(new_xvals, exponential_data, width = 0.3 ,color='pink')

<Container object of 8 artists>

In [91]:
mydata = [3,2,6,2,6,8,9,3]
another = []
for i in xvals:
    another.append(i+0.6)
    
plt.bar(another, mydata, width = 0.3, color = 'yellow')

<Container object of 8 artists>

In [93]:
plt.figure()
xvals = range(len(linear_data))
plt.bar(xvals, linear_data, width = 0.3)

new_xvals = []

# plot another set of bars, adjusting the new xvals to make up for the first set of bars plotted
for item in xvals:
    new_xvals.append(item+0.3)

plt.bar(new_xvals, exponential_data, width = 0.3 ,color='pink')

from random import randint
linear_err = [randint(0,15) for x in range(len(linear_data))] 

# This will plot a new set of bars with errorbars using the list of random error values
plt.bar(xvals, linear_data, width = 0.3, yerr=linear_err)

<IPython.core.display.Javascript object>

<Container object of 8 artists>

In [94]:
# stacked bar charts are also possible
plt.figure()
xvals = range(len(linear_data))
plt.bar(xvals, linear_data, width = 0.3, color='b')
plt.bar(xvals, exponential_data, width = 0.3, bottom=linear_data, color='r')

<IPython.core.display.Javascript object>

<Container object of 8 artists>

In [95]:
# or use barh for horizontal bar charts
plt.figure()
xvals = range(len(linear_data))
plt.barh(xvals, linear_data, height = 0.3, color='b')
plt.barh(xvals, exponential_data, height = 0.3, left=linear_data, color='r')

<IPython.core.display.Javascript object>

<Container object of 8 artists>

###  Subplots

In [96]:
%matplotlib notebook

import matplotlib.pyplot as plt
import numpy as np

plt.subplot?

Subplot takes three argument 
    
    1st is the number of rows
    2nd is the number of columns
    3rd is the Current axis
    So the gird will be divided accordingly, i.e. if the command is 
    plt.plot(2,2,1), then totally 4 plots can be displayed with 2 in 1st row and 2 in 2nd row, with current axis pointing to the 1st plot

In [101]:
plt.figure()
# subplot with 1 row, 2 columns, and current axis is 1st subplot axes
plt.subplot(1, 2, 1)

linear_data = np.array([1,2,3,4,5,6,7,8])

plt.plot(linear_data, '-o')

<IPython.core.display.Javascript object>

[<matplotlib.lines.Line2D at 0xf112530>]

In [102]:
exponential_data = linear_data**2 

# subplot with 1 row, 2 columns, and current axis is 2nd subplot axes
plt.subplot(1, 2, 2)
plt.plot(exponential_data, '-o')

[<matplotlib.lines.Line2D at 0xf0f3b90>]

To modify any plot previously created just make that as the current plot by using plt.subplot(nrows, ncolumns, plot number)
    
    plt.subplot(1,2,1) in the above case to make any changes to the 1st plot

In [104]:
# plot exponential data on 1st subplot axes
plt.subplot(1, 2, 1)
plt.plot(exponential_data, '-x')

[<matplotlib.lines.Line2D at 0xf06bdd0>]

Using the code in above cell we added the exponential data points to the same plot in which linear data was there previously, but it needs to be noted that the y axis of the 1st plot was modified to accomadate exponential data points and there by the plots are venurable to change if added with other data points and in some cases it may not be desired to change the axis value and in those cases sharing of x or y or both the axes can be done

In [107]:
plt.figure()
ax1 = plt.subplot(1, 2, 1)
plt.plot(linear_data, '-o')
# pass sharey=ax1 to ensure the two subplots share the same y axis
ax2 = plt.subplot(1, 2, 2, sharey=ax1)
plt.plot(exponential_data, '-x')

<IPython.core.display.Javascript object>

[<matplotlib.lines.Line2D at 0xf2fd250>]

In [125]:
plt.figure()
# the right hand side is equivalent shorthand syntax
plt.subplot(1,2,1) == plt.subplot(121)

<IPython.core.display.Javascript object>

True

The right hand side is equivalent shorthand syntax for subplot with number to rows and columns in sub plot restricted to single digit 

In [132]:
# create a 3x3 grid of subplots
fig, ((ax1,ax2,ax3), (ax4,ax5,ax6), (ax7,ax8,ax9)) = plt.subplots(3, 3, sharex=True, sharey=True)
# plot the linear_data on the 5th subplot axes 
ax5.plot(linear_data, '-')

<IPython.core.display.Javascript object>

[<matplotlib.lines.Line2D at 0x1505bfb0>]

In [133]:
# set inside tick labels to visible
for ax in plt.gcf().get_axes():
    for label in ax.get_xticklabels() + ax.get_yticklabels():
        label.set_visible(True)

In [134]:
# necessary on some systems to update the plot
plt.gcf().canvas.draw()

In [135]:
#plt.figure()
plt.subplots(3, 3, sharex=True, sharey=True)

<IPython.core.display.Javascript object>

(<matplotlib.figure.Figure at 0x15097750>,
 array([[<matplotlib.axes._subplots.AxesSubplot object at 0x14F72B50>,
         <matplotlib.axes._subplots.AxesSubplot object at 0x14FB8250>,
         <matplotlib.axes._subplots.AxesSubplot object at 0x14FD6570>],
        [<matplotlib.axes._subplots.AxesSubplot object at 0x158C7770>,
         <matplotlib.axes._subplots.AxesSubplot object at 0x158F3630>,
         <matplotlib.axes._subplots.AxesSubplot object at 0x15914690>],
        [<matplotlib.axes._subplots.AxesSubplot object at 0x1593E430>,
         <matplotlib.axes._subplots.AxesSubplot object at 0x15944AD0>,
         <matplotlib.axes._subplots.AxesSubplot object at 0x15990190>]], dtype=object))

In [136]:
#plt.figure()
plt.subplots(3, 3, sharex=True, sharey=True)
ax6 = plt.subplot(3,3,6)
x = [1,2,3]
y = [4,5,6]
ax6 = plt.plot(x,y, '-o')
b = plt.gca()
b.axis([0,4,0,7])

<IPython.core.display.Javascript object>

[0, 4, 0, 7]