# Plotting with Matplotlib

In [1]:
%matplotlib notebook

In [2]:
import matplotlib as mpl
mpl.get_backend()

'nbAgg'

In [3]:
import matplotlib.pyplot as plt
plt.plot?

**plt.plot**'s signature : _plt.plot(*args, scalex=True, scaley=True, data=None, ***kwargs)_

The _***args**_ means you can pass unlimited unnamed arguments.

The _****kwargs**_ means you can pass unlimited named arguments.

This makes the _plot_ function very flexible, but also ambiguous when it comes to assigning arguments. 

Basically, the arguments are interpreted as X, Y pairs.

In [4]:
plt.plot(3,2,'.')

<IPython.core.display.Javascript object>

[<matplotlib.lines.Line2D at 0x2666160c308>]

Pyplot is a scripting interface, it keeps track of the latest _**figures, subplots, and axis objects.**_

Pyplot.plot() looks to see if there is a figure that already exists to work on, if not it creates one.

Pyplot.gca: gets current axis. Pyplot.gcf: gets current figure.

In [12]:
plt.figure(figsize=[5, 3])
plt.plot(3,2,'o')
ax = plt.gca()
ax.axis([0, 6, 0, 10])

<IPython.core.display.Javascript object>

[0, 6, 0, 10]

axis() set the x and y limits: _axis([xmin,xmax,ymin,ymax])_

In [8]:
ax=plt.gca()
ax.get_children()

[<matplotlib.lines.Line2D at 0x27b09b2c408>,
 <matplotlib.spines.Spine at 0x27b093ef748>,
 <matplotlib.spines.Spine at 0x27b093ef9c8>,
 <matplotlib.spines.Spine at 0x27b093efe88>,
 <matplotlib.spines.Spine at 0x27b093efc08>,
 <matplotlib.axis.XAxis at 0x27b093ef7c8>,
 <matplotlib.axis.YAxis at 0x27b092a40c8>,
 Text(0.5, 1.0, ''),
 Text(0.0, 1.0, ''),
 Text(1.0, 1.0, ''),
 <matplotlib.patches.Rectangle at 0x27b09b19b88>]

One line object is the data point. 

Spine are renderings of the borders of the frame. 

Rectangle is the bakcground for the axis.

Basically, **plot()** generates a series of lines that gets rendered against an axis object. Pyplot module has other useful methods in the scripting layer like **scatter().**

# Scatterplots

A scatterplot is a 2-dim plot similar to the line plot we've seen.
**scatter()** takes an x-axis value as a first argument, and y-axis value as the second. 

In [13]:
import numpy as np 
x = np.array([1,2,3,4,5,6,7])
y = x

plt.figure()
plt.scatter(x,x)

<IPython.core.display.Javascript object>

<matplotlib.collections.PathCollection at 0x266639a7b08>

**scatter()** doesn't represent items as a series. It's not like every point has a _x, y, name, and color_. 

Instead, we can pass a list of colors to scatter to represent certain points. 

In [14]:
colors=['green']*(len(x)-1)
colors.append('red')

plt.figure()
plt.scatter(x, x, s=100, c=colors)

<IPython.core.display.Javascript object>

<matplotlib.collections.PathCollection at 0x26663d0f188>

### Zip function and list unpacking
the **zip method** takes a number of iterables and creates tuples, matching them based on index.
Zip method returns a generator, to see the results we can use **list()**.

In [15]:
zip_generator = zip([1,2,3,4,5],[6,7,8,9,10])
list(zip_generator)

[(1, 6), (2, 7), (3, 8), (4, 9), (5, 10)]

It's common to store data in **tuples**. It's therefore very important to know to go to and from tuples. 

We can use parameter unpacking with **_zip()_** to turn the tuples back into lists. 

In [16]:
zip_generator = zip([1,2,3,4,5],[6,7,8,9,10])
x,y= zip(*zip_generator)
print(x)
print(y)

(1, 2, 3, 4, 5)
(6, 7, 8, 9, 10)


### Scatterplot: Axis, Legends.

We'll plot the two lists as two data series after slicing them.

In [17]:
plt.figure()
plt.scatter(x[:2], y[:2], s=100, c='red', label='Tall Students')
plt.scatter(x[2:], y[2:], s=100, c='blue', label='Short Students')

<IPython.core.display.Javascript object>

<matplotlib.collections.PathCollection at 0x2666406da48>

**Axis** generally have labels to them. **Charts** have titles as well. **Legends** are artists and can contains children.

In [18]:
plt.xlabel('The number of ball kicks')
plt.ylabel('The grade')
plt.title('Relationship between ball kicking and grades')

Text(0.5, 1, 'Relationship between ball kicking and grades')

In [19]:
plt.legend()

<matplotlib.legend.Legend at 0x266643d02c8>

In [20]:
#to change the corner where the legend is displayed, get rid of the frame and add a title. 
plt.legend(loc=4, frameon=False, title='Legend')

<matplotlib.legend.Legend at 0x266639bfbc8>

In [21]:
plt.gca().get_children()

[<matplotlib.collections.PathCollection at 0x2666406d208>,
 <matplotlib.collections.PathCollection at 0x2666406da48>,
 <matplotlib.spines.Spine at 0x2666403c808>,
 <matplotlib.spines.Spine at 0x2666403c9c8>,
 <matplotlib.spines.Spine at 0x2666403cb88>,
 <matplotlib.spines.Spine at 0x2666403cd48>,
 <matplotlib.axis.XAxis at 0x2666403c788>,
 <matplotlib.axis.YAxis at 0x266640414c8>,
 Text(0.5, 1, 'Relationship between ball kicking and grades'),
 Text(0.0, 1, ''),
 Text(1.0, 1, ''),
 <matplotlib.legend.Legend at 0x266639bfbc8>,
 <matplotlib.patches.Rectangle at 0x2666404f448>]

In [22]:
legend=plt.gca().get_children()[-2]

In [23]:
legend.get_children()[0].get_children()

[<matplotlib.offsetbox.TextArea at 0x2666402d088>,
 <matplotlib.offsetbox.HPacker at 0x2666402d708>]

# Line Plots
Lineplots are created using _**plot()**_. It plots different _series_ of _data points_, connects each series in a point with a line. 

Down below, we'll plot datapoints of two series: _linear_data_ and _quadratic_data_. 

In [24]:
import numpy as np 
import matplotlib.pyplot as plt

linear_data= np.array([1,2,3,4,5,6,7,8])
quadratic_data= linear_data**2

plt.figure()
plt.plot(linear_data, '-o', quadratic_data, '-o')

<IPython.core.display.Javascript object>

[<matplotlib.lines.Line2D at 0x266643fb448>,
 <matplotlib.lines.Line2D at 0x26664403dc8>]

We only gave x-axis to the function _**plot()**_, it was smart to understand that the x-axis can be the indices of the series. <br> Unlike **_scatterplot()_**, we don't have to label the data points. 

To plot a dashed line, we can use the following:

In [25]:
plt.plot([21,44,55],'--r')

[<matplotlib.lines.Line2D at 0x266647366c8>]

In [26]:
plt.xlabel('Some data')
plt.ylabel('Some Other data')
plt.title('A title')
plt.legend(['Europe','Africa','Americas'])

<matplotlib.legend.Legend at 0x266643ea308>

To fill between linear data and quadratic data, we'll use as arguments the same **range** of the data points, **lower bounds** and **upper bounds**, the **color** and **transparency**. 

In [27]:
plt.gca().fill_between(range(len(linear_data)), 
                       linear_data, quadratic_data, 
                       facecolor='blue', 
                       alpha=0.25)

<matplotlib.collections.PolyCollection at 0x26664732d48>

**np.arrange()** is used to sample dates, but they're not well handled as we can see in the graph. <br> We can use Pandas' **to_datetime()** instead. It converts Numpy dates, into standard library dates expected by Matplotlib.

In [28]:
plt.figure()

obs_dates = np.arange('2020-01-01', '2020-01-09', dtype='datetime64[D]')

plt.plot(obs_dates, linear_data, '-o',
        obs_dates, quadratic_data, '-o')

<IPython.core.display.Javascript object>

[<matplotlib.lines.Line2D at 0x2666473ec88>,
 <matplotlib.lines.Line2D at 0x26664781548>]

The tick labels stand for the dates written down. They need to be rotated.

In [29]:
x = plt.gca().xaxis

for item in x.get_ticklabels():
    item.set_rotation(45)

In [30]:
x.get_children()

[Text(0.5, -32.10831969736306, ''),
 Text(1, -30.574754937737733, ''),
 <matplotlib.axis.XTick at 0x26664759fc8>,
 <matplotlib.axis.XTick at 0x26664732fc8>,
 <matplotlib.axis.XTick at 0x266647813c8>,
 <matplotlib.axis.XTick at 0x266618a5a88>,
 <matplotlib.axis.XTick at 0x266618c0588>,
 <matplotlib.axis.XTick at 0x266618c0c88>,
 <matplotlib.axis.XTick at 0x266618b48c8>,
 <matplotlib.axis.XTick at 0x266618c0608>]

In [31]:
plt.subplots_adjust(bottom=0.25)

Matplotlib as heavily based on LaTeX, we can use mathematics formulas in the titles. 

In [32]:
ax = plt.gca()
ax.set_xlabel("Date")
ax.set_ylabel("Units")
ax.set_title("Quadratic vs. Linear performance")

Text(0.5, 1, 'Quadratic vs. Linear performance')

In [33]:
ax.set_title('Quadratic ($x^2$) vs. Linear ($x$) performance')

Text(0.5, 1, 'Quadratic ($x^2$) vs. Linear ($x$) performance')

# Bar Charts
For bar charts, we pass in the parameters of the **x components** and **the height of the bars**.

In [37]:
plt.figure()
xvals= range(len(linear_data))
plt.bar(xvals, linear_data, width = 0.3)

<IPython.core.display.Javascript object>

<BarContainer object of 8 artists>

In [39]:
new_xvals=[]
for item in xvals: 
    new_xvals.append(item+0.3)
plt.bar(new_xvals, quadratic_data, width=0.3, color='red')

<BarContainer object of 8 artists>

It's unconvenient to calculate in the case of multiple bars. Also, to add data to the bar, a manual iteration should be done. <br> 
Plotting several series of data in groups across time is not one of the pyplot bars' strengths. 

We can plot error bars using **yerr** in the **plt.bar()**. We can consider our linear data points to be mean values and we will create random error values, and plot. 

In [44]:
from random import randint 
linear_err = [ randint(0,15) for x in range(len(linear_data))]

In [48]:
plt.figure(figsize=(5.1,3.1))
plt.bar(xvals, linear_data , width = 0.3, yerr=linear_err)

<IPython.core.display.Javascript object>

<BarContainer object of 8 artists>

### Stacked bars
Stacked bar are used for cumulative values while keeping the series independent. <br>
It's done by establishing two plots. The **bottom paramter** of the 2nd plot should be equal to **the first set of data**. 

In [51]:
plt.figure(figsize=(5.1,3.1))
xvals = range(len(linear_data))
plt.bar(xvals, linear_data, width=0.3, color='b')
plt.bar(xvals, quadratic_data, width=0.3, color='r', bottom=linear_data)

<IPython.core.display.Javascript object>

<BarContainer object of 8 artists>

A horizontal stacked bar can be drawn using **plt.barh**, _width_ and _bottom_ should be changed to _height_ and _left_

In [52]:
plt.figure(figsize=(5.1,3.1))
xvals = range(len(linear_data))
plt.barh(xvals, linear_data, height=0.3, color='b')
plt.barh(xvals, quadratic_data, height=0.3, color='r', left=linear_data)

<IPython.core.display.Javascript object>

<BarContainer object of 8 artists>