# matplotlib:  the chart plotting library

<U>Notes if you are using Jupyter Notebook</U>:  to call <B>exit()</B> from a notebook, please use <B>sys.exit()</B> (requires <B>import sys</B>); if a strange error occurs, it may be because Jupyter retains variables from all executed cells.  To reset the notebook's variables, click 'Restart Kernel' (the circular arrow) -- this will not undo any text changes.  

## Documentation and Study

Matplotlib documentation can be found here:  <A HREF="http://matplotlib.org/">http://matplotlib.org/</A>

A very good rundown of features is in the <A HREF="http://opencarts.org/sachlaptrinh/pdf/28232.pdf">Python for Data Analysis 2nd Edition PDF, Chapter 9</A>

A clear tutorial on the central plotting function <B>pyplot</B> (part of which was used for this presentation) can be found here:  <A HREF="https://matplotlib.org/users/pyplot\_tutorial.html">https://matplotlib.org/users/pyplot\_tutorial.html</A>

## importing matplotlib and seaborn; using Jupyter notebook

###  Get started with charting tools.  

In [None]:
# visualize in Jupyter notebook
%matplotlib inline

import matplotlib.pyplot as plt   # access matplotlib
import seaborn as sns             # access seaborn

sns.set()                         # apply seaborn default styles and scaling

<B>matplotlib</B> is the standard Python charting library, tightly integrated with pandas.

<B>seaborn</B> is a library built on top of matplotlib, providing support for statistical analysis and offering attractive styles.  <B>import seaborn</B> and <B>sns.set()</B> change matplotlib styles.

<B>%matplotlib inline</B> is a Jupyter notebook "magic" command that will allow Notebook to display chart plots in the notebook; otherwise, we would need to save our charts to an image file.

## Plotting from matplotlib with plt

###  <B>plt</B> can be used to control the "current" chart (or <I>subplot</I>)

This approach uses the "default" <I><B>subplot</B></I> and <I><B>figure</B></I> objects.

A <I><B>subplot</B></I> is a chart:  line, bar, scatter, etc.

A <I><B>figure</B></I> is an overall image:  it may contain multiple subplots.  

<B>SPECIAL NOTE</B> in many online examples, "subplots" (actual line or bar charts) are referred to as "axes" and subplot objects are named <B>ax</B>.  This is to be distinguished from chart axes (x and y).  Examples in this notebook follow this convention in order to condition your thinking to existing examples online.  <B>Remember, <I>ax</I> usually refers to a <I>subplot</I></B>.  

In [None]:
import matplotlib.pyplot as plt

line_1_data = [1, 2, 4, 6,  8, 10, 12,  14,  16]
line_2_data = [1, 2, 4, 8, 16, 32, 64, 128, 256]

# plot lines to current subplot
plt.plot(line_1_data, label='constant')
plt.plot(line_2_data, 'r--', label='geometric')  # red, dashed


# TITLE
plt.title('A Very Special Relationship')

# AXIS LABELS
plt.xlabel('time')
plt.ylabel('level')

# TICKS
plt.xticks(ticks=[0, 2, 4, 6, 8])
plt.yticks(ticks=[0, 50, 100, 150, 200, 250])

# GRIDLINES
plt.grid(True)

# LEGEND
plt.legend();

plt.savefig('chart.png')    # save figure to image file

<br>


In [None]:
plt.clf()                             # clear the figure
plt.plot([1, 5, 3, 7, 5, 9], 'g-')    # green, solid - see line styles

plt.savefig('chart.png')     # save figure to image file

## Plotting from matplotlib with figure and subplot objects

### This approach allows for multiple subplots (charts) within a figure (chart image).   

<B>SPECIAL NOTE</B> in many online examples, "subplots" (actual line or bar charts) are referred to as "axes" and subplot objects are named <B>ax</B>.  This is to be distinguished from chart axes (x and y).  Examples in this notebook follow this convention in order to condition your thinking to existing examples online.  <B>Remember, <I>ax</I> usually refers to a <I>subplot</I></B>.  

In [None]:
import matplotlib.pyplot as plt

fig = plt.figure()         # "figure" object representing overall chart image
ax = fig.add_subplot()     # "subplot" object representing a chart

ax.plot([1, 2, 3, 4, 5], label='constant')   # plot a line on the subplot
ax.plot([1, 4, 6, 8, 10], label='doubling')  # plot a line on the subplot

# TITLE (see 'text' for more options)
ax.title.set_text('Awesome Chart!!')

# AXIS LABELS
ax.set_xlabel('time')
ax.set_ylabel('level')

# TICK VALUES
ax.set_xticks(ticks=[0, 1, 2, 3, 4])
ax.set_yticks(ticks=[0, 2, 4, 6, 8, 10])

# TICK LABELS
ax.set_xticklabels(['12:00', '12:30', '1:00', '1:30', '2:00'])

# GRIDLINES
plt.grid(True)

## Saving multiple plots within a figure

### We may prefer to work with the <B>figure</B> object in order to create multiple plots within a figure.

In [None]:
import matplotlib.pyplot as plt

fig = plt.figure()              # 'figure' representing overall chart
ax1 = fig.add_subplot(2, 1, 1)  # specifies  2 subplots, this one at row 1, col 1
ax2 = fig.add_subplot(2, 1, 2)  # reiterates 2 subplots, this one at row 1, col 2

line_1_data =  [1, 2, 3,  4,  5,  6,  7,  8,  9]
doubled_data = [1, 2, 6,  8, 10, 12, 14, 16, 18]
squared_data = [1, 2, 9, 16, 25, 36, 49, 64, 81]

ax1.plot(line_1_data, label='constant')   # plot a line on subplot 1
ax1.plot(doubled_data, label='doubling')  # same

ax2.plot(line_1_data, label='constant')   # plot a line on subplot 2
ax2.plot(squared_data, label='squaring')  # same

# AXIS LABELS
ax1.set_xlabel('time')
ax1.set_ylabel('level')

ax1.set_xlabel('time')
ax1.set_ylabel('level')

# TICKS
ax1.set_xticks(ticks=[0, 1, 2, 3, 4, 5, 6, 7, 8])
ax1.set_yticks(ticks=[0, 5, 10, 15, 20])

ax2.set_xticks(ticks=[0, 1, 2, 3, 4, 5, 6, 7, 8])
ax2.set_yticks(ticks=[0, 20, 40, 60, 80, 100])


# LEGEND
ax1.legend()
ax2.legend();

## plotting from pandas DataFrame

###  pandas' tight integration with matplotlib allows for easy chart plotting from pandas DataFrames and Series.  

In [None]:
import pandas as pd
import numpy as np

df = pd.DataFrame({ 'constant': [1, 2, 3,  4,  5,  6,  7,  8,  9],
                    'doubled':  [1, 2, 6,  8, 10, 12, 14, 16, 18] },
                    index=[0, 1, 2, 3, 4, 5, 6, 7, 8])

df.plot();          # default - 'line'
#df.plot.line()     # same; also df.plot.bar(), df.plot.barh(), etc.

print(df)
                    #    constant   doubled
                    # 0  1   1
                    # 1  2   2
                    # 2  3   6
                    # 3  4   8
                    # 4  5  10
                    # 5  6  12
                    # 6  7  14
                    # 7  8  16
                    # 8  9  18


plt.savefig('chart.tiff')    # save "current" figure to image file

In [None]:
df = pd.DataFrame({ 'constant': [1, 2, 3,  4,  5,  6,  7,  8,  9],
                    'squared':  [1, 2, 9, 16, 25, 36, 49, 64, 81] },
                    index=[0, 1, 2, 3, 4, 5, 6, 7, 8])

ax = df.plot(title='Awesome!',
             xticks=[1, 3, 5, 7, 9],        # default is index
#             use_index=True,
             yticks=[20, 40, 60, 80, 100],
             fontsize=15,                   # TICK font
             rot=45,                        # TICK rotation
             xlim=(0, 10),                  # X AXIS SCALE limits
             ylim=(0, 120),                 # Y AXIS SCALE limits
             xlabel='Joe',
             ylabel='Pete',
             legend=True,                   # LEGEND:    True is default
             grid=True,                     # GRIDLINES: True is default
             style=['r-', 'b--']            # LINE STYLES list (can also be dict)
     );

# USING the SUBPLOT OBJECT here, to set y tick labels
ax.set_yticklabels(['a', 'b', 'c', 'd', 'e']);

<H4>Retrieving the 'subplot' object from a DataFrame plot</H4>

We may need to change styles on a DataFrame plot - all we need to do is retrieve the subplot object and change its attributes - even if <B>.plot()</B> has been called.  

In [None]:
ax = df.plot.line()
fig = ax.get_figure()

## Setting tick intervals

###  If matplotlib's default intervals are not suitable, some manual intervention is required.

Matplotlib attempts to set the most reasonable tick intervals but sometimes screws it up.  We may wish to set them directly; we may have to calculate the range and spacing ourselves and set them manually.

In [None]:
df = pd.DataFrame({ 'constant': [1, 2, 3,  4,  5,  6,  7,  8,  9],
                    'squared':  [1, 2, 9, 16, 25, 36, 49, 64, 81] },
                    index=[0, 1, 2, 3, 4, 5, 6, 7, 8])

ax = df.plot(title='Awesome!',
     yticks=range(20, 101, 20),     # [20, 40, 60, 80, 100]
#     xticks=range(11),             # [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
     xticks=np.linspace(0, 10, 20, endpoint=False));  # [0, 0.5, 1..10]

We can always set the specific numbers we wish to see and they will be duly applied:

In [None]:
yticks=range(20, 101, 20),     # [20, 40, 60, 80, 100]

Sometimes however we may want to set a range, with or without a step:

In [None]:
# xticks=range(0, 11)               # [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
xticks=range(0, 11, 2),           # [0, 2, 4, 6, 8, 10]

To choose a fractional interval in a range, use <B>np.linspace()</B>

In [None]:
np.linspace(0, 10, 20, endpoint=False)  # [0, 0.5, 1, 1.5 ... 9]

np.linspace(1, 10, 19, endpoint=True)   # [1, 1.5, 2, 2.5 ... 10]

The above <B>linspace()</B> arguments read "20 data points between 0 and 9 (not including 10)" and "19 data points between 1 and 10 (including 10)"

The range may also be based on the available data:

In [None]:
yticks=range(0, len(df))                    # [0..# of df values]
xticks=range(min(df.index), max(df.index))  # from highest to lowest in index

This example shows how an index set from a datetime64 column intelligently sets x axis tick labels (not always guaranteed):

In [None]:
df = pd.read_csv('data/weather_newyork.csv')
df.EST = df.EST.astype('datetime64')
df = df.set_index('EST')                   # now a DateTimeIndex
ax = df['Mean TemperatureF'].plot(rot=30,
                                  title='Mean Temp NYC 2016')
ax.set_ylabel('Temp (F)')
ax.set_xlabel(None);

## Text Styling

Text styling can be applied to any text generating method

In [None]:
import pandas as pd

df = pd.DataFrame({ 'a': [1, 2, 3, 4, 5],
                    'b': [1, 3, 6, 8, 10] })

ax = df.plot()

ax.set_title('Hey!!', size=40, backgroundcolor='grey')

ax.set_ylabel('Getting warmer...', rotation=45, color='red')

# size=10|smaller|x-large   # font size
# position=(x,y)
# rotation=degrees|vertical|horizontal
# backgroundcolor=name|hex
# color=name|hex
# family=serif|sans-serif|cursive|fantasy|monospace
# style=normal|italic|oblique
# weight=normal|bold|heavy|light|ultrabold|ultralight
# ha=center|left|right
# va=top|bottom|center|baseline|center_baseline
# visible=True|False
# linespacing=float
# fontname='Courier'

More information can be found here:<br><br>
<A HREF="https://matplotlib.org/3.1.1/tutorials/text/text_props.html">text properties tutorial</A><br><br>
<A HREF="https://matplotlib.org/3.1.1/api/text_api.html#matplotlib.text.Text">matplotlib Text() object</A>

##  Saving the figure, setting the figure size

###  <B>savefig()</B> on the <B>plt</B> module or <B>Figure</B> objects saves the figure to an image file.  

The <B>dpi=</B> argument can control the size through 'dots per inch'.

The extension of the filename determines what type of image is saved -- see options below.


In [None]:
# when plotting through plt or DataFrame
plt.savefig('mychart.png', dpi=400)

# when plotting through a Figure object (or Figure from DataFrame:
fig.savefig('thischart.jpg', dpi=400);

In [None]:
import json
print(json.dumps(fig.canvas.get_supported_filetypes(), indent=4))

In [None]:
help(plt.savefig)

### bar chart

In [None]:
df = pd.Series({ 'Toronto': 5.4,
                 'Montreal': 3.5,
                 'Vancouver': 2.2,
                 'Calgary': 1.2    },
                name='population')
ax = df.plot(kind='bar', rot=0);
ax.set_ylabel('pop. (MM)');

### clustered bar

In [None]:
a=np.array([[4,8,5,7,6],[2,3,4,2,6],[4,7,4,7,8],[2,6,4,8,6],[2,4,3,3,2]])
df=pd.DataFrame(a, columns=['a','b','c','d','e'], index=[2,4,6,8,10])
print(df)
ax = df.plot.bar();

###  horizontal bar ('barh') chart

In [None]:
df = pd.DataFrame({'lab':['A', 'B', 'C'], 'val':[10, 30, 20]})
ax = df.plot.barh(x='lab', y='val')

#### plot a DataFrame to horizontal bar

In [None]:
speed = [0.1, 17.5, 40, 48, 52, 69, 88]
lifespan = [2, 8, 70, 1.5, 25, 12, 28]
index = ['snail', 'pig', 'elephant',
         'rabbit', 'giraffe', 'coyote', 'horse']
df = pd.DataFrame({'speed': speed,
                   'lifespan': lifespan}, index=index)
ax = df.plot.barh()

###  box plot:  distribution of values in quartiles.  

In [None]:
np.random.seed(1234)
df = pd.DataFrame(np.random.randn(10,4),
                  columns=['Col1', 'Col2', 'Col3', 'Col4'])
ax = df.boxplot(column=['Col1', 'Col2', 'Col3'])

###  histogram

When we draw a dice 6000 times, we expect to get each value around 1000 times. But when we draw two dices and sum the result, the distribution is going to be quite different. A histogram illustrates those distributions.

In [None]:
df = pd.DataFrame(
np.random.randint(1, 7, 6000),
columns = ['one'])
df['two'] = df['one'] + np.random.randint(1, 7, 6000)
ax = df.plot.hist(bins=12, alpha=0.5)

###  pie plot

In [None]:
df = pd.DataFrame({'mass': [0.330, 4.87 , 5.97],
                   'radius': [2439.7, 6051.8, 6378.1]},
                   index=['Mercury', 'Venus', 'Earth'])
ax = df.plot.pie(y='mass', figsize=(5, 5))

###  scatter plot

In [None]:
df = pd.DataFrame([[5.1, 3.5, 0], [4.9, 3.0, 0], [7.0, 3.2, 1],
                   [6.4, 3.2, 1], [5.9, 3.0, 2]],
                  columns=['length', 'width', 'species'])
ax = df.plot.scatter(x='length',
                     y='width',
                     c='DarkBlue')

## documentation for further study