# Lecture 5 Extra

## matplotlib
- why matplotlib
- the functional approach
- the object-oriented approach

#### Dataset used
- stocks-sp500
---

### Why matplotlib?

`matplotlib` is the primary charting library of Python. It is a massive library, which offers so much, that it can easily become overwhelming. Creating a basic chart is fairly simple, but sometimes just a little customization already requires a deep dive into the API. 

One of the reasons we cover matplotlib here though is that many other libraries are also built on the matplotlib API, and  plotting charts directly from Pandas dataframes is easier if we have a basic understading of matplotlib's mechanics. There are other popular charting packages, such as `seaborn` or `Plotly`, but we think that a real Pythonista should be able to work with matplotlib objects.

In [None]:
import matplotlib.pyplot as plt
%matplotlib inline
import warnings
warnings.filterwarnings('ignore')

A good sumary of the hows and whys of matplotlib can be found here: [https://heartbeat.comet.ml/introduction-to-matplotlib-data-visualization-in-python-d9143287ae39](https://heartbeat.comet.ml/introduction-to-matplotlib-data-visualization-in-python-d9143287ae39). 

There are two ways of creating a matplotlib plot.

### The Functional Approach

In [None]:
x = range(0, 10)
y = [i ** 2 for i in x]

plt.plot(x,y)
plt.title('x-square')
plt.xlabel('x')
plt.ylabel('y')
plt.show()

In [None]:
plt.subplot(1,2,1) # nrows, ncols, index of the next plot starting with index 1 from the top left and increasing to the right
plt.plot(x, y, 'r--') # 'r' stands for red, '--' stands for dash
plt.title('x-squared')
plt.subplot(1,2,2)
plt.plot(y, x, 'g*-')
plt.title('square root x');  # insted of plt.show() you can also use a semicolon to show the plot

Matplotlib color options can be found here: [https://matplotlib.org/stable/gallery/color/named_colors.html](https://matplotlib.org/stable/gallery/color/named_colors.html)

### The Object-oriented Approach

There are two key [components](https://files.realpython.com/media/fig_map.bc8c7cabd823.png) in a Plot; namely, `Figure` and `Axes`.

The `Figure` is the top-level container that acts as the window or page on which everything is drawn. It can contain multiple independent figures, or `Axes`, a subtitle (which is a centered title for the figure), a legend, a color bar, etc.

The `Axes` is the area on which we plot our data and any labels/ticks associated with it. Each Axes has an X-Axis and a Y-Axis

In [None]:
x = range(0, 10)
y = [i ** 2 for i in x]

fig = plt.figure()
axes = fig.add_axes([0.1, 0.1, 0.8, 0.8]) # left, bottom, width, height (range 0 to 1)

axes.plot(x, y, 'r')

axes.set_xlabel('x')
axes.set_ylabel('y')
axes.set_title('Simple x-squared');

We can do a plot within a plot.

In [None]:
fig = plt.figure()

axes1 = fig.add_axes([0, 0, 0.8, 0.8]) # main axes
axes2 = fig.add_axes([0.1, 0.4, 0.4, 0.3]) # inset axes: left and bottom of the lower-left corner, width, height

# main figure
axes1.plot(x, y, 'r')
axes1.set_xlabel('x')
axes1.set_ylabel('y')
axes1.set_title('x-squared')

# insert
axes2.plot(y, x, 'g')
axes2.set_xlabel('y')
axes2.set_ylabel('x')
axes2.set_title('square root x ');

Let's do some stock market price discovery!

In [None]:
import pandas as pd
import numpy as np

In [None]:
df = pd.read_csv('https://osf.io/4pgrf/download')

In [None]:
df.head()

In [None]:
df.shape

In [None]:
df['ref.date'] = pd.to_datetime(df['ref.date'])

In [None]:
df.ticker.unique()

In [None]:
# 2 ways to convert a Pandas column to list
date = list(df[df.ticker == 'MSFT']['ref.date'])
price = df[df.ticker == 'MSFT']['price.close'].tolist()

In [None]:
fig = plt.figure(figsize=(10,6))
ax = fig.add_axes([0,0,1,1])
ax.set_title('MSFT Daily Closing Prices')
ax.plot(date, price)
plt.show();

Adding additional chart elements.
- y-axis limits
- legends

In [None]:
fig = plt.figure(figsize=(10,6))
ax = fig.add_axes([0,0,1,1])
ax.set_title('MSFT Daily Closing Prices')
ax.plot(date, price, label = 'MSFT')
plt.legend(loc = 'upper left')
ax.set_ylim(0,180)
plt.show();

- average line

In [None]:
date = df[df.ticker == 'MSFT']['ref.date'].tolist()
price = df[df.ticker == 'MSFT']['price.close'].tolist()
meanprice = np.mean(price)

In [None]:
fig = plt.figure(figsize=(10,6))
ax = fig.add_axes([0,0,1,1])
ax.set_title('MSFT Daily Closing Prices')
ax.plot(date, price, label = 'MSFT')
ax.hlines(y = meanprice, xmin = date[0], xmax = date[-1], linestyle = '--', label = 'avg')
plt.legend(loc = 'upper left')
plt.show();

- log scale 

In [None]:
fig = plt.figure(figsize=(10,6))
ax = fig.add_axes([0,0,1,1])
ax.set_title('MSFT Daily Closing Prices')
ax.plot(date, price, label = 'MSFT')
ax.hlines(y = meanprice, xmin = date[0], xmax = date[-1], linestyle = '--', label = 'avg')
ax.set_yscale('log')
plt.legend(loc = 'upper left')
plt.show();

- plotting two time series of different scale &#8594; use a secondary axis

In [None]:
date = df[df.ticker == 'MSFT']['ref.date'].tolist()
price_msft = df[df.ticker == 'MSFT']['price.close'].tolist()
price_aapl = df[df.ticker == 'AAPL']['price.close'].tolist()


fig, ax1 = plt.subplots(figsize = (10,6))

ax1.plot(date, price_msft, color = 'k')
ax1.xaxis_date()
ax1.set_ylabel("MSFT", color = 'k')
ax2 = ax1.twinx()
ax2.plot(date, price_aapl, color = "royalblue")
ax2.set_ylabel("AAPL", color = "royalblue")
plt.title('Microsoft and Apple, past twenty years');

- histogram of daily price changes

In [None]:
df_msft = df[df.ticker == 'MSFT']
df_msft['pct_chg'] = df_msft['price.close'].pct_change(periods = 1)

In [None]:
df_msft.head()

In [None]:
fig = plt.figure(figsize=(10,6))
ax = fig.add_axes([0,0,1,1])
ax.set_title('MSFT Daily Price Changes')
ax.hist(df_msft.pct_chg, bins = 50)

#plt.legend(loc = 'upper left')
plt.show();

- spacing between the bars + horizontal grids

In [None]:
fig = plt.figure(figsize=(10,6))
ax = fig.add_axes([0,0,1,1])
ax.set_title('MSFT Daily Price Changes')
ax.hist(df_msft.pct_chg, bins = 50, rwidth = 0.9)
plt.grid(axis = 'y', linestyle='--', linewidth=1)
#plt.legend(loc = 'upper left')
plt.show();

- chart within a chart

In [None]:
fig = plt.figure(figsize = (10,6))

axes1 = fig.add_axes([0, 0, 1, 1]) # main axes
axes2 = fig.add_axes([0.25, 0.55, 0.45, 0.4]) # inset axes

# main figure
axes1.plot(date, price_msft)
axes1.set_xlabel('date')
axes1.set_ylabel('MSFT (log scale)')
axes1.set_yscale('log')
axes1.set_title('Microsoft Stock Price')

# insert
axes2.plot(date, price_aapl, color = 'black')
axes2.set_xlabel('date', fontsize = 8)
axes2.set_ylabel('AAPL (log)',  fontsize = 8)
axes2.set_yscale('log')
axes2.grid(axis = 'y', linestyle='--', linewidth=1)
axes2.set_title('Apple Stock Price', fontsize = 8);

Note that both price axes are quoted in scientific notation. It is kind of tricky to convert it into a easy-to-read format but not impossible. 

In [None]:
import matplotlib as mpl

In [None]:
fig = plt.figure(figsize = (10,6))

axes1 = fig.add_axes([0, 0, 1, 1]) # main axes
axes2 = fig.add_axes([0.25, 0.55, 0.45, 0.4]) # inset axes

# main figure
axes1.plot(date, price_msft)
axes1.set_xlabel('date')
axes1.set_ylabel('MSFT')
axes1.yaxis.set_major_formatter(mpl.ticker.StrMethodFormatter('{x:,.0f}')) # This line takes care of the number formatting. 
axes1.set_title('Microsoft Stock Price')

# insert
axes2.plot(date, price_aapl, color = 'black')
axes2.set_xlabel('date', fontsize = 8)
axes2.set_ylabel('AAPL (log)',  fontsize = 8)
axes2.set_yscale('log')
axes2.yaxis.set_major_formatter(mpl.ticker.StrMethodFormatter('{x:,.0f}'))
axes2.grid(axis = 'y', linestyle='--', linewidth=1)
axes2.set_title('Apple Stock Price', fontsize = 8);