# Week 3: Plotting and Visualization

In [7]:
import matplotlib.pyplot as  plt 
import numpy as np
import pandas as pd

In [4]:
%matplotlib notebook

### Figures and Subplots

Plots in matplotlib reside within a Figure object. You can create a new figure with plt.figure : 


In [None]:
fig = plt.figure()

To the figure __fig__ that we just created we are going to add a __subplot__.

In [None]:
ax1 = fig.add_subplot(2, 2, 1)

This means that the figure should be 2 × 2 (so up to four plots in
total), and we’re selecting the first of four subplots (numbered from
1). If you create the next two subplots.
 

In [None]:
ax2 = fig.add_subplot(2, 2, 2)
ax3 = fig.add_subplot(2, 2, 3)

When you issue a plotting command like plt.plot([1.5, 3.5, 2, 1.6]) , matplotlib
draws on the last figure and subplot used (creating one if necessary),
thus hiding the figure and subplot creation. 

In [None]:
plt.plot([1.5,3.5,-2,1.6])

We can also try to plot a more fancy line below:

In [None]:
plt.plot(np.random.randn(50).cumsum(), 'k--')

The 'k-' is a
style option instructing matplotlib to plot a black
dashed line. The objects returned by fig.add_subplot here are AxesSubplot objects, on which you can directly plot on the other empty subplots by
calling each one’s instance method: 


In [None]:
_ = ax1.hist(np.random.randn(100), bins=20, color='r', alpha=0.3)
ax2.scatter(np.arange(30), np.arange(30) + 3 * np.random.randn(30))

In [None]:
plt.close('all')

In [None]:
fig, axes_1 = plt.subplots(2, 3)
axes_1

In [None]:
axes_1[0, 1].plot([0.2,0.4,0.5,0.7])
axes_1[0,2].hist(np.random.randn(100), bins=20, color='r', alpha=0.3)

#### Adjusting the spacing around subplots

In [None]:
fig, axes = plt.subplots(2, 2, sharex=True, sharey=True)
for i in range(2):
    for j in range(2):
        axes[i, j].hist(np.random.randn(500), bins=50, color='k', alpha=0.5)
plt.subplots_adjust(wspace=0.1, hspace=0.1)

### Colors, Markers, and Line Styles

Matplotlib’s main plot function
accepts arrays of x and y coordinates and optionally a
string abbreviation indicating color and line style. For example, to
plot x versus y with green dashes, you would execute: 


``` python
ax.plot(x, y, 'g--')

```

``` python
ax.plot(x, y, linestyle='--', color='g')

```

You can
see the full set of line styles by looking at the docstring for plot (use plot? in Jupyter). 
 

In [None]:
plt.figure()

In [None]:
from numpy.random import randn
plt.plot(randn(30).cumsum(), 'ko--')

In [None]:
plt.plot(randn(30).cumsum(), color='k', linestyle='dashed', marker='o')

For line plots, you will notice that subsequent points are
linearly interpolated by default. This can be altered with the drawstyle option 



In [None]:
plt.close('all')

In [None]:
data = np.random.randn(30).cumsum()
plt.plot(data, 'k--', label='Default')
plt.plot(data, 'k-', drawstyle='steps-post', label='steps-post')
plt.legend(loc='best')

Notice that we also created a legend object and added it to figure. 

### Ticks, Labels, and Legends

For most kinds of plot decorations, there are two main ways to do things: using
the procedural pyplot interface (i.e., matplotlib.pyplot ) and the more
object-oriented native matplotlib API.
The pyplot interface, designed for interactive use, consists of methods like xlim , xticks , and xticklabels . These control the plot range, tick locations, and tick labels, respectively. They can be used in two ways: 
- Called with no arguments returns the current parameter value
    (e.g., plt.xlim() returns the current x-axis plotting range) 
- Called with parameters sets the parameter value (e.g., plt.xlim([0, 10]) , sets the x-axis range to 0 to 10) 

All such methods act on the active or most recently created AxesSubplot .
Each of them corresponds to two methods on the subplot object itself; in the case of
xlim these are ax.get_xlim and ax.set_xlim . I prefer to use the subplot
instance methods myself in the interest of being explicit (and
especially when working with multiple subplots), 


# Setting the title, axis labels, ticks, and ticklabels

In [None]:
fig = plt.figure()
ax = fig.add_subplot(1, 1, 1)
ax.plot(np.random.randn(1000).cumsum())

In [None]:
ticks = ax.set_xticks([0, 250, 500, 750, 1000])
labels = ax.set_xticklabels(['one', 'two', 'three', 'four', 'five'],
                            rotation=30, fontsize='small')

In [None]:
ax.set_title('My first matplotlib plot')
ax.set_xlabel('Stages')

props = {
    'title': 'My first matplotlib plot',
    'xlabel': 'Stages'
}
ax.set(**props)

#### Adding legends

In [None]:
from numpy.random import randn
fig = plt.figure(); ax = fig.add_subplot(1, 1, 1)
ax.plot(randn(1000).cumsum(), 'k', label='one')
ax.plot(randn(1000).cumsum(), 'k--', label='two')
ax.plot(randn(1000).cumsum(), 'k.', label='three')

In [None]:
ax.legend(loc='best')

### Annotations and Drawing on a Subplot

In addition to the standard plot types, you may wish to draw your own plot
annotations, which could consist of text, arrows, or other
shapes. You can add annotations and text using the text ,
arrow , and annotate functions. text draws text at given coordinates (x, y) on the plot with optional custom
styling: 

 

``` python
ax.text(x, y, 'Hello world!',
        family='monospace', fontsize=10)
```

Annotations can draw both text and arrows arranged appropriately.
As an example, let’s plot the closing S&P 500 index price since 2007
(obtained from Yahoo! Finance) and annotate it with some of the
important dates from the 2008–2009 financial crisis. You can most easily
reproduce this code example in a single cell in a Jupyter notebook. 



In [None]:
from datetime import datetime

fig = plt.figure()
ax = fig.add_subplot(1, 1, 1)

data = pd.read_csv('examples/spx.csv', index_col=0, parse_dates=True)
spx = data['SPX']

spx.plot(ax=ax, style='k-')

crisis_data = [
    (datetime(2007, 10, 11), 'Peak of bull market'),
    (datetime(2008, 3, 12), 'Bear Stearns Fails'),
    (datetime(2008, 9, 15), 'Lehman Bankruptcy')
]

for date, label in crisis_data:
    ax.annotate(label, xy=(date, spx.asof(date) + 75),
                xytext=(date, spx.asof(date) + 225),
                arrowprops=dict(facecolor='red', headwidth=4, width=2,
                                headlength=4),
                horizontalalignment='left', verticalalignment='top')

# Zoom in on 2007-2010
ax.set_xlim(['1/1/2007', '1/1/2011'])
ax.set_ylim([600, 1800])

ax.set_title('Important dates in the 2008-2009 financial crisis')

There are a couple of important points to highlight in this plot:
the ax.annotate method can draw labels at the
indicated x and y coordinates. We use the set_xlim and
set_ylim methods to manually set the start and end
boundaries for the plot rather than using matplotlib’s default. Lastly,
ax.set_title adds a main title to the plot. See the online matplotlib gallery for many more annotation
examples to learn from. Drawing shapes requires some more care. matplotlib has objects
that represent many common shapes, referred to as patches . Some of these, like
Rectangle and Circle , are found in matplotlib.pyplot , but the 


``` python
fig = plt.figure()
ax = fig.add_subplot(1, 1, 1)

rect = plt.Rectangle((0.2, 0.75), 0.4, 0.15, color='k', alpha=0.3)
circ = plt.Circle((0.7, 0.2), 0.15, color='b', alpha=0.3)
pgon = plt.Polygon([[0.15, 0.15], [0.35, 0.4], [0.2, 0.6]],
                   color='g', alpha=0.5)

ax.add_patch(rect)
ax.add_patch(circ)
ax.add_patch(pgon)

```

In [None]:
fig = plt.figure(figsize=(12, 6))
ax = fig.add_subplot(1, 1, 1)
rect = plt.Rectangle((0.2, 0.75), 0.4, 0.15, color='k', alpha=0.3)
circ = plt.Circle((0.7, 0.2), 0.15, color='b', alpha=0.3)
pgon = plt.Polygon([[0.15, 0.15], [0.35, 0.4], [0.2, 0.6]], color='g', alpha=0.5)


In [None]:
ax.add_patch(rect)
ax.add_patch(circ)
ax.add_patch(pgon)

### Saving Plots to File

In [None]:
plt.savefig('figpath.svg')

In [None]:
plt.savefig('figpath.png', dpi=400, bbox_inches='tight')

## Plotting with pandas and seaborn

matplotlib can be a fairly low-level tool. You assemble a plot from its base
components: the data display (i.e., the type of plot: line, bar, box,
scatter, contour, etc.), legend, title, tick labels, and other
annotations. In pandas we may have multiple columns of data, along with row and
column labels. pandas itself has built-in methods that simplify creating
visualizations from DataFrame and Series objects. Another library is
seaborn , a
statistical graphics library created by Michael Waskom. Seaborn simplifies creating many common
visualization types. 
 

### Line Plots

In [5]:
plt.close('all')

In [8]:
s = pd.Series(np.random.randn(10).cumsum(), index=np.arange(0, 100, 10))
s.plot()

<IPython.core.display.Javascript object>

<matplotlib.axes._subplots.AxesSubplot at 0x117638e80>

In [9]:
df = pd.DataFrame(np.random.randn(10, 4).cumsum(0),
                  columns=['A', 'B', 'C', 'D'],
                  index=np.arange(0, 100, 10))
df.plot()

<IPython.core.display.Javascript object>

<matplotlib.axes._subplots.AxesSubplot at 0x11819f0b8>

### Bar Plots

In [10]:
fig, axes = plt.subplots(2, 1)
data = pd.Series(np.random.rand(16), index=list('abcdefghijklmnop'))
data.plot.bar(ax=axes[0], color='k', alpha=0.7)
data.plot.barh(ax=axes[1], color='k', alpha=0.7)

<IPython.core.display.Javascript object>

<matplotlib.axes._subplots.AxesSubplot at 0x118a02b00>

In [11]:
np.random.seed(12348)

In [12]:
df = pd.DataFrame(np.random.rand(6, 4),
                  index=['one', 'two', 'three', 'four', 'five', 'six'],
                  columns=pd.Index(['A', 'B', 'C', 'D'], name='Genus'))
df
df.plot.bar()

<IPython.core.display.Javascript object>

<matplotlib.axes._subplots.AxesSubplot at 0x118a79390>

In [13]:
plt.figure()

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

In [14]:
df.plot.barh(stacked=True, alpha=0.5)

<IPython.core.display.Javascript object>

<matplotlib.axes._subplots.AxesSubplot at 0x106699c50>

In [15]:
plt.close('all')

Let’s have a look at an example dataset about restaurant tipping, and suppose
we wanted to make a stacked bar plot showing the percentage of data
points for each party size on each day. I load the data using read_csv and make
a cross-tabulation by day and party size: 

 

In [18]:
tips = pd.read_csv('examples/tips.csv')
tips.head()

Unnamed: 0,total_bill,tip,smoker,day,time,size
0,16.99,1.01,No,Sun,Dinner,2
1,10.34,1.66,No,Sun,Dinner,3
2,21.01,3.5,No,Sun,Dinner,3
3,23.68,3.31,No,Sun,Dinner,2
4,24.59,3.61,No,Sun,Dinner,4


In [21]:

party_counts = pd.crosstab(tips['day'], tips['size'])
party_counts


size,1,2,3,4,5,6
day,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Fri,1,16,1,1,0,0
Sat,2,53,18,13,1,0
Sun,0,39,15,18,3,1
Thur,1,48,4,5,1,3


In [23]:
# Not many 1- and 6-person parties
party_counts = party_counts.loc[:, 2:5]
party_counts

size,2,3,4,5
day,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
Fri,16,1,1,0
Sat,53,18,13,1
Sun,39,15,18,3
Thur,48,4,5,1


In [24]:
# Normalize to sum to 1
party_pcts = party_counts.div(party_counts.sum(1), axis=0)
party_pcts


size,2,3,4,5
day,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
Fri,0.888889,0.055556,0.055556,0.0
Sat,0.623529,0.211765,0.152941,0.011765
Sun,0.52,0.2,0.24,0.04
Thur,0.827586,0.068966,0.086207,0.017241


In [25]:
party_pcts.plot.bar()

<IPython.core.display.Javascript object>

<matplotlib.axes._subplots.AxesSubplot at 0x11a222cc0>

In [28]:
plt.close('all')

## Seaborn

In [26]:
import seaborn as sns
tips['tip_pct'] = tips['tip'] / (tips['total_bill'] - tips['tip'])
tips.head()


Unnamed: 0,total_bill,tip,smoker,day,time,size,tip_pct
0,16.99,1.01,No,Sun,Dinner,2,0.063204
1,10.34,1.66,No,Sun,Dinner,3,0.191244
2,21.01,3.5,No,Sun,Dinner,3,0.199886
3,23.68,3.31,No,Sun,Dinner,2,0.162494
4,24.59,3.61,No,Sun,Dinner,4,0.172069


In [31]:
sns.barplot(x='tip_pct', y='day', data=tips, orient='h')

  return np.add.reduce(sorted[indexer] * weights, axis=axis) / sumval


<IPython.core.display.Javascript object>

<matplotlib.axes._subplots.AxesSubplot at 0x1a1ce83eb8>

In [32]:
plt.close('all')

In [33]:
sns.barplot(x='tip_pct', y='day', hue='time', data=tips, orient='h')

<IPython.core.display.Javascript object>

<matplotlib.axes._subplots.AxesSubplot at 0x1a1cf34048>

In [34]:
plt.close('all')

In [35]:
sns.set(style="whitegrid")

In [36]:
plt.figure()

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

In [37]:
comp1 = np.random.normal(0, 1, size=200)
comp2 = np.random.normal(10, 2, size=200)
values = pd.Series(np.concatenate([comp1, comp2]))
sns.distplot(values, bins=100, color='k')

<matplotlib.axes._subplots.AxesSubplot at 0x1a1d5ff9e8>

### Scatter or Point Plots

In [38]:
macro = pd.read_csv('examples/macrodata.csv')
data = macro[['cpi', 'm1', 'tbilrate', 'unemp']]
trans_data = np.log(data).diff().dropna()
trans_data[-5:]

Unnamed: 0,cpi,m1,tbilrate,unemp
198,-0.007904,0.045361,-0.396881,0.105361
199,-0.021979,0.066753,-2.277267,0.139762
200,0.00234,0.010286,0.606136,0.160343
201,0.008419,0.037461,-0.200671,0.127339
202,0.008894,0.012202,-0.405465,0.04256


In [39]:
plt.figure()

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

In [40]:
sns.regplot('m1', 'unemp', data=trans_data)
plt.title('Changes in log %s versus log %s' % ('m1', 'unemp'))

Text(0.5, 1.0, 'Changes in log m1 versus log unemp')

In [41]:
sns.pairplot(trans_data, diag_kind='kde', plot_kws={'alpha': 0.2})

<IPython.core.display.Javascript object>

<seaborn.axisgrid.PairGrid at 0x1a1e513240>

### Facet Grids and Categorical Data

In [42]:
sns.factorplot(x='day', y='tip_pct', hue='time', col='smoker',
               kind='bar', data=tips[tips.tip_pct < 1])



<IPython.core.display.Javascript object>

<seaborn.axisgrid.FacetGrid at 0x1a1f1334e0>

In [43]:
sns.factorplot(x='day', y='tip_pct', row='time',
               col='smoker',
               kind='bar', data=tips[tips.tip_pct < 1])



<IPython.core.display.Javascript object>

<seaborn.axisgrid.FacetGrid at 0x1a209ec2b0>

In [44]:
sns.factorplot(x='tip_pct', y='day', kind='box',
               data=tips[tips.tip_pct < 0.5])



<IPython.core.display.Javascript object>

<seaborn.axisgrid.FacetGrid at 0x1a1e07c860>