### Pandas [Visualization](http://pandas.pydata.org/pandas-docs/version/0.18.1/visualization.html)
### Coursera [Applied Plotting, Charting and data Representation in Python](https://www.coursera.org/learn/python-plotting)

Alberto cairo (Tools for thinking about design)

Visualization Wheel Dimensions:

- Abstraction - Figuration
- Functionality - Decoration
- Density - Lightness
- Multidimensional - Unidimensional
- Originality - Familiarity
- Novelty - Redundancy

Edward Tufte (Data-ink Ratio): 

By Increasing the data-ink ratio, one can make the graphic not only simpler and more readable but increased the amount of information of the viewer sees.

Please explore Dark Horse's portfolio and blog for interesting and interactive visuals [here](http://www.darkhorseanalytics.com/)

**Matplotlib Architecture**
- Backend layer
    - Deals with the rendering of plots to screen or files
    - In jupyter botebooks we use the inline backend
- Artist layer
    - Contains containers such as Figure, Subplot, and Axes
    - Contains primitives such as a Line2D and Rectangle and collections such as PathCollection
- Scripting layer
    - Simplifies access to the Artist and Backend layers
    
[Ten simple rules for better figures](http://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1003833)

### Scatter plots, line graphs and bar charts

In [None]:
import matplotlib.pyplot as plt
import matplotlib as mpl
% matplotlib inline

In [None]:
mpl.get_backend()

In [None]:
# plt.plot?
plt.plot([3,2, 1], [3,4,5])

In [18]:
plt.plot(3,2,'.')

[<matplotlib.lines.Line2D at 0x7ea9160>]

In [4]:
from matplotlib.backends.backend_agg import FigureCanvasAgg
from matplotlib.figure import Figure

fig = Figure()
canvas = FigureCanvasAgg(fig)

ax  =fig.add_subplot(111)
ax.plot(3,2,'.')
canvas.print_png('test.png')

In [5]:
%%html
<img src='test.png' />

In [6]:
# create a new figure
plt.figure()

# plot the point (3,2) using the circle marker
plt.plot(3,2,'o')

# get the current axes
ax = plt.gca()

#set axis properties [xmin, xmax, ymin, ymax]
ax.axis([0,6,0,10])

<IPython.core.display.Javascript object>

[0, 6, 0, 10]

In [25]:
# create a new figure
plt.figure()

# plot the point (1.5, 1.5) using the circle marker
plt.plot(1.5, 1.5, 'o')
# plot the point (2, 2) using the circle marker
plt.plot(2, 2, 'o')
# plot the point (2.5, 2.5) using the circle marker
plt.plot(2.5, 2.5, 'o')

<IPython.core.display.Javascript object>

[<matplotlib.lines.Line2D at 0x9c2a828>]

In [7]:
# get current axes
ax = plt.gca()

# get all the child objects that axes contains
ax.get_children()

[<matplotlib.lines.Line2D at 0x7ceb518>,
 <matplotlib.spines.Spine at 0x7bc4978>,
 <matplotlib.spines.Spine at 0x7bc46d8>,
 <matplotlib.spines.Spine at 0x7bc4828>,
 <matplotlib.spines.Spine at 0x7bae240>,
 <matplotlib.axis.XAxis at 0x7bc4a90>,
 <matplotlib.axis.YAxis at 0x7bd65c0>,
 <matplotlib.text.Text at 0x7c8db70>,
 <matplotlib.text.Text at 0x7c8da20>,
 <matplotlib.text.Text at 0x7c8d940>,
 <matplotlib.patches.Rectangle at 0x7c8dc18>]

In [8]:
# get current figure
plt.gcf()

<IPython.core.display.Javascript object>

In [34]:
import numpy as np
x = np.array([1,2,3,4,5,6,7])
y = x

colors = ['green'] * (len(x)- 1)
colors.append('red')
plt.figure()
plt.scatter(x,y, c = colors, s =90)

<IPython.core.display.Javascript object>

<matplotlib.collections.PathCollection at 0xa6ceef0>

### zip 

In [9]:
zip_list = zip([1,2,3,4,5], [6,7,8,9,10])
zip_list

[(1, 6), (2, 7), (3, 8), (4, 9), (5, 10)]

In [10]:
x, y =zip(*zip_list)  # don't forget the star
print(x)
print(y)

(1, 2, 3, 4, 5)
(6, 7, 8, 9, 10)


In [11]:
plt.figure()
plt.scatter(x[:3], y[:3], c= 'blue', s =100, label = 'short students')
plt.scatter(x[3:], y[3:], c= 'red', s =100, label = 'tall students')
plt.xlabel("x label")
plt.ylabel("y label")
plt.title("Plot title")
plt.legend(loc = 4, frameon=False)

<IPython.core.display.Javascript object>

<matplotlib.legend.Legend at 0x7ea1cf8>

In [17]:
import numpy as np
linear_data = np.array(range(1,10))
exp_data = linear_data ** 2

plt.figure()
plt.plot(linear_data, '-o', exp_data, '-o')
plt.plot([20, 30, 45, 60], '--r')  # dash

<IPython.core.display.Javascript object>

[<matplotlib.lines.Line2D at 0x9dcbc18>]

In [18]:
plt.xlabel('Timer')
plt.ylabel('Y label')
plt.title('My plot')
plt.legend(['Baseline', 'exp curve', 'random line'], loc=9)

<matplotlib.legend.Legend at 0xa250c88>

### fill_between

In [19]:
plt.gca().fill_between(range(len(linear_data)), linear_data, exp_data, facecolor = 'blue', alpha = 0.25)

<matplotlib.collections.PolyCollection at 0x9dadba8>

In [20]:
import pandas as pd
plt.figure()
observation_dates = np.arange('2017-01-01', '2017-01-10', dtype = 'datetime64[D]')
observation_dates = list(map(pd.to_datetime, observation_dates))
plt.plot(observation_dates, linear_data, '-o', observation_dates, exp_data, '-o')

<IPython.core.display.Javascript object>

[<matplotlib.lines.Line2D at 0xa27f5f8>,
 <matplotlib.lines.Line2D at 0xa3fbbe0>]

In [21]:
x = plt.gca().xaxis
for item in x.get_ticklabels():
    item.set_rotation(45)

In [22]:
plt.subplots_adjust(bottom=0.25)

In [23]:
ax = plt.gca()
ax.set_xlabel('Date')
ax.set_ylabel('Units')
ax.set_title('Exponential vs. Linear performance')

<matplotlib.text.Text at 0xa36af60>

In [24]:
ax.set_title("Exponential ($x^2$) vs. Linear ($x$) performance")  #Latex

<matplotlib.text.Text at 0xa36af60>

In [25]:
plt.figure()

xvals = range(len(linear_data))
plt.bar(xvals, linear_data, width=0.3)

<IPython.core.display.Javascript object>

<Container object of 9 artists>

In [114]:
new_xvals = []

for item in xvals:
    new_xvals.append(item+0.3)
    
plt.bar(new_xvals, exp_data, width=0.3, color= 'red')    

<Container object of 9 artists>

In [26]:
from random import randint
linear_err = [randint(0,3) for x in range(len(linear_data))]

plt.bar(xvals, linear_data, width=0.3, yerr = linear_err)

<Container object of 9 artists>

In [27]:
# stack bar chart

plt.figure()
plt.bar(xvals, linear_data, width=0.3, color = 'b')
plt.bar(xvals, exp_data, width=0.3, bottom=linear_data, color= 'r')

<IPython.core.display.Javascript object>

<Container object of 9 artists>

In [153]:
plt.figure()
    
languages = ['Python', 'SQL', 'Java','C++', 'JavaScript']
pos = np.arange(len(languages))
popularity= [56, 39, 34, 34, 29]

bars = plt.bar(pos, popularity, align = 'center', color='lightslategrey')
plt.xticks(pos, languages)
plt.ylabel('% Popularity')
plt.title('Top 5 languages for Math & Data \nby % popularity on Stack Overflow', alpha=0.8)

# make one bar , the python bar, a constrasting color
bars[0].set_color('#1F77B4')

# remove all the ticks (both axes), and tick labels on the Y axis
plt.tick_params(top='off', bottom='off', left='off',right='off', labelleft='off', labelbottom='on')

# remove the frame of the chart
for spine in plt.gca().spines.values():
    spine.set_visible(False)
    
# directly label each bar with y axis values
for bar in bars:
    height = bar.get_height()
    plt.gca().text(bar.get_x() + bar.get_width()/2, bar.get_height() -5, str(int(height)) + '%', ha='center', color='w', fontsize=11)
plt.show()

<IPython.core.display.Javascript object>

### Chart fundamentals - Subplots, interaction and animation

In [28]:
# plt.subplot?
plt.figure()

linear_data=np.array([1,2,3,4,5,6,7,8])
exp_data = linear_data ** 2

ax1 = plt.subplot(1,2,1)  # equals to plt.subplot(121), subplot in single forms
plt.plot(linear_data, '-o')
plt.plot(exp_data, '--x')

ax2 = plt.subplot(1,2,2, sharey = ax1)
plt.plot(exp_data, '-o')

<IPython.core.display.Javascript object>

[<matplotlib.lines.Line2D at 0xc6751d0>]

In [173]:
fig, ((ax1, ax2, ax3), (ax4, ax5, ax6), (ax7, ax8, ax9)) = plt.subplots(3, 3, sharex=True, sharey = True)  #subplots in plural form
ax5.plot(linear_data, '-')

<IPython.core.display.Javascript object>

[<matplotlib.lines.Line2D at 0x20f107f0>]

In [29]:
for ax in plt.gcf().get_axes():   # gcf not gca()
    for label in ax.get_xticklabels() + ax.get_yticklabels():
        label.set_visible(True)

In [30]:
# redraw the plot()
plt.gcf().canvas.draw()

### histogram

In [31]:
fig, ((ax1, ax2), (ax3, ax4)) = plt.subplots(2,2,sharex=True)

axs = [ax1, ax2, ax3, ax4]

for n in range(0, len(axs)):
    sample_size = 10 ** (n + 1)
    sample = np.random.normal(loc=0.0, scale=1.0, size=sample_size)
    axs[n].hist(sample, bins=25)
    axs[n].set_title('n={}'.format(sample_size))

<IPython.core.display.Javascript object>

In [34]:
plt.figure()

Y = np.random.normal(loc=0.0, scale=1.0, size=10000)
X = np.random.random(size=10000)
plt.scatter(X,Y)

<IPython.core.display.Javascript object>

<matplotlib.collections.PathCollection at 0xdeae8d0>

In [35]:
import matplotlib.gridspec as gridspec

plt.figure()
gspec = gridspec.GridSpec(3,3)

top_histogram = plt.subplot(gspec[0, 1:])
side_histogram = plt.subplot(gspec[1:, 0])
lower_right = plt.subplot(gspec[1:, 1:])

lower_right.scatter(X,Y)
top_histogram.hist(X, bins=100) 
s = side_histogram.hist(Y, bins=100, orientation='horizontal')

<IPython.core.display.Javascript object>

In [36]:
top_histogram.clear()
top_histogram.hist(X, bins=100, normed=True)

side_histogram.clear()
side_histogram.hist(Y, bins=100, normed= True, orientation='horizontal')
side_histogram.invert_xaxis()

In [37]:
for ax in [top_histogram, lower_right]:
    ax.set_xlim(0,1)

for ax in [side_histogram, lower_right]:    
    ax.set_ylim(-5,5)

### box plot

In [38]:
normal_sample = np.random.normal(loc=0.0, scale=1.0, size=10000)
random_sample = np.random.random(size=10000)
gamma_sample = np.random.gamma(2, size=10000)

df = pd.DataFrame({'normal':normal_sample,
                   'random':random_sample,
                   'gamma': gamma_sample})
df.head()

Unnamed: 0,gamma,normal,random
0,2.612296,0.824819,0.446249
1,2.153705,-0.81847,0.570763
2,1.687982,-0.16449,0.793403
3,1.655245,-1.141228,0.192938
4,0.988635,-0.531064,0.098891


In [39]:
df.describe()

Unnamed: 0,gamma,normal,random
count,10000.0,10000.0,10000.0
mean,1.997128,0.005658,0.499635
std,1.409181,1.003663,0.290125
min,0.017337,-3.540068,0.000156
25%,0.97003,-0.680111,0.250516
50%,1.690006,0.00794,0.495247
75%,2.675606,0.67831,0.752445
max,11.546946,4.169064,0.999885


In [40]:
plt.figure()
_ = plt.boxplot(df['normal'], whis='range')

<IPython.core.display.Javascript object>

In [41]:
plt.clf()
_ = plt.boxplot([df['normal'], df['random'], df['gamma']], whis='range')

In [42]:
plt.figure()
_ = plt.hist(df['gamma'], bins=50)

<IPython.core.display.Javascript object>

In [44]:
import mpl_toolkits.axes_grid1.inset_locator as mpl_il

plt.figure()

plt.boxplot([df['normal'], df['random'], df['gamma']], whis='range')
ax2 = mpl_il.inset_axes(plt.gca(), width='60%', height='40%', loc=2)
ax2.hist(df['gamma'], bins=100)
ax2.margins(x=0.5)

<IPython.core.display.Javascript object>

In [45]:
# move the y axis to the right
ax2.yaxis.tick_right()

In [46]:
plt.figure()

plt.boxplot([df['normal'], df['random'], df['gamma']])  # detect outlier

<IPython.core.display.Javascript object>

{'boxes': [<matplotlib.lines.Line2D at 0x10b6f3c8>,
  <matplotlib.lines.Line2D at 0x10d07198>,
  <matplotlib.lines.Line2D at 0x10d22080>],
 'caps': [<matplotlib.lines.Line2D at 0x10ced668>,
  <matplotlib.lines.Line2D at 0x10c30da0>,
  <matplotlib.lines.Line2D at 0x10c22cc0>,
  <matplotlib.lines.Line2D at 0xe14dc50>,
  <matplotlib.lines.Line2D at 0x10b36b38>,
  <matplotlib.lines.Line2D at 0x10a7fe80>],
 'fliers': [<matplotlib.lines.Line2D at 0x10d070b8>,
  <matplotlib.lines.Line2D at 0x109f5128>,
  <matplotlib.lines.Line2D at 0xe16bf98>],
 'means': [],
 'medians': [<matplotlib.lines.Line2D at 0x10c30cf8>,
  <matplotlib.lines.Line2D at 0xe14dda0>,
  <matplotlib.lines.Line2D at 0xe16b630>],
 'whiskers': [<matplotlib.lines.Line2D at 0x10b6f898>,
  <matplotlib.lines.Line2D at 0x10cedc18>,
  <matplotlib.lines.Line2D at 0x10a76278>,
  <matplotlib.lines.Line2D at 0x10c3b518>,
  <matplotlib.lines.Line2D at 0x10d229e8>,
  <matplotlib.lines.Line2D at 0x10d227b8>]}

### heatmap -- tow dimensional histogram

In [47]:
plt.figure()

Y = np.random.normal(loc=0.0, scale=1.0, size=10000)
X = np.random.random(size=10000)
_= plt.hist2d(X, Y, bins=25)

<IPython.core.display.Javascript object>

In [48]:
plt.colorbar()

<matplotlib.colorbar.Colorbar at 0x11da9a20>

### Animation

In [49]:
import matplotlib.animation as animation

n = 100
x = np.random.randn(n)

In [50]:
def update(curr):
    if curr == n :
        a.event_source.stop()
    
    plt.cla()
    bins = np.arange(-4, 4, 0.5)
    
    plt.hist(x[:curr], bins= bins)
    
    plt.axis([-4, 4, 0, 30])
    plt.gca().set_title('Sampling the Normal Distribution')
    plt.gca().set_ylabel('Frequency')
    plt.gca().set_xlabel('Value')
    plt.annotate('n = {}'.format(curr), [3,27])        

In [51]:
fig = plt.figure()

a = animation.FuncAnimation(fig, update, interval=100)

<IPython.core.display.Javascript object>

### interactivity

In [52]:
plt.figure()
data = np.random.rand(10)
plt.plot(data)

def onclick(event):
    
    plt.cla()
    plt.plot(data)
    
    plt.gca().set_title('Event at pixels {}, {} {} and data {}, {}'.format(event.x, event.y, '\n', event.xdata, event.ydata))
    
plt.gcf().canvas.mpl_connect('button_press_event', onclick)    

<IPython.core.display.Javascript object>

7

In [54]:
from random import shuffle

origins = ['China', 'Brazil', 'India', 'USA', 'Canada', 'UK', 'Germany', 'Iraq', 'Chile', 'Mexico']

shuffle(origins)

df = pd.DataFrame({'height': np.random.rand(10), 
                  'weight': np.random.rand(10),
                 'origin':origins})

df

Unnamed: 0,height,origin,weight
0,0.320793,Canada,0.618317
1,0.431868,Iraq,0.873834
2,0.643368,China,0.850183
3,0.046257,India,0.760481
4,0.176661,Brazil,0.363868
5,0.227281,Chile,0.938099
6,0.316859,USA,0.70499
7,0.578367,Mexico,0.415152
8,0.395352,UK,0.725679
9,0.112769,Germany,0.297451


In [55]:
plt.figure()
plt.scatter(df['height'], df['weight'], picker=5)

plt.gca().set_ylabel('Weight')
plt.gca().set_xlabel('Height')

<IPython.core.display.Javascript object>

<matplotlib.text.Text at 0x12d61860>

In [56]:
def onpick(event):
    origin = df.iloc[event.ind[0]]['origin']
    plt.gca().set_title('Selected item came from {}'.format(origin))

plt.gcf().canvas.mpl_connect('pick_event', onpick)

7

### Plotting with pandas

In [57]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

%matplotlib notebook

In [58]:
plt.style.available

[u'seaborn-darkgrid',
 u'seaborn-notebook',
 u'classic',
 u'seaborn-ticks',
 u'grayscale',
 u'bmh',
 u'seaborn-talk',
 u'dark_background',
 u'ggplot',
 u'fivethirtyeight',
 u'seaborn-colorblind',
 u'seaborn-deep',
 u'seaborn-whitegrid',
 u'seaborn-bright',
 u'seaborn-poster',
 u'seaborn-muted',
 u'seaborn-paper',
 u'seaborn-white',
 u'seaborn-pastel',
 u'seaborn-dark',
 u'seaborn-dark-palette']

In [59]:
plt.style.use('seaborn-colorblind')

In [60]:
np.random.seed(123)

df = pd.DataFrame({'A': np.random.randn(365).cumsum(0),
                   'B': np.random.randn(365).cumsum(0) + 20 ,
                   'C': np.random.randn(365).cumsum(0) - 20},
                 index = pd.date_range('1/1/2017', periods=365) ) #cumulatively summing up random numbers. 
df.head()

Unnamed: 0,A,B,C
2017-01-01,-1.085631,20.059291,-20.230904
2017-01-02,-0.088285,21.803332,-16.659325
2017-01-03,0.194693,20.835588,-17.055481
2017-01-04,-1.311601,21.255156,-17.093802
2017-01-05,-1.890202,21.462083,-19.518638


In [61]:
df.plot();

<IPython.core.display.Javascript object>

In [62]:
df.plot('A', 'B', kind='scatter');

<IPython.core.display.Javascript object>

In [63]:
df.plot.scatter('A','C', c='B', s = df['B'], colormap = 'viridis')

<IPython.core.display.Javascript object>

<matplotlib.axes._subplots.AxesSubplot at 0x134bc9b0>

In [64]:
ax = df.plot.scatter('A','C', c='B', s = df['B'], colormap = 'viridis')
ax.set_aspect('equal')

<IPython.core.display.Javascript object>

In [65]:
df.plot.box()

<IPython.core.display.Javascript object>

<matplotlib.axes._subplots.AxesSubplot at 0x149c1358>

In [66]:
df.plot.hist(alpha=0.7)

<IPython.core.display.Javascript object>

<matplotlib.axes._subplots.AxesSubplot at 0x14bdc128>

In [67]:
df.plot.kde();

<IPython.core.display.Javascript object>

### pandas.tools.plotting

In [68]:
iris = pd.read_csv(r'https://raw.githubusercontent.com/plotly/datasets/master/iris.csv')

In [69]:
iris.head()

Unnamed: 0,SepalLength,SepalWidth,PetalLength,PetalWidth,Name
0,5.1,3.5,1.4,0.2,Iris-setosa
1,4.9,3.0,1.4,0.2,Iris-setosa
2,4.7,3.2,1.3,0.2,Iris-setosa
3,4.6,3.1,1.5,0.2,Iris-setosa
4,5.0,3.6,1.4,0.2,Iris-setosa


In [70]:
pd.tools.plotting.scatter_matrix(iris)

<IPython.core.display.Javascript object>

array([[<matplotlib.axes._subplots.AxesSubplot object at 0x0000000016C0E438>,
        <matplotlib.axes._subplots.AxesSubplot object at 0x0000000014F0FEB8>,
        <matplotlib.axes._subplots.AxesSubplot object at 0x0000000016D5C080>,
        <matplotlib.axes._subplots.AxesSubplot object at 0x0000000016E12E10>],
       [<matplotlib.axes._subplots.AxesSubplot object at 0x0000000016DDA278>,
        <matplotlib.axes._subplots.AxesSubplot object at 0x00000000170122E8>,
        <matplotlib.axes._subplots.AxesSubplot object at 0x000000001710FE48>,
        <matplotlib.axes._subplots.AxesSubplot object at 0x000000001717F438>],
       [<matplotlib.axes._subplots.AxesSubplot object at 0x000000001727ADA0>,
        <matplotlib.axes._subplots.AxesSubplot object at 0x0000000017357AC8>,
        <matplotlib.axes._subplots.AxesSubplot object at 0x000000001742D470>,
        <matplotlib.axes._subplots.AxesSubplot object at 0x0000000017528CF8>],
       [<matplotlib.axes._subplots.AxesSubplot object at 0x00

In [71]:
plt.figure()
pd.tools.plotting.parallel_coordinates(iris, 'Name')

<IPython.core.display.Javascript object>

<matplotlib.axes._subplots.AxesSubplot at 0x18470780>

In [72]:
import seaborn as sns

In [73]:
np.random.seed(1234)

v1  = pd.Series(np.random.normal(0,10, 1000), name = 'v1')
v2 = pd.Series(2*v1 + np.random.normal(60, 15, 1000), name = 'v2')

In [74]:
plt.figure()
plt.hist(v1, alpha = 0.7, bins = np.arange(-50, 150, 5), label='v1')
plt.hist(v2, alpha = 0.7, bins = np.arange(-50, 150, 5), label='v2')
plt.legend()

<IPython.core.display.Javascript object>

<matplotlib.legend.Legend at 0x192fbbe0>

In [75]:
plt.figure()
plt.hist([v1, v2], histtype = 'barstacked', normed=True)
v3 = np.concatenate((v1,v2))
sns.kdeplot(v3);

<IPython.core.display.Javascript object>

In [76]:
plt.figure()
sns.distplot(v3, hist_kws={'color':'Teal'}, kde_kws={'color':'Navy'});

<IPython.core.display.Javascript object>

In [77]:
sns.jointplot(v1,v2, alpha=0.4);

<IPython.core.display.Javascript object>

In [78]:
grid = sns.jointplot(v1,v2,alpha = 0.4);
grid.ax_joint.set_aspect('equal')

<IPython.core.display.Javascript object>

In [79]:
sns.jointplot(v1,v2, kind='hex')

<IPython.core.display.Javascript object>

<seaborn.axisgrid.JointGrid at 0x1b0b25f8>

In [80]:
sns.set_style('white')
sns.jointplot(v1,v2, kind='kde', space=0)

<IPython.core.display.Javascript object>

<seaborn.axisgrid.JointGrid at 0x1b921828>

In [81]:
sns.pairplot(iris, hue = 'Name', diag_kind = 'kde')

<IPython.core.display.Javascript object>

<seaborn.axisgrid.PairGrid at 0x1bff17b8>

In [82]:
plt.figure(figsize=(12,8))
plt.subplot(121)
sns.swarmplot('Name', 'PetalLength', data=iris);

plt.subplot(122)
sns.violinplot('Name', 'PetalLength', data=iris);

<IPython.core.display.Javascript object>