<h1><font color = #fc7cc9> Ch. 9 Plotting and Visualisation
    <br>pg. 253 - 286</h1>

In [None]:
import pandas as pd
import numpy as np

import matplotlib.pyplot as plt
import seaborn as sns

In [None]:
%matplotlib notebook 

<h2> <font color = #39abed> 9.1 A Brief matplotlib API Primer
    </h2>

In [3]:
data = np.arange(10)
data

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

In [4]:
plt.plot(data)

<IPython.core.display.Javascript object>

[<matplotlib.lines.Line2D at 0x23468574100>]

<h3> <font color = #39abed>Figures and Subplots
    </h3>
<p>

In [5]:
# To create a new figure, do the following, to create an empty plot:
fig = plt.figure()

<IPython.core.display.Javascript object>

In [6]:
# Next, you need to make one or more subplots to the figure to make a real plot
ax1 = fig.add_subplot(2, 2, 1) # This changes the above figure

# The above means that the figure should be 2 x 2 (so up to 4 plots total)

In [7]:
# Adding more subplots
ax2 = fig.add_subplot(2, 2, 2)
ax3 = fig.add_subplot(2, 2, 3)

# Typically, these would all be run in the same cell, the above blocks too

In [8]:
# If you only add 1 plot command, like below, 
# Then it will only be applied to the last subplot
plt.plot(np.random.randn(50).cumsum(), 'k--')

# The 'k--' is a style option, for the black dashed line. 


[<matplotlib.lines.Line2D at 0x234685b3580>]

In [9]:
# Making a graph wit some data
_ = ax1.hist(np.random.randn(100), bins = 20, color = 'k', alpha = 0.3)
ax2.scatter(np.arange(30), np.arange(30) + 3 * np.random.randn(30))

<matplotlib.collections.PathCollection at 0x2346861b670>

<blockquote> Creating a figure with a grid of subplots is a very common task, so matplotlib includes a convenience method, <b>plt.subplots</b>, that creates a new figure and returns a NumPy array containing the created subplot objects: 

In [10]:
fig, axes = plt.subplots(2, 3) # make 6 graphs, 2 rows by 3
axes

<IPython.core.display.Javascript object>

array([[<matplotlib.axes._subplots.AxesSubplot object at 0x0000023468669CD0>,
        <matplotlib.axes._subplots.AxesSubplot object at 0x000002346868CBB0>,
        <matplotlib.axes._subplots.AxesSubplot object at 0x00000234686B8130>],
       [<matplotlib.axes._subplots.AxesSubplot object at 0x00000234686F0430>,
        <matplotlib.axes._subplots.AxesSubplot object at 0x000002346871D850>,
        <matplotlib.axes._subplots.AxesSubplot object at 0x000002346874AD30>]],
      dtype=object)

The axes array can be easily indexed like normal axes.
You can also specify if the subplots should have the same x- and y- axis by using 'sharex' and 'sharey'.

<br>
<b>See page 258, table 9.1 for pyplot.subplots options

In [None]:
# To change the spacing around the subplots, you can use subplots_adjust
# wspave and hspace are for the width and height. Example:

# See page 258 for an example

<h3> <font color = #39abed>Colours, Markers, and Line Styles
    </h3>
<p> the main 'plot' function accepts arrays of x and y coordinates, and optionally a string abbreviation for the color and style of a line for the graph. E.g. for a green dotted line<br>
    <code> ax.plot(x, y, 'g--')</code>
<p> Alternatively, you can also do the following: <br>
    <code>ax.plot(x, y, lintestyle = '--', color = 'g')</code> <br>
    <p> Can also use hex code for colours. For more docstring for plot, use <code>plot?</code> in IPython of Jupyter.<br>
 <br>
    <p> Line plots can also have <i>markers</i> to highlight the actual data points. Try below:

In [17]:
from numpy.random import randn

In [18]:
# NOTE, need to open and close the notebook file, each time
# you want to lot something new. Otherwise this graph will show up in the previous code graphing code chunk
plt.plot(randn(3).cumsum(), 'ko--')

<IPython.core.display.Javascript object>

[<matplotlib.lines.Line2D at 0x2346acb6f70>]

In [19]:
# the above could have also be written as the following:
plot(randn(30).cumsum(), colors = "k", linestyle = 'dashed', marker = 'o')

NameError: name 'plot' is not defined

<h3> <font color = #39abed>Ticks, Labels, and Legends
    </h3>
<p>There are 2 main ways to do plot decorations: 1) with pyplot interface, (e.g. <code>matplotlib.pyplot</code>, and with the more object-oriented native matplotlib API. <br>
    <p> For <code>pyplot</code>, you would use things like xlim, xticks, and xticklabels. E.g., as <code>plt.xlim([0, 10])</code> to se the x-axis range to be between 0 and 10.<br>
    <p> You can also use subplot methods with AxesSubplot, such as: <code>ax.get_xlim</code> or <code>ax.set_xlim</code>.

 #### Setting the title, axis labels, ticks, and ticklabels 

In [3]:
# Make the canvas for the plot (?)
fig = plt.figure()

<IPython.core.display.Javascript object>

In [4]:
ax = fig.add_subplot(1, 1, 1) # setting the dimensions/axises

In [5]:
ax.plot(np.random.randn(1000).cumsum())

[<matplotlib.lines.Line2D at 0x1d40713d730>]

Now we can change the x-axis ticks. With <code>set_xticks</code>, this tells matplotlib where to place the ticks along the data range, which will allow these locations to also be the labels by default.<br>
With <code>set_xticklabels</code>, this allows you to set any other values as the labels. 

In [6]:
ticks = ax.set_xticks([0, 250, 500, 750, 1000]) # Watch how fig gets updated after this is executed

In [7]:
labels = ax.set_xticklabels(['one', 'two', 'three', 'four', 'five'],
                           rotation = 30, fontsize = 'small')

# The rotation option sets the x ticks at a 30degree rotation.

##### Now, how to set titles!

In [8]:
ax.set_title('My first matplotlib plot!')

Text(0.5, 1.0, 'My first matplotlib plot!')

In [9]:
ax.set_xlabel('Stages')

Text(0.5, 27.470839501604587, 'Stages')

To modify the y-axis, it is a similar process as the above for the x-axis. Also, for the above, you could have done this as well, it is the same:<br>
<code>props = {
    'title': 'My first matplotlib plot!',
    'xlabel': 'Stages'
}
ax.set(**props)</code>

#### Adding legends
There are a few ways to add legends. The easiest is to pass the <code>labels</code> argument.<br>
See below for an example, where we <b>first</b> start with 1) drawing the canvas, 2) creating each line and labeling them, as well as defining how they will look with 'k', 3) then adding in the legend.

In [10]:
from numpy.random import randn

In [11]:
fig = plt.figure(); ax = fig.add_subplot(1, 1, 1)

#Start with a a blank 'graphing canvas'

<IPython.core.display.Javascript object>

In [12]:
ax.plot(randn(1000).cumsum(), 'k', label = 'one')

[<matplotlib.lines.Line2D at 0x1d4071edb50>]

In [13]:
ax.plot(randn(1000).cumsum(), 'k--', label = 'two')

[<matplotlib.lines.Line2D at 0x1d4071e5430>]

In [14]:
ax.plot(randn(1000).cumsum(), 'k.', label = 'three')

[<matplotlib.lines.Line2D at 0x1d407df4460>]

In [15]:
# Here is where you ad the actual legend. 
ax.legend(loc = 'best') 

# You can also do plt.legend()

<matplotlib.legend.Legend at 0x1d4071daa90>

There are several other choices for the legend location. You can look at the docstring (with <code>ax.legend?</code>) for more info.

In [16]:
ax.legend?

If you are not sure where to put the legend, 'best' will choose the most out-of-the way spot. If you want to exclude one of the lines from legend, just pass: <code>label = '_nolegend_'</code>.

<h3> <font color = #39abed>Annotations and Drawing on a Subplot
    </h3>
<p>This includes adding to you plot things lie: text, arrows, or other shapes.<br>
<code>text</code> drwas text at the provided coordinates on the plot wih some custom styling available:
<br><code>ax.text(x, y, 'Hello World!,
    family = 'monospace', fontsize = 10)</code><br>
<p> See pages 265-266 for an example using Yahoo stock data. 

<h3> <font color = #39abed>Saving Plots to File
    </h3>
<p> You can save the active figure by using <code>plt.savefig</code>.
Another example, to get minimum white space at 400 res:<br>
    <code>plt.savefig('figpath.png', dpi = 400, bbox_inches = 'tight')</code><br>
In the above, 'dpi' controls the dots-per-inch resolution. 'bbox_inches', can trim the whitespace around the actual figure.<br><br>
    <p><b> See table 9.2 on page 268 for more </b><code>.savefig</code> <b> options.

<h3> <font color = #39abed>matplotlib Configuration
    </h3>
<p>With matplotlib you can customise pretty much all of the defaults for things like: figure size, subplot spacing, colours, font sizes, grid styles, etc. One way to modify these things is to use the <code>rc</code> method. E.g., to set the global default figure size to be 10 x 10, you could enter the following: <br>
    <code>plt.rc('figure', figsize = (10, 10))</code><br>
<p>If you want to adjust multiple things, you could even write them all down as a dict, and then run that dict! E.g.,<br>
<code>font_options = {'family' : 'monospace',
    'weight' : 'bold',
    'size' : 'small'}
plt.rc('font', **font_options)</code><br><br>
<p><b> See matplotlibrc in the matplotlib/mpl-data directory for more of the possible options!

<h2> <font color = #39abed> 9.2 Plotting with pandas and seaborn
    </h2>

<h3> <font color = #39abed>Line Plots
    </h3>
<p> Both Series and Dataframes have a <code>plot</code> attribute. By default, <code>plot()</code> will make line plots.

In the next chunk of code, the Series object's index is used for plotting the x-axis. Where to start, where to end, and the increments, respectively. (start at 0, end at 100, in increments of 10).

In [3]:
s = pd.Series(np.random.randn(10).cumsum(), index = np.arange(0, 100, 10))
s.plot()

<IPython.core.display.Javascript object>

<matplotlib.axes._subplots.AxesSubplot at 0x185640b7df0>

For DataFrames, the <code>plot</code> method plots each of it's <b>columns</b> as a different line on the same subplot, creating a legend automatically. See below:

In [4]:
df = pd.DataFrame(np.random.randn(10, 4).cumsum(0),
                 columns = ['A', 'B', 'C', 'D'],
                 index = np.arange(0, 100, 10))
df.plot()

<IPython.core.display.Javascript object>

<matplotlib.axes._subplots.AxesSubplot at 0x1856545b4f0>

Also note that the plot attribute has a whole "family" of methods for different plot types. E.g., <code>df.plot()</code> is the same as <code>df.plot.line()</code>

#### See table 9.3 on page 271 for a table of more plot method arguments.

<h3> <font color = #39abed>Barplots
    </h3>
<p><code>plot.bar()</code> and <code>plot.barh()</code> make vertical and horizontal bar plots. The Series or DataFrame index will be used as the x(bar) or y(barh) ticks. See below:

In [5]:
fig, axes = plt.subplots(2, 1) # Creating the blank 'canvas'

<IPython.core.display.Javascript object>

In [9]:
# Now, create le data with a Series
data = pd.Series(np.random.rand(16), index = list('abcdefghijklmnop'))

For the two bar plots below. The options mean this: <code>color = 'k'</code> is for making the color of the plots black, and <code>alpha = 0.7</code> is for using partial transparency on the filling.

In [7]:
# Create le plot, a standard vertical bar plot
data.plot.bar(ax = axes [0], color = 'k', alpha = 0.7)

<matplotlib.axes._subplots.AxesSubplot at 0x1856610bdc0>

In [8]:
# Create the horizontal barplot
data.plot.barh(ax = axes [1], color = 'k', alpha = 0.7)

<matplotlib.axes._subplots.AxesSubplot at 0x1856610b910>

For DataFrames, bar plots group the values in each row together in a group in bars, side by side, for each value. See below:

In [15]:
# Remember, that the index will be the name of the x axis
df = pd.DataFrame(np.random.rand(6, 4), 
                  index = ['one', ' two', 'three', 'four', 'five', 'six'],
                  columns = pd.Index(['A', 'B', 'C', 'D'], name = "Genus")) # Name = the name for the legend
df

Genus,A,B,C,D
one,0.329564,0.914651,0.976639,0.900793
two,0.239806,0.168014,0.46205,0.52221
three,0.804,0.459123,0.964743,0.862453
four,0.431255,0.683266,0.889616,0.194949
five,0.296793,0.493651,0.042827,0.715262
six,0.758821,0.089367,0.368489,0.631343


In [16]:
df.plot.bar()

<IPython.core.display.Javascript object>

<matplotlib.axes._subplots.AxesSubplot at 0x1856611e7f0>

In [17]:
# If you want the bar plots to be stacked, for stacked = true
df.plot.barh(stacked = True, alpha = 0.5)

<IPython.core.display.Javascript object>

<matplotlib.axes._subplots.AxesSubplot at 0x18567bc1400>

<blockquote>Returning to the tipping dataset used earlier in the book, suppose we wanted to make a stacked bar plot showing the percentage of data points for each party size on each day. I load the data using read_csv and make a cross-tabulation by day and party size:

In [42]:
tips = pd.read_csv('C:\\Users\\Kitty\\Desktop\\learnpy\\tips.csv')

In [43]:
party_counts = pd.crosstab(tips['day'], tips['size'])
party_counts

size,1,2,3,4,5,6
day,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Fri,1,16,1,1,0,0
Sat,2,53,18,13,1,0
Sun,0,39,15,18,3,1
Thur,1,48,4,5,1,3


In [44]:
# There aren't many 1 and 6 person parties, so we can take them away
party_counts = party_counts.loc[:, 2:5]

In [45]:
# Now, normalise the sum to 1 and make a plot.... BUT WHY
party_pcts = party_counts.div(party_counts.sum(1), axis = 0)
party_pcts

size,2,3,4,5
day,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
Fri,0.888889,0.055556,0.055556,0.0
Sat,0.623529,0.211765,0.152941,0.011765
Sun,0.52,0.2,0.24,0.04
Thur,0.827586,0.068966,0.086207,0.017241


In [25]:
party_pcts.plot.bar()

<IPython.core.display.Javascript object>

<matplotlib.axes._subplots.AxesSubplot at 0x18568853280>

<blockquote>So you can see that party sizes appear to increase on the weekend in this dataset. With data that requires aggregation or summarization before making a plot, using the seaborn  package can make things much simpler. Let’s look now at the tipping per‐ centage by day with seaborn 

In [46]:
# Now let's do the same with SEABORN
import seaborn as sns

In [47]:
tips['tip_pct'] = tips['tip'] / (tips['total_bill'] - tips['tip'])

In [41]:
tips.head()

Unnamed: 0,total_bill,tip,smoker,day,time,size,tip_pct
0,16.99,1.01,No,Sun,Dinner,2,0.063204
1,10.34,1.66,No,Sun,Dinner,3,0.191244
2,21.01,3.5,No,Sun,Dinner,3,0.199886
3,23.68,3.31,No,Sun,Dinner,2,0.162494
4,24.59,3.61,No,Sun,Dinner,4,0.172069


In [41]:
#NOTE, run this in a new session, or refresh/make a new fig canvas. Otherwise it will plot it on the previous tips barplot!
sns.barplot(x = 'tip_pct', y = 'day', data = tips, orient = 'h')

<IPython.core.display.Javascript object>

<matplotlib.axes._subplots.AxesSubplot at 0x1856b6780a0>

Seaborn is more similar to R, in that it takes a 'data' argument, which can even be a pandas DataFarame. Other arguments refer to column names. <br>
Because there are multiple observations for each value in the 'day', the bars in the graph above are the average of the tip_pct. <br><br>
<p><code>seaborn.barplot</code> has a 'hue' option that allows you to split by an <i>additional</i> categorical value! See below:

In [42]:
sns.barplot(x = 'tip_pct', y = 'day', hue = 'time', data = tips, orient = 'h')

# See above. Note that this re-writes!

<matplotlib.axes._subplots.AxesSubplot at 0x1856b6780a0>

<h3> <font color = #39abed>Histograms and Density Plots
    </h3>
<p>

In [48]:
# using plot.hist for a series!
tips['tip_pct'].plot.hist(bins = 50)

<IPython.core.display.Javascript object>

<matplotlib.axes._subplots.AxesSubplot at 0x1856c33cfa0>

Another and related plot is a <i>density plot</i>. It is formed by computing an estimate of a continuous probability distribution that might have generated the observed data... See pg.278 for more indo. These type of plots use "kernels" - or, simpler distributions like the normal distribution. Therefore, denisty plots are also known as <b>kernel denisty estimate (KDE) plot</b>. See below for an example.

In [62]:
tips['tip_pct'].plot.density()

<IPython.core.display.Javascript object>

<matplotlib.axes._subplots.AxesSubplot at 0x1856c447580>

In [9]:
comp1 = np.random.normal(0, 1, size = 200) # Adding a normal distrubition curve

In [10]:
comp2 = np.random.normal(10, 2, size = 200) # Adding a second normal distribution

In [11]:
values = pd.Series(np.concatenate([comp1, comp2]))

In [14]:
sns.distplot(values, bins = 100, color = 'k')

<IPython.core.display.Javascript object>

<matplotlib.axes._subplots.AxesSubplot at 0x259dad09730>

<h3> <font color = #39abed>Scatter or Point Plots
    </h3>
<p>In this example, we will load the 'macrodata' data set. 

In [20]:
macro = pd.read_csv('C:\\Users\\Kitty\\Desktop\\learnpy\\macrodata.csv')

In [21]:
data = macro[['cpi', 'm1', 'tbilrate', 'unemp']]

In [22]:
trans_data = np.log(data).diff().dropna() # get rid of the  NA

In [18]:
trans_data[-5:]

Unnamed: 0,cpi,m1,tbilrate,unemp
198,-0.007904,0.045361,-0.396881,0.105361
199,-0.021979,0.066753,-2.277267,0.139762
200,0.00234,0.010286,0.606136,0.160343
201,0.008419,0.037461,-0.200671,0.127339
202,0.008894,0.012202,-0.405465,0.04256


In [27]:
# Now you can use 'regplot' method
sns.regplot('m1', 'unemp', data = trans_data)

<IPython.core.display.Javascript object>

<matplotlib.axes._subplots.AxesSubplot at 0x259dc19a880>

In [28]:
# TO make a scatter plot matrix, as you normaly do in data analys, 
# Just do 'pairplot' in seaborn. See example below:

sns.pairplot(trans_data, diag_kind =  'kde', plot_kws = {'alpha': 0.2})

<IPython.core.display.Javascript object>

<seaborn.axisgrid.PairGrid at 0x259dc21be80>

In the above, there is <code>plot_kws</code> argument, which allows for individual plotting calls on the off-diagonal elements... (<b>WHAT?!</b>). Look up 'seaborn.pairplot' for more info.

<h3> <font color = #39abed>Facet Grids and Categorical Data
    </h3>
<p>For datasets where you have <b>additional grouping dimensions</b>, (i.e., many categorical variables), use facet grids with seaborn's <code>catplot</code>. It is not longer called a factorplot!

In [36]:
sns.factorplot(x = 'day', y = 'tip_pct', hue = "time", col = "smoker",
              kind = 'bar', data = tips[tips.tip_pct < 1])



<IPython.core.display.Javascript object>

<seaborn.axisgrid.FacetGrid at 0x259dd517370>

In [48]:
# If you do not want to group by 'time' you can also do different
# bar colours within a facet.

#You can also expand the facet grid by adding one row per time value. See below:

sns.catplot(x = 'day', y = 'tip_pct', row = 'time',
           col = 'smoker',
           kind = 'bar', data = tips[tips.tip_pct < 1])

<IPython.core.display.Javascript object>

<seaborn.axisgrid.FacetGrid at 0x259dd5b0bb0>

<h2> <font color = #39abed> 9.3 Other Python Visualisation Tools
    </h2>

<h2> <font color = #39abed> 9.4 Conclusion
    </h2>