
<p><img align="left" src="https://www.cqf.com/themes/custom/creode/logo.svg" style="vertical-align: top; padding-top: 23px;" width="10%"/>
<img align="right" src="https://upload.wikimedia.org/wikipedia/commons/c/c3/Python-logo-notext.svg" style="vertical-align: middle;" width="12%"/>
<font color="#306998"><h1><center>Python Labs</center></h1></font></p>
<p></p><h1><center>Introduction to Data Visualization</center></h1>
<center><h3>Kannan Singaravelu</h3></center>
<center>kannan.singaravelu@fitchlearning.com</center>



<h2 id="Data-Visualization">Data Visualization<a class="anchor-link" href="#Data-Visualization">¶</a></h2><p>Data visualization is the graphic representation of data. It involves producing images that communicate relationships among the data and is critical part of data science. We will use <code>matplotlib</code> and <code>seaborn</code> libraries for static plotting and <code>cufflinks</code> for interactive visualization.</p>



<h3 id="Installing-Libraries">Installing Libraries<a class="anchor-link" href="#Installing-Libraries">¶</a></h3><p>We'll install the required libraries that we'll use in this example.</p>


In [None]:

# ! pip install matplotlib
# ! pip install seaborn
# ! pip install cufflinks==0.16.0



In [None]:

# Import required libraries
import pandas as pd
import numpy as np




<h3 id="Loading-Datasets">Loading Datasets<a class="anchor-link" href="#Loading-Datasets">¶</a></h3>


In [None]:

# Load data to plot
df = pd.read_csv('data/faang_stocks.csv', index_col=0, parse_dates=True)['2013':]
df.head()



In [None]:

spy = pd.read_csv('data/spy.csv', index_col=0, parse_dates=True)['2020':]
spy.tail()




<h2 id="Matplotlib">Matplotlib<a class="anchor-link" href="#Matplotlib">¶</a></h2>



<p><code>Matplotlib</code> is a multiplatform data visualization library built on NumPy arrays where it converts all sequences to NumPy arrays internally. Originally written as a Python alternative for MATLAB users, it is one of the most widely adopted visualization package. The biggest advantage of Matplotlib is its ability to adapt well with many operating systems and graphics backend. Matplotlib is a comprehensive library for creating publication quality plots and its the ability to customize properties make it a go to tool for data visualization.</p>
<p>Matplotlib has two interfaces: a) <code>MATALB-style</code> based interface and b) <code>Object-oriented</code> interface. While the former is convenient, the later approach is more powerful.</p>



<h3 id="Importing-Matplotlib">Importing Matplotlib<a class="anchor-link" href="#Importing-Matplotlib">¶</a></h3>


In [None]:

import matplotlib as mpl
import matplotlib.pyplot as plt

# Set the plot style
plt.style.use('dark_background')




<p><code>matplotlib.pyplot</code> is a collection of command style functions that make matplotlib work like MATLAB. Each pyplot function makes some change to a figure. For example, it creates a figure, creates a plotting area in a figure, plots some lines in a plotting area, decorates the plot with labels, etc., Various states are preserved across function calls, so that it keeps track of things like the current figure and plotting area, and the plotting functions are directed to the current axes.</p>



<h3 id="MATLAB-Style-Interface">MATLAB-Style Interface<a class="anchor-link" href="#MATLAB-Style-Interface">¶</a></h3>


In [None]:

x = np.linspace(-5,5,100)
fig = plt.figure()
plt.title('Sine-Cosine Plot')
plt.plot(x, np.sin(x))
plt.plot(x, np.cos(x));




<p>In the transition between MATLAB-style functions and object-oriented methods, most <code>plt</code> functions translate directly to <code>ax</code> methods. For example, <code>plt.plot() -&gt; ax.plot()</code> and <code>plt.legend() -&gt; ax.legend()</code> etc., However, functions that set limits, labels and titles are slightly modified.</p>
<p>Refer table below for such changes.</p>
<table>
<thead><tr>
<th>MATLAB-style</th>
<th>Object-oriented</th>
</tr>
</thead>
<tbody>
<tr>
<td><code>plt.xlabel()</code></td>
<td><code>ax.set_xlabel()</code></td>
</tr>
<tr>
<td><code>plt.ylabel()</code></td>
<td><code>ax.set_ylabel()</code></td>
</tr>
<tr>
<td><code>plt.xlim()</code></td>
<td><code>ax.set_xlim()</code></td>
</tr>
<tr>
<td><code>plt.ylim()</code></td>
<td><code>ax.set_ylim()</code></td>
</tr>
<tr>
<td><code>plt.title()</code></td>
<td><code>ax.set_title()</code></td>
</tr>
</tbody>
</table>



<h3 id="Object-oriented-Interface">Object-oriented Interface<a class="anchor-link" href="#Object-oriented-Interface">¶</a></h3>


In [None]:

fig = plt.figure()
ax = plt.axes()
ax.set_title('Sine-Cosine Plot')
ax.plot(x, np.sin(x), color='blue')
ax.plot(x, np.cos(x), color='red');




<h3 id="Subplots">Subplots<a class="anchor-link" href="#Subplots">¶</a></h3>


In [None]:

# MATLAB-style
# create a plot figure
plt.figure() 

# create the first of two panels and set current axis
plt.subplot(2, 1, 1) # (rows, columns, panel number)
plt.title('Sine')
plt.plot(x, np.sin(x))

# create the second panel and set current axis
plt.subplot(2, 1, 2)
plt.title('Cosine')
plt.plot(x, np.cos(x));



In [None]:

# Object-oriented
# First create a grid of plots; ax will be an array of two Axes objects
fig, ax = plt.subplots(2)
# Call plot() method on the appropriate object
ax[0].plot(x, np.sin(x), color='blue')
ax[0].set_title('Sine')
ax[1].plot(x, np.cos(x), color='red')
ax[1].set_title('Cosine');



In [None]:

fig, ax = plt.subplots(2,2, figsize=(12, 10))

# Call plot() method on the appropriate object
ax[0,0].plot(x, np.sin(x), color='blue')
ax[0,0].set_title('Sine')

ax[0,1].plot(x, np.tan(x), color='green')
ax[0,1].set_title('Tangent')

ax[1,0].plot(x, np.cos(x), color='red')
ax[1,0].set_title('Cosine')

ax[1,1].plot(x, np.tanh(x), color='orange');
ax[1,1].set_ylim(-1,1)
ax[1,1].set_title('Hyberbolic Tangent');




<p>Object-oriented approach offer high level of customization, especially for multi-plots. The above subplots can also be plotted using a for-loop.</p>


In [None]:

# Using for loop to generate subplots
fig, ax = plt.subplots(2,2, figsize=(12,10), sharex=True)
flist = [np.sin(x), np.cos(x), np.tan(x), np.tanh(x)]
ftitle = ['Sine', 'Cosine', 'Tangent', 'Hyperbolic Tangent']
fcolor = ['blue', 'red', 'green', 'orange']

k = 0
for i in range(2):
    for j in range(2):
        ax[i,j].plot(x, flist[k], color=fcolor[k])
        ax[i,j].set_title(ftitle[k])
        k+=1




<h2 id="Chart-Types">Chart Types<a class="anchor-link" href="#Chart-Types">¶</a></h2>



<h3 id="Histogram">Histogram<a class="anchor-link" href="#Histogram">¶</a></h3>


In [None]:

plt.hist(np.random.randn(100000), bins=50, histtype='stepfilled', color='orange', alpha=1);




<h3 id="Bar-Graph">Bar Graph<a class="anchor-link" href="#Bar-Graph">¶</a></h3>


In [None]:

x = np.array(list("ABCDEFGHIJ"))
y = np.arange(1, 11)
y1 = y - 5

plt.bar(x,y, color='blue')
plt.bar(x,y1, color='orange');




<h3 id="Scatter-Plot">Scatter Plot<a class="anchor-link" href="#Scatter-Plot">¶</a></h3>


In [None]:

plt.scatter(x,y, color='blue')
plt.scatter(x,y1, color='red');




<h3 id="Stylesheets">Stylesheets<a class="anchor-link" href="#Stylesheets">¶</a></h3>



<p>Switch to a stylesheet by using <em><code>plt.style.use('stylename')</code></em>. This will change the style for the entire session. To set style temporarily (for a particular plot), we can use style context manager <em><code>plt.style.context('stylename')</code></em>.</p>


In [None]:

# List of available styles
plt.style.available[:6]




<h3 id="Style-:-dark_background">Style : dark_background<a class="anchor-link" href="#Style-:-dark_background">¶</a></h3>


In [None]:

with plt.style.context('dark_background'):
    plt.hist(np.random.randn(100000), bins=50)
    plt.title('dark_background')




<h3 id="Style-:-fivethirtyeight">Style : fivethirtyeight<a class="anchor-link" href="#Style-:-fivethirtyeight">¶</a></h3>


In [None]:

# plt.setp(plt.title('fivethirtyeight'), color='red')
with plt.style.context('fivethirtyeight'):
    plt.hist(np.random.randn(100000), bins=50)
    plt.title('fivethirtyeight')




<h3 id="Style:-ggplot">Style: ggplot<a class="anchor-link" href="#Style:-ggplot">¶</a></h3>


In [None]:

with plt.style.context('ggplot'):
    plt.hist(np.random.randn(100000), bins=50)
    plt.title('ggplot')




<h3 id="Style:-seaborn">Style: seaborn<a class="anchor-link" href="#Style:-seaborn">¶</a></h3>


In [None]:

with plt.style.context('seaborn'):
    plt.hist(np.random.randn(100000), bins=50)
    plt.title('Seaborn')




<h3 id="Saving-Plots">Saving Plots<a class="anchor-link" href="#Saving-Plots">¶</a></h3>


In [None]:

# Get supported file types
fig.canvas.get_supported_filetypes()



In [None]:

# Saving figure
# fig.savefig('first_matplotlib.pdf')




<h2 id="Seaborn">Seaborn<a class="anchor-link" href="#Seaborn">¶</a></h2>



<p><code>Seaborn</code> is a Python data visualization library based on matplotlib. It provides a high-level interface for drawing attractive and informative statistical graphics. Seaborn address some of key concern of matplotlib with respected to sophisticated statistical visualization or plotting Pandas DataFrame data. While matplotlib tries to address these issue in the subsequent version, seaborn remains a natural choice.</p>


In [None]:

import seaborn as sns
sns.set() # set the seaborn style




<h3 id="Overriding-the-matplotlib-parameters">Overriding the matplotlib parameters<a class="anchor-link" href="#Overriding-the-matplotlib-parameters">¶</a></h3>


In [None]:

plt.plot(df)
plt.legend(df.columns, ncol=2, loc='upper left');




<h3 id="Pairplot">Pairplot<a class="anchor-link" href="#Pairplot">¶</a></h3><p>Pair plots are very useful for exploring correlations between multidimensional data.</p>


In [None]:

sns.pairplot(df.pct_change(1).fillna(0))




<h3 id="KDE-Plot">KDE Plot<a class="anchor-link" href="#KDE-Plot">¶</a></h3>


In [None]:

sns.kdeplot(df['AAPL'].pct_change().fillna(0), shade=True);




<h3 id="Histogram---Distribution-Plot">Histogram - Distribution Plot<a class="anchor-link" href="#Histogram---Distribution-Plot">¶</a></h3>


In [None]:

sns.distplot(df['AAPL'].pct_change().fillna(0));




<h3 id="Violin-Plot">Violin Plot<a class="anchor-link" href="#Violin-Plot">¶</a></h3>


In [None]:

# Draw a nested violinplot and split the violins for easier comparison
sns.violinplot(x=df.index.year, y=df['AAPL'].pct_change().fillna(0), split=True, inner="box")
plt.title('violin plot');




<h3 id="Heatmap">Heatmap<a class="anchor-link" href="#Heatmap">¶</a></h3>


In [None]:

# Load the example flights dataset and convert to long-form
flights_long = sns.load_dataset("flights")
flights = flights_long.pivot("month", "year", "passengers")

# Draw a heatmap with the numeric values in each cell
f, ax = plt.subplots(figsize=(9, 6))
sns.heatmap(flights, annot=True, fmt="d", linewidths=.5, ax=ax)
ax.set_title('Heat Map');




<h1 id="References">References<a class="anchor-link" href="#References">¶</a></h1><ul>
<li><p>Matplotlib documentation <a href="https://matplotlib.org">https://matplotlib.org</a></p>
</li>
<li><p>Seaborn documentation <a href="https://seaborn.pydata.org/index.html">https://seaborn.pydata.org/index.html</a></p>
</li>
<li><p>Cufflinks documentation <a href="https://github.com/santosjorge/cufflinks">https://github.com/santosjorge/cufflinks</a> and <a href="https://plotly.com/python/cufflinks/">https://plotly.com/python/cufflinks/</a></p>
</li>
<li><p>Python resources <a href="https://github.com/kannansingaravelu/PythonResources">https://github.com/kannansingaravelu/PythonResources</a></p>
</li>
</ul>
