#                                    Comprehensive Guide to--Matplotlib / Seaborn / Plotly

Matplotlib is a multiplatform data visualization library built on NumPy arrays, and
designed to work with the broader SciPy stack.One of Matplotlib’s most important features is its ability to play well with many operating systems and graphics backends. Matplotlib supports dozens of backends and
output types, which means you can count on it to work regardless of which operating
system you are using or which output format you wish.

# Table of Contents

* [Section  1 - Importing matplotlib & Classic Graph](#section-one)  
* [Section 2 - loading from a script](#section-two)
* [Section 3 - Adjusting the Plot: Line Colors and Styles](#section-Three)
* [Section 4 - Simple Scatter Plots](#section-four)
* [Section 5 - Visualizing Errors Density and Contour Plots](#section-five)
* [Section 6 - Histograms, Binnings, and Density](#section-six)
* [Section 7 - Customizing Plot Legends](#section-seven)
* [Section 8 - Multiple Subplots](#section-eight)
* [Section 9 - Multiple Plots](#section-nine)
* [Section 10 - Text & Annotation/Text Position/Arrow Position](#section-ten)
* [Section 11 - Customizing Matplotlib: Configurations and Stylesheets](#section-eleven)
* [Section 12 - Three-Dimensional Plotting in Matplotlib](#section-twelve)
* [Section 13 - Visualization with Seaborn](#section-thirteen)
* [Section 14 - Visualization with Plotly](#section-fourteen)
* [Section 15 - Read data from input files for Seaborn Plots](#section-fifteen)
* [Section 16 - Bar Plot using Seaborn](#section-sixteen)
* [Section 17 - Point Plot using Seaborn](#section-seventeen)
* [Section 18 - Joint Plot using Seaborn](#section-eighteen)
* [Section 19 - Pie Plot using Seaborn](#section-ninteen)
* [Section 20 - Lm Plot using Seaborn](#section-twenty)
* [Section 21 - Kde Plot using Seaborn](#section-twentyone)
* [Section 22 - Violin Plot using Seaborn](#section-twentytwo)
* [Section 23 - Heatmap](#section-twentythree)
* [Section 24 - Box plot](#section-twentyfour)
* [Section 25 - Swarm Plot](#section-twentyfive)
* [Section 26 - Pair Plot](#section-twentysix)
* [Section 27 - Count Plot](#section-twentyseven)
* [Section 28 - Read data from input files for Plotly Plots](#section-twentyeight)
* [Section 29 - Line Charts Plotly Plots](#section-twentynine)
* [Section 30 - Scatter Charts Plotly Plots](#section-thirty)
* [Section 31 - Bar Charts Plotly Plots](#section-thirtyone)
* [Section 32 - Pie Charts Plotly Plots](#section-thirtytwo)
* [Section 33 - Bubble Charts Plotly Plots](#section-thirtythree)
* [Section 34 - Histogram Plotly Plots](#section-thirtyfour)
* [Section 35 - Word Cloud Plotly Plots](#section-thirtyfive)
* [Section 36 - Box Plots Plotly Plots](#section-thirtysix)
* [Section 37 - Scatter Matrix Plotly Plots](#section-thirtyseven)


In [None]:
# This Python 3 environment comes with many helpful analytics libraries installed
# It is defined by the kaggle/python Docker image: https://github.com/kaggle/docker-python
# For example, here's several helpful packages to load

import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)

# Input data files are available in the read-only "../input/" directory
# For example, running this (by clicking run or pressing Shift+Enter) will list all files under the input directory

import os
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))

# You can write up to 5GB to the current directory (/kaggle/working/) that gets preserved as output when you create a version using "Save & Run All" 
# You can also write temporary files to /kaggle/temp/, but they won't be saved outside of the current session

Importing matplotlib
Just as we use the np shorthand for NumPy and the pd shorthand for Pandas, we will
use some standard shorthands for Matplotlib imports:

Setting Styles
We will use the plt.style directive to choose appropriate aesthetic styles for our fig‐
ures. Here we will set the classic style, which ensures that the plots we create use the
classic Matplotlib style:


<a id="section-one"></a>
# Section  1 - Importing matplotlib & Classic Graph

In [None]:
import matplotlib as mlp
import matplotlib.pyplot as plt
import numpy as np

# plot simple sin & cos function

plt.style.use('classic')

x = np.linspace(1,10,200)
plt.plot(x,np.sin(x))
plt.plot(x,np.cos(x))
plt.show()  # plt.show() starts an event loop, looks for all currently active figure objects,and opens one or more interactive windows that display your figure or figures.



Plotting from an IPython shell

It can be very convenient to use Matplotlib interactively within an IPython shell. IPython is built to work well with Matplotlib if you specify Matplotlib
mode. To enable this mode, you can use the %matplotlib magic command after start‐
ing ipython:

In [None]:
%matplotlib

Plotting from an IPython notebook

The IPython notebook is a browser-based interactive data analysis tool that can com‐
bine narrative, code, graphics, HTML elements, and much more into a single exe‐
cutable document

Plotting interactively within an IPython notebook can be done with the %matplotlib
command, and works in a similar way to the IPython shell. In the IPython notebook,
you also have the option of embedding graphics directly in the notebook, with two
possible options:

• %matplotlib notebook will lead to interactive plots embedded within the
notebook
• %matplotlib inline will lead to static images of your plot embedded in the
notebook

For this book, we will generally opt for %matplotlib inline:

%matplotlib inline

After you run this command (it needs to be done only once per kernel/session), any
cell within the notebook that creates a plot will embed a PNG image of the resulting
graphic




Saving Figures to File
One nice feature of Matplotlib is the ability to save figures in a wide variety of for‐
mats. You can save a figure using the savefig() command. For example, to save the
previous figure as a PNG file, you can run this:

<a id="section-two"></a>
# Section 2 - loading from a script

In [None]:
x = np.linspace(0, 10, 100)
fig=plt.figure()
plt.plot(x,np.sin(x),'_')
plt.plot(x,np.cos(x), '_')

fig.savefig('my_figure.png')


The file format is inferred from the extension of the given filename.
Depending on what backends you have installed, many different file formats are
available. You can find the list of supported file types for your system by using the
following method of the figure canvas object:

In [None]:
fig.canvas.get_supported_filetypes()


In [None]:
# one more way to draw graph

plt.figure()

# create the first of two panels and set current axis

plt.subplot(2,1,1) # (rows, columns, panel number)
plt.plot(x, np.sin(x))

# create the second panel and set current axis

plt.subplot(2,1,2)
plt.plot(x, np.cos(x))

In [None]:
# Simple Line Plots
# Perhaps the simplest of all plots is the visualization of a single function y = f(x) . Here we will take a first look at creating a simple plot of this type. As with all the following
# sections, we’ll start by setting up the notebook for plotting and importing the func‐ tions we will use:

%matplotlib inline
import matplotlib.pyplot as plt
plt.style.use('seaborn-whitegrid')
import numpy as np

# For all Matplotlib plots, we start by creating a figure and an axes. In their simplest form, a figure and axes can be created as follows

fig=plt.figure()
ax=plt.axis()

# Once we have created an axes, we can use the ax.plot function to plot some data. Let’s start with a simple sinusoid
x=np.linspace(1,10,1000)

x = np.linspace(0, 10, 2000)
plt.plot(x, np.sin(x))

# If we want to create a single figure with multiple lines, we can simply call the plot function multiple times
plt.plot(x, np.sin(x))
plt.plot(x, np.cos(x))
# plt.plot(x, np.tan(x))



<a id="section-Three"></a>
# Section 3 - Adjusting the Plot: Line Colors and Styles


In [None]:
# Adjusting the Plot: Line Colors and Styles

# The first adjustment you might wish to make to a plot is to control the line colors and styles. The plt.plot() function takes additional arguments that can be used to spec‐
# ify these. To adjust the color, you can use the color keyword, which accepts a string argument representing virtually any imaginable color. The color can be specified in a variety of ways


plt.plot(x, np.sin(x - 0), color='blue')      # specify color by name
plt.plot(x, np.sin(x - 1), color='g')        # short color code (rgbcmyk)
plt.plot(x, np.sin(x - 2), color='0.75')     # Grayscale between 0 and 1
plt.plot(x, np.sin(x - 3), color='#FFDD44')      # Hex code (RRGGBB from 00 to FF)
plt.plot(x, np.sin(x - 4), color=(1.0,0.2,0.3))    # RGB tuple, values 0 and 1
plt.plot(x, np.sin(x - 5), color='chartreuse');     # all HTML color names supported

# If no color is specified, Matplotlib will automatically cycle through a set of default colors for multiple lines.

# Similarly, you can adjust the line style using the linestyle keyword

plt.plot(x, x + 0, linestyle='solid')
plt.plot(x, x + 1, linestyle='dashed')
plt.plot(x, x + 2, linestyle='dashdot')
plt.plot(x, x + 3, linestyle='dotted')
    
 # For short, you can use the following codes:

plt.plot(x, x + 4, linestyle='-') # solid
plt.plot(x, x + 5, linestyle='--') # dashed
plt.plot(x, x + 6, linestyle='-.') # dashdot
plt.plot(x, x + 7, linestyle=':') # dotted


In [None]:
# If you would like to be extremely terse, these linestyle and color codes can be com‐ bined into a single nonkeyword argument to the plt.plot() function

plt.plot(x, x + 0, '-g') # solid green  # x & x+1 is drawing a line here
plt.plot(x, x + 1, '--c') # dashed cyan
plt.plot(x, x + 2, '-.k') # dashdot black
plt.plot(x, x + 3, ':r'); # dotted red

# These single-character color codes reflect the standard abbreviations in the RGB (Red/Green/Blue) and CMYK (Cyan/Magenta/Yellow/blacK) color systems, com‐monly used for digital color graphics.


In [None]:
# Adjusting the Plot: Axes Limits

# Matplotlib does a decent job of choosing default axes limits for your plot, but some‐times it’s nice to have finer control. The most basic way to adjust axis limits is to use the plt.xlim() and plt.ylim() methods

plt.plot(x,np.sin(x))
plt.xlim(0, 11)
plt.ylim(0, 1.5)

In [None]:
# If for some reason you’d like either axis to be displayed in reverse, you can simply reverse the order of the arguments

plt.plot(x,np.sin(x))
plt.xlim(10,0)
plt.ylim(1.2, -1.2)

In [None]:
# A useful related method is plt.axis() (note here the potential confusion between axes with an e, and axis with an i). The plt.axis() method allows you to set the x
# and y limits with a single call, by passing a list that specifies [xmin, xmax, ymin,ymax]

plt.plot(x,np.sin(x))
plt.axis([-1,11,0,6])

In [None]:
# The plt.axis() method goes even beyond this, allowing you to do things like auto‐ matically tighten the bounds around the current plot

plt.plot(x,np.sin(x))
plt.axis('tight')

In [None]:
# It allows even higher-level specifications, such as ensuring an equal aspect ratio so that on your screen, one unit in x is equal to one unit in y

plt.plot(x,np.sin(x))
plt.axis('equal')

In [None]:
# Labeling Plots
# we’ll briefly look at the labeling of plots: titles, axis labels, and simple legends.
# Titles and axis labels are the simplest such labels—there are methods that can be used to quickly set them

plt.plot(x,np.sin(x))
plt.title('A sign curve')
plt.xlabel("x value")
plt.ylabel("sinx value")

In [None]:
# When multiple lines are being shown within a single axes, it can be useful to create a plot legend that labels each line type.
# Again, Matplotlib has a built-in way of quickly creating such a legend. It is done via the (you guessed it) plt.legend() method.

plt.plot(x,np.sin(x),'g',label='sin(x)')
plt.plot(x,np.cos(x), 'r',label='cos(x)')

plt.axis('equal')
plt.legend()  # this method is responsible for displaying legend

# As you can see, the plt.legend() function keeps track of the line style and color, and matches these with the correct label. More information on specifying and formatting
# plot legends can be found in the plt.legend() docstring;

In [None]:
# In the object-oriented interface to plotting, rather than calling these functions indi‐ vidually, it is often more convenient to use the ax.set() method to set all these prop‐erties at once

ax = plt.axes()
ax.plot(x, np.sin(x))
ax.set(xlim=(0,10),ylim=(-2,2),xlabel='x',ylabel='sin(x)',title='A sign curve')



<a id="section-four"></a>
# Section 4 - Simple Scatter Plots

In [None]:
# Simple Scatter Plots
# Another commonly used plot type is the simple scatter plot, a close cousin of the line plot. Instead of points being joined by line segments, here the points are represented individually with a dot, circle, or other shape. 

%matplotlib inline
import matplotlib.pyplot as plt
plt.style.use('seaborn-whitegrid')
import numpy as np

x = np.linspace(0, 10, 30)
y = np.sin(x)

plt.plot(x,y,'o',color='black')

# The third argument in the function call is a character that represents the type of sym‐bol used for the plotting. Just as you can specify options such as '-' and '--' to con‐
# trol the line style, the marker style has its own set of short string codes. 


In [None]:
rng = np.random.RandomState(0)
for marker in ['o', '.', ',', 'x', '+', 'v', '^', '<', '>', 's', 'd']:
    plt.plot(rng.rand(5), rng.rand(5), marker,label="marker='{0}'".format(marker))

plt.legend(numpoints=1)
plt.xlim(0, 1.8);

In [None]:
# For even more possibilities, these character codes can be used together with line and color codes to plot points along with a line connecting them
plt.plot(x,y,'-ok') # line (-), circle marker (o), black (k)

In [None]:
# Additional keyword arguments to plt.plot specify a wide range of properties of the lines and markers

plt.plot(x,y,'-p',color='gray',markersize=15,linewidth=4,markerfacecolor='white',markeredgecolor='gray',markeredgewidth=2)
plt.ylim(-1.2,1.2)
plt.xlim(0,3)


In [None]:
# Scatter Plots with plt.scatter

plt.scatter(x,y,marker='o')

In [None]:
# Let’s show this by creating a random scatter plot with points of many colors and sizes.

rng = np.random.RandomState(0)
x = rng.randn(100)
y = rng.randn(100)
colors = rng.rand(100)
sizes = 1000 * rng.rand(100)
plt.scatter(x, y, c=colors, s=sizes, alpha=0.3,cmap='viridis')
plt.colorbar(); # show color scale

# Notice that the color argument is automatically mapped to a color scale (shown here by the colorbar() command), and the size argument is given in pixels. In this way,
# the color and size of points can be used to convey information in the visualization, in order to illustrate multidimensional data.

In [None]:
#  we might use the Iris data from Scikit-Learn, where each sample is one of three types of flowers that has had the size of its petals and sepals carefully measured

from sklearn.datasets import load_iris
iris = load_iris()
features = iris.data.T
plt.scatter(features[0], features[1], alpha=0.2,s=100*features[3], c=iris.target, cmap='viridis')
plt.xlabel(iris.feature_names[0])
plt.ylabel(iris.feature_names[1])


<a id="section-five"></a>
# Section 5 - Visualizing Errors Density and Contour Plots

In [None]:
# Basic Errorbars
%matplotlib inline
import matplotlib.pyplot as plt
plt.style.use('seaborn-whitegrid')
import numpy as np

x=np.linspace(0,10,50)
dy=0.9
y=np.sin(x)+dy*np.random.randn(50)

plt.errorbar(x,y,yerr=dy,fmt='.k')

# Here the fmt is a format code controlling the appearance of lines and points, and has the same syntax as the shorthand used in plt.plot

In [None]:
# In addition to these basic options, the errorbar function has many options to finetune the outputs. Using these additional options you can easily customize the aesthet‐ics of your errorbar plot

plt.errorbar(x,y,yerr=dy,fmt='o',color='black',ecolor='lightgray',elinewidth=3,capsize=0)

<a id="section-six"></a>
# Section 6 - Histograms, Binnings, and Density

Sometimes it is useful to display three-dimensional data in two dimensions using
contours or color-coded regions. There are three Matplotlib functions that can be
helpful for this task: plt.contour for contour plots, plt.contourf for filled contour
plots, and plt.imshow for showing images.

In [None]:
%matplotlib inline
import matplotlib.pyplot as plt
plt.style.use('seaborn-whitegrid')
import numpy as np

# Visualizing a Three-Dimensional Function


# We’ll start by demonstrating a contour plot using a function z = f (x, y)

def f(x,y):
    return np.sin(x)**10+np.cos(10+y*x)*np.cos(x)


# A contour plot can be created with the plt.contour function. It takes three arguments: a grid of x values, a grid of y values, and a grid of z values. The x and y values
# represent positions on the plot, and the z values will be represented by the contour levels. 

x=np.linspace(0,5,60)
y=np.linspace(0,5,50)

# most straightforward way to prepare such data is to use the np.meshgrid function, which builds two-dimensional grids from one-dimensional arrays

X,Y=np.meshgrid(x,y)
Z=f(X,Y)

# Now let’s look at this with a standard line-only contour plot

plt.contour(X,Y,Z,color='black')

# Notice that by default when a single color is used, negative values are represented by dashed lines, and positive values by solid lines.



In [None]:
plt.contour(X, Y, Z, 20, cmap='RdGy')  # we chose the RdGy (short for Red-Gray) colormap

In [None]:
# Our plot is looking nicer, but the spaces between the lines may be a bit distracting. We can change this by switching to a filled contour plot using the plt.contourf()
# function (notice the f at the end), which uses largely the same syntax as plt.contour()

plt.contourf(X, Y, Z, 20, cmap='RdGy')
plt.colorbar()

# The colorbar makes it clear that the black regions are “peaks,” while the red regions are “valleys.”

In [None]:
# A better way to handle this is to use the plt.imshow() function, which inter‐prets a two-dimensional grid of data as an image.

plt.imshow(Z,extent=[0,5,0,5],origin='lower',cmap='RdGy')
plt.colorbar()
plt.axis(aspect='image')

There are a few potential gotchas with imshow(), however:

• plt.imshow() doesn’t accept an x and y grid, so you must manually specify the
extent [xmin, xmax, ymin, ymax] of the image on the plot.

• plt.imshow() by default follows the standard image array definition where the
origin is in the upper left, not in the lower left as in most contour plots. This
must be changed when showing gridded data.


• plt.imshow() will automatically adjust the axis aspect ratio to match the input
data; you can change this by setting, for example, plt.axis(aspect='image') to
make x and y units match.


In [None]:
# Finally, it can sometimes be useful to combine contour plots and image plots. For example, to create the effect shown in Figure 4-34, we’ll use a partially transparent
# background image (with transparency set via the alpha parameter) and over-plot contours with labels on the contours themselves (using the plt.clabel() function

contours=plt.contour(X,Y,Z,3,color='black')
plt.clabel(contours,inline=True,fontsize=8)

plt.imshow(Z,extent=[0,5,0,5],origin='lower',cmap='RdGy',alpha=0.5)
plt.colorbar()

# The combination of these three functions—plt.contour, plt.contourf, and plt.imshow—gives nearly limitless possibilities for displaying this sort of threedimensional data within a two-dimensional plot.


<a id="section-seven"></a>
# Section 7 - Customizing Plot Legends

In [None]:
# A simple histogram can be a great first step in understanding a dataset. Earlier, we saw a preview of Matplotlib’s histogram function

%matplotlib inline
import matplotlib.pyplot as plt
import numpy as np
plt.style.use('seaborn-white')
data=np.random.randn(1000)
plt.hist(data)




In [None]:
# The hist() function has many options to tune both the calculation and the display; here’s an example of a more customized histogram

plt.hist(data, bins=30, alpha=1,histtype='stepfilled', color='red',edgecolor='none')

# The plt.hist docstring has more information on other customization options avail‐ able. I find this combination of histtype='stepfilled' along with some transpar‐
# ency alpha to be very useful when comparing histograms of several distributions

In [None]:
x1=np.random.normal(0,0.8,1000)
x2=np.random.normal(1,2,1000)
x3=np.random.normal(3,4,1000)

test = dict(histtype='stepfilled', alpha=0.3, bins=40)

plt.hist(x1,**test)
plt.hist(x2,**test)
plt.hist(x3,**test)


In [None]:
# If you would like to simply compute the histogram (that is, count the number of points in a given bin) and not display it, the np.histogram() function is available

counts, bin_edges = np.histogram(data, bins=5)
print(counts)


In [None]:
# Two-Dimensional Histograms and Binnings

# Just as we create histograms in one dimension by dividing the number line into bins, we can also create histograms in two dimensions by dividing points among twodimensional bins

# We’ll start by defining some data—an x and y array drawn from a multivariate Gaussian distribution

mean = [0, 0]
cov = [[1, 1], [1, 2]]
x, y = np.random.multivariate_normal(mean, cov, 10000).T

# plt.hist2d: Two-dimensional histogram

# One straightforward way to plot a two-dimensional histogram is to use Matplotlib’s plt.hist2d function

plt.hist2d(x,y,bins=30,cmap='Blues')
cb=plt.colorbar()
cb.set_label('Counts in bin')


Just as with plt.hist, plt.hist2d has a number of extra options to fine-tune the plot
and the binning, which are nicely outlined in the function docstring. Further, just as
plt.hist has a counterpart in np.histogram, plt.hist2d has a counterpart in
np.histogram2d, which can be used as follows:

In [None]:
counts,xedges,yedges=np.histogram2d(x,y,bins=30)

# For the generalization of this histogram binning in dimensions higher than two, see the np.histogramdd function

In [None]:
# plt.hexbin: Hexagonal binnings
# The two-dimensional histogram creates a tessellation of squares across the axes. Another natural shape for such a tessellation is the regular hexagon. For this purpose,
# Matplotlib provides the plt.hexbin routine, which represents a two-dimensional dataset binned within a grid of hexagons

plt.hexbin(x, y, gridsize=30, cmap='Blues')
cb = plt.colorbar(label='count in bin')

# plt.hexbin has a number of interesting options, including the ability to specify weights for each point, and to change the output in each bin to any NumPy aggregate


In [None]:
# Kernel density estimation

# Another common method of evaluating densities in multiple dimensions is kernel density estimation (KDE)

# One extremely quick and simple KDE implementation exists in the scipy.stats package. Here is a quick example of using the KDE on this data

from scipy.stats import gaussian_kde

# fit an array of size [Ndim, Nsamples]

data=np.vstack([x,y])
kde=gaussian_kde(data)

# evaluate on a regular grid

xgrid = np.linspace(-3.5, 3.5, 40)
ygrid = np.linspace(-6, 6, 40)
Xgrid, Ygrid = np.meshgrid(xgrid, ygrid)
Z = kde.evaluate(np.vstack([Xgrid.ravel(), Ygrid.ravel()]))

# Plot the result as an image

plt.imshow(Z.reshape(Xgrid.shape),origin='lower', aspect='auto', extent=[-3.5, 3.5, -6, 6],cmap='Blues')
cb = plt.colorbar()
cb.set_label("density")




<a id="section-eight"></a>
# Section 8 - Multiple Subplots


In [None]:
# Plot legends give meaning to a visualization, assigning labels to the various plot ele‐ments. We previously saw how to create a simple legend; here we’ll take a look at cus‐
# tomizing the placement and aesthetics of the legend in Matplotlib

import matplotlib.pyplot as plt
plt.style.use('classic')
%matplotlib inline
import numpy as np

x=np.linspace(0,10,1000)
fig,ax=plt.subplots()
ax.plot(x,np.sin(x),'b',label='sine')
ax.plot(x, np.cos(x), '--r', label='Cosine')
ax.axis('equal')
leg=ax.legend()


In [None]:
# But there are many ways we might want to customize such a legend. For example, we can specify the location and turn off the frame

ax.legend(loc='upper left',frameon='false')
fig

In [None]:
# We can use the ncol command to specify the number of columns in the legend

ax.legend(frameon=False,loc='lower center',ncol=2)
fig

In [None]:
# We can use a rounded box (fancybox) or add a shadow, change the transparency (alpha value) of the frame, or change the padding around the text

ax.legend(fancybox=True, framealpha=1, shadow=True, borderpad=1)
fig

In [None]:
# Choosing Elements for the Legend

# The plt.plot() command is able to create multiple lines at once, and returns a list of created line instances. Passing any of
# these to plt.legend() will tell it which to identify, along with the labels we’d like to specify

y=np.sin(x[:, np.newaxis] + np.pi * np.arange(0, 2, 0.5))
lines=plt.plot(x,y)

# lines is a list of plt.Line2D instances
plt.legend(lines[:2], ['first', 'second'])


In [None]:
# I generally find in practice that it is clearer to use the first method, applying labels to the plot elements you’d like to show on the legend

plt.plot(x, y[:, 0], label='first')
plt.plot(x, y[:, 1], label='second')
plt.plot(x, y[:, 2:])
plt.legend(framealpha=1, frameon=True)

In [None]:
# Multiple Legends
# creating a new legend artist from scratch, and then using the lower-level ax.add_artist() method to manually add the second artist to the plot

fig,ax=plt.subplots()
lines=[]
styles= ['-', '--', '-.', ':']
x = np.linspace(0, 10, 1000)

for i in range(4):
    lines += ax.plot(x, np.sin(x - i * np.pi / 2),styles[i], color='black')
    
ax.axis('equal')    

# specify the lines and labels of the first legend
ax.legend(lines[:2], ['line A', 'line B'],loc='upper right', frameon=False)

# Create the second legend and add the artist manually.
from matplotlib.legend import Legend

leg = Legend(ax, lines[2:], ['line C', 'line D'],loc='lower right', frameon=False)
ax.add_artist(leg)




In [None]:
# Customizing Colorbars

import matplotlib.pyplot as plt
plt.style.use('classic')

%matplotlib inline
import numpy as np

# As we have seen several times throughout this section, the simplest colorbar can be created with the plt.colorbar function 

x = np.linspace(0, 10, 1000)
I = np.sin(x) * np.cos(x[:, np.newaxis])

plt.imshow(I)
plt.colorbar()



In [None]:
# We can specify the colormap using the cmap argument to the plotting function that is creating the visualization

plt.imshow(I,cmap='gray')

![image.png](attachment:image.png)

In [None]:
from matplotlib.colors import LinearSegmentedColormap

def grayscale_cmap(cmap):
 """Return a grayscale version of the given colormap"""
 cmap = plt.cm.get_cmap(cmap)
 colors = cmap(np.arange(cmap.N))
 # convert RGBA to perceived grayscale luminance
 # cf. http://alienryderflex.com/hsp.html
 RGB_weight = [0.299, 0.587, 0.114]
 luminance = np.sqrt(np.dot(colors[:, :3] ** 2, RGB_weight))
 colors[:, :3] = luminance[:, np.newaxis]
 return LinearSegmentedColormap.from_list(cmap.name + "_gray", colors, cmap.N)


In [None]:
def view_colormap(cmap):
 """Plot a colormap with its grayscale equivalent"""
 cmap = plt.cm.get_cmap(cmap)
 colors = cmap(np.arange(cmap.N))
 cmap = grayscale_cmap(cmap)
 grayscale = cmap(np.arange(cmap.N))
 fig, ax = plt.subplots(2, figsize=(6, 2),
 subplot_kw=dict(xticks=[], yticks=[]))
 ax[0].imshow([colors], extent=[0, 10, 0, 1])
 ax[1].imshow([grayscale], extent=[0, 10, 0, 1])
    
 view_colormap('jet')   

In [None]:
# Color limits and extensions

# Matplotlib allows for a large range of colorbar customization. The colorbar itself is simply an instance of plt.Axes, so all of the axes and tick formatting tricks we’ve
# learned are applicable. The colorbar has some interesting flexibility; for example, we can narrow the color limits and indicate the out-of-bounds values with a triangular
# arrow at the top and bottom by setting the extend property. This might come in handy

In [None]:
# make noise in 1% of the image pixels
speckles = (np.random.random(I.shape) < 0.01)
I[speckles] = np.random.normal(0, 3, np.count_nonzero(speckles))
plt.figure(figsize=(10, 3.5))
plt.subplot(1, 2, 1)
plt.imshow(I, cmap='RdBu')
plt.colorbar()
plt.subplot(1, 2, 2)
plt.imshow(I, cmap='RdBu')
plt.colorbar(extend='both')
plt.clim(-1, 1)

In [None]:
# Discrete colorbars

# Colormaps are by default continuous, but sometimes you’d like to represent discrete values. The easiest way to do this is to use the plt.cm.get_cmap() function, and pass
# the name of a suitable colormap along with the number of desired bins

plt.imshow(I,cmap=plt.cm.get_cmap('Blues',6))
plt.colorbar()
plt.clim(-1,1)

In [None]:
# Example: Handwritten Digits 

# load images of the digits 0 through 5 and visualize several of them

from sklearn.datasets import load_digits
digits=load_digits(n_class=6)

fig,ax=plt.subplots(8,8,figsize=(6, 6))
for i, axi in enumerate(ax.flat):
     axi.imshow(digits.images[i], cmap='binary')
     axi.set(xticks=[], yticks=[])

In [None]:
## project the digits into 2 dimensions using IsoMap

from sklearn.manifold import Isomap
iso=Isomap(n_components=2)
projection=iso.fit_transform(digits.data)

# plot the results

plt.scatter(projection[:, 0], projection[:, 1], lw=0.1,c=digits.target, cmap=plt.cm.get_cmap('cubehelix', 6))
plt.colorbar(ticks=range(6), label='digit value')
plt.clim(-0.5, 5.5)



<a id="section-nine"></a>
# Section 9 - Multiple Plots

The most basic method of creating an axes is to use the plt.axes function. As we’ve
seen previously, by default this creates a standard axes object that fills the entire fig‐
ure. plt.axes also takes an optional argument that is a list of four numbers in the
figure coordinate system. These numbers represent [bottom, left, width,
height] in the figure coordinate system, which ranges from 0 at the bottom left of the
figure to 1 at the top right of the figure.
For example, we might create an inset axes at the top-right corner of another axes by
setting the x and y position to 0.65 (that is, starting at 65% of the width and 65% of
the height of the figure) and the x and y extents to 0.2 (that is, the size of the axes is
20% of the width and 20% of the height of the figure). 

In [None]:
%matplotlib inline
import matplotlib.pyplot as plt
plt.style.use('seaborn-white')
import numpy as np

# plt.axes: Subplots by Hand

ax1=plt.axes()  # standard axes
ax2=plt.axes([0.65,0.65,0.2,0.2])

In [None]:
# The equivalent of this command within the object-oriented interface is fig.add_axes(). Let’s use this to create two vertically stacked axes
fig=plt.figure()
ax1 = fig.add_axes([0.1, 0.5, 0.8, 0.4],xticklabels=[], ylim=(-1.2, 1.2))
ax2 = fig.add_axes([0.1, 0.1, 0.8, 0.4],ylim=(-1.2, 1.2))
x=np.linspace(0,10)
ax1.plot(np.sin(x))
ax2.plot(np.cos(x))

In [None]:
# plt.subplot: Simple Grids of Subplots
for i in range(1, 7):
 plt.subplot(2, 3, i)
 plt.text(0.5, 0.5, str((2, 3, i)),fontsize=18, ha='center')
    

In [None]:
# The command plt.subplots_adjust can be used to adjust the spacing between these plots.

fig=plt.figure()
fig.subplots_adjust(hspace=0.4,wspace=0.4)
for i in range(1,7):
    ax = fig.add_subplot(2, 3, i)
    ax.text(0.5, 0.5, str((2, 3, i)),fontsize=18, ha='center')

In [None]:
# plt.subplots: The Whole Grid in One Go

# Here we’ll create a 2×3 grid of subplots, where all axes in the same row share their y-axis scale, and all axes in the same column share their x-axis scale 

fig,ax= plt.subplots(2,3,sharex='col',sharey='row')


In [None]:
# # axes are in a two-dimensional array, indexed by [row, col]
for i in range(2):
    for j in range(3):
        ax[i,j].text(0.5,0.5,str((i,j)),fontsize=18,ha='center')
        
fig

# In comparison to plt.subplot(), plt.subplots() is more consistent with Python’s conventional 0-based indexing

In [None]:
# plt.GridSpec: More Complicated Arrangements

grid=plt.GridSpec(2,3,wspace=0.4,hspace=0.4)

#From this we can specify subplot locations and extents using the familiar Python slic‐ing syntax 

plt.subplot(grid[0,0])
plt.subplot(grid[0, 1:])
plt.subplot(grid[1, :2])
plt.subplot(grid[1, 2])

In [None]:
# # Create some normally distributed data

# Create some normally distributed data
mean = [0, 0]
cov = [[1, 1], [1, 2]]
x, y = np.random.multivariate_normal(mean, cov, 3000).T
# Set up the axes with gridspec
fig = plt.figure(figsize=(6, 6))
grid = plt.GridSpec(4, 4, hspace=0.2, wspace=0.2)
main_ax = fig.add_subplot(grid[:-1, 1:])
y_hist = fig.add_subplot(grid[:-1, 0], xticklabels=[], sharey=main_ax)
x_hist = fig.add_subplot(grid[-1, 1:], yticklabels=[], sharex=main_ax)


# scatter points on the main axes
main_ax.plot(x, y, 'ok', markersize=3, alpha=0.2)
# histogram on the attached axes
x_hist.hist(x, 40, histtype='stepfilled',
orientation='vertical', color='gray')
x_hist.invert_yaxis()
y_hist.hist(y, 40, histtype='stepfilled',orientation='horizontal', color='gray')
y_hist.invert_xaxis()


<a id="section-ten"></a>
# Section 10 - Text & Annotation/Text Position/Arrow Position

In [None]:
# Text and Annotation
%matplotlib inline
import matplotlib.pyplot as plt
import matplotlib as mpl
plt.style.use('seaborn-whitegrid')
import numpy as np
import pandas as pd

In [None]:
fig,ax=plt.subplots(facecolor='lightgray')
ax.axis([0,10,0,10])

# transform=ax.transData is the default, but we'll specify it anyway

ax.text(1,5,".Data:(1,5)",transform=ax.transData)
ax.text(0.5, 0.1, ". Axes: (0.5, 0.1)", transform=ax.transAxes)
ax.text(0.2, 0.2, ". Figure: (0.2, 0.2)", transform=fig.transFigure)

ax.set_ylim(-6,6)
ax.set_xlim(0,2)
fig

In [None]:
# Arrows and Annotation

# using the plt.annotate() function. This function creates some text and an arrow, and the arrows can be very flexibly specified.

%matplotlib inline
import matplotlib.pyplot as plt
fig, ax = plt.subplots()
x = np.linspace(0, 20, 1000)
ax.plot(x, np.cos(x))
ax.axis('equal')

ax.annotate('local maximum', xy=(6.28, 1), xytext=(10, 4),arrowprops=dict(facecolor='red', shrink=5.05))
ax.annotate('local minimum', xy=(5 * np.pi, -1), xytext=(2, -6),arrowprops=dict(arrowstyle="->",connectionstyle="angle3,angleA=0,angleB=-90"))

In [None]:
# The arrow style is controlled through the arrowprops dictionary, which has numerous options available.

%matplotlib inline
import matplotlib.pyplot as plt
fig, ax = plt.subplots()
x = np.linspace(0, 20, 2000)
ax.plot(x, np.cos(x))
ax.axis('equal')

In [None]:
# Customizing Ticks #Major and Minor Ticks

%matplotlib inline
import matplotlib.pyplot as plt
plt.style.use('seaborn-whitegrid')
import numpy as np

ax = plt.axes(xscale='log', yscale='log')

print(ax.xaxis.get_major_locator())
print(ax.xaxis.get_minor_locator())

In [None]:
print(ax.xaxis.get_major_formatter())
print(ax.xaxis.get_minor_formatter())

In [None]:
# Hiding Ticks or Labels

# the most common tick/label formatting operation is the act of hiding ticks or labels. We can do this using plt.NullLocator() and plt.NullFormatter()

ax=plt.axes()
ax.plot(np.random.rand(50))

ax.yaxis.set_major_locator(plt.NullLocator())
ax.xaxis.set_major_formatter(plt.NullFormatter())


In [None]:
fig,ax=plt.subplots(5,5,figsize=(5,5))
fig.subplots_adjust(hspace=0,wspace=0)

# Get some face data from scikit-learn

from sklearn.datasets import fetch_olivetti_faces
faces = fetch_olivetti_faces().images

for i in range(5):
    for j in range(5):
        ax[i, j].xaxis.set_major_locator(plt.NullLocator())
        ax[i, j].yaxis.set_major_locator(plt.NullLocator())
        ax[i, j].imshow(faces[10 * i + j], cmap="bone")

In [None]:
# Reducing or Increasing the Number of Ticks

fig,ax=plt.subplots(4,4,sharex=True,sharey=True)

In [None]:
# plt.MaxNLocator(), which allows us to specify the maximum number of ticks that will be displayed
# # For every axis, set the x and y major locator

for axi in ax.flat:
     axi.xaxis.set_major_locator(plt.MaxNLocator(3))
     axi.yaxis.set_major_locator(plt.MaxNLocator(3))

    
fig


In [None]:
# Fancy Tick Formats

# Plot a sine and cosine curve
fig, ax = plt.subplots()
x = np.linspace(0, 3 * np.pi, 1000)
ax.plot(x, np.sin(x), lw=3, label='Sine')
ax.plot(x, np.cos(x), lw=3, label='Cosine')

# Set up grid, legend, and limits

ax.grid(True)
ax.legend(frameon=False)
ax.axis('equal')
ax.set_xlim(0, 3 * np.pi)


ax.xaxis.set_major_locator(plt.MultipleLocator(np.pi / 2))
ax.xaxis.set_minor_locator(plt.MultipleLocator(np.pi / 4))
fig

In [None]:
def format_func(value,tick_number):
# find number of multiples of pi/2
# we’ll instead use plt.FuncFormatter, which accepts a user-defined function giving fine-grained control over the tick outputs
    N=int(np.round(2*value/np.pi))
    if N==0:
        return "0"
    elif N==1:
        return r"$\pi/2$"
    elif N==2:
        return r"$\pi$"
    elif N%2>0:
        return r"${0}\pi/2$".format(N)
    else:
        return r"${0}\pi$".format(N // 2)


ax.xaxis.set_major_formatter(plt.FuncFormatter(format_func))

fig
    
    

<a id="section-eleven"></a>
# Section 11 - Customizing Matplotlib: Configurations and Stylesheets

In [None]:
# Plot Customization by Hand

import matplotlib.pyplot as plt
plt.style.use('classic')
import numpy as np
%matplotlib inline

x=np.random.randn(1000)
plt.hist(x)

# We can adjust this by hand to make it a much more visually pleasing plot


# draw solid white grid lines
plt.grid(color='w', linestyle='solid')

# hide axis spines
for spine in ax.spines.values():
    spine.set_visible(False)




In [None]:
#Changing the Defaults: rcParams
# We’ll start by saving a copy of the current rcParams dictionary, so we can easily reset these changes in the current session

IPython_default=plt.rcParams.copy()

#Now we can use the plt.rc function to change some of these settings

from matplotlib import cycler
colors=cycler('color',['#EE6666', '#3388BB', '#9988DD','#EECC55', '#88BB44', '#FFBBBB'])
plt.rc('axes', facecolor='#E6E6E6', edgecolor='none',axisbelow=True, grid=True, prop_cycle=colors)
plt.rc('grid', color='w', linestyle='solid')
plt.rc('xtick', direction='out', color='gray')
plt.rc('ytick', direction='out', color='gray')
plt.rc('patch', edgecolor='#E6E6E6')
plt.rc('lines', linewidth=2)

plt.hist(x)

In [None]:
for i in range(4):
    plt.plot(np.random.rand(10))
    

In [None]:
# Stylesheets
# The available styles are listed in plt.style.available

plt.style.available[:5]

#The basic way to switch to a stylesheet is to call



In [None]:
# Let’s create a function that will make two basic types of plot:

def hist_and_lines():
    np.random.seed(0)
    fig,ax=plt.subplots(1,2,figsize=(11, 4))
    ax[0].hist(np.random.randn(1000))
    for i in range(3):
        ax[1].plot(np.random.rand(10))
    ax[1].legend(['a', 'b', 'c'], loc='lower left') 

In [None]:
#Default style

# reset rcParams

hist_and_lines()

In [None]:
#FiveThirtyEight style
with plt.style.context('fivethirtyeight'):
    hist_and_lines()

In [None]:
# ggplot
with plt.style.context('ggplot'):
 hist_and_lines()

In [None]:
#Bayesian Methods for Hackers style
with plt.style.context('bmh'):
 hist_and_lines()

In [None]:
# Dark background

with plt.style.context('dark_background'):
     hist_and_lines()

In [None]:
# Grayscale
with plt.style.context('grayscale'):
     hist_and_lines()

In [None]:
# Seaborn style

import seaborn
hist_and_lines()

<a id="section-twelve"></a>
# Section 12 - Three-Dimensional Plotting in Matplotlib

In [None]:
# We enable three-dimensional plots by importing the mplot3d toolkit
from mpl_toolkits import mplot3d

# Once this submodule is imported, we can create a three-dimensional axes by passing the keyword projection='3d' to any of the normal axes creation routines

%matplotlib inline
import numpy as np
import matplotlib.pyplot as plt

fig = plt.figure()
ax = plt.axes(projection='3d')

In [None]:
# Three-Dimensional Points and Lines
#we can create these using the ax.plot3D and ax.scatter3D functions.

ax=plt.axes(projection='3d')
# Data for a three-dimensional line

zline=np.linspace(0,15,1000)
yline=np.cos(zline)
xline=np.sin(zline)

ax.plot3D(xline,yline,zline,'red')

## Data for three-dimensional scattered points

zdata = 15 * np.random.random(100)
xdata = np.sin(zdata) + 0.1 * np.random.randn(100)
ydata = np.cos(zdata) + 0.1 * np.random.randn(100)
ax.scatter3D(xdata, ydata, zdata, c=zdata, cmap='Greens')




In [None]:
#Three-Dimensional Contour Plots

def f(x,y):
    return np.sin(np.sqrt(x**2+y**2))

x = np.linspace(-6, 6, 30)
y = np.linspace(-6, 6, 30)

X,Y=np.meshgrid(x,y)
Z=f(X,Y)

fig=plt.figure()
ax = plt.axes(projection='3d')
ax.contour3D(X, Y, Z, 50, cmap='binary')
ax.set_xlabel('x')
ax.set_ylabel('y')
ax.set_zlabel('z')

#Sometimes the default viewing angle is not optimal, in which case we can use the view_init method to set the elevation and azimuthal angles. 

ax.view_init(60,35)
fig

In [None]:
#Wireframes and Surface Plots
fig=plt.figure()
ax = plt.axes(projection='3d')
ax.plot_wireframe(X, Y, Z, color='black')
ax.set_title('wireframe')

In [None]:
# A surface plot is like a wireframe plot, but each face of the wireframe is a filled poly‐gon. Adding a colormap to the filled polygons can aid perception of the topology of
# the surface being visualized

ax = plt.axes(projection='3d')
ax.plot_surface(X, Y, Z, rstride=1, cstride=1,cmap='viridis', edgecolor='none')
ax.set_title('surface')

In [None]:
r=np.linspace(0,6,20)
theta = np.linspace(-0.9 * np.pi, 0.8 * np.pi, 40)
r,theta=np.meshgrid(r,theta)
X=r*np.sin(theta)
Y=r*np.cos(theta)
Z=f(X,Y)
ax = plt.axes(projection='3d')
ax.plot_surface(X, Y, Z, rstride=1, cstride=1,cmap='viridis', edgecolor='none')

In [None]:
# Surface Triangulations

theta = 2 * np.pi * np.random.random(1000)
r = 6 * np.random.random(1000)
x = np.ravel(r * np.sin(theta))
y = np.ravel(r * np.cos(theta))
z = f(x, y)

# We could create a scatter plot of the points to get an idea of the surface we’re sampling from

ax=plt.axes(projection='3d')
ax.scatter(x,y,z,c=z,cmap='viridis',linewidth=0.5)



In [None]:
# The function that will help us in this case is ax.plot_trisurf, which creates a surface by first finding a set of triangles formed
# between adjacent points

ax=plt.axes(projection='3d')
ax.plot_trisurf(x,y,z,cmap='viridis',edgecolor='none')

In [None]:
# Example: Visualizing a Möbius strip
theta=np.linspace(0,2*np.pi,30)
w=np.linspace(-0.25,0.25,8)
w,theta=np.meshgrid(w,theta)
phi=0.5*theta

# radius in x-y plane
r = 1 + w * np.cos(phi)

x = np.ravel(r * np.cos(theta))
y = np.ravel(r * np.sin(theta))
z = np.ravel(w * np.sin(phi))          


In [None]:
# triangulate in the underlying parameterization

from matplotlib.tri import Triangulation
tri = Triangulation(np.ravel(w), np.ravel(theta))
ax = plt.axes(projection='3d')
ax.plot_trisurf(x, y, z, triangles=tri.triangles,cmap='viridis', linewidths=0.2)
ax.set_xlim(-1, 1); ax.set_ylim(-1, 1); ax.set_zlim(-1, 1)

<a id="section-thirteen"></a>
# Section 13 - Visualization with Seaborn

In [None]:
#Seaborn Versus Matplotlib

import matplotlib.pyplot as plt
plt.style.use('classic')
%matplotlib inline
import numpy as np
import pandas as pd

#Now we create some random walk data:

# Create some data

rng = np.random.RandomState(0)
x=np.linspace(0,10,500)
y=np.cumsum(rng.randn(500,6),0)

# Plot the data with Matplotlib defaults
plt.plot(x,y)
plt.legend('ABCDEF',ncol=2,loc='upper left')

Now let’s take a look at how it works with Seaborn. As we will see, Seaborn has many
of its own high-level plotting routines, but it can also overwrite Matplotlib’s default
parameters and in turn get even simple Matplotlib scripts to produce vastly superior
output. We can set the style by calling Seaborn’s set() method. By convention, Sea‐
born is imported as sns

In [None]:
import seaborn as sns
sns.set()

#Now let’s rerun the same two lines as before
plt.plot(x,y)
plt.legend('ABCDEF', ncol=2, loc='upper left')

In [None]:
#Histograms, KDE, and densities
# Often in statistical data visualization, all you want is to plot histograms and joint dis‐tributions of variables

data = np.random.multivariate_normal([0, 0], [[5, 2], [2, 2]], size=2000)
data = pd.DataFrame(data, columns=['x', 'y'])

for col in 'xy':
     plt.hist(data[col], alpha=0.5)

In [None]:
# Rather than a histogram, we can get a smooth estimate of the distribution using a kernel density estimation, which Seaborn does with sns.kdeplot

for col in 'xy':
    sns.kdeplot(data[col],shade=True)

In [None]:
# Histograms and KDE can be combined using distplot

sns.distplot(data['x'])
sns.distplot(data['y'])

In [None]:
# If we pass the full two-dimensional dataset to kdeplot, we will get a two-dimensional visualization of the data
sns.kdeplot(data)

In [None]:
# We can see the joint distribution and the marginal distributions together using sns.jointplot. For this plot, we’ll set the style to a white background

with sns.axes_style('white'):
    sns.jointplot("x","y",data,kind='kde')

In [None]:
# There are other parameters that can be passed to jointplot—for example, we can use a hexagonally based histogram instead
with sns.axes_style('white'):
    sns.jointplot("x","y",data,kind='hex')

In [None]:
# Pair plots

iris=sns.load_dataset("iris")
iris.head()

In [None]:
# Visualizing the multidimensional relationships among the samples is as easy as call‐ing sns.pairplot

sns.pairplot(iris,hue='species',size=2.5)

In [None]:
# Faceted histograms

tips=sns.load_dataset('tips')
tips.head()

In [None]:
tips['tip_pct'] = 100 * tips['tip'] / tips['total_bill']
grid = sns.FacetGrid(tips, row="sex", col="time", margin_titles=True)
grid.map(plt.hist, "tip_pct", bins=np.linspace(0, 40, 15))

In [None]:
# Factor plots

with sns.axes_style(style='ticks'):
 g = sns.factorplot("day", "total_bill", "sex", data=tips, kind="box")
 g.set_axis_labels("Day", "Total Bill");


In [None]:
# Joint distributions

# Similar to the pair plot we saw earlier, we can use sns.jointplot to show the jointdistribution between different datasets, along with the associated marginal distribu‐tions
with sns.axes_style('white'):
 sns.jointplot("total_bill", "tip", data=tips, kind='hex')

In [None]:
# The joint plot can even do some automatic kernel density estimation and regression

sns.jointplot("total_bill","tip",data=tips,kind='reg')

In [None]:
# Bar plots

# Time series can be plotted with sns.factorplot

planets=sns.load_dataset('planets')
planets.head()

In [None]:
with sns.axes_style('white'):
    g=sns.factorplot("year",data=planets,aspect=2,kind="count",color="steelblue")
    g.set_xticklabels(step=5)

In [None]:
with sns.axes_style('white'):
 g = sns.factorplot("year", data=planets, aspect=4.0, kind='count',hue='method', order=range(2001, 2015))
 g.set_ylabels('Number of Planets Discovered')

    
    

<a id="section-fourteen"></a>
# Section 14 - Visualization with Plotly

Plotly library: Plotly's Python graphing library makes interactive, publication-quality graphs online. Examples of how to make line plots, scatter plots, area charts, bar charts, error bars, box plots, histograms, heatmaps, subplots, multiple-axes, polar charts, and bubble charts.

In [None]:
# import plotly
import plotly as py
from plotly.offline import init_notebook_mode, iplot,plot
init_notebook_mode(connected=True)
import plotly.graph_objs as go

# word cloud library
from wordcloud import WordCloud

In [None]:
# pip install plotly==3.10.0

<a id="section-fifteen"></a>
# Section 15 - Read data from input files for Seaborn Plots

In [None]:
# Read data from input files for Seaborn Plots

import numpy as np
import csv as csv
import pandas as pd

median_house_hold_in_come = pd.read_csv('/kaggle/input/fatalpoliceshootingsintheus/MedianHouseholdIncome2015.csv', encoding="windows-1252")
percentage_people_below_poverty_level = pd.read_csv('/kaggle/input/fatalpoliceshootingsintheus/PercentagePeopleBelowPovertyLevel.csv', encoding="windows-1252")
percent_over_25_completed_highSchool = pd.read_csv('/kaggle/input/fatalpoliceshootingsintheus/PercentOver25CompletedHighSchool.csv', encoding="windows-1252")
share_race_city = pd.read_csv('/kaggle/input/fatalpoliceshootingsintheus/ShareRaceByCity.csv/ShareRaceByCity.csv', encoding="windows-1252")
kill = pd.read_csv('/kaggle/input/fatalpoliceshootingsintheus/PoliceKillingsUS.csv', encoding="windows-1252")


In [None]:
median_house_hold_in_come.head(20)

In [None]:
percentage_people_below_poverty_level.head(20)

In [None]:
percent_over_25_completed_highSchool.head(20)

In [None]:
share_race_city.head(20)

In [None]:
kill.head(20)

In [None]:
percentage_people_below_poverty_level['Geographic Area'].unique()

<a id="section-sixteen"></a>
# Section 16 - Bar Plot using Seaborn

In [None]:
# # Poverty rate of each state
percentage_people_below_poverty_level.replace(['-'],0.0,inplace=True)
percentage_people_below_poverty_level.poverty_rate = percentage_people_below_poverty_level.poverty_rate.astype(float)
area_list=list((percentage_people_below_poverty_level['Geographic Area'].unique()))
area_poverty_ratio = []
for i in area_list:
    x = percentage_people_below_poverty_level[percentage_people_below_poverty_level['Geographic Area']==i]
    area_poverty_rate = sum(x.poverty_rate)/len(x)
    area_poverty_ratio.append(area_poverty_rate)
data = pd.DataFrame({'area_list': area_list,'area_poverty_ratio':area_poverty_ratio})
new_index = (data['area_poverty_ratio'].sort_values(ascending=False)).index.values
sorted_data = data.reindex(new_index)

# visualization

plt.figure(figsize=(15,10))
sns.barplot(x=sorted_data['area_list'], y=sorted_data['area_poverty_ratio'])
plt.xticks(rotation=45)
plt.xlabel('states')
plt.ylabel('Poverty Rate')
plt.title('Poverty Rate Given States')

In [None]:
kill.head()

In [None]:
kill.name.value_counts()

In [None]:
percent_over_25_completed_highSchool.info()

In [None]:
# High school graduation rate of the population that is older than 25 in states
percent_over_25_completed_highSchool.percent_completed_hs.replace(['-'],0.0,inplace = True)
percent_over_25_completed_highSchool.percent_completed_hs = percent_over_25_completed_highSchool.percent_completed_hs.astype(float)
area_list = list(percent_over_25_completed_highSchool['Geographic Area'].unique())
area_highschool = []
for i in area_list:
    x = percent_over_25_completed_highSchool[percent_over_25_completed_highSchool['Geographic Area']==i]
    area_highschool_rate = sum(x.percent_completed_hs)/len(x)
    area_highschool.append(area_highschool_rate)
# sorting
data = pd.DataFrame({'area_list': area_list,'area_highschool_ratio':area_highschool})
new_index = (data['area_highschool_ratio'].sort_values(ascending=True)).index.values
sorted_data2 = data.reindex(new_index)
# visualization
plt.figure(figsize=(15,10))
sns.barplot(x=sorted_data2['area_list'], y=sorted_data2['area_highschool_ratio'])
plt.xticks(rotation= 90)
plt.xlabel('States')
plt.ylabel('High School Graduate Rate')
plt.title("Percentage of Given State's Population Above 25 that Has Graduated High School")

In [None]:
percentage_people_below_poverty_level.head()

In [None]:
percentage_people_below_poverty_level.info()

In [None]:
percentage_people_below_poverty_level['Geographic Area'].unique()

In [None]:
share_race_city.head()

In [None]:
# Percentage of state's population according to races that are black,white,native american, asian and hispanic
share_race_city.replace(['-'],0.0,inplace = True)
share_race_city.replace(['(X)'],0.0,inplace = True)
share_race_city.loc[:,['share_white','share_black','share_native_american','share_asian','share_hispanic']] = share_race_city.loc[:,['share_white','share_black','share_native_american','share_asian','share_hispanic']].astype(float)
area_list = list(share_race_city['Geographic area'].unique())
share_white = []
share_black = []
share_native_american = []
share_asian = []
share_hispanic = []
for i in area_list:
    x = share_race_city[share_race_city['Geographic area']==i]
    share_white.append(sum(x.share_white)/len(x))
    share_black.append(sum(x.share_black) / len(x))
    share_native_american.append(sum(x.share_native_american) / len(x))
    share_asian.append(sum(x.share_asian) / len(x))
    share_hispanic.append(sum(x.share_hispanic) / len(x))
    
# Visualization
f,ax = plt.subplots(figsize = (9,15))
sns.barplot(x=share_white,y=area_list,color='green',alpha = 0.5,label='White' )
sns.barplot(x=share_black,y=area_list,color='blue',alpha = 0.7,label='African American')
sns.barplot(x=share_native_american,y=area_list,color='cyan',alpha = 0.6,label='Native American')
sns.barplot(x=share_asian,y=area_list,color='yellow',alpha = 0.6,label='Asian')
sns.barplot(x=share_hispanic,y=area_list,color='red',alpha = 0.6,label='Hispanic')

ax.legend(loc='lower right',frameon = True)
ax.set(xlabel='Percentage of Races', ylabel='States',title = "Percentage of State's Population According to Races ")

<a id="section-seventeen"></a>
# Section 17 - Point Plot using Seaborn

In [None]:
# high school graduation rate vs Poverty rate of each state
sorted_data['area_poverty_ratio'] = sorted_data['area_poverty_ratio']/max( sorted_data['area_poverty_ratio'])
sorted_data2['area_highschool_ratio'] = sorted_data2['area_highschool_ratio']/max( sorted_data2['area_highschool_ratio'])
data = pd.concat([sorted_data,sorted_data2['area_highschool_ratio']],axis=1)
data.sort_values('area_poverty_ratio',inplace=True)

# visualize
f,ax1 = plt.subplots(figsize =(20,10))
sns.pointplot(x='area_list',y='area_poverty_ratio',data=data,color='lime',alpha=0.8)
sns.pointplot(x='area_list',y='area_highschool_ratio',data=data,color='red',alpha=0.8)
plt.text(40,0.6,'high school graduate ratio',color='red',fontsize = 17,style = 'italic')
plt.text(40,0.55,'poverty ratio',color='lime',fontsize = 18,style = 'italic')
plt.xlabel('States',fontsize = 15,color='blue')
plt.ylabel('Values',fontsize = 15,color='blue')
plt.title('High School Graduate  VS  Poverty Rate',fontsize = 20,color='blue')
plt.grid()

In [None]:
data.head()

<a id="section-eighteen"></a>
# Section 18 - Joint Plot using Seaborn

In [None]:
# Visualization of high school graduation rate vs Poverty rate of each state with different style of seaborn code
# pearsonr= if it is 1, there is positive correlation and if it is, -1 there is negative correlation.
# If it is zero, there is no correlation between variables
# Show the joint distribution using kernel density estimation

sns.jointplot(data.area_poverty_ratio, data.area_highschool_ratio)
plt.savefig('graph.png')
plt.show()

In [None]:
data.head()

In [None]:
# you can change parameters of joint plot
# kind : { “scatter” | “reg” | “resid” | “kde” | “hex” }
# Different usage of parameters but same plot with previous one
g = sns.jointplot("area_poverty_ratio", "area_highschool_ratio", data=data,size=5, ratio=3, color="r")


<a id="section-ninteen"></a>
# Section 19 - Pie Plot using Seaborn

In [None]:
kill.race.head()

In [None]:
kill.race.value_counts

In [None]:
# Race rates according in kill data 
kill.race.dropna(inplace = True)
labels = kill.race.value_counts().index
colors = ['grey','blue','red','yellow','green','brown']
explode = [0,0,0,0,0,0]
sizes = kill.race.value_counts().values

# visual
plt.figure(figsize = (7,7))
plt.pie(sizes, explode=explode, labels=labels, colors=colors, autopct='%1.1f%%')
plt.title('Killed People According to Races',color = 'blue',fontsize = 15)

<a id="section-twenty"></a>
# Section 20 - Lm Plot using Seaborn

In [None]:
data.head()

In [None]:
# Visualization of high school graduation rate vs Poverty rate of each state with different style of seaborn code
# lmplot 
# Show the results of a linear regression within each dataset
sns.lmplot(x="area_poverty_ratio", y="area_highschool_ratio", data=data)
plt.show()

<a id="section-twentyone"></a>
# Section 21 - Kde Plot using Seaborn

In [None]:
data.head()

In [None]:
# Visualization of high school graduation rate vs Poverty rate of each state with different style of seaborn code
# cubehelix plot
sns.kdeplot(data.area_poverty_ratio, data.area_highschool_ratio, shade=True, cut=3)
plt.show()

<a id="section-twentytwo"></a>
# Section 22 - Violin Plot using Seaborn

In [None]:
data.head()

In [None]:
# Show each distribution with both violins and points
# Use cubehelix to get a custom sequential palette
pal = sns.cubehelix_palette(2, rot=-.6, dark=.4)
sns.violinplot(data=data, palette=pal, inner="points")
plt.show()

<a id="section-twentythree"></a>
# Section 23 - Heatmap

In [None]:
data.corr()

In [None]:
#correlation map
# Visualization of high school graduation rate vs Poverty rate of each state with different style of seaborn code
f,ax = plt.subplots(figsize=(5, 10))
sns.heatmap(data.corr(), annot=True, linewidths=0.8,linecolor="blue", fmt= '.1f',ax=ax)
plt.show()

<a id="section-twentyfour"></a>
# Section 24 - Box plot

In [None]:
kill.head()

In [None]:
kill.manner_of_death.unique()

In [None]:
sns.boxplot(x="gender", y="age", hue="manner_of_death", data=kill, palette="PRGn")
plt.show()

<a id="section-twentyfive"></a>
# Section 25 - Swarm Plot

In [None]:
sns.swarmplot(x="gender", y="age",hue="manner_of_death", data=kill)
plt.show()

<a id="section-twentysix"></a>
# Section 26 - Pair Plot

In [None]:
data.head

In [None]:
sns.pairplot(data)
plt.show()

<a id="section-twentyseven"></a>
# Section 27 - Count Plot

In [None]:
kill.gender.value_counts()

In [None]:
kill.head()

In [None]:
# kill properties
# Manner of death
sns.countplot(kill.gender)


In [None]:
# kill weapon
armed = kill.armed.value_counts()
#print(armed)
plt.figure(figsize=(10,7))
sns.barplot(x=armed[:7].index,y=armed[:7].values)
plt.ylabel('Number of Weapon')
plt.xlabel('Weapon Types')
plt.title('Kill weapon',color = 'blue',fontsize=15)

In [None]:
# Race of killed people
kill.race.value_counts()
sns.countplot(data=kill,x='race')
plt.title('Race of killed people',color = 'blue',fontsize=15)

In [None]:
# Most dangerous cities
city = kill.city.value_counts()
plt.figure(figsize=(12,8))
sns.barplot(x=city[:12].index,y=city[:12].values)
plt.xticks(rotation=60)
plt.title('Most dangerous cities',color = 'red',fontsize=20)

In [None]:
# most dangerous states
state = kill.state.value_counts()
plt.figure(figsize=(10,7))
sns.barplot(x=state[:20].index,y=state[:20].values)
plt.title('Most dangerous state',color = 'blue',fontsize=15)

<a id="section-twentyeight"></a>
# Section 28 - Read data from input files for Plotly Plots

In [None]:
# plotly
# import plotly.plotly as py
from plotly.offline import init_notebook_mode, iplot, plot
import plotly as py
init_notebook_mode(connected=True)
import plotly.graph_objs as go
# word cloud library
from wordcloud import WordCloud


In [None]:
# Read data from input files for Plotly Plots

import numpy as np
import csv as csv
import pandas as pd

#educational_attainment_supplementary_data = pd.read_csv("/kaggle/input/worlduniversityrankings-data/educational_attainment_supplementary_data.csv")
cwurData = pd.read_csv('/kaggle/input/worlduniversityrankings-data/cwurData.csv')
#education_expenditure_supplementary_data = pd.read_csv("/kaggle/input/worlduniversityrankings-data/education_expenditure_supplementary_data.csv")
school_and_country_table = pd.read_csv('/kaggle/input/worlduniversityrankings-data/school_and_country_table.csv')
shanghaiData = pd.read_csv('/kaggle/input/worlduniversityrankings-data/shanghaiData.csv')
timesData = pd.read_csv('/kaggle/input/worlduniversityrankings-data/timesData.csv')


In [None]:
timesData.head(20)

In [None]:
timesData.info

<a id="section-twentynine"></a>
# Section 29 - Line Charts Plotly Plots

In [None]:
# prepare data frame
df = timesData.iloc[:100,:]

# import graph objects as "go"
import plotly.graph_objs as go

# Creating trace1
trace1 = go.Scatter(
                    x = df.world_rank,
                    y = df.citations,
                    mode = "lines",
                    name = "citations",
                    marker = dict(color = 'rgba(16, 112, 2, 0.8)'),
                    text= df.university_name)

# Creating trace2
trace2 = go.Scatter(
                    x = df.world_rank,
                    y = df.teaching,
                    mode = "lines+markers",
                    name = "teaching",
                    marker = dict(color = 'rgba(80, 26, 80, 0.8)'),
                    text= df.university_name)

data = [trace1, trace2]
layout = dict(title = 'Citation and Teaching vs World Rank of Top 100 Universities',
              xaxis= dict(title= 'World Rank',ticklen= 5,zeroline= False)
             )
fig = dict(data = data, layout = layout)
iplot(fig)

<a id="section-thirty"></a>
# Section 30 - Scatter Charts Plotly Plots

In [None]:
# prepare data frames
df2014 = timesData[timesData.year == 2014].iloc[:100,:]
df2015 = timesData[timesData.year == 2015].iloc[:100,:]
df2016 = timesData[timesData.year == 2016].iloc[:100,:]
# import graph objects as "go"
import plotly.graph_objs as go

# creating trace1
trace1 =go.Scatter(
                    x = df2014.world_rank,
                    y = df2014.citations,
                    mode = "markers",
                    name = "2014",
                    marker = dict(color = 'rgba(255, 128, 255, 0.8)'),
                    text= df2014.university_name)
# creating trace2
trace2 =go.Scatter(
                    x = df2015.world_rank,
                    y = df2015.citations,
                    mode = "markers",
                    name = "2015",
                    marker = dict(color = 'rgba(255, 128, 2, 0.8)'),
                    text= df2015.university_name)
# creating trace3
trace3 =go.Scatter(
                    x = df2016.world_rank,
                    y = df2016.citations,
                    mode = "markers",
                    name = "2016",
                    marker = dict(color = 'rgba(0, 255, 200, 0.8)'),
                    text= df2016.university_name)

data = [trace1, trace2, trace3]
layout = dict(title = 'Citation vs world rank of top 100 universities with 2014, 2015 and 2016 years',
              xaxis= dict(title= 'World Rank',ticklen= 5,zeroline= False),
              yaxis= dict(title= 'Citation',ticklen= 5,zeroline= False)
             )
fig = dict(data = data, layout = layout)
iplot(fig)

<a id="section-thirtyone"></a>
# Section 31 - Bar Charts Plotly Plots

In [None]:
# prepare data frames
df2014 = timesData[timesData.year == 2014].iloc[:3,:]
df2014

In [None]:
# prepare data frames
df2014 = timesData[timesData.year == 2014].iloc[:3,:]
# import graph objects as "go"
import plotly.graph_objs as go
# create trace1 
trace1 = go.Bar(
                x = df2014.university_name,
                y = df2014.citations,
                name = "citations",
                marker = dict(color = 'rgba(255, 174, 255, 0.5)',
                             line=dict(color='rgb(0,0,0)',width=1.5)),
                text = df2014.country)
# create trace2 
trace2 = go.Bar(
                x = df2014.university_name,
                y = df2014.teaching,
                name = "teaching",
                marker = dict(color = 'rgba(255, 255, 128, 0.5)',
                              line=dict(color='rgb(0,0,0)',width=1.5)),
                text = df2014.country)
data = [trace1, trace2]
layout = go.Layout(barmode = "group")
fig = go.Figure(data = data, layout = layout)
iplot(fig)

<a id="section-thirtytwo"></a>
# Section 32 - Pie Charts Plotly Plots

In [None]:
# data preparation
df2016 = timesData[timesData.year == 2016].iloc[:7,:]
pie1 = df2016.num_students
pie1_list = [float(each.replace(',', '.')) for each in df2016.num_students]  # str(2,4) => str(2.4) = > float(2.4) = 2.4
labels = df2016.university_name
# figure
fig = {
  "data": [
    {
      "values": pie1_list,
      "labels": labels,
      "domain": {"x": [0, .5]},
      "name": "Number Of Students Rates",
      "hoverinfo":"label+percent+name",
      "hole": .3,
      "type": "pie"
    },],
  "layout": {
        "title":"Universities Number of Students rates",
        "annotations": [
            { "font": { "size": 20},
              "showarrow": False,
              "text": "Number of Students",
                "x": 0.20,
                "y": 1
            },
        ]
    }
}
iplot(fig)

<a id="section-thirtythree"></a>
# Section 33 - Bubble Charts Plotly Plots

In [None]:
df2016.info()

In [None]:
# data preparation
df2016 = timesData[timesData.year == 2016].iloc[:20,:]
num_students_size  = [float(each.replace(',', '.')) for each in df2016.num_students]
international_color = [float(each) for each in df2016.international]
data = [
    {
        'y': df2016.teaching,
        'x': df2016.world_rank,
        'mode': 'markers',
        'marker': {
            'color': international_color,
            'size': num_students_size,
            'showscale': True
        },
        "text" :  df2016.university_name    
    }
]
iplot(data)

<a id="section-thirtyfour"></a>
# Section 34 - Histogram Plotly Plots

In [None]:
# prepare data
x2011 = timesData.student_staff_ratio[timesData.year == 2011]
x2012 = timesData.student_staff_ratio[timesData.year == 2012]

trace1 = go.Histogram(
    x=x2011,
    opacity=0.75,
    name = "2011",
    marker=dict(color='rgba(171, 50, 96, 0.6)'))
trace2 = go.Histogram(
    x=x2012,
    opacity=0.75,
    name = "2012",
    marker=dict(color='rgba(12, 50, 196, 0.6)'))

data = [trace1, trace2]
layout = go.Layout(barmode='overlay',
                   title=' students-staff ratio in 2011 and 2012',
                   xaxis=dict(title='students-staff ratio'),
                   yaxis=dict( title='Count'),
)
fig = go.Figure(data=data, layout=layout)
iplot(fig)

<a id="section-thirtyfive"></a>
# Section 35 - Word Cloud Plotly Plots

In [None]:
# data prepararion
x2011 = timesData.country[timesData.year == 2011]
plt.subplots(figsize=(8,8))
wordcloud = WordCloud(
                          background_color='white',
                          width=512,
                          height=384
                         ).generate(" ".join(x2011))
plt.imshow(wordcloud)
plt.axis('off')
plt.savefig('graph.png')

plt.show()

<a id="section-thirtysix"></a>
# Section 36 - Box Plots Plotly Plots

In [None]:
# data preparation
x2015 = timesData[timesData.year == 2015]

trace0 = go.Box(
    y=x2015.total_score,
    name = 'total score of universities in 2015',
    marker = dict(
        color = 'rgb(12, 12, 140)',
    )
)
trace1 = go.Box(
    y=x2015.research,
    name = 'research of universities in 2015',
    marker = dict(
        color = 'rgb(12, 128, 128)',
    )
)
data = [trace0, trace1]
iplot(data)

<a id="section-thirtyseven"></a>
# Section 37 - Scatter Matrix Plotly Plots

In [None]:
# import figure factory
import plotly.figure_factory as ff
# prepare data
dataframe = timesData[timesData.year == 2015]
data2015 = dataframe.loc[:,["research","international", "total_score"]]
data2015["index"] = np.arange(1,len(data2015)+1)
# scatter matrix
fig = ff.create_scatterplotmatrix(data2015, diag='box', index='index',colormap='Portland',
                                  colormap_type='cat',
                                  height=750, width=750)
iplot(fig)