# Plotly and Cufflinks
Plotly is a library that allows you to create interactive plots that you can use in dashboards or websites (you can save them as html files or static images).

## Installation

In order for this all to work, you'll need to install plotly and cufflinks to call plots directly off of a pandas dataframe. These libraries are not currently available through **conda** but are available through **pip**. Install the libraries at your command line/terminal using:

    pip install plotly
    pip install cufflinks

** NOTE: Make sure you only have one installation of Python on your computer when you do this, otherwise the installation may not work. **

## Imports and Set-up

In [1]:
import pandas as pd
import numpy as np
%matplotlib inline

In [2]:
from plotly import __version__

In [3]:
print(__version__)

4.6.0


In [3]:
import cufflinks as cf

In [4]:
from plotly.offline import download_plotlyjs,init_notebook_mode,plot,iplot

In [5]:
#in order to everything work in the notebook
init_notebook_mode(connected=True)
#This is going to connect js to the notebook

In [6]:
#This will allow us to use cufflinks offline
cf.go_offline()

### Fake Data

In [8]:
# Let's load some DATA
df = pd.DataFrame(np.random.randn(100,4),columns='A B C D'.split())

In [9]:
df.head()

Unnamed: 0,A,B,C,D
0,-0.539386,0.446992,-0.677487,0.07582
1,1.269562,-0.335222,-1.205976,-0.011548
2,-0.109124,-0.382679,-1.441782,-0.087426
3,-0.790729,0.366779,-1.928552,0.662423
4,-0.052207,0.094617,0.722747,1.456849


In [10]:
df2 = pd.DataFrame({'Category':['A','B','C'],'Values':[32,43,50]})

In [11]:
df2

Unnamed: 0,Category,Values
0,A,32
1,B,43
2,C,50


In [12]:
df.head()

Unnamed: 0,A,B,C,D
0,-0.539386,0.446992,-0.677487,0.07582
1,1.269562,-0.335222,-1.205976,-0.011548
2,-0.109124,-0.382679,-1.441782,-0.087426
3,-0.790729,0.366779,-1.928552,0.662423
4,-0.052207,0.094617,0.722747,1.456849


## Using Cufflinks and iplot()

* scatter
* bar
* box
* spread
* ratio
* heatmap
* surface
* histogram
* bubble

## Scatter

In [20]:
#Using iplot() instead of plot() creates magic. 
#The same matplot is now converted to pyplot and it's interactive
df.iplot()
#Now you can zoom in, zoom out. see values at different points
#You can edit and save the file
#Download the file
#You can also click on and off certain columns
#Incredibly powerful data visualization tool

## Scatter

In [25]:
#Scatter plots
#You have to specify mode='markers' so all the point are seperate and not connected with lines
#To see bigger points you can specify size. (ie, size=20)
df.iplot(kind='scatter',x='A',y='B',mode='markers',size=20)

In [26]:
df2.head()

Unnamed: 0,Category,Values
0,A,32
1,B,43
2,C,50


## Bar Plots

In [24]:
#Barplots
df2.iplot(kind='bar',x='Category',y='Values')

In [27]:
df

Unnamed: 0,A,B,C,D
0,-0.531155,0.237445,-0.024494,0.168319
1,0.191832,0.554728,0.997129,0.480920
2,0.958364,-0.333033,0.870559,-0.570730
3,0.697636,-0.297803,0.858096,0.867976
4,0.250548,-0.470834,2.182585,-0.025597
...,...,...,...,...
95,0.134959,0.147584,-1.516900,0.396282
96,0.786914,0.318750,-0.710816,-0.793789
97,-1.756719,0.283249,-0.345563,-0.597327
98,-0.049455,-1.106398,0.327688,-1.816496


In [29]:
#Using aggregate functions to use barplots on columns.
#Let you want the number of instances on each column using bar plot
#use df.count() and then your barplot
df.count().iplot(kind='bar')

In [30]:
#sum of each column using barplot
df.sum().iplot(kind='bar')

In [31]:
#Barplots usually becomes a lot powerful when it's used with some sort of aggregate function

In [32]:
df.head()

Unnamed: 0,A,B,C,D
0,-0.531155,0.237445,-0.024494,0.168319
1,0.191832,0.554728,0.997129,0.48092
2,0.958364,-0.333033,0.870559,-0.57073
3,0.697636,-0.297803,0.858096,0.867976
4,0.250548,-0.470834,2.182585,-0.025597


## Boxplots

In [33]:
#Boxplot
#You can on and off some column by clicking on it
df.iplot(kind='box')

In [37]:
#3D surface plot
#We are going to make a new dataframe
df3 = pd.DataFrame({'x':[1,2,3,4,5],'y':[10,20,30,20,10],'z':[5,4,3,2,1]})

In [38]:
df3

Unnamed: 0,x,y,z
0,1,10,5
1,2,20,4
2,3,30,3
3,4,20,2
4,5,10,1


## 3d Surface

In [40]:
# Now we have df3 which have 3 columns x,y,z.
# We can create 3-dimensional surface plot of these three variables
# You can change the colorscale of the plot. 
# for instance: for the colorscale to be red,yellow,blue colorscale='rdylbu'. 
# see documentation for more
df3.iplot(kind='surface',colorscale='rdylbu')

## histogram

In [43]:
df['A'].iplot(kind='hist',bins=20)

In [44]:
#To see histograms of all the columns at once. 
#They will overlap, but you can turn on and off, to see one histogram at a time
df.iplot(kind='hist')

## Spread

In [46]:
#Spread type plot
#This type plots are mostly used in stock data to see the spread between two stocks
df[['A','B']].iplot(kind='spread')
#spread is the difference of two values. 
#(for ex: A-B for each values of A and B will be shown here as spread)


The pandas.np module is deprecated and will be removed from pandas in a future version. Import numpy directly instead


The pandas.np module is deprecated and will be removed from pandas in a future version. Import numpy directly instead



## Bubble

In [47]:
# Bubble plot
# Similar like scatter plot. But here the data points will change size by values of another variable
# You'll see these type of plots for things like:
# world GDP in comparison to population and happiness factor etc.
# Quite common for united nations report
df.iplot(kind='bubble',x='A',y='B',size='C')

## scatter_matrix()
Similar to sns.pairplot()

In [48]:
#Scatter matrix plot
#Very similar to seaborn's pairplot
#It just creates scatter plots for all the columns it can
#For a very large dataset, it will take a while to load and will be very slow. 
#So use this with care. Otherwise, you may crash your python kernel
df.scatter_matrix()