# Plotly and Cufflinks

## Importing packages

The basic packages like numpy and pandas will be imported along with plotly and cufflinks. We will also import few packages of the plotly offline since we are doing it in the offline mode.

In [3]:
import pandas as pd
import numpy as np
import plotly
import cufflinks as cf
from plotly.offline import download_plotlyjs,init_notebook_mode,plot,iplot

To make sure that the plots are created inside the notebook, write the below code.

In [5]:
init_notebook_mode(connected=True)

To make sure that cufflinks can be used offline, write the below code.

In [6]:
cf.go_offline()

## Create dataframe

We will create one dataframe with 150 rows and 4 columns with random values. Another dataframe is made with categorical values. Another dataframe with 3 columns and 5 rows filled with random values. These dataframes will be plotted.

In [8]:
dataset1 = pd.DataFrame(np.random.randn(100,4),columns='a b c d'.split())
dataset1.head()

Unnamed: 0,a,b,c,d
0,1.160856,0.023137,-0.581574,1.088865
1,0.10139,-1.334721,-0.118557,0.679182
2,1.066625,-0.494051,1.370822,1.001304
3,-0.725962,-1.195338,-0.016472,-1.799026
4,-1.547782,1.310474,0.536567,0.043641


In [9]:
dataset2=pd.DataFrame({'Key':['a','b','c'], 'Value':[11,22,33]})
dataset2.head()

Unnamed: 0,Key,Value
0,a,11
1,b,22
2,c,33


In [19]:
dataset3 = pd.DataFrame({'a':[11,22,33,44,55],'b':[10,20,30,10,20],'c':[5,4,3,2,1]})
dataset3.head()

Unnamed: 0,a,b,c
0,11,10,5
1,22,20,4
2,33,30,3
3,44,10,2
4,55,20,1


## Line plot

The interactive line plot is created using the 'iplot' method. You can zoom in and zoom out the plot. Hover over the plot to view the values. Also saving the plot as png is an option. You can also click on the column names in the legend to view specific plot of the columns.

In [10]:
dataset1.iplot()

## Scatter plot

The interactive scatter plot is created by mentioning the 'kind' as scatter while calling the 'iplot' method. The x and y values can also be mentioned according to which the scatter plot is made.

In [11]:
dataset1.iplot(kind='scatter',x='a',y='c')

By default plotly makes all the points to be connected by lines. So to view the data points without the connections, mention the 'mode' as 'markers'.

In [12]:
dataset1.iplot(kind='scatter',x='a',y='c',mode='markers')

## Bar plot

This plot can be created by passing 'bar' as the value for the 'kind' argument. The x and y axis values are also mentioned.

In [14]:
dataset2.iplot(kind='bar',x='Key',y='Value')

Different functions can also be called on the bar plot like count() or sum().

In [16]:
dataset1.count().iplot(kind='bar')

In [17]:
dataset1.sum().iplot(kind='bar')

## Box plots

By passing the 'box' value to the 'kind' argument, box plot is formed. The information of the quartiles and median along with minimum and maximum value is shown when hovered over the plot. By clicking on the column names present in the legend, you can choose what to display.

In [18]:
dataset1.iplot(kind='box')

## Surface plot

By mentioning 'surface' as the 'kind', it will create a surface plot. It can be rotated around to view from different angles.

In [20]:
dataset3.iplot(kind='surface')

The color scheme can also be changed by mentioning the 'colorscale' argument. The abbreviations for few colors are: red-rd, blue-bu, yellow-yl

In [24]:
dataset3.iplot(kind='surface',colorscale='rdylbu')

## Histogram

Histograms can be drawn for a particular column of the dataset by mentioning the 'kind' as 'hist'. Along with this te bins can also be mentioned.

In [25]:
dataset1['d'].iplot(kind='hist', bins=30)

Incase the 'hist' is called on an entire dataframe, then the histograms will be overlapping for each column. The selection of columns can be done by clicking on the legends.

In [26]:
dataset1.iplot(kind='hist')

## Spread plot

It shows the line plot for the selected columns. Along with this the spread plot is shown, which is made against each other. Normally this is used for stock data analysis. The 'spread' keyword is passed to the 'kind' argument to create this plot.

In [28]:
dataset1[['a','d']].iplot(kind='spread')


The pandas.np module is deprecated and will be removed from pandas in a future version. Import numpy directly instead


The pandas.np module is deprecated and will be removed from pandas in a future version. Import numpy directly instead



## Bubble plot

This plot is similar to the scatter plot. The only difference is theat the size of the marker will change according to a parameter. The 'kind' is mentioned as 'bubble' to create this plot. The x and y axis values are passed just like in a scatter plot. Additionally the size of the marker based on another column is also mentioned by mentioning the column name to the 'size' argument. This kind of plot is used to depict GDP.

In [29]:
dataset1.iplot(kind='bubble',x='a',y='b',size='c')

## Scatter matrix plot

This plot is similar to the pair plot of seaborn library. The 'scatter_matrix' method is called off the dataset. For the scatter matrix to be formed, the columns should be numerical.

In [31]:
dataset1.scatter_matrix()