# CrashDS

#### Module 5 : Graphics with Plotly

Datasets from ISLR by *James et al.* : `Advertising.csv` and `Heart.csv`         
Source: http://faculty.marshall.usc.edu/gareth-james/ISL/data.html     

---

### Essential Libraries

Let us begin by importing the essential Python Libraries.    
You may install any library using `conda install <library>`.    
Most of the libraries come by default with the Anaconda platform.

> NumPy : Library for Numeric Computations in Python  
> Pandas : Library for Data Acquisition and Preparation  
> Matplotlib : Low-level library for Data Visualization  
> Seaborn : Higher-level library for Data Visualization  

In [None]:
# Basic Libraries
import numpy as np
import pandas as pd
import seaborn as sb
import matplotlib.pyplot as plt # we only need pyplot
sb.set() # set the default Seaborn style for graphics

Import the most common `plotly` libraries for Graphics and Visualizations.     
Plotly does not come with Anaconda. Install it by `conda install plotly`

In [None]:
import plotly.offline as py
import plotly.figure_factory as ff
import plotly.graph_objs as go
from plotly.subplots import make_subplots

In [None]:
# Activate inline plotting in notebook
py.init_notebook_mode(connected = False)

---

## Case Study : Advertising Budget vs Sales


### Import the Dataset

The dataset is in CSV format; hence we use the `read_csv` function from Pandas.  
Immediately after importing, take a quick look at the data using the `head` function.

In [None]:
# Load the CSV file and check the format
advData = pd.read_csv('Advertising.csv')
advData.head()

### Format the Dataset

Drop the `Unnamed: 0` column as it contributes nothing to the problem.   
Rename the other columns for homogeneity in nomenclature and style.      

Check the format and vital statistics of the modified dataframe.     


In [None]:
# Drop the first column (axis = 1) by its name
advData = advData.drop('Unnamed: 0', axis = 1)

# Rename the other columns as per your choice
advData = advData.rename(columns={"TV": "TV", "radio": "RD", "newspaper" : "NP", "sales" : "Sales"})

# Check the modified dataset
advData.info()

### Distribution of Sales

You can use `plotly` to produce a cool histogram for `Sales`.

In [None]:
# Get Histogram from plotly.graph_objs (go)
trace = go.Histogram(x = advData['Sales'], histnorm = 'density')
layout = go.Layout(title = 'Sales Distribution')
data = [trace]
fig = go.Figure(data = data, layout = layout)
py.iplot(fig)

### Distribution of Other Variables

You can visualize cool boxplots with `plotly` too.

In [None]:
trace0 = go.Box(x = advData['TV'], showlegend = False, name = "TV")
trace1 = go.Box(x = advData['RD'], showlegend = False, name = "Radio")
trace2 = go.Box(x = advData['NP'], showlegend = False, name = "Newspaper")

data = [trace0, trace1, trace2]
py.iplot(data)

### Sales vs Other Variables

You can use `plotly` to produce cool multi-variate jointplots too.

In [None]:
fig = make_subplots(rows = 2, cols = 1, 
                    subplot_titles = ('TV vs. Sales', 
                                      'Radio vs. Sales'))

p1 = go.Scatter(
        x = advData['TV'],
        y = advData['Sales'],
        mode = 'markers', showlegend = False)

fig.append_trace(p1,1, 1)


p2 = go.Scatter(
        x = advData['RD'],
        y = advData['Sales'],
        mode = 'markers', showlegend = False)

fig.append_trace(p2, 2, 1)

fig['layout'].update(height = 1200, width = 1000)
py.iplot(fig)

### Relationship between the Variables

You can of course plot the heatmap for correlation as well.

In [None]:
# Correlation Matrix
print(advData.corr())

# Heatmap of the Correlation Matrix
trace = go.Heatmap(z = advData.corr(), 
                   x = advData.columns, 
                   y = advData.columns, 
                   colorscale = 'Hot',
                   reversescale = True)
data=[trace]
py.iplot(data, filename='labelled-heatmap')

#### References

More such Statistical Graphics are demonstrated at https://plot.ly/python/statistical-charts/     
Basic charts and plots by Plotly are demostrated at https://plot.ly/python/basic-charts/    
Fundamental components of Plotly are illustrated at https://plot.ly/python/plotly-fundamentals/    

---

## Cool Examples

Here are a few cool examples from the `plotly` website.

### 3D Interactive Plot

In [None]:
s = np.linspace(0, 2 * np.pi, 240)
t = np.linspace(0, np.pi, 240)
tGrid, sGrid = np.meshgrid(s, t)

r = 2 + np.sin(7 * sGrid + 5 * tGrid)  # r = 2 + sin(7s+5t)
x = r * np.cos(sGrid) * np.sin(tGrid)  # x = r*cos(s)*sin(t)
y = r * np.sin(sGrid) * np.sin(tGrid)  # y = r*sin(s)*sin(t)
z = r * np.cos(tGrid)                  # z = r*cos(t)

surface = go.Surface(x=x, y=y, z=z)
data = [surface]

layout = go.Layout(
    title='Parametric Plot',
    scene=dict(
        xaxis=dict(
            gridcolor='rgb(255, 255, 255)',
            zerolinecolor='rgb(255, 255, 255)',
            showbackground=True,
            backgroundcolor='rgb(230, 230,230)'
        ),
        yaxis=dict(
            gridcolor='rgb(255, 255, 255)',
            zerolinecolor='rgb(255, 255, 255)',
            showbackground=True,
            backgroundcolor='rgb(230, 230,230)'
        ),
        zaxis=dict(
            gridcolor='rgb(255, 255, 255)',
            zerolinecolor='rgb(255, 255, 255)',
            showbackground=True,
            backgroundcolor='rgb(230, 230,230)'
        )
    )
)

fig = go.Figure(data=data, layout=layout)
py.iplot(fig, filename='jupyter-parametric_plot')

### Gapminder style World Statistics

In [None]:
url = 'https://raw.githubusercontent.com/plotly/datasets/master/gapminderDataFiveYear.csv'
dataset = pd.read_csv(url)

years = ['1952', '1962', '1967', '1972', '1977', '1982', '1987', '1992', '1997', '2002', '2007']
# make list of continents
continents = []
for continent in dataset['continent']:
    if continent not in continents:
        continents.append(continent)
# make figure
figure = {
    'data': [],
    'layout': {},
    'frames': []
}

# fill in most of layout
figure['layout']['xaxis'] = {'range': [30, 85], 'title': 'Life Expectancy'}
figure['layout']['yaxis'] = {'title': 'GDP per Capita', 'type': 'log'}
figure['layout']['hovermode'] = 'closest'
figure['layout']['sliders'] = {
    'args': [
        'transition', {
            'duration': 400,
            'easing': 'cubic-in-out'
        }
    ],
    'initialValue': '1952',
    'plotlycommand': 'animate',
    'values': years,
    'visible': True
}
figure['layout']['updatemenus'] = [
    {
        'buttons': [
            {
                'args': [None, {'frame': {'duration': 500, 'redraw': False},
                         'fromcurrent': True, 'transition': {'duration': 300, 'easing': 'quadratic-in-out'}}],
                'label': 'Play',
                'method': 'animate'
            },
            {
                'args': [[None], {'frame': {'duration': 0, 'redraw': False}, 'mode': 'immediate',
                'transition': {'duration': 0}}],
                'label': 'Pause',
                'method': 'animate'
            }
        ],
        'direction': 'left',
        'pad': {'r': 10, 't': 87},
        'showactive': False,
        'type': 'buttons',
        'x': 0.1,
        'xanchor': 'right',
        'y': 0,
        'yanchor': 'top'
    }
]

sliders_dict = {
    'active': 0,
    'yanchor': 'top',
    'xanchor': 'left',
    'currentvalue': {
        'font': {'size': 20},
        'prefix': 'Year:',
        'visible': True,
        'xanchor': 'right'
    },
    'transition': {'duration': 300, 'easing': 'cubic-in-out'},
    'pad': {'b': 10, 't': 50},
    'len': 0.9,
    'x': 0.1,
    'y': 0,
    'steps': []
}

# make data
year = 1952
for continent in continents:
    dataset_by_year = dataset[dataset['year'] == year]
    dataset_by_year_and_cont = dataset_by_year[dataset_by_year['continent'] == continent]

    data_dict = {
        'x': list(dataset_by_year_and_cont['lifeExp']),
        'y': list(dataset_by_year_and_cont['gdpPercap']),
        'mode': 'markers',
        'text': list(dataset_by_year_and_cont['country']),
        'marker': {
            'sizemode': 'area',
            'sizeref': 200000,
            'size': list(dataset_by_year_and_cont['pop'])
        },
        'name': continent
    }
    figure['data'].append(data_dict)
    
# make frames
for year in years:
    frame = {'data': [], 'name': str(year)}
    for continent in continents:
        dataset_by_year = dataset[dataset['year'] == int(year)]
        dataset_by_year_and_cont = dataset_by_year[dataset_by_year['continent'] == continent]

        data_dict = {
            'x': list(dataset_by_year_and_cont['lifeExp']),
            'y': list(dataset_by_year_and_cont['gdpPercap']),
            'mode': 'markers',
            'text': list(dataset_by_year_and_cont['country']),
            'marker': {
                'sizemode': 'area',
                'sizeref': 200000,
                'size': list(dataset_by_year_and_cont['pop'])
            },
            'name': continent
        }
        frame['data'].append(data_dict)

    figure['frames'].append(frame)
    slider_step = {'args': [
        [year],
        {'frame': {'duration': 300, 'redraw': False},
         'mode': 'immediate',
       'transition': {'duration': 300}}
     ],
     'label': year,
     'method': 'animate'}
    sliders_dict['steps'].append(slider_step)

    
figure['layout']['sliders'] = [sliders_dict]

py.iplot(figure)

**Find out what else you can do with Plotly!** Explore on your own.