# Interactive Data Visualisations



## Introduction


In this lesson, we will introduce a new libraries (Plotly and cufflinks) to construct our interactive visualizations. Plotly is an open source visualization library, and cufflinks is a wrapper around Plotly that allows you to easily produce interactive visualizations.


In this lesson we will learn:

- The basics of interactive visuals.

- How we can extent these tools to create interactive Widgets. 

Before we jump into the lesson, let's make sure both of these libraries are installed.

In [None]:
#$ pip install plotly
#$ pip install cufflinks
#$ pip install chart-studio
# conda install -c conda-forge jupyterlab-plotly-extension




Let's also import the libraries we are going to need for this lesson and set cufflinks to offline mode so that we can just visualize our interactive charts in the Jupyter Notebook. If we did not specify, it would push each visualization created up to your Plotly account.

**NB: FOLLOW THESE STEPS: **

- pip install chart-studio
- conda install -c conda-forge jupyterlab-plotly-extension (to be able to run the plots in your notebook)
- pip install ipywidgets
- jupyter labextension install @jupyter-widgets/jupyterlab-manager (to be able to run action plots in your notebook)
- Import/call ALL of the following:

    - import chart_studio.plotly as py
    - import cufflinks as cf
    
    - import pandas as pd
    - import numpy as np
    
    - %matplotlib inline

    - cf.go_offline()
    
    - import chart_studio.plotly as py
    - import cufflinks as cf
        
    - import ipywidgets as widgets    
    - from ipywidgets import interact
    - from ipywidgets import interact, interactive, fixed, interact_manual



In [1]:
# Cufflinks is a library that connects the Pandas data frame with Plotly 
# enabling users to create visualizations directly from Pandas.
import chart_studio.plotly as py
import cufflinks as cf
import pandas as pd
%matplotlib inline

cf.go_offline()

## Our Data


The data set we will be using for this lesson is the Telco Customer Churn data set from Kaggle. Let's go ahead and import the data set using the Pandas read_csv method and take a look at what columns we have to work with.

In [2]:
df = pd.read_csv('churn.csv')
df.columns

Index(['customerID', 'gender', 'SeniorCitizen', 'Partner', 'Dependents',
       'tenure', 'PhoneService', 'MultipleLines', 'InternetService',
       'OnlineSecurity', 'OnlineBackup', 'DeviceProtection', 'TechSupport',
       'StreamingTV', 'StreamingMovies', 'Contract', 'PaperlessBilling',
       'PaymentMethod', 'MonthlyCharges', 'TotalCharges', 'Churn'],
      dtype='object')

In [3]:
# Show all columns 
df.head()

Unnamed: 0,customerID,gender,SeniorCitizen,Partner,Dependents,tenure,PhoneService,MultipleLines,InternetService,OnlineSecurity,...,DeviceProtection,TechSupport,StreamingTV,StreamingMovies,Contract,PaperlessBilling,PaymentMethod,MonthlyCharges,TotalCharges,Churn
0,7590-VHVEG,Female,0,Yes,No,1,No,No phone service,DSL,No,...,No,No,No,No,Month-to-month,Yes,Electronic check,29.85,29.85,No
1,5575-GNVDE,Male,0,No,No,34,Yes,No,DSL,Yes,...,Yes,No,No,No,One year,No,Mailed check,56.95,1889.5,No
2,3668-QPYBK,Male,0,No,No,2,Yes,No,DSL,Yes,...,No,No,No,No,Month-to-month,Yes,Mailed check,53.85,108.15,Yes
3,7795-CFOCW,Male,0,No,No,45,No,No phone service,DSL,Yes,...,Yes,Yes,No,No,One year,No,Bank transfer (automatic),42.3,1840.75,No
4,9237-HQITU,Female,0,No,No,2,Yes,No,Fiber optic,No,...,No,No,No,No,Month-to-month,Yes,Electronic check,70.7,151.65,Yes


To make our life easier, let us first convert the churns to zeros and ones. 

In [4]:
df.loc[df.Churn=='No','Churn'] = 0 
df.loc[df.Churn=='Yes','Churn'] = 1

In [6]:
df['Churn']

0       0
1       0
2       1
3       0
4       1
       ..
7038    0
7039    0
7040    0
7041    1
7042    0
Name: Churn, Length: 7043, dtype: int64

In [7]:
df = df.rename(columns={"Churn": "ChurnBinary"})

## Interactive Histograms

The first type of interactive visualization we will be generating with Plotly and cufflinks is a histogram. 

Histograms help us visualize how the values in a specific field are distributed. 

Let's take the TotalCharges column of our data set and create a basic interactive histogram for it.

In [8]:
data = df['TotalCharges']

data.iplot(kind='hist', xTitle='Total Charges', yTitle='Count', 
           title='Total Charges Distribution')
 

Instead of count, we can also get the percentage by using an additional argument: histnorm='percent'. 

In [9]:
data.iplot(kind='hist', xTitle='Total Charges', histnorm='percent', yTitle='Percentage', 
           title='Total Charges Distribution')

Now let us make a histogram with more dimensions. Here we will plot the monthly charge by payment method. 

In [11]:
data = df.pivot_table(values='MonthlyCharges', columns='PaymentMethod', index='customerID', aggfunc='sum')
data

PaymentMethod,Bank transfer (automatic),Credit card (automatic),Electronic check,Mailed check
customerID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
0002-ORFBO,,,,65.60
0003-MKNFE,,,,59.90
0004-TLHLJ,,,73.9,
0011-IGKFF,,,98.0,
0013-EXCHZ,,,,83.90
...,...,...,...,...
9987-LUTYD,,,,55.15
9992-RRAMN,,,85.1,
9992-UJOEL,,,,50.30
9993-LHIEB,,,,67.85


In [24]:
data.iplot(kind='hist', xTitle='Charge', histnorm = 'percent', yTitle='Percentage of people that Payed that Amount', 
           title='Monthly Charge by Payment Method')

This plot contains the information we want. However, there is a lot over overlap, which makes it hard to interpret the results.

We can remedy this by splitting this one plot into four subplots. We only need to included one additional argument: subplots=True.   

In [22]:
data.iplot(kind='hist', xTitle='Charge', subplots=True, histnorm = 'percent', yTitle='Percentage', 
           title='Monthly Charge by Payment Method')

## Interactive Line Charts. 

Let us now make some interactive line charts. 

In order to do so, we will look at the average churn rate for each tenure. 

Tenure is simply the number of months the customer has stayed with the company. 

In [25]:
data = df.groupby('tenure', as_index=False).agg({'ChurnBinary':'mean'})
data.head()

Unnamed: 0,tenure,ChurnBinary
0,0,0.0
1,1,0.619902
2,2,0.516807
3,3,0.47
4,4,0.471591


Keep in mind that that a higher churn rate means that a person is more likely to leave to company and go to a competitor or simply stops using the product. 

Let us now plot the development of the average churn rate for each tenure (months of being a customer). 

In [26]:
data.iplot(kind='line', x='tenure', xTitle='Tenure', color='blue',
           yTitle='Avg. Churn Rate', title='Avg. Churn Rate by Tenure')

This gives us a sense of what happens with aggregate churn rates, but how does that differ by demographics? 



In order to see this, we can create three pivot tables again to add this additional layer of granularity to the plot. 

In [27]:
gender = df.pivot_table(values='ChurnBinary', columns='gender', 
                        index='tenure', aggfunc='mean')

senior = df.pivot_table(values='ChurnBinary', columns='SeniorCitizen', 
                        index='tenure', aggfunc='mean')

partner = df.pivot_table(values='ChurnBinary', columns='Partner', 
                         index='tenure', aggfunc='mean')

In [28]:
gender.head()

gender,Female,Male
tenure,Unnamed: 1_level_1,Unnamed: 2_level_1
0,0.0,0.0
1,0.65493,0.589666
2,0.523077,0.509259
3,0.49505,0.444444
4,0.534091,0.409091


In [29]:
senior.head()

SeniorCitizen,0,1
tenure,Unnamed: 1_level_1,Unnamed: 2_level_1
0,0.0,
1,0.580645,0.860465
2,0.476923,0.697674
3,0.422857,0.8
4,0.466667,0.5


Before we can plot, we need to concatenate all three separate dataframes. 

In [30]:
data = pd.concat([gender, senior, partner], axis=1)
data.columns = ['Female', 'Male', 'NonSenior', 'Senior', 'Single', 'Partner']
data = data.reset_index() 

In [31]:
data

Unnamed: 0,tenure,Female,Male,NonSenior,Senior,Single,Partner
0,0,0.000000,0.000000,0.000000,,0.000000,0.000000
1,1,0.654930,0.589666,0.580645,0.860465,0.609709,0.673469
2,2,0.523077,0.509259,0.476923,0.697674,0.542105,0.416667
3,3,0.495050,0.444444,0.422857,0.800000,0.464052,0.489362
4,4,0.534091,0.409091,0.466667,0.500000,0.476562,0.458333
...,...,...,...,...,...,...,...
68,68,0.083333,0.096154,0.057471,0.307692,0.038462,0.108108
69,69,0.025000,0.127273,0.073171,0.153846,0.160000,0.057143
70,70,0.080645,0.105263,0.088235,0.117647,0.130435,0.083333
71,71,0.011905,0.058140,0.035971,0.032258,0.034483,0.035461


We now have the data in the format we need to create an interactive multi-line chart. To do that, we call the iplot method again and pass the appropriate values to each argument.

In [32]:
data.iplot(kind='line', x='tenure', xTitle='Tenure', 
           yTitle='Avg. Churn Rate', title='Avg. Churn Rate by Demographics')

## Interactive Scatter Plots


Another useful type of interactive visualization is the scatter plot. Like with line charts, one of the valuable features of interactivity with scatter plots is the ability to hover over each of the points and see their value.

In the following example, we capitalize on the advantages mentioned above while avoiding the disadvantages by filtering our data set for a subset that we would like to investigate (customers on one year contracts that pay with credit cards).

In [33]:
data = df[(df['Contract']=='One year') & (df['PaymentMethod']=='Credit card (automatic)')]

In [34]:
data.head()

Unnamed: 0,customerID,gender,SeniorCitizen,Partner,Dependents,tenure,PhoneService,MultipleLines,InternetService,OnlineSecurity,...,DeviceProtection,TechSupport,StreamingTV,StreamingMovies,Contract,PaperlessBilling,PaymentMethod,MonthlyCharges,TotalCharges,ChurnBinary
12,8091-TTVAX,Male,0,Yes,No,58,Yes,Yes,Fiber optic,No,...,Yes,No,Yes,Yes,One year,No,Credit card (automatic),100.35,5681.1,0
54,4667-QONEA,Female,1,Yes,Yes,60,Yes,No,DSL,Yes,...,Yes,Yes,No,Yes,One year,Yes,Credit card (automatic),74.85,4456.35,0
56,8769-KKTPH,Female,0,Yes,Yes,63,Yes,Yes,Fiber optic,Yes,...,No,No,Yes,Yes,One year,Yes,Credit card (automatic),99.65,6311.2,0
63,0557-ASKVU,Female,0,Yes,Yes,18,Yes,No,DSL,No,...,Yes,Yes,No,No,One year,Yes,Credit card (automatic),54.4,957.1,0
76,6416-JNVRK,Female,0,No,No,46,Yes,No,DSL,No,...,No,No,No,Yes,One year,No,Credit card (automatic),55.65,2688.85,0


We then call the iplot method to generate our interactive visualization. 

In [35]:
data.iplot(x='tenure', y='TotalCharges', categories='InternetService',
           xTitle='Tenure', yTitle='Total Charges',
           title='Charges vs. Tenure: One Year Contract, Credit Card Customers')

# iPyWidgets

Before the break, we covered how to make interactive, Javascript-based visualizations using Python and the plotly and cufflinks libraries. 

The interactivity doesn't have to stop there. Here we are going learn how to make our visualizations even more interactive via the use of widgets such as sliders, drop-down boxes, check boxes. 

To incorporate widgets into our visualizations, we will need to ensure that we have the iPyWidgets library installed.

In [None]:
# $ pip install ipywidgets
# jupyter labextension install @jupyter-widgets/jupyterlab-manager

Let's go ahead and import everything we are going to need for this lesson.

In [36]:
import chart_studio.plotly as py
import cufflinks as cf
import pandas as pd

from ipywidgets import interact
from ipywidgets import interact, interactive, fixed, interact_manual
import ipywidgets as widgets

cf.go_offline()
%matplotlib inline


Finally, for this part of the lesson we will continue using the same churn data set that we used for the plotly and cufflinks lesson, so make sure that is imported as well.

### The Interact Decorator

iPyWidgets library has a variety of functionality for creating interactive widgets, but the interact decorator is both the easiest way to get started and the most useful one. 


The interact decorator accepts a few different types of inputs, and the format of those inputs determines the type of widget that is displayed:




- **Slider**: a numeric value, (min, max), or (min, max, step);



- **Drop-Down Box**: a list or dictionary;



- **Check Box**: True or False values.


We will see examples of each of these in the sections below, both individually and then all together.

## Interactive Sliders


One of the most useful types of widgets you can use for numeric variables is the slider. Sliders allow you to modify numeric inputs that are fed to your visualizations.


In [37]:
@interact(bins = (8, len(df['tenure'].unique())))
def hist(bins):
    df['tenure'].iplot(kind='hist', bins=bins, title='Tenure Distribution')

interactive(children=(IntSlider(value=40, description='bins', max=73, min=8), Output()), _dom_classes=('widget…

### Interactive Drop-Down Boxes


Another useful widget when you have the same visualization that you'd like to view for either different fields or for different field values is the drop-down box. 

In [38]:
@interact(Selection=['gender', 'SeniorCitizen', 'Partner', 
                     'Dependents', 'InternetService', 'PaymentMethod'])

def linechart(Selection):
    data = df.pivot_table(values='ChurnBinary', columns=Selection,
                            index='tenure', aggfunc='mean').reset_index()
 
    data.iplot(kind='line', x='tenure', xTitle='Tenure', 
               yTitle='Avg. Churn Rate', title='Avg. Churn Rate by ' + Selection.title())

interactive(children=(Dropdown(description='Selection', options=('gender', 'SeniorCitizen', 'Partner', 'Depend…

Let's look at another example where we use multiple checkboxes to filter our data down to a point where we can investigate it at a granular level.


We create 4 different drop-down boxes containing the unique categorical values in the gender, Partner, InternetService, and PaymentMethod fields respectively. Essentially, we can focus on a subgroup of our data by reconfiguring the conditions in the drop-down boxed. 

In [39]:
@interact(Gender=list(df['gender'].unique()), 
          Partner=list(df['Partner'].unique()),
          Internet=list(df['InternetService'].unique()), 
          Payment=list(df['PaymentMethod'].unique())
         )

def scatter(Gender, Partner, Internet, Payment):
    data = df[(df['gender']==Gender) & 
              (df['Partner']==Partner) & 
              (df['InternetService']==Internet) & 
              (df['PaymentMethod']==Payment)]

    data.iplot(kind='scatter', x='tenure', y='MonthlyCharges', 
               categories='Contract', text='customerID', 
               xTitle='Tenure', yTitle='Monthly Charges',
               title='Charges vs. Tenure')

interactive(children=(Dropdown(description='Gender', options=('Female', 'Male'), value='Female'), Dropdown(des…

Note that we added a text argument to our iplot method that will show us each customer's customerID in addition to the monthly amount they are paying when we hover over the data points.

### Interactive Check Boxes



Interactive check boxes can also help you explore a data set, especially when there are binary fields whose impact you'd like to visualize. 


The way to do this is to map whether the check box is checked to a corresponding condition for the binary field. Once that is done, you'd just need to filter the data set based on those conditions.

Below is an example that does exactly this. The interact decorator has two True/False arguments which it will render as check boxes - one for Senior and one for PhoneService. Inside our barchart function, we write some conditional statements that will translate those True/False options into conditions that we can use to filter our data. 

In [40]:
@interact(Senior=True, PhoneService=False)

def barchart(Senior, PhoneService):
    if Senior==True:
        senior = df['SeniorCitizen']==1
    else:
        senior = df['SeniorCitizen']==0
    
    if PhoneService==True:
        phone = df['PhoneService']=='Yes'
    else:
        phone = df['PhoneService']=='No'
    
    data = df[(senior) & (phone)].groupby('PaymentMethod').agg({'ChurnBinary':'mean'}).reset_index()
    
    data.iplot(kind='bar', x='PaymentMethod', xTitle='Payment Method',
               yTitle='Avg. Churn Rate', color='blue', 
               title='Churn Rate by Payment Method')

interactive(children=(Checkbox(value=True, description='Senior'), Checkbox(value=False, description='PhoneServ…

## Summary

In this lesson we learnt:

- How to make histograms and line plots and scatterplots in Plotly. 

- How to make our plots even more interactive with iPyWidgets.