# Interactive Data Visualisations



## Introduction

So far in the program, we have covered a variety of ways to visualize information. We have seen histograms, bar charts, line charts, scatter plots, and even some more advanced visualizations like box plots, violin plots, and scatter matrices.

However, these visualizations have been static and, while perfectly suitable and informative, the world is moving in the direction of interactive visualization. When we add interactivity to our visualizations, one of the benefits we immediately derive is the ability to view values of different data points simply by hovering over them. We also get the ability to zoom in and out as well as remove and add back different series of data points, which allows us to explore our data in ways that we simply could not with static visualizations.

In the past, in order to construct interactive visualizations in the browser, you had to learn D3 or JavaScript. These days, there are Python libraries that sit on top of D3 which give us the ability to generate interactive visualizations directly from Python. In this lesson, we will be using a couple of these new libraries (Plotly and cufflinks) to construct our interactive visualizations. Plotly is an open source visualization library, and cufflinks is a wrapper around Plotly that allows you to easily produce interactive visualizations - often with a single line of code!

Before we jump into the lesson, let's make sure both of these libraries are installed.

In [None]:
#$ pip install plotly
#$ pip install cufflinks
#$ pip install chart-studio
# conda install -c conda-forge jupyterlab-plotly-extension




Let's also import the libraries we are going to need for this lesson and set cufflinks to offline mode so that we can just visualize our interactive charts in the Jupyter Notebook. If we did not specify, it would push each visualization created up to your Plotly account.

**NB: FOLLOW THESE STEPS: **

- pip install chart-studio
- conda install -c conda-forge jupyterlab-plotly-extension (to be able to run the plots in your notebook)
- pip install ipywidgets
- jupyter labextension install @jupyter-widgets/jupyterlab-manager (to be able to run action plots in your notebook)
- Import/call ALL of the following:

    - import chart_studio.plotly as py
    - import cufflinks as cf
    
    - import pandas as pd
    - import numpy as np
    
    - %matplotlib inline

    - cf.go_offline()
    
    - import chart_studio.plotly as py
    - import cufflinks as cf
        
    - import ipywidgets as widgets    
    - from ipywidgets import interact
    - from ipywidgets import interact, interactive, fixed, interact_manual



In [None]:
import chart_studio.plotly as py
import cufflinks as cf
import pandas as pd
%matplotlib inline

cf.go_offline()

## THE DATA
The data set we will be using for this lesson is the Telco Customer Churn data set from Kaggle. Let's go ahead and import the data set using the Pandas read_csv method and take a look at what columns we have to work with.

In [None]:
df = pd.read_csv('../data/churn.csv')
df.columns

In [None]:
df.head()

In [None]:
df.loc[df.Churn=='No','Churn'] = 0 
df.loc[df.Churn=='Yes','Churn'] = 1

In [None]:
df.head()

In [None]:
df = df.rename(columns={"Churn": "ChurnBinary"})

In [None]:
df.head()

In [None]:
#df['tenure']

## INTERACTIVE HISTOGRAMS
The first type of interactive visualization we will be generating with Plotly and cufflinks is a histogram. As mentioned in prior lessons, histograms help us visualize how the values in a specific field are distributed. For example, let's take the TotalCharges column of our data set and create a basic interactive histogram for it.

To do that we would call the iplot method and pass "hist" to the kind argument along with what we want the x and y axis to be named (to the xTitle and yTitle arguments respectively) and what title we want displayed on our visualization to the title argument. These last three arguments are present in every interactive chart we will generate.

In [None]:
data = df['TotalCharges']

data.iplot(kind='hist', xTitle='Total Charges', yTitle='Count', 
           title='Total Charges Distribution')

The resulting plot should look like the image above, only yours should be interactive! When you hover over each bar of the histogram, it should display the number of records that fall into the bin that bar represents.

Plotly and cufflinks allow us to do a few other interesting things with histograms. In the example below, we are creating a pivot table from our original data frame that has the different Internet Services represented in separate columns and Monthly Charges by customer as the values in the table.

We then create a histogram from this data set and, by default, it creates overlay histograms representing the Monthly Charge ranges for all three Internet Services. Note also that we have added a new histnorm argument and have passed it the value "percent." What this does is converts the y axis to show the percentage instead of the number of records each bin makes up.

In [None]:
data = df.pivot_table(values='MonthlyCharges', columns='InternetService', 
                      index='customerID', aggfunc='sum')

In [None]:
data

In [None]:
data.iplot(kind='hist', histnorm='percent', xTitle='Value', 
           yTitle='Percent', title='Monthly Charge by Internet Service')

In addition to being able to hover over components in the visualization and see their values, another useful feature of these types of interactive visualizations is that you can filter categories when there are more than one of them and they are included in a legend. For example, to remove the No category, simply click on it. It disappears and the visualization automatically rescales itself according to the remaining data points. If you click it again, it appears in the visualization again.

Another useful feature that is great for histograms is the subplot feature. When this value is included and set to True, instead of overlapping histograms, you get back a grid of histograms. In the example below, we create a pivot table where the columns represent different payment methods and the values represent monthly charges by customer. When we call the iplot method to generate histograms from this data and include the subplot argument, it returns four separate histograms - one for each type of payment method.

In [None]:
data = df.pivot_table(values='MonthlyCharges', columns='PaymentMethod', index='customerID', aggfunc='sum')

In [None]:
data.head()

In [None]:
# Make histogram 
data.iplot(kind='hist', xTitle='Value', yTitle='Percent', title='Monthly Charge by Payment Method')

In [None]:
# Scale to percentages 
data.iplot(kind='hist', histnorm='percent', xTitle='Value', yTitle='Percent', title='Monthly Charge by Payment Method')

In [None]:
# Divide into subplots
data.iplot(kind='hist', histnorm='percent', xTitle='Value', subplots=True, yTitle='Percent', title='Monthly Charge by Payment Method')

## INTERACTIVE BAR CHARTS
In addition to histograms, another useful type of interactive visualization is the interactive bar chart. Below is a basic example where we calculate the average churn rate for each payment method and then create a bar chart by setting the kind argument to "bar."

In [None]:
data = df.groupby('PaymentMethod', as_index=False).agg({"ChurnBinary":'mean'})

In [None]:
data.head()

In [None]:
data.iplot(kind='bar', x='PaymentMethod', xTitle='Payment Method', color='purple',
           yTitle='Avg. Churn %', title='Avg. Churn Rate by Payment Method')

Note that we also included a color argument to specify that we would like the bars to be green. Feel free to try changing it to whatever your favorite color is.

Basic bar charts are very helpful, but sometimes the need arises to create more complex bar charts, such as when we need to see bars in groups. To set this up with the data set we are currently working with, below we are creating three sets of pivot tables that each calculate average monthly charges by tenure level but for different demographic variables (gender, senior citizen, and partner). We combine all the three pivot tables together (concat columns) and sort according to the intuitive order of customer tenure level - new customers, regular customers, loyal customers, and very loyal customers.

## INTERACTIVE LINE CHARTS
In addition to histograms and bar charts, another very useful interactive visualization is the line chart. Interactive line charts allow us to not only visualize how a variable changes, but it can also let us hover over and find out the exact values even when there are a lot of points across the x axis.

In the example below, we are creating calculating the average churn rate by tenure and then creating a basic single line chart that shows the overall decrease in churn rate as tenure increases. Note that all we had to do was pass "line" to the kind argument and then the appropriate fields, titles, and colors to their respective arguments.

In [None]:
data = df.groupby('tenure', as_index=False).agg({'ChurnBinary':'mean'})

In [None]:
data.head()

In [None]:
data.iplot(kind='line', x='tenure', xTitle='Tenure', color='blue',
           yTitle='Avg. Churn Rate', title='Avg. Churn Rate by Tenure')

This gives us a sense of what happens with aggregate churn rates, but how does that differ by demographics? To find out, we can take the same approach as we did with our grouped bar chart. Below, we create three pivot tables again and combine them.

In [None]:
gender = df.pivot_table(values='ChurnBinary', columns='gender', 
                        index='tenure', aggfunc='mean')

senior = df.pivot_table(values='ChurnBinary', columns='SeniorCitizen', 
                        index='tenure', aggfunc='mean')

partner = df.pivot_table(values='ChurnBinary', columns='Partner', 
                         index='tenure', aggfunc='mean')

In [None]:
gender

In [None]:
data = pd.concat([gender, senior, partner], axis=1)
data.columns = ['Female', 'Male', 'NonSenior', 'Senior', 'Single', 'Partner']
data = data.reset_index()

In [None]:
data

We now have the data in the format we need to create an interactive multi-line chart. To do that, we call the iplot method again and pass the appropriate values to each argument.

In [None]:
data.iplot(kind='line', x='tenure', xTitle='Tenure', 
           yTitle='Avg. Churn Rate', title='Avg. Churn Rate by Demographics')

## INTERACTIVE SCATTER PLOTS
Another useful type of interactive visualization is the scatter plot. Like with line charts, one of the valuable features of interactivity with scatter plots is the ability to hover over each of the points and see their value. It is often also useful to apply the filtering by group like we did with the histograms earlier. However, one drawback about creating interactive scatter plots is that scatter plots tend to have a lot of data points, so creating interactive ones can quickly get computationally intensive and the results can look a bit cluttered.

In the example below, we capitalize on the advantages mentioned above while avoiding the disadvantages by filtering our data set for a subset that we would like to investigate (customers on one year contracts that pay with credit cards).

In [None]:
data = df[(df['Contract']=='One year') & (df['PaymentMethod']=='Credit card (automatic)')]

In [None]:
data.head()

We then call the iplot method to generate our interactive visualization. Note that to create a scatter plot, we did not need to pass anything to the kind argument but we did need to specify which fields would be represented on both the x and y axes. We also added a new categories argument that we passed the type of Internet Service to so that we can distinguish between those groups in our visualization.

In [None]:
data.iplot(x='tenure', y='TotalCharges', categories='InternetService',
           xTitle='Tenure', yTitle='Total Charges',
           title='Charges vs. Tenure: One Year Contract, Credit Card Customers')

Due to our filtering of the data, the resulting visualization contains a reasonable number of data points and allows us to clearly distinguish between the different groups in our data.

## INTERACTIVE BUBBLE CHARTS
Bubble charts are essentially scatter plots with an additional dimension - the size of the bubbles. This can provide us with additional insights but also makes it so that the visualization has the potential to get cluttered even easier than with a scatter plot. Because of this, it is best to use interactive bubble charts when we have filtered our data significantly to a number of data points that will be reasonable for this type of visualization.

Below, we are starting with the data subset that we used for our scatter plot above and we are filtering it even further by keeping only the customers that do not have phone service. This makes our data sufficiently small to visualize using a bubble chart.

To create the bubble chart, we call iplot, pass "bubble" to the kind argument, and then fill in all the rest of the arguments with appropriate values.

In [None]:
# Read and reset the df db
df = pd.read_csv('../data/churn.csv')

In [None]:
df.columns

In [None]:
# Change churn to binary churn
df.loc[df.Churn=='No','Churn'] = 0 
df.loc[df.Churn=='Yes','Churn'] = 1

In [None]:
# Rename churn column
df = df.rename(columns={"Churn": "ChurnBinary"})

In [None]:
df.dtypes

In [None]:
# Coercefully convert column to numeric
data = pd.to_numeric(df['TotalCharges'],errors='coerce')

In [None]:
# Rename the column so as to prevent double name 
data = data.rename(columns={"TotalCharges": "TotalChargesFloat"})

In [None]:
data

In [None]:
df = pd.concat((data, df), axis=1)
df.head()

In [None]:
df = df.rename(columns={0: "TotalChargesFloat"})

In [None]:
df.dtypes

In [None]:
df = df[df['TotalChargesFloat'].isnull() == False]
df.head()

In [None]:
df = df[df['PhoneService']=='No']

In [None]:
df.iplot(kind='bubble', x='tenure', y='MonthlyCharges', size='TotalChargesFloat',
           categories='gender', xTitle='Tenure', yTitle='Total Charges',
           title='Charges vs. Tenure: One Year Contract, Credit Card Customers')

The result is an informative, interactive bubble chart that conveys information about both monthly and total charges, tenure, and gender for this specific segment of customers.

## INTERACTIVE HEATMAPS
The last type of interactive visualization we will cover in this lesson is the heatmap. Heatmaps are useful for seeing how relatively high or low values are across combinations of categories. For example, below we create a pivot table containing average churn rates by amount of online Back-Ups and tenure.

In [None]:
#df.dtypes

In [None]:
data = df.pivot_table(values='ChurnBinary', columns='OnlineBackup', 
                      index='tenure', aggfunc='mean')

In [None]:
data.head()

From there, we call iplot, pass "heatmap" to the kind argument, specify an appropriate colorscale, and then fill in the rest of the arguments. This generates an informative heatmap that looks like the following.

In [None]:
data.iplot(kind='heatmap', colorscale='YlOrRd', xTitle='Tenure', 
           yTitle='Online Back-Up', title='Online Back-Up by Tenure')

In this heatmap, we can see that the High and Very High categories are where the highest churn rates are occurring, particularly early in the customer tenure.

## Summary 

In this lesson, we have explored a variety of interactive visualizations. We began the lesson by introducing the plotly and cufflinks libraries, which when used together make generating interactive visualizations as easy as writing a single line of Python code. We then looked at examples of how to create interactive histograms, bar charts, line charts, scatter plots, bubble charts, and heatmaps. We hope that the ease with which we were able to create all of these makes you strongly consider incorporating interactive visualizations into your analytical workflow.

# iPyWidgets

In the last lesson, we covered how to make interactive, Javascript-based visualizations using Python and the plotly and cufflinks libraries. This provided us with the ability to hover over the plot points in the chart to see the values, and it also enabled us to include and exclude groups from the visualization by simply clicking on them in the legend.

The interactivity doesn't have to stop there. In this lesson, we are going learn how to make our visualizations even more interactive via the use of widgets such as sliders, drop-down boxes, check boxes, and text boxes. These widgets are going to control aspects of the information that get passed to our visualizations. Using these widgets to change values is going to cause the visualizations to change themselves. We will be using these widgets to extend the interactivity of charts generated with plotly and cufflinks.

To incorporate widgets into our visualizations, we will need to ensure that we have the iPyWidgets library installed.

In [None]:
# $ pip install ipywidgets
# jupyter labextension install @jupyter-widgets/jupyterlab-manager

Let's also go ahead and import everything we are going to need for this lesson.

In [None]:
import chart_studio.plotly as py
import cufflinks as cf
import pandas as pd
%matplotlib inline
from ipywidgets import interact
from ipywidgets import interact, interactive, fixed, interact_manual
import ipywidgets as widgets

cf.go_offline()

Finally, for this lesson we will continue using the same churn data set that we used for the plotly and cufflinks lesson, so make sure that is imported as well.

### The Interact Decorator

IPyWidgets library has a variety of functionality for creating interactive widgets, but the interact decorator is both the easiest way to get started and the most useful, so we will be focusing it for this lesson.

The interact decorator accepts a few different types of inputs, and the format of those inputs determines the type of widget that is displayed.

- Slider: a numeric value, (min, max), or (min, max, step)
- Drop-Down Box: a list or dictionary
- Check Box: True or False values
- Text Box: a string enclosed in quotes

We will see examples of each of these in the sections below, both individually and then all together.

## Interactive Sliders

One of the most useful types of widgets you can use for numeric variables is the slider. Sliders allow you to modify numeric inputs that are fed to your visualizations.

We can see a basic example of this with the histogram below. One of the challenges of working with histograms is determining an appropriate number of bins. Widgets can help us with this by allowing us to quickly and easily view different numbers of bins without having to manually input them into our code and regenerate the visualization.

We can use the interact decorator to create a slider widget where we specify a range of bins (from as few as 8 to as many as there are unique tenures). Then, we can simply define a hist function (it really doesn't matter what you name it) that accepts the dynamic number of bins and plugs them in to a plotly histogram so that the number of bins in the visualization updates as you modify the values with the slider.

In [None]:
@interact(bins = (8, len(df['tenure'].unique())))
def hist(bins):
    df['tenure'].iplot(kind='hist', bins=bins, title='Tenure Distribution')

### Interactive Drop-Down Boxes
Another useful widget when you have the same visualization that you'd like to view for either different fields or for different field values is the drop-down box. To generate one, we can pass a list to our interact decorator consisting of the fields we would like to be able to choose from. We can then define a linechart function that accepts the input we have chosen from the drop-down box, pivots our data set so that the columns represent the categorical values in the field we have chosen, and then generates a plotly line chart with lines showing changes in churn rate for each category.

In [None]:
@interact(Selection=['gender', 'SeniorCitizen', 'Partner', 
                     'Dependents', 'InternetService', 'PaymentMethod'])

def linechart(Selection):
    data = df.pivot_table(values='ChurnBinary', columns=Selection,
                            index='tenure', aggfunc='mean').reset_index()

    data.iplot(kind='line', x='tenure', xTitle='Tenure', 
               yTitle='Avg. Churn Rate', title='Avg. Churn Rate by ' + Selection.title())

As you can see, having the check box where we can choose the field we want to see saves us from having to jump back into the code, create a pivot table for the specific field, and then regenerate the visualization. This has the potential to save us a significant amount of time in our data exploration workflow.

Let's look at another example where we use multiple checkboxes to filter our data down to a point where we can investigate it at a granular level.

Below, we are creating 4 different drop-down boxes containing the unique categorical values in the gender, Partner, InternetService, and PaymentMethod fields respectively. We then define a scatter function where we filter our data down by the values chosen in each of the drop-down boxes and then generate a scatter plot that plots how much each customer is charged by their tenure and color codes the points by the type of contract they have.

In [None]:
@interact(Gender=list(df['gender'].unique()), 
          Partner=list(df['Partner'].unique()),
          Internet=list(df['InternetService'].unique()), 
          Payment=list(df['PaymentMethod'].unique())
         )

def scatter(Gender, Partner, Internet, Payment):
    data = df[(df['gender']==Gender) & 
              (df['Partner']==Partner) & 
              (df['InternetService']==Internet) & 
              (df['PaymentMethod']==Payment)]

    data.iplot(kind='scatter', x='tenure', y='MonthlyCharges', 
               categories='Contract', text='customerID', 
               xTitle='Tenure', yTitle='Monthly Charges',
               title='Charges vs. Tenure')

Note that we added a text argument to our iplot method that will show us each customer's customerID in addition to the monthly amount they are paying when we hover over the data points.

### Interactive Check Boxes
Interactive check boxes can also help you explore a data set, especially when there are binary fields whose impact you'd like to visualize. The way to do this is to map whether the check box is checked to a corresponding condition for the binary field. Once that is done, you'd just need to filter the data set based on those conditions.

Below is an example that does exactly this. The interact decorator has two True/False arguments which it will render as check boxes - one for Senior and one for PhoneService. Inside our barchart function, we write some conditional statements that will translate those True/False options into conditions that we can use to filter our data. We then apply those filters and group the data by PaymentMethod, calculating the average churn rate for each. Finally, we generate a bar chart that displays the average churn rate for each payment method based on the filters applied via the check boxes.

In [None]:
@interact(Senior=True, PhoneService=False)

def barchart(Senior, PhoneService):
    if Senior==True:
        senior = df['SeniorCitizen']==1
    else:
        senior = df['SeniorCitizen']==0
    
    if PhoneService==True:
        phone = df['PhoneService']=='Yes'
    else:
        phone = df['PhoneService']=='No'
    
    data = df[(senior) & (phone)].groupby('PaymentMethod').agg({'ChurnBinary':'mean'}).reset_index()
    
    data.iplot(kind='bar', x='PaymentMethod', xTitle='Payment Method',
               yTitle='Avg. Churn Rate', color='blue', 
               title='Churn Rate by Payment Method')

### Interactive Text Boxes

The last type of widget we are going to cover in this lesson is the interactive text box. These are good for filtering when you have categorical variables that are different but have some string in common. For example, the PaymentMethod field in our data set has four unique values.


In [None]:
import numpy as np

In [None]:
df['PaymentMethod'].unique()

np.array(['Electronic check', 'Mailed check', 'Bank transfer (automatic)',
       'Credit card (automatic)'], dtype=object)

In [None]:
@interact(Payment='')

def chart(Payment):
    data = df[df['PaymentMethod'].str.contains(Payment)]
    data = data.groupby('tenure').agg({'MonthlyCharges':'sum'}).reset_index()
    custom_dict = {'New': 0, 'Regular': 1, 'Loyal': 2, 'Very Loyal' : 3}  
    data = data.iloc[data['tenure'].map(custom_dict).argsort()].set_index('tenure')
    
    data.iplot(kind='bar', xTitle='Values')

Notice that there are similarities between pairs of these unique categories - two of them have the string 'check' in common and the other two have the string 'automatic' in common. We can use a text box to provide a flexible means by which we can visualize any of these options based on the unique strings they contain or groups of them based on their co-occurring strings.

To do this, we will once again use the interact decorator, passing a Payment argument with a blank string. This will create a text box into which we can type any string we want. In our chart function, we will then filter our data set to just the payment methods that contain the string typed into the text box. We then aggregate, sort, and visualize as a bar chart.

## Summary

In this lesson, we have introduced iPyWidgets and how we can use them to easily add more interactivity into our visualizations. We learned about the powerful interact decorator and covered examples of each type of widget (sliders, drop-down boxes, check boxes, and text boxes). We covered each of these individually, but there is nothing stopping you from combining several widgets to create very intricately interactive visualizations. In fact, we encourage you to challenge yourself to do so. You already have all the tools you need in your arsenal, and incorporating widgets into your analytical workflow can allow you to more easily explore your data and while saving you a significant amount of time.