# Interactive  data analysis with dropdown menu Ipywidgets and Plotly in Jupyter Notebook. #

An example of how to set up an interactive dropdown menu widgets and using Plotly to display the outcome of database analysis in Jupyter Notebook using IPython and Pandas. 

## The Challenge ##

Recently, while working on the project for one of my graduate classes, I was faced with the question of what is the best way to present database analysis of the multiple datasets I was processing in the Jupyter notebook. Each of these databases contained information collected over a number of years and also had multiple stratification levels such as gender and race for each of the different questions asked in that particular database. In addition, each database had information compiled by the state. It is cumbersome and not efficient to have to modify the code and to run each cell over again each time you want to look at the data from a different state or view data by race or gender, for example. The solution to this problem lies in [ipywidgets.](https://ipywidgets.readthedocs.io/en/latest/)   Ipywidgets are interactive HTML widgets for Jupyter notebooks and the IPython kernel. They give the user the ability to interact with the data and visualize the changes in the data in a quick, easy and efficient manner. 

In this article  I will show how to set up interactive dropdown menu widgets to do a simple data analysis on the example of 'Alzheimer's Disease and Healthy Aging Data' CDC database. The 3 dropdown menu widgets will be built to create a user-friendly menu where one can choose a state, a stratification level (gender, race or overall) and a question the user wants to see information for. I will use Plotly, the interactive graphing library, to display the graphical representation of the information stored in the database. 

 
The csv file used in this article is publicly available to download on [CDC website:](https://chronicdata.cdc.gov/Healthy-Aging/Alzheimer-s-Disease-and-Healthy-Aging-Data/hfr9-rurv).

#### First, let’s import all libraries and extensions needed: ####

In [1]:
import numpy as np
import pandas as pd
import textwrap

import ipywidgets as widgets
from ipywidgets import interact, interact_manual
import IPython.display
from IPython.display import display, clear_output

import plotly.graph_objects as go

#### Second, let’s get the cvs file ready to work on: ####

+ For simplicity of the output, we will only look at the following 5 questions being evaluated in the dataset:


    1. Percentage of older adults who are currently obese, with a body mass index (BMI) of 30 or more. 

    2. Percentage of older adults who have not had any leisure time physical activity in the past month. 

    3. Percentage of older adults with a lifetime diagnosis of depression.

    4. Percentage of older adults who have been told they have high blood pressure who report currently 
       taking medication for their high blood pressure. 

    5. Percentage of older adults who are eating 3 or more vegetables daily.

+ Columns that are not needed will be dropped.

In [9]:
# saving common path to use to read in dataset
path = ''

healthy_aging_data = pd.read_csv(path + 'Alzheimer_s_Disease_and_Healthy_Aging_Data.csv')

# # healthy_aging_data = healthy_aging_data.drop(columns = \
#             {'Datasource', 'Class', 'DataValueTypeID', 'Response', 'Data_Value_Alt', 'Data_Value_Footnote_Symbol', \
#              'Data_Value_Footnote', 'Low_Confidence_Limit', 'High_Confidence_Limit', 'Sample_Size', \
#              'StratificationCategory3', 'Stratification3', 'ClassID', 'TopicID', 'QuestionID', 'ResponseID', \
#              'LocationID' , 'Geolocation', 'StratificationCategoryID3', 'StratificationID3', 'Report', \
#             'YearStart', 'LocationDesc', 'Topic', 'Data_Value_Type', 'StratificationID1', 'RowId', \
#             'StratificationCategory2', 'Stratification2', 'StratificationID1', 'StratificationCategoryID1'})

questions_list = ['Percentage of older adults who are currently obese, with a body mass index (BMI) of 30 or more', \
                 'Percentage of older adults who have not had any leisure time physical activity in the past month', \
                 'Percentage of older adults who have been told they have high blood pressure who ' \
                 'report currently taking medication for their high blood pressure', \
                 'Percentage of older adults who are eating 3 or more vegetables daily', \
                 'Percentage of older adults with a lifetime diagnosis of depression']

display(healthy_aging_data)

Unnamed: 0,version https://git-lfs.github.com/spec/v1
0,oid sha256:ecfdfc49be23ecf32ee953563542360e807...
1,size 77422936


#### Now, let’s set up all the functions needed for graphical representation of the information stored in the 'Alzheimer's Disease and Healthy Aging Data' database: ####

+ For simplicity of the output, we will only concentrate on the all-ages group of datapoints.

+ Only data available for the years 2015 through 2019 will be analyzed.

Function plot_healthy_aging_data_gender takes 2 letter state abbreviation, 3 dataframes with the information needed to make a Plotly graph and a question. This function will be selected if a user specifies in the dropdown widget that they wish to see the data plotted by gender.

In [10]:
def plot_healthy_aging_data_gender(state_to_plot_gender, state_to_plot_race, state_to_plot_overall, state_name, question):
   
    state_to_plot = state_to_plot_gender.sort_values('YearEnd').groupby('StratificationID2')
    
    state_to_plot_male = state_to_plot.get_group('MALE')
    state_to_plot_female = state_to_plot.get_group('FEMALE')
        
    fig = go.Figure()
     
    fig.add_trace(go.Scatter(x=state_to_plot_male.YearEnd,
            y=state_to_plot_male.Data_Value,
                    mode='lines+markers',
                    name='Male'))
    
    fig.add_trace(go.Scatter(x=state_to_plot_female.YearEnd,
            y=state_to_plot_female.Data_Value,
                    mode='lines+markers',
                    name='Female'))
    
    question = textwrap.fill(question, width=50)   
    title = ' '.join(question) + '  in  ' + state_name
    print('\n')
    print(title)
    
    fig.update_layout(
        xaxis_title="Year",
        yaxis_title="%",
        font=dict(
            family="Courier New, monospace",
            size=18,
            color="RebeccaPurple"))
    
    layout = dict(
        xaxis=dict(
        tickmode="array",
        tickvals=state_to_plot_male['YearEnd'].astype(int),
        ticktext=state_to_plot_male['YearEnd'],
        tickformat='%Y',
        tickangle=45))

    fig.update_layout(layout)
    fig.update_yaxes(rangemode="tozero")
    
    fig.show()

Function plot_healthy_aging_data_race takes 2 letter State abbreviation, 3 dataframes with the information needed to make a Plotly graph and a question. This function will be selected if a user specifies in the dropdown widget that they wish to see the data plotted by race.

In [11]:
def plot_healthy_aging_data_race(state_to_plot_gender, state_to_plot_race, state_to_plot_overall, state_name, question):

    state_to_plot_race = state_to_plot_race.loc[state_to_plot_race["Data_Value"] != 'NaN']
    
    fig = go.Figure()
    
    if state_to_plot_race.isin(['NAA']).any().any() & state_to_plot_race.isin(['ASN']).any().any():
        
        state_to_plot_race = state_to_plot_race.sort_values('YearEnd').groupby('StratificationID2')
        
        state_to_plot_white = state_to_plot_race.get_group('WHT')
        state_to_plot_hispanic = state_to_plot_race.get_group('HIS')
        state_to_plot_black = state_to_plot_race.get_group('BLK')
        state_to_plot_asian = state_to_plot_race.get_group('ASN')
        state_to_plot_native = state_to_plot_race.get_group('NAA')
        
        fig.add_trace(go.Scatter(x=state_to_plot_white.YearEnd,
                y=state_to_plot_white.Data_Value,
                        mode='lines+markers',
                        name='White'))

        fig.add_trace(go.Scatter(x=state_to_plot_hispanic.YearEnd,
                y=state_to_plot_hispanic.Data_Value,
                        mode='lines+markers',
                        name='Hispanic'))

        fig.add_trace(go.Scatter(x=state_to_plot_black.YearEnd,
                y=state_to_plot_black.Data_Value,
                        mode='lines+markers',
                        name='African-American'))

        fig.add_trace(go.Scatter(x=state_to_plot_asian.YearEnd,
                y=state_to_plot_asian.Data_Value,
                        mode='lines+markers',
                        name='Asian'))

        fig.add_trace(go.Scatter(x=state_to_plot_native.YearEnd,
                y=state_to_plot_native.Data_Value,
                        mode='lines+markers',
                        name='Native American'))
        
    elif state_to_plot_race.isin(['ASN']).any().any():
        
        state_to_plot_race = state_to_plot_race.sort_values('YearEnd').groupby('StratificationID2')
    
        state_to_plot_white = state_to_plot_race.get_group('WHT')
        state_to_plot_hispanic = state_to_plot_race.get_group('HIS')
        state_to_plot_black = state_to_plot_race.get_group('BLK')
        state_to_plot_asian = state_to_plot_race.get_group('ASN')
        
        fig.add_trace(go.Scatter(x=state_to_plot_white.YearEnd,
                y=state_to_plot_white.Data_Value,
                        mode='lines+markers',
                        name='White'))

        fig.add_trace(go.Scatter(x=state_to_plot_hispanic.YearEnd,
                y=state_to_plot_hispanic.Data_Value,
                        mode='lines+markers',
                        name='Hispanic'))

        fig.add_trace(go.Scatter(x=state_to_plot_black.YearEnd,
                y=state_to_plot_black.Data_Value,
                        mode='lines+markers',
                        name='African-American'))

        fig.add_trace(go.Scatter(x=state_to_plot_asian.YearEnd,
                y=state_to_plot_asian.Data_Value,
                        mode='lines+markers',
                        name='Asian'))
        
    else:
        
        state_to_plot_race = state_to_plot_race.sort_values('YearEnd').groupby('StratificationID2')
        
        state_to_plot_white = state_to_plot_race.get_group('WHT')
        state_to_plot_hispanic = state_to_plot_race.get_group('HIS')
        state_to_plot_black = state_to_plot_race.get_group('BLK')
        
        fig.add_trace(go.Scatter(x=state_to_plot_white.YearEnd,
            y=state_to_plot_white.Data_Value,
                    mode='lines+markers',
                    name='White'))
    
        fig.add_trace(go.Scatter(x=state_to_plot_hispanic.YearEnd,
                y=state_to_plot_hispanic.Data_Value,
                        mode='lines+markers',
                        name='Hispanic'))

        fig.add_trace(go.Scatter(x=state_to_plot_black.YearEnd,
                y=state_to_plot_black.Data_Value,
                        mode='lines+markers',
                        name='African-American'))        
    
    question = textwrap.fill(question, width=50)   
    title = ' '.join(question) + '  in  ' + state_name
    print('\n')
    print(title)
    
    fig.update_layout(
        xaxis_title="Year",
        yaxis_title="%",
        font=dict(
            family="Courier New, monospace",
            size=18,
            color="RebeccaPurple"))
    
    layout = dict(
        xaxis=dict(
        tickmode="array",
        tickvals=state_to_plot_white['YearEnd'].astype(int),
        ticktext=state_to_plot_white['YearEnd'],
        tickformat='%Y',
        tickangle=45))

    fig.update_layout(layout)
    fig.update_yaxes(rangemode="tozero")
    
    fig.show()

Function plot_healthy_aging_data_overall takes 2 letter State abbreviation, 3 dataframes with the information needed to make a Plotly graph and a question. This function will be selected if a user specifies in the dropdown widget that they wish to see the data plotted by overall stratification category.

In [12]:
def plot_healthy_aging_data_overall(state_to_plot_gender, state_to_plot_race, state_to_plot_overall, state_name, question):
        
    state_to_plot_overall = state_to_plot_overall.sort_values('YearEnd') \
        .groupby('StratificationID2').get_group('OVERALL')
    
    fig = go.Figure()
        
    fig.add_trace(go.Scatter(x=state_to_plot_overall.YearEnd,
            y=state_to_plot_overall.Data_Value,
                    mode='lines+markers',
                    name='Overall'))
  
    question = textwrap.fill(question, width=50)   
    title = ' '.join(question) + '  in  ' + state_name
    print('\n')
    print(title)
    
    fig.update_layout(
        xaxis_title="Year",
        yaxis_title="%",
        font=dict(
            family="Courier New, monospace",
            size=18,
            color="RebeccaPurple"))
    
    layout = dict(
        xaxis=dict(
        tickmode="array",
        tickvals=state_to_plot_overall['YearEnd'].astype(int),
        ticktext=state_to_plot_overall['YearEnd'],
        tickformat='%Y',
        tickangle=45))

    fig.update_layout(layout)
    fig.update_yaxes(rangemode="tozero")
    
    fig.show()

Function dataset_analysis_aging queries the original dataset to partition the results in 3 different dataframes based on race, gender or overall (all races, genders and ages) stratification category. This function is designed to pull the required information needed to make a Plotly graph once the user has selected the inputs.

As mentioned previously, for simplicity of the output, we will only concentrate on the all-ages group of datapoints which is labeled as Overall in the 'Stratification1' column of the dataset.

In [13]:
def dataset_analysis_aging(dataset, state_name, question):
           
    data_by_question = dataset.query('Question == @question')

    data_by_question_by_state = data_by_question.groupby('LocationAbbr').get_group(state_name) \
            .query("YearEnd == 2019 | YearEnd == 2018 | YearEnd == 2017 | YearEnd == 2016 | YearEnd == 2015")

    data_by_state_age = data_by_question_by_state.groupby('Stratification1').get_group('Overall')

    data_by_state_age_gender_race = data_by_state_age \
            .query("StratificationCategoryID2 == 'RACE' | StratificationCategoryID2 == 'GENDER' |  \
                   StratificationCategoryID2 == 'OVERALL'")

    data_by_state_age_gender = data_by_state_age_gender_race \
            .groupby('StratificationCategoryID2').get_group('GENDER')

    data_by_state_age_race = data_by_state_age_gender_race \
            .groupby('StratificationCategoryID2').get_group('RACE')
    
    data_by_state_age_overall = data_by_state_age_gender_race \
            .groupby('StratificationCategoryID2').get_group('OVERALL')

    data_by_state_age_gender_output_table = pd.DataFrame(data_by_state_age_gender,
            columns = ['YearEnd', 'Data_Value', 'StratificationID2'])

    data_by_state_age_race_output_table = pd.DataFrame(data_by_state_age_race,
            columns = ['YearEnd', 'Data_Value', 'StratificationID2'])
    
    data_by_state_age_overall_output_table = pd.DataFrame(data_by_state_age_overall,
            columns = ['YearEnd', 'Data_Value', 'StratificationID2'])
   
    return data_by_state_age_gender_output_table, data_by_state_age_race_output_table, \
            data_by_state_age_overall_output_table, state_name, question

Functions dataset_analysis_aging_gender, dataset_analysis_aging_race and dataset_analysis_aging_overall are helper functions used to pass the dataset info for the state, question and stratification category(gender, race or overall) to dataset_analysis_aging function. The returned dataframes are then passed to the corresponding stratification category plotting function.

In [14]:
def dataset_analysis_aging_gender(dataset, state_name, question):

    output_table = dataset_analysis_aging(dataset, state_name, question)
    
    plot_healthy_aging_data_gender(output_table[0], output_table[1], output_table[2], state_name, question)
              
def dataset_analysis_aging_race(dataset, state_name, question):
    
    output_table = dataset_analysis_aging(dataset, state_name, question)
    
    plot_healthy_aging_data_race(output_table[0], output_table[1], output_table[2], state_name, question)
        
def dataset_analysis_aging_overall(dataset, state_name, question):
    
    output_table = dataset_analysis_aging(dataset, state_name, question)
    
    plot_healthy_aging_data_overall(output_table[0], output_table[1], output_table[2], state_name, question) 

Now we are ready to set up 3 dropdown menu widgets to be used for the data analysis of 'Alzheimer's Disease and Healthy Aging Data' database. This will be done by writing a function that creates a dropdown menu for choosing a particular state, question and stratification category and passes this information into dataset analysis functions to get a specific output.

Function dropdown_menu_widget_healthy_aging takes in a dataset related to healthy aging, as well as, 3 helper functions and a list of questions to query over. Each of the helper functions are used to control the UI-interface for selecting a specific state, question, and stratification, i.e., gender, race, or overall data.

This function at first returns a dropdown menu to choose a state, a question and stratification category to be analyzed and after the user choses the inputs, the function returns data analysis based on the dropdown menu choices made.

In [15]:
def dropdown_menu_widget_healthy_aging(dataset, dataset_analysis_function_gender, dataset_analysis_function_race, 
                                       dataset_analysis_function_overall, questions):
    
    output = widgets.Output()

    dropdown_state = widgets.Dropdown(options = sorted(dataset.LocationAbbr.unique()), value=None, description='State:')
    dropdown_question = widgets.Dropdown(options = questions, value=None, description='Question:')
    dropdown_stratification = widgets.Dropdown(options = dataset.StratificationCategoryID2.unique(), 
                                               value=None, description='Stratification:')
    
    for question in questions:
        
        def output_by_state(state, question, stratification):
            """
            Takes in a state value, the specific question from the for loop and the specified stratification category.
            This function is called by the dropdown handlers below to pull the data based on user-input.
            """
            try:
                if stratification == 'GENDER':                
                    output_data = dataset_analysis_function_gender(dataset, state, question)
                
                elif stratification == 'RACE':
                    output_data = dataset_analysis_function_race(dataset, state, question)
                
                elif stratification == 'OVERALL':
                    output_data = dataset_analysis_function_overall(dataset, state, question)

                with output:
                    display(output_data)
            except KeyError:
                if dropdown_question.value != True:
                    if dropdown_state.value != True:
                        IPython.display.clear_output(wait=True)
                        display(input_widgets)
        
    def dropdown_state_eventhandler(change):
        """
        Eventhandler for the state dropdown widget
        """
        display(input_widgets)
        state_choice = change.new
        output_by_state(state_choice, dropdown_question.value, dropdown_stratification.value)
        IPython.display.clear_output(wait=True)            

    def dropdown_question_eventhandler(change):
        """
        Eventhandler for the question dropdown widget
        """
        display(input_widgets)
        question_choice = change.new
        output_by_state(dropdown_state.value, question_choice, dropdown_stratification.value)
        IPython.display.clear_output(wait=True)

    def dropdown_stratification_eventhandler(change):     
        """
        Event handler for the stratification dropdown widget
        """
        display(input_widgets)
        stratification_choice = change.new
        output_by_state(dropdown_state.value, dropdown_question.value, stratification_choice)
        IPython.display.clear_output(wait=True)
            
    dropdown_state.observe(dropdown_state_eventhandler, names='value')
    dropdown_question.observe(dropdown_question_eventhandler, names='value')
    dropdown_stratification.observe(dropdown_stratification_eventhandler, names='value')

    input_widgets = widgets.HBox([dropdown_state, dropdown_question, dropdown_stratification])
    
    display(input_widgets)
    IPython.display.clear_output(wait=True)    

Finally, we are ready to call the interactive dropdown_menu_widget_healthy_aging function. After the initial input choices one can continue to change the inputs in the menu to see graphical outputs for different inputs without having to re-run any cells. 

In [16]:
dropdown_menu_widget_healthy_aging(healthy_aging_data, dataset_analysis_aging_gender, dataset_analysis_aging_race, 
                                   dataset_analysis_aging_overall, questions_list)

AttributeError: 'DataFrame' object has no attribute 'LocationAbbr'

Thank you for reading, 

Diana