# State of Data Science and Machine Learning 2020

## Table of Contents
1. [Introduction](#Introduction)<br>
2. [Demographics](#Demographic)<br>
    2.1. [Geographic Distribution](#Geographic-Distribution)<br>
    2.2. [Age Distribution](#Age-Distribution)<br>
    2.3. [Gender Distribution](#Gender-Distribution)<br>
3. [Making of a Data Scientist](#Making-of-Data-Scientist)<br>
    3.1. [Academic Qualification](#Academic-Qualification)<br>
    3.2. [Data Science Learning Platform](#Data-Science-Learning-Platform)<br>
    3.3. [Programming Experience](#Programming-Experience)<br>
    3.4. [First Programming Language](#First-Programming-Language)<br>
    3.5. [Programming Languages](#Programming-Language)<br>
4. [Being a Data Scientist](#Being-A-Data-Scientist)<br>
    4.1. [Salary](#Salary)<br>
    4.2. [Job Title](#Job-Title)<br>
    4.3. [Daily Activities](#Daily-Activities)<br>
    4.4. [Company Size](#Company-Size)<br>
    4.5. [Data Science Team Size](#Data-Science-Team-Size)<br>
    4.6. [Data Scientist Arsenal](#Data-Scientist-Arsenal)<br>
    4.6.1. [Integrated Development Environments](#Integrated-Development-Environments)<br>
    4.6.2. [Hosted Notebook Products](#Hosted-Notebook-Products)<br>
    4.6.3. [Primary Data Analysis Tools](#Primary-Data-Analysis-Tools)<br>
    4.6.4. [Data Visualization Tools](#Data-Visualization-Tools)<br>
    4.6.5. [Business Intelligence](#Business-Intelligence)<br>
    4.6.5.1. [Primary Business Intelligence Tools](#Primary-Business-Intelligence-Tools)<br>
    4.6.5.2 [Business Intelligence Tools](#Business-Intelligence-Tools)<br>
    4.6.5.3. [Business Intelligence Tools To Learn](#Business-Intelligence-Tools-To-Learn)<br>
    4.6.6. [Big Data](#Big-Data)<br>
    4.6.6.1 [Primary Big Data Products](#Primary-Big-Data-Products)<br>
    4.6.6.2 [Big Data Products](#Big-Data-Products)<br>
    4.6.6.3 [Big Data Products To Learn](#Big-Data-Products-To-Learn)<br>
    4.6.7. [Machine Learning](#Machine-Learning)<br>
    4.6.7.1. [Machine Learning Experience](#Machine-Learning-Experience)<br>
    4.6.7.2. [Primary Machine Learning Framework](#Primary-Machine-Learning-Framework)<br>
    4.6.7.3. [Machine Learning Algorithm](#Machine-Learning-Algorithm)<br>
    4.6.7.4. [Computer Vision Methods](#Computer-Vision-Methods)<br>
    4.6.7.5. [Natural Language Processing Methods](#NLP-Methods)<br>
    4.6.7.6. [Machine Learning in Production](#ML-PROD)<br>
    4.6.7.7. [Machine Learning Repository](#ML-Repository)<br>
    4.6.7.8. [Machine Learning Repository To Learn](#ML-Repository-To-Learn)<br>
    4.6.7.9. [Machine Learning Public Deployment Tools](#ML-Public-Deployment-Tools)<br>
    4.6.8. [Auto Machine Learning](#Auto-ML)<br>
    4.6.8.1. [Auto ML Methods](#Auto-ML-Methods)<br>
    4.6.8.2. [Auto ML Methods To Learn](#Auto-ML-Methods-To-Learn)<br>
    4.6.8.3. [Auto ML Tools](#Auto-ML-Tools)<br>
    4.6.8.4. [Auto ML Tools To Learn](#Auto-ML-Tools-To-Learn)<br>
    4.7. [Computing Environment](#Computing-Environment)<br>
    4.7.1. [Computing Platform](#Computing-Platform)<br>
    4.7.2. [Specialized Hardware](#Specialized-Hardware)<br>
    4.7.3. [TPU Usage](#TPU-Usage)<br>
    4.7.4. [Cloud Environments](#Cloud-Environments)<br>
    4.7.4.1. [Cloud Budget](#Cloud-Budget)<br>
    4.7.4.2. [Cloud Platforms](#Cloud-Platforms)<br>
    4.7.4.3. [Cloud Computing Products](#Cloud-Computing-Products)<br>
    4.7.4.4. [Cloud Platforms To Learn](#Cloud-Platforms-To-Learn)<br>
    4.7.4.5. [Cloud Computing Products To Learn](#Cloud-Computing-Products-To-Learn)


##  <a class="anchor" id="Introduction">1. Introduction</a>

This notebook is an Exploratory Data Analysis (EDA) of the 2020 Kaggle Data Science & Machine Learning survey. The survey was live from 07-October-2020 to 30-October-2020 and captured impressions from 20,036 participants.

The intended audience of this analysis are:
1. Students - to learn the path to become a Data Scientist/ Machine Learning practitioner.
2. Software Developers - to learn the trending frameworks and technologies peers are using.
3. Product Managers - to learn about the costs associated with planing and executing a DS/ML project.
4. Recruiters - to learn the true snapshot of the workforce and make educated requirements in the job posts.

All the visualizations in this notebook can be toggled between the global and specific country level view.

In [None]:
"""
Uncomment the below code if the dependencies are not installed earlier.
"""
'''
!pip install plotly
!pip install pycountry
!pip install ipywidgets
'''

In [None]:
# This Python 3 environment comes with many helpful analytics libraries installed
# It is defined by the kaggle/python Docker image: https://github.com/kaggle/docker-python
# For example, here's several helpful packages to load

import gc

import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)

# Input data files are available in the read-only "../input/" directory
# For example, running this (by clicking run or pressing Shift+Enter) will list all files under the input directory

import os
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))

# You can write up to 20GB to the current directory (/kaggle/working/) that gets preserved as output when you create a version using "Save & Run All" 
# You can also write temporary files to /kaggle/temp/, but they won't be saved outside of the current session

import pycountry

import plotly.graph_objs as go
import plotly.express as px
from ipywidgets import widgets

from plotly.offline import init_notebook_mode, iplot

init_notebook_mode(connected=True)

import warnings
warnings.filterwarnings('ignore')

In [None]:
surveyDF = pd.read_csv('/kaggle/input/kaggle-survey-2020/kaggle_survey_2020_responses.csv',low_memory=False)
surveyDF.drop(0,inplace=True)
surveyDF.reset_index(drop=True,inplace=True)

In [None]:
def verticalHistogramSingleColumn(surveyDF, column, title, xaxis, yaxis, labels, width, height):
    """
    This utility method will process the single column that is passed as a parameter and generate a vertical histogram.
    :param surveyDF: Pandas dataframe generated by reading the survey responses.
    :param column: Column used to construct the Histogram.
    :param title: Default title of the visualization.
    :param xaxis: X-Axis label.
    :param yaxis: Y-Axis label.
    :param labels: Distinct set of values in that are present in the column. The order in which the bins should be
            displayed left to right.
    :param width: Visualization width in pixels.
    :param height: Visualization height in pixels.
    :return: FigureWidget Histogram.
    """
    traceData = []

    counter = 0

    for label in labels:
        if counter % 10 == 0:
            trace = go.Histogram(x=surveyDF[surveyDF[column] == label][column],
                                 name=label,
                                 marker=dict(color='rgba(0,107,164,1)', ))

        if counter % 10 == 1:
            trace = go.Histogram(x=surveyDF[surveyDF[column] == label][column],
                                 name=label,
                                 marker=dict(color='rgba(255,128,14,1)', ))

        if counter % 10 == 2:
            trace = go.Histogram(x=surveyDF[surveyDF[column] == label][column],
                                 name=label,
                                 marker=dict(color='rgba(171,171,171,1)', ))

        if counter % 10 == 3:
            trace = go.Histogram(x=surveyDF[surveyDF[column] == label][column],
                                 name=label,
                                 marker=dict(color='rgba(89,89,89,1)', ))

        if counter % 10 == 4:
            trace = go.Histogram(x=surveyDF[surveyDF[column] == label][column],
                                 name=label,
                                 marker=dict(color='rgba(95,158,209,1)', ))

        if counter % 10 == 5:
            trace = go.Histogram(x=surveyDF[surveyDF[column] == label][column],
                                 name=label,
                                 marker=dict(color='rgba(255,188,121,1)', ))

        if counter % 10 == 6:
            trace = go.Histogram(x=surveyDF[surveyDF[column] == label][column],
                                 name=label,
                                 marker=dict(color='rgba(207,207,207,1)', ))

        if counter % 10 == 7:
            trace = go.Histogram(x=surveyDF[surveyDF[column] == label][column],
                                 name=label,
                                 marker=dict(color='rgba(200,82,0,1)', ))

        if counter % 10 == 8:
            trace = go.Histogram(x=surveyDF[surveyDF[column] == label][column],
                                 name=label,
                                 marker=dict(color='rgba(162,200,236,1)', ))

        if counter % 10 == 9:
            trace = go.Histogram(x=surveyDF[surveyDF[column] == label][column],
                                 name=label,
                                 marker=dict(color='rgba(137,137,137,1)', ))

        counter += 1
        traceData.append(trace)

    layout = go.Layout(
        title=title,
        xaxis=dict(title=xaxis),
        yaxis=dict(title=yaxis),
        barmode='group',
        autosize=True,
        bargap=0.25
    )

    fig = go.FigureWidget(data=traceData, layout=layout)

    fig.update_layout(showlegend=False,
                      autosize=False,
                      width=width,
                      height=height, )

    gc.collect()

    return fig

In [None]:
def verticalHistogramMultipleColumns(surveyDF, columns, title, xaxis, yaxis, width, height):
    """
    This utility method will process the list of columns that are passed as a parameter and generate a vertical
    histogram.
    :param surveyDF: Pandas dataframe generated by reading the survey responses.
    :param columns: Columns used to construct the Histogram.
    :param title: Default title of the visualization.
    :param xaxis: X-Axis label.
    :param yaxis: Y-Axis label.
    :param width: Visualization width in pixels.
    :param height: Visualization height in pixels.
    :return: FigureWidget Histogram.
    """
    traceData = []

    counter = 0

    for column in columns:
        if counter % 10 == 0:
            trace = go.Histogram(x=surveyDF[column].dropna(),
                                 name=column,
                                 marker=dict(color='rgba(0,107,164,1)', ))

        if counter % 10 == 1:
            trace = go.Histogram(x=surveyDF[column].dropna(),
                                 name=column,
                                 marker=dict(color='rgba(255,128,14,1)', ))

        if counter % 10 == 2:
            trace = go.Histogram(x=surveyDF[column].dropna(),
                                 name=column,
                                 marker=dict(color='rgba(171,171,171,1)', ))

        if counter % 10 == 3:
            trace = go.Histogram(x=surveyDF[column].dropna(),
                                 name=column,
                                 marker=dict(color='rgba(89,89,89,1)', ))

        if counter % 10 == 4:
            trace = go.Histogram(x=surveyDF[column].dropna(),
                                 name=column,
                                 marker=dict(color='rgba(95,158,209,1)', ))

        if counter % 10 == 5:
            trace = go.Histogram(x=surveyDF[column].dropna(),
                                 name=column,
                                 marker=dict(color='rgba(255,188,121,1)', ))

        if counter % 10 == 6:
            trace = go.Histogram(x=surveyDF[column].dropna(),
                                 name=column,
                                 marker=dict(color='rgba(207,207,207,1)', ))

        if counter % 10 == 7:
            trace = go.Histogram(x=surveyDF[column].dropna(),
                                 name=column,
                                 marker=dict(color='rgba(200,82,0,1)', ))

        if counter % 10 == 8:
            trace = go.Histogram(x=surveyDF[column].dropna(),
                                 name=column,
                                 marker=dict(color='rgba(162,200,236,1)', ))

        if counter % 10 == 9:
            trace = go.Histogram(x=surveyDF[column].dropna(),
                                 name=column,
                                 marker=dict(color='rgba(137,137,137,1)', ))

        counter += 1
        traceData.append(trace)

    layout = go.Layout(
        title=title,
        xaxis=dict(title=xaxis),
        yaxis=dict(title=yaxis),
        barmode='group',
        autosize=True,
        bargap=0.25
    )

    fig = go.FigureWidget(data=traceData, layout=layout)

    fig.update_layout(showlegend=False,
                      autosize=False,
                      width=width,
                      height=height, )

    gc.collect()

    return fig

In [None]:
def horizontalHistogramSingleColumn(surveyDF, column, title, xaxis, yaxis, labels, width, height):
    """
    This utility method will process the single column that is passed as a parameter and generate a horizontal histogram.
    :param surveyDF: Pandas dataframe generated by reading the survey responses.
    :param column: Column used to construct the Histogram.
    :param title: Default title of the visualization
    :param xaxis: X-Axis label.
    :param yaxis: Y-Axis label.
    :param labels: Distinct set of values in that are present in the column. The order in which the bins should be
            displayed bottom to top.
    :param width: Visualization width in pixels.
    :param height: Visualization height in pixels.
    :return: FigureWidget Histogram.
    """
    traceData = []

    counter = 0

    for label in labels:
        if counter % 10 == 0:
            trace = go.Histogram(y=surveyDF[surveyDF[column] == label][column],
                                 name=label,
                                 marker=dict(color='rgba(0,107,164,1)', ))

        if counter % 10 == 1:
            trace = go.Histogram(y=surveyDF[surveyDF[column] == label][column],
                                 name=label,
                                 marker=dict(color='rgba(255,128,14,1)', ))

        if counter % 10 == 2:
            trace = go.Histogram(y=surveyDF[surveyDF[column] == label][column],
                                 name=label,
                                 marker=dict(color='rgba(171,171,171,1)', ))

        if counter % 10 == 3:
            trace = go.Histogram(y=surveyDF[surveyDF[column] == label][column],
                                 name=label,
                                 marker=dict(color='rgba(89,89,89,1)', ))

        if counter % 10 == 4:
            trace = go.Histogram(y=surveyDF[surveyDF[column] == label][column],
                                 name=label,
                                 marker=dict(color='rgba(95,158,209,1)', ))

        if counter % 10 == 5:
            trace = go.Histogram(y=surveyDF[surveyDF[column] == label][column],
                                 name=label,
                                 marker=dict(color='rgba(255,188,121,1)', ))

        if counter % 10 == 6:
            trace = go.Histogram(y=surveyDF[surveyDF[column] == label][column],
                                 name=label,
                                 marker=dict(color='rgba(207,207,207,1)', ))

        if counter % 10 == 7:
            trace = go.Histogram(y=surveyDF[surveyDF[column] == label][column],
                                 name=label,
                                 marker=dict(color='rgba(200,82,0,1)', ))

        if counter % 10 == 8:
            trace = go.Histogram(y=surveyDF[surveyDF[column] == label][column],
                                 name=label,
                                 marker=dict(color='rgba(162,200,236,1)', ))

        if counter % 10 == 9:
            trace = go.Histogram(y=surveyDF[surveyDF[column] == label][column],
                                 name=label,
                                 marker=dict(color='rgba(137,137,137,1)', ))

        counter += 1
        traceData.append(trace)

    layout = go.Layout(
        title=title,
        xaxis=dict(title=xaxis),
        yaxis=dict(title=yaxis),
        barmode='group',
        autosize=True,
        bargap=0.25
    )

    fig = go.FigureWidget(data=traceData, layout=layout)

    fig.update_layout(showlegend=False, autosize=False,
                      width=width,
                      height=height, )

    gc.collect()

    return fig

In [None]:
def horizontalHistogramMultipleColumns(surveyDF, columns, title, xaxis, yaxis,width,height):
    """
    This utility method will process the list of columns that are passed as a parameter and generate a horizontal 
    histogram. 
    :param surveyDF: Pandas dataframe generated by reading the survey responses.
    :param columns: Columns used to construct the Histogram.
    :param title: Default title of the visualization.
    :param xaxis: X-Axis label.
    :param yaxis: Y-Axis label.
    :param width: Visualization width in pixels.
    :param height: Visualization height in pixels.
    :return: FigureWidget Histogram.
    """
    traceData = []

    counter = 0

    for column in columns:
        if counter % 10 == 0:
            trace = go.Histogram(y=surveyDF[column].dropna(),
                                 name=column,
                                 marker=dict(color='rgba(0,107,164,1)', ))

        if counter % 10 == 1:
            trace = go.Histogram(y=surveyDF[column].dropna(),
                                 name=column,
                                 marker=dict(color='rgba(255,128,14,1)', ))

        if counter % 10 == 2:
            trace = go.Histogram(y=surveyDF[column].dropna(),
                                 name=column,
                                 marker=dict(color='rgba(171,171,171,1)', ))

        if counter % 10 == 3:
            trace = go.Histogram(y=surveyDF[column].dropna(),
                                 name=column,
                                 marker=dict(color='rgba(89,89,89,1)', ))

        if counter % 10 == 4:
            trace = go.Histogram(y=surveyDF[column].dropna(),
                                 name=column,
                                 marker=dict(color='rgba(95,158,209,1)', ))

        if counter % 10 == 5:
            trace = go.Histogram(y=surveyDF[column].dropna(),
                                 name=column,
                                 marker=dict(color='rgba(255,188,121,1)', ))

        if counter % 10 == 6:
            trace = go.Histogram(y=surveyDF[column].dropna(),
                                 name=column,
                                 marker=dict(color='rgba(207,207,207,1)', ))

        if counter % 10 == 7:
            trace = go.Histogram(y=surveyDF[column].dropna(),
                                 name=column,
                                 marker=dict(color='rgba(200,82,0,1)', ))

        if counter % 10 == 8:
            trace = go.Histogram(y=surveyDF[column].dropna(),
                                 name=column,
                                 marker=dict(color='rgba(162,200,236,1)', ))

        if counter % 10 == 9:
            trace = go.Histogram(y=surveyDF[column].dropna(),
                                 name=column,
                                 marker=dict(color='rgba(137,137,137,1)', ))

        counter += 1
        traceData.append(trace)

    layout = go.Layout(
        title=title,
        xaxis=dict(title=xaxis),
        yaxis=dict(title=yaxis),
        barmode='group',
        autosize=True,
        bargap=0.25
    )

    fig = go.FigureWidget(data=traceData, layout=layout)

    fig.update_layout(showlegend=False,autosize=False,
                      width=width,
                      height=height,)

    gc.collect()

    return fig

In [None]:
def donutPie(surveyDF, column, title, colors, pull, hole, width, height):
    """
    This utility method will process the single column that is passed as a parameter and generate a pie chart.
    :param surveyDF: Pandas dataframe generated by reading the survey responses.
    :param column: Column used to construct the Histogram.
    :param title: Default title of the visualization.
    :param colors: list of RGB colors used in the Pie visualization.
    :param pull: Categorical pull.
    :param hole: Size of the center hole.
    :param width: Visualization width in pixels.
    :param height: Visualization height in pixels.
    :return: Pie FigureWidget
    """

    labels = surveyDF[column].value_counts().index
    values = surveyDF[column].value_counts().values

    trace = go.Pie(labels=labels,
                   values=values,
                   hole=hole,
                   textinfo='label+percent',
                   pull=pull,
                   marker_colors=colors)

    layout = go.Layout(
        title=title
    )

    fig = go.FigureWidget(data=[trace],
                          layout=layout)

    fig.update_layout(showlegend=False,
                      autosize=False,
                      width=width,
                      height=height, )

    gc.collect()

    return fig


##  <a class="anchor" id="Demographic">2. Demographics</a>
###  <a class="anchor" id="Geographic-Distribution">2.1. Geographic Distribution</a>

In [None]:
countries = sorted(set(np.append(surveyDF.Q3.values, "All")))


def getCountryISODictionary():
    """
    This method will add the alpha_3 code for the missing entries in pycountry.countries.
    :return: Dictionary of country name along with the alpha 3 code.
    """
    countries = {}

    for country in pycountry.countries:
        countries[country.name] = country.alpha_3

    countries['Taiwan'] = 'ZHO'
    countries['Iran'] = 'IRN'
    countries['United Kingdom of Great Britain and Northern Ireland'] = 'GBR'
    countries['South Korea'] = 'KOR'
    countries['Republic of Korea'] = 'KOR'
    countries['United States of America'] = 'USA'
    countries['Russia'] = 'RUS'

    return countries


def createLocationDF(countries, surveyDF):
    """
    This method will create locationDF with country name, alpha 3 code and the number of participants are from the
    country.
    :param countries: Dictionary of country name along with the alpha 3 code.
    :param surveyDF: Pandas dataframe generated by reading the survey responses.
    :return: Pandas dataframe with country name, alpha 3 code and the number of participants are from the
    country.
    """
    locationDF = surveyDF[['Q3']]

    locationDF.drop(locationDF[locationDF.Q3 == 'Other'].index, inplace=True)
    locationDF.reset_index(drop=True, inplace=True)

    locationDF.loc[locationDF.Q3 == 'Iran, Islamic Republic of...', 'Q3'] = 'Iran'

    for country in set(locationDF.Q3.values):
        locationDF.loc[locationDF.Q3 == country, 'alpha_3'] = countries[country]

    locationDF['country'] = locationDF['Q3']

    locationDF = locationDF.groupby(['country', 'alpha_3'])['Q3'].count().reset_index(name="count")

    return locationDF


def geoDistribution(surveyDF):
    """
    This method will create the geographical distribution visualization.
    :param surveyDF: Pandas dataframe generated by reading the survey responses.
    :return: None.
    """
    countries = getCountryISODictionary()

    locationDF = createLocationDF(countries, surveyDF)

    trace = go.Choropleth(
        locations=locationDF['alpha_3'],
        z=locationDF['count'],
        text=locationDF['country'],
        colorscale='Blues',
        autocolorscale=False,
        marker_line_color='darkgray',
        marker_line_width=0.5,
        colorbar_title='Number of participants', )

    layout = go.Layout(title='Participants Geographic Distribution',
                       geo=dict(
                           showframe=False,
                           showcoastlines=False,
                           projection_type='equirectangular'
                       ))

    fig = go.Figure(data=[trace], layout=layout)

    iplot(fig)

    del locationDF
    gc.collect()


geoDistribution(surveyDF)

###  <a class="anchor" id="Age-Distribution">2.2. Age Distribution</a>

In [None]:
ageDistributionLabels = sorted(surveyDF['Q1'].value_counts().index)

ageDistributionFig = verticalHistogramSingleColumn(surveyDF,
                                                   column='Q1',
                                                   title='Participants Age Distribution',
                                                   xaxis='Age',
                                                   yaxis='Count',
                                                   labels=ageDistributionLabels,
                                                   width=990,
                                                   height=600)

ageDistributionTextbox = widgets.Dropdown(
    description='Country:   ',
    value='All',
    options=countries
)

ageDistributionContainer = widgets.HBox(children=[ageDistributionTextbox])


def ageDistributionResponse(change):
    if (ageDistributionTextbox.value == 'All'):
        df = surveyDF
    else:
        df = surveyDF[surveyDF['Q3'] == ageDistributionTextbox.value][['Q1']].reset_index(drop=True)

    with ageDistributionFig.batch_update():
        for i in range(len(ageDistributionFig.data)):
            ageDistributionFig.data[i].x = df[df['Q1'] == ageDistributionFig.data[i].name]['Q1'].values
        if (ageDistributionTextbox.value == 'All'):
            ageDistributionFig.layout.title = dict(text="Participants Age Distribution")
        else:
            ageDistributionFig.layout.title = dict(text=ageDistributionTextbox.value + " Participants Age Distribution")


ageDistributionTextbox.observe(ageDistributionResponse, names="value")

widgets.VBox([ageDistributionContainer,
              ageDistributionFig])

###  <a class="anchor" id="Gender-Distribution">2.3. Gender Distribution</a>

In [None]:
colors = ['rgba(171,171,171,1)', 'rgba(95,158,209,1)', 'rgba(0,107,164,1)', 'rgba(255,128,14,1)', 'rgba(89,89,89,1)']

pull = [0.2, 0.2, 0.2, 0.2, 0.2]

genderDistributionFig = donutPie(surveyDF, column='Q2', title="Participants Gender Distribution", colors=colors,
                                 pull=pull, hole=.3, width=990, height=600)

genderDistributionTextBox = widgets.Dropdown(
    description='Country:   ',
    value='All',
    options=countries
)

genderDistributionContainer = widgets.HBox(children=[genderDistributionTextBox])


def genderDistributionResponse(change):
    if (genderDistributionTextBox.value == 'All'):
        df = surveyDF
    else:
        df = surveyDF[surveyDF['Q3'] == genderDistributionTextBox.value][['Q2']].reset_index(drop=True)

    with genderDistributionFig.batch_update():
        values = df['Q2'].value_counts().values
        genderDistributionFig.data[0].values = values
        if (genderDistributionTextBox.value == 'All'):
            genderDistributionFig.layout.title = dict(
                text="Participants Gender Distribution")
        else:
            genderDistributionFig.layout.title = dict(
                text=genderDistributionTextBox.value + " Participants Gender Distribution")


genderDistributionTextBox.observe(genderDistributionResponse, names="value")

widgets.VBox([genderDistributionContainer,
              genderDistributionFig])

###  <a class="anchor" id="Making-of-Data-Scientist">3. Making of a Data Scientist</a>

###  <a class="anchor" id="Academic-Qualification">3.1. Academic Qualification</a>

Participants are surveyed on the highest level of formal education that they have attained or plan to attain within the next 2 years with the following options to choose from:

* No formal education past high school
* Some college/university study without earning a bachelor’s degreeBachelor’s degree
* Master’s degree
* Doctoral degree
* Professional degree
* I prefer not to answer 

In [None]:
surveyDF['Q4'] = surveyDF['Q4'].str.strip()

surveyDF.loc[
    surveyDF.Q4 == 'Some college/university study without earning a bachelor’s degree',
    'Q4'
] = 'Some college/university study without<br>earning a bachelor’s degree'

academicQualificationLabels = ['I prefer not to answer', 'No formal education past high school',
                               'Some college/university study without<br>earning a bachelor’s degree',
                               'Professional degree',
                               'Bachelor’s degree', 'Master’s degree', 'Doctoral degree']

academicQualificationDistributionFig = horizontalHistogramSingleColumn(surveyDF, column='Q4',
                                                                       title='Participants Academic Qualification',
                                                                       xaxis='Count',
                                                                       yaxis='Qualification',
                                                                       labels=academicQualificationLabels, width=990,
                                                                       height=600)

academicQualificationDistributionTextbox = widgets.Dropdown(
    description='Country:   ',
    value='All',
    options=countries
)

academicQualificationDistributionContainer = widgets.HBox(children=[academicQualificationDistributionTextbox])


def academicQualificationDistributionResponse(change):
    if (academicQualificationDistributionTextbox.value == 'All'):
        df = surveyDF
    else:
        df = surveyDF[surveyDF['Q3'] == academicQualificationDistributionTextbox.value][['Q4']].reset_index(drop=True)

    with academicQualificationDistributionFig.batch_update():
        for i in range(len(academicQualificationDistributionFig.data)):
            academicQualificationDistributionFig.data[i].y = \
                df[df['Q4'] == academicQualificationDistributionFig.data[i].name]['Q4'].values
        if (academicQualificationDistributionTextbox.value == 'All'):
            academicQualificationDistributionFig.layout.title = dict(
                text="Participants Academic Qualification")
        else:
            academicQualificationDistributionFig.layout.title = dict(
                text=academicQualificationDistributionTextbox.value + " Participants Academic Qualification")


academicQualificationDistributionTextbox.observe(academicQualificationDistributionResponse, names="value")

widgets.VBox([academicQualificationDistributionContainer,
              academicQualificationDistributionFig])

###  <a class="anchor" id="Data-Science-Learning-Platform">3.2. Data Science Learning Platform</a>

Participants are asked to select the platforms on which they have begun or completed data science courses? (Select all that apply)
* Coursera
* edX
* Kaggle Learn Courses
* DataCamp
* Fast.ai
* Udacity
* Udemy
* LinkedIn Learning
* Cloud-certification programs (direct from AWS, Azure, GCP, or similar)
* University Courses (resulting in a university degree)
* None
* Other

In [None]:
dsLearningPlatformColumns = ['Q37_Part_1', 'Q37_Part_2', 'Q37_Part_3', 'Q37_Part_4', 'Q37_Part_5', 'Q37_Part_6',
                             'Q37_Part_7',
                             'Q37_Part_8',
                             'Q37_Part_9', 'Q37_Part_10', 'Q37_Part_11', 'Q37_OTHER']

surveyDF['Q37_Part_9'] = surveyDF['Q37_Part_9'].str.strip()
surveyDF['Q37_Part_10'] = surveyDF['Q37_Part_10'].str.strip()
surveyDF.loc[
    surveyDF.Q37_Part_9 == 'Cloud-certification programs (direct from AWS, Azure, GCP, or similar)'
    , 'Q37_Part_9'] = 'Cloud-certification programs<br>(direct from AWS, Azure, GCP, or similar)'
surveyDF.loc[
    surveyDF.Q37_Part_10 == 'University Courses (resulting in a university degree)'
    , 'Q37_Part_10'] = 'University Courses<br>(resulting in a university degree)'

dsLearningPlatformDistributionFig = horizontalHistogramMultipleColumns(surveyDF, columns=dsLearningPlatformColumns,
                                                                       title='Data Science Learning Platform',
                                                                       xaxis='Count',
                                                                       yaxis='Data Science Learning Platform',
                                                                       width=990, height=800)

dsLearningPlatformDistributionTextbox = widgets.Dropdown(
    description='Country:   ',
    value='All',
    options=countries
)

dsLearningPlatformDistributionContainer = widgets.HBox(children=[dsLearningPlatformDistributionTextbox])


def dsLearningPlatformDistributionResponse(change):
    if (dsLearningPlatformDistributionTextbox.value == 'All'):
        df = surveyDF
    else:
        df = surveyDF[surveyDF['Q3'] == dsLearningPlatformDistributionTextbox.value][
            dsLearningPlatformColumns].reset_index(drop=True)

    with dsLearningPlatformDistributionFig.batch_update():
        for i in range(len(dsLearningPlatformDistributionFig.data)):
            dsLearningPlatformDistributionFig.data[i].y = df[dsLearningPlatformDistributionFig.data[i].name].dropna()
        if (dsLearningPlatformDistributionTextbox.value == 'All'):
            dsLearningPlatformDistributionFig.layout.title = dict(text="Data Science Learning Platform")
        else:
            dsLearningPlatformDistributionFig.layout.title = dict(
                text=dsLearningPlatformDistributionTextbox.value + " Data Science Learning Platform")


dsLearningPlatformDistributionTextbox.observe(dsLearningPlatformDistributionResponse, names="value")

widgets.VBox([dsLearningPlatformDistributionContainer,
              dsLearningPlatformDistributionFig])

###  <a class="anchor" id="Programming-Experience">3.3. Programming Experience</a>

Participants are asked about their coding/programming experience.
* I have never written code
* < 1 years
* 1-2 years
* 3-5 years
* 5-10 years
* 10-20 years
* 20+ years

In [None]:
codingExperience = ['I have never written code', '< 1 years', '1-2 years', '3-5 years',
                    '5-10 years', '10-20 years', '20+ years', ]

codingExperienceDistributionFig = verticalHistogramSingleColumn(surveyDF, column='Q6',
                                                                title='Participants Coding Experience',
                                                                xaxis='Experience', yaxis='Count',
                                                                labels=codingExperience, width=990, height=600)

codingExperienceDistributionTextbox = widgets.Dropdown(
    description='Country:   ',
    value='All',
    options=countries
)

codingExperienceDistributionContainer = widgets.HBox(children=[codingExperienceDistributionTextbox])


def codingExperienceDistributionResponse(change):
    if (codingExperienceDistributionTextbox.value == 'All'):
        df = surveyDF
    else:
        df = surveyDF[surveyDF['Q3'] == codingExperienceDistributionTextbox.value][['Q6']].reset_index(drop=True)

    with codingExperienceDistributionFig.batch_update():
        for i in range(len(codingExperienceDistributionFig.data)):
            codingExperienceDistributionFig.data[i].x = df[df['Q6'] == codingExperienceDistributionFig.data[i].name][
                'Q6'].values
        if (codingExperienceDistributionTextbox.value == 'All'):
            codingExperienceDistributionFig.layout.title = dict(
                text="Participants Coding Experience")
        else:
            codingExperienceDistributionFig.layout.title = dict(
                text=codingExperienceDistributionTextbox.value + " Participants Coding Experience")


codingExperienceDistributionTextbox.observe(codingExperienceDistributionResponse, names="value")

widgets.VBox([codingExperienceDistributionContainer,
              codingExperienceDistributionFig])

###  <a class="anchor" id="First-Programming-Language">3.4. First Programming Language</a>

Participants are asked to select the programming language would they would recommend an aspiring data scientist to learn first?
* Python
* R
* SQL
* C
* C++
* Java
* Javascript
* Julia
* Swift
* Bash
* MATLAB
* None
* Other

In [None]:
programmingLanguageLabels = sorted(surveyDF['Q8'].value_counts().index)

firstProgrammingLanguageDistributionFig = verticalHistogramSingleColumn(surveyDF, column='Q8',
                                                                        title='Programming Language For Aspiring Data Scientist',
                                                                        xaxis='Programming Language', yaxis='Count',
                                                                        labels=programmingLanguageLabels, width=990,
                                                                        height=600)

firstProgrammingLanguageDistributionTextbox = widgets.Dropdown(
    description='Country:   ',
    value='All',
    options=countries
)

firstProgrammingLanguageDistributionContainer = widgets.HBox(children=[firstProgrammingLanguageDistributionTextbox])


def firstProgrammingLanguageDistributionResponse(change):
    if (firstProgrammingLanguageDistributionTextbox.value == 'All'):
        df = surveyDF
    else:
        df = surveyDF[surveyDF['Q3'] == firstProgrammingLanguageDistributionTextbox.value][['Q8']].reset_index(
            drop=True)

    with firstProgrammingLanguageDistributionFig.batch_update():
        for i in range(len(firstProgrammingLanguageDistributionFig.data)):
            firstProgrammingLanguageDistributionFig.data[i].x = \
                df[df['Q8'] == firstProgrammingLanguageDistributionFig.data[i].name]['Q8'].values
        if (firstProgrammingLanguageDistributionTextbox.value == 'All'):
            firstProgrammingLanguageDistributionFig.layout.title = dict(
                text="Programming Language For Aspiring Data Scientist")
        else:
            firstProgrammingLanguageDistributionFig.layout.title = dict(
                text=firstProgrammingLanguageDistributionTextbox.value + " Programming Language For Aspiring Data Scientist")


firstProgrammingLanguageDistributionTextbox.observe(firstProgrammingLanguageDistributionResponse, names="value")

widgets.VBox([firstProgrammingLanguageDistributionContainer,
              firstProgrammingLanguageDistributionFig])

###  <a class="anchor" id="Programming-Language">3.5. Programming Languages</a>

Participants are asked to select the programming languages that they used on a regular basis? (Select all that apply)
* Python 
* R
* SQL 
* C
* C++
* Java
* Javascript 
* Julia
* Swift
* Bash
* MATLAB 
* None
* Other

In [None]:
programmingLanguageColumns = ['Q7_Part_1', 'Q7_Part_2', 'Q7_Part_3', 'Q7_Part_4', 'Q7_Part_5', 'Q7_Part_6', 'Q7_Part_7',
                              'Q7_Part_8',
                              'Q7_Part_9', 'Q7_Part_10', 'Q7_Part_11', 'Q7_Part_12', 'Q7_OTHER']

programmingLanguageDistributionFig = verticalHistogramMultipleColumns(surveyDF, columns=programmingLanguageColumns,
                                                                      title='Programming Language',
                                                                      xaxis='Programming Language', yaxis='Count',
                                                                      width=990, height=600)

programmingLanguageDistributionTextbox = widgets.Dropdown(
    description='Country:   ',
    value='All',
    options=countries
)

programmingLanguageDistributionContainer = widgets.HBox(children=[programmingLanguageDistributionTextbox])


def programmingLanguageDistributionResponse(change):
    if (programmingLanguageDistributionTextbox.value == 'All'):
        df = surveyDF
    else:
        df = surveyDF[surveyDF['Q3'] == programmingLanguageDistributionTextbox.value][
            programmingLanguageColumns].reset_index(drop=True)

    with programmingLanguageDistributionFig.batch_update():
        for i in range(len(programmingLanguageDistributionFig.data)):
            programmingLanguageDistributionFig.data[i].x = df[programmingLanguageDistributionFig.data[i].name].dropna()

        if (programmingLanguageDistributionTextbox.value == 'All'):
            programmingLanguageDistributionFig.layout.title = dict(text="Programming Language")
        else:
            programmingLanguageDistributionFig.layout.title = dict(
                text=programmingLanguageDistributionTextbox.value + " Programming Language")


programmingLanguageDistributionTextbox.observe(programmingLanguageDistributionResponse, names="value")

widgets.VBox([programmingLanguageDistributionContainer,
              programmingLanguageDistributionFig])

###  <a class="anchor" id="Being-A-Data-Scientist">4. Being a Data Scientist</a>
###  <a class="anchor" id="Salary">4.1. Salary</a>

In [None]:
salaryLabels = ['$0-999', '1,000-1,999', '2,000-2,999', '3,000-3,999', '4,000-4,999', '5,000-7,499',
                '7,500-9,999', '10,000-14,999', '15,000-19,999', '20,000-24,999', '25,000-29,999',
                '30,000-39,999', '40,000-49,999', '50,000-59,999', '60,000-69,999', '70,000-79,999',
                '80,000-89,999', '90,000-99,999', '100,000-124,999', '125,000-149,999', '150,000-199,999',
                '200,000-249,999', '250,000-299,999', '300,000-500,000', '> $500,000']

salaryDistributionFig = horizontalHistogramSingleColumn(surveyDF, column='Q24', title='Participants Salary',
                                                        xaxis='Count',
                                                        yaxis='Salary', labels=salaryLabels, width=990, height=1000)

salaryDistributionTextbox = widgets.Dropdown(
    description='Country:   ',
    value='All',
    options=countries
)

salaryDistributionContainer = widgets.HBox(children=[salaryDistributionTextbox])


def salaryDistributionResponse(change):
    if (salaryDistributionTextbox.value == 'All'):
        df = surveyDF
    else:
        df = surveyDF[surveyDF['Q3'] == salaryDistributionTextbox.value][['Q24']].reset_index(drop=True)

    with salaryDistributionFig.batch_update():
        for i in range(len(salaryDistributionFig.data)):
            salaryDistributionFig.data[i].y = df[df['Q24'] == salaryDistributionFig.data[i].name]['Q24'].values
        if (salaryDistributionTextbox.value == 'All'):
            salaryDistributionFig.layout.title = dict(text="Participants Salary")
        else:
            salaryDistributionFig.layout.title = dict(text=salaryDistributionTextbox.value + " Participants Salary")


salaryDistributionTextbox.observe(salaryDistributionResponse, names="value")

widgets.VBox([salaryDistributionContainer,
              salaryDistributionFig])

###  <a class="anchor" id="Job-Title">4.2. Job Title</a>
Participants are asked to select the title most similar to their current role (or most recent title if retired):
* Business Analyst
* Data Analyst
* Data Engineer
* Data Scientist
* DBA/Database Engineer
* Machine Learning Engineer
* Product/Project Manager
* Research Scientist
* Software Engineer
* Statistician
* Student
* Currently not employed
* Other 

In [None]:
colors = ['rgba(171,171,171,1)', 'rgba(95,158,209,1)', 'rgba(0,107,164,1)', 'rgba(255,128,14,1)',
          'rgba(89,89,89,1)', 'rgba(255,188,121,1)', 'rgba(207,207,207,1)', 'rgba(200,82,0,1)',
          'rgba(162,200,236,1)', 'rgba(137,137,137,1)', 'rgba(171,171,171,1)', 'rgba(95,158,209,1)',
          'rgba(0,107,164,1)']

pull = [0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2]

jobTitleDistributionFig = donutPie(surveyDF, column='Q5', title='Participants Job Title', colors=colors, pull=pull,
                                   hole=.0, width=990, height=600)

jobTitleDistributionTextBox = widgets.Dropdown(
    description='Country:   ',
    value='All',
    options=countries
)

jobTitleDistributionContainer = widgets.HBox(children=[jobTitleDistributionTextBox])


def jobTitleDistributionResponse(change):
    if (jobTitleDistributionTextBox.value == 'All'):
        df = surveyDF
    else:
        df = surveyDF[surveyDF['Q3'] == jobTitleDistributionTextBox.value][['Q5']].reset_index(drop=True)

    with jobTitleDistributionFig.batch_update():
        values = df['Q5'].value_counts().values
        jobTitleDistributionFig.data[0].values = values
        if (jobTitleDistributionTextBox.value == 'All'):
            jobTitleDistributionFig.layout.title = dict(
                text="Participants Job Title")
        else:
            jobTitleDistributionFig.layout.title = dict(
                text=jobTitleDistributionTextBox.value + " Participants Job Title")


jobTitleDistributionTextBox.observe(jobTitleDistributionResponse, names="value")

widgets.VBox([jobTitleDistributionContainer,
              jobTitleDistributionFig])

###  <a class="anchor" id="Daily-Activities">4.3. Daily Activities</a>

Participants are asked to select the activities that make up an important part of your role at work: (Select all that apply)
* Analyze and understand data to influence product or business decisions
* Build and/or run the data infrastructure that my business uses for storing, analyzing, and
operationalizing data
* Build prototypes to explore applying machine learning to new areas
* Build and/or run a machine learning service that operationally improves my product or
workflows
* Experimentation and iteration to improve existing ML models
* Do research that advances the state of the art of machine learning
* None of these activities are an important part of my role at work
* Other

In [None]:
dailyActivityColumns = ['Q23_Part_1', 'Q23_Part_2', 'Q23_Part_3', 'Q23_Part_4', 'Q23_Part_5', 'Q23_Part_6',
                        'Q23_Part_7',
                        'Q23_OTHER']

surveyDF['Q23_Part_1'] = surveyDF['Q23_Part_1'].str.strip()
surveyDF['Q23_Part_2'] = surveyDF['Q23_Part_2'].str.strip()
surveyDF['Q23_Part_3'] = surveyDF['Q23_Part_3'].str.strip()
surveyDF['Q23_Part_4'] = surveyDF['Q23_Part_4'].str.strip()
surveyDF['Q23_Part_5'] = surveyDF['Q23_Part_5'].str.strip()
surveyDF['Q23_Part_6'] = surveyDF['Q23_Part_6'].str.strip()
surveyDF['Q23_Part_7'] = surveyDF['Q23_Part_7'].str.strip()

surveyDF.loc[
    surveyDF.Q23_Part_1 == 'Analyze and understand data to influence product or business decisions',
    'Q23_Part_1'
] = 'Analyze and understand data to influence<br>product or business decisions'

surveyDF.loc[
    surveyDF.Q23_Part_2 == 'Build and/or run the data infrastructure that my business uses for storing, analyzing, and operationalizing data',
    'Q23_Part_2'
] = 'Build and/or run the data infrastructure that<br>my business uses for storing, analyzing,<br>and operationalizing data'

surveyDF.loc[
    surveyDF.Q23_Part_3 == 'Build prototypes to explore applying machine learning to new areas',
    'Q23_Part_3'
] = 'Build prototypes to explore applying<br>machine learning to new areas'

surveyDF.loc[
    surveyDF.Q23_Part_4 == 'Build and/or run a machine learning service that operationally improves my product or workflows',
    'Q23_Part_4'
] = 'Build and/or run a machine learning service<br>that operationally improves my product or workflows'

surveyDF.loc[
    surveyDF.Q23_Part_5 == 'Experimentation and iteration to improve existing ML models',
    'Q23_Part_5'
] = 'Experimentation and iteration to<br>improve existing ML models'

surveyDF.loc[
    surveyDF.Q23_Part_6 == 'Do research that advances the state of the art of machine learning',
    'Q23_Part_6'
] = 'Do research that advances the state<br>of the art of machine learning'

surveyDF.loc[
    surveyDF.Q23_Part_7 == 'None of these activities are an important part of my role at work',
    'Q23_Part_7'
] = 'None of these activities are an<br>important part of my role at work'

dailyActivityDistributionFig = horizontalHistogramMultipleColumns(surveyDF, columns=dailyActivityColumns,
                                                                  title='Daily Activities',
                                                                  xaxis='Count',
                                                                  yaxis='Daily Activity', width=990, height=600)

dailyActivityDistributionTextbox = widgets.Dropdown(
    description='Country:   ',
    value='All',
    options=countries
)

dailyActivityDistributionContainer = widgets.HBox(children=[dailyActivityDistributionTextbox])


def dailyActivityDistributionResponse(change):
    if (dailyActivityDistributionTextbox.value == 'All'):
        df = surveyDF
    else:
        df = surveyDF[surveyDF['Q3'] == dailyActivityDistributionTextbox.value][dailyActivityColumns].reset_index(
            drop=True)

    with dailyActivityDistributionFig.batch_update():
        for i in range(len(dailyActivityDistributionFig.data)):
            dailyActivityDistributionFig.data[i].y = df[dailyActivityDistributionFig.data[i].name].dropna()
        if (dailyActivityDistributionTextbox.value == 'All'):
            dailyActivityDistributionFig.layout.title = dict(text="Daily Activities")
        else:
            dailyActivityDistributionFig.layout.title = dict(
                text=dailyActivityDistributionTextbox.value + " Daily Activities")


dailyActivityDistributionTextbox.observe(dailyActivityDistributionResponse, names="value")

widgets.VBox([dailyActivityDistributionContainer,
              dailyActivityDistributionFig])

###  <a class="anchor" id="Company-Size">4.4. Company Size</a>
Participants are asked to select the size of the company that they are employed?
* 0-49 employees
* 50-249 employees 
* 250-999 employees 
* 1000-9,999 employees 
* 10,000 or more employees

In [None]:
colors = ['rgba(95,158,209,1)', 'rgba(0,107,164,1)', 'rgba(255,128,14,1)', 'rgba(89,89,89,1)', 'rgba(171,171,171,1)', ]

pull = [0.2, 0.2, 0.2, 0.2, 0.2]

employeeCountDistributionFig = donutPie(surveyDF, column='Q20', title='Participants Company Size',
                                        colors=colors, pull=pull, hole=.0, width=990, height=600)

employeeCountDistributionTextBox = widgets.Dropdown(
    description='Country:   ',
    value='All',
    options=countries
)

employeeCountDistributionContainer = widgets.HBox(children=[employeeCountDistributionTextBox])


def employeeCountDistributionResponse(change):
    if (employeeCountDistributionTextBox.value == 'All'):
        df = surveyDF
    else:
        df = surveyDF[surveyDF['Q3'] == employeeCountDistributionTextBox.value][['Q20']].reset_index(drop=True)

    with employeeCountDistributionFig.batch_update():
        values = df['Q20'].value_counts().values
        employeeCountDistributionFig.data[0].values = values
        if (employeeCountDistributionTextBox.value == 'All'):
            employeeCountDistributionFig.layout.title = dict(
                text="Participants Company Size")
        else:
            employeeCountDistributionFig.layout.title = dict(
                text=employeeCountDistributionTextBox.value + " Participants Company Size")


employeeCountDistributionTextBox.observe(employeeCountDistributionResponse, names="value")

widgets.VBox([employeeCountDistributionContainer,
              employeeCountDistributionFig])

###  <a class="anchor" id="Data-Science-Team-Size">4.5. Data Science Team Size</a>

Participants are asked to select approximately how many individuals are responsible for data science workloads?
* 0
* 1-2
* 3-4
* 5-9
* 10-14
* 15-19
* 20+

In [None]:
colors = ['rgba(171,171,171,1)', 'rgba(95,158,209,1)', 'rgba(0,107,164,1)', 'rgba(255,128,14,1)', 'rgba(89,89,89,1)',
          'rgba(255,188,121,1)', 'rgba(207,207,207,1)', ]

pull = [0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2]

dsTeamSizeDistributionFig = donutPie(surveyDF, column='Q21', title='Participants Data Science Team Size',
                                     colors=colors, pull=pull, hole=.0, width=990, height=600)

dsTeamSizeDistributionTextBox = widgets.Dropdown(
    description='Country:   ',
    value='All',
    options=countries
)

dsTeamSizeDistributionContainer = widgets.HBox(children=[dsTeamSizeDistributionTextBox])


def dsTeamSizeDistributionResponse(change):
    if (dsTeamSizeDistributionTextBox.value == 'All'):
        df = surveyDF
    else:
        df = surveyDF[surveyDF['Q3'] == dsTeamSizeDistributionTextBox.value][['Q21']].reset_index(drop=True)

    with dsTeamSizeDistributionFig.batch_update():
        values = df['Q21'].value_counts().values
        dsTeamSizeDistributionFig.data[0].values = values
        if (dsTeamSizeDistributionTextBox.value == 'All'):
            dsTeamSizeDistributionFig.layout.title = dict(
                text="Participants Data Science Team Size")
        else:
            dsTeamSizeDistributionFig.layout.title = dict(
                text=dsTeamSizeDistributionTextBox.value + " Participants Data Science Team Size")


dsTeamSizeDistributionTextBox.observe(dsTeamSizeDistributionResponse, names="value")

widgets.VBox([dsTeamSizeDistributionContainer,
              dsTeamSizeDistributionFig])

###  <a class="anchor" id="Data-Scientist-Arsenal">4.6. Data Scientist Arsenal</a>
###  <a class="anchor" id="Integrated-Development-Environments">4.6.1. Integrated Development Environments</a>

Participants are asked to select the integrated development environments (IDE's) they use on a regular basis? (Select all that apply)
* __[JupyterLab (or products based off of Jupyter)](https://jupyter.org/)__
* __[RStudio](https://rstudio.com/)__
* __[Visual Studio](https://visualstudio.microsoft.com/)__
* __[Visual Studio Code (VSCode)](https://code.visualstudio.com/)__
* __[PyCharm](https://www.jetbrains.com/pycharm/)__
* __[Spyder](https://www.spyder-ide.org/)__
* __[Notepad++](https://notepad-plus-plus.org/)__
* __[Sublime Text](https://www.sublimetext.com/)__
* __[Vim, Emacs, or similar](https://www.vim.org/)__
* __[MATLAB](https://www.mathworks.com/products/matlab.html)__
* None
* Other

In [None]:
ideColumns = ['Q9_Part_1', 'Q9_Part_2', 'Q9_Part_3', 'Q9_Part_4', 'Q9_Part_5', 'Q9_Part_6', 'Q9_Part_7', 'Q9_Part_8',
              'Q9_Part_9', 'Q9_Part_10', 'Q9_Part_11', 'Q9_OTHER']

surveyDF['Q9_Part_1'] = surveyDF['Q9_Part_1'].str.strip()
surveyDF['Q9_Part_4'] = surveyDF['Q9_Part_4'].str.strip()

surveyDF.loc[
    surveyDF.Q9_Part_1 == 'Jupyter (JupyterLab, Jupyter Notebooks, etc)',
    'Q9_Part_1'
] = 'Jupyter<br>(JupyterLab, Jupyter Notebooks, etc)'

surveyDF.loc[
    surveyDF.Q9_Part_4 == 'Visual Studio Code (VSCode)',
    'Q9_Part_4'
] = 'Visual Studio Code<br>(VSCode)'

ideDistributionFig = horizontalHistogramMultipleColumns(surveyDF, columns=ideColumns, title='IDE Usage', xaxis='Count',
                                                        yaxis='IDE', width=990, height=600)

ideDistributionTextbox = widgets.Dropdown(
    description='Country:   ',
    value='All',
    options=countries
)

ideDistributionContainer = widgets.HBox(children=[ideDistributionTextbox])


def ideDistributionResponse(change):
    if (ideDistributionTextbox.value == 'All'):
        df = surveyDF
    else:
        df = surveyDF[surveyDF['Q3'] == ideDistributionTextbox.value][ideColumns].reset_index(drop=True)

    with ideDistributionFig.batch_update():
        for i in range(len(ideDistributionFig.data)):
            ideDistributionFig.data[i].y = df[ideDistributionFig.data[i].name].dropna()
        if (ideDistributionTextbox.value == 'All'):
            ideDistributionFig.layout.title = dict(text="IDE Usage")
        else:
            ideDistributionFig.layout.title = dict(text=ideDistributionTextbox.value + " IDE Usage")


ideDistributionTextbox.observe(ideDistributionResponse, names="value")

widgets.VBox([ideDistributionContainer,
              ideDistributionFig])

###  <a class="anchor" id="Hosted-Notebook-Products">4.6.2. Hosted Notebook Products</a>

Participants are asked to select the hosted notebook products they have used on a regular basis? (Select all that apply)
* __[Kaggle Notebooks](https://www.kaggle.com/notebooks/)__
* __[Colab Notebooks](https://colab.research.google.com/notebooks/intro.ipynb#recent=true)__
* __[Azure Notebooks](https://notebooks.azure.com/)__
* __[Paperspace / Gradient](https://gradient.paperspace.com/)__
* __[Binder / JupyterHub](https://mybinder.org/)__
* __[Code Ocean](https://codeocean.com/)__
* __[IBM Watson Studio](https://www.ibm.com/cloud/watson-studio)__
* __[Amazon Sagemaker Studio](https://docs.aws.amazon.com/sagemaker/latest/dg/notebooks.html)__
* __[Amazon EMR Notebooks](https://docs.aws.amazon.com/emr/latest/ManagementGuide/emr-managed-notebooks.html)__
* __[Google Cloud AI Platform Notebooks](https://cloud.google.com/ai-platform-notebooks/)__
* __[Google Cloud Datalab Notebooks](https://cloud.google.com/datalab/docs/how-to/working-with-notebooks/)__
* __[Databricks Collaborative Notebooks](https://databricks.com/product/collaborative-notebooks)__
* None
* Other

In [None]:
notebookColumns = ['Q10_Part_1', 'Q10_Part_2', 'Q10_Part_3', 'Q10_Part_4', 'Q10_Part_5', 'Q10_Part_6', 'Q10_Part_7',
                   'Q10_Part_8',
                   'Q10_Part_9', 'Q10_Part_10', 'Q10_Part_11', 'Q10_Part_12', 'Q10_Part_13', 'Q10_OTHER']

notebookDistributionFig = horizontalHistogramMultipleColumns(surveyDF, columns=notebookColumns,
                                                             title='Interactive Notebooks Usage', xaxis='Count',
                                                             yaxis='Notebooks', width=990, height=600)

notebookDistributionTextbox = widgets.Dropdown(
    description='Country:   ',
    value='All',
    options=countries
)

notebookDistributionContainer = widgets.HBox(children=[notebookDistributionTextbox])


def notebookDistributionResponse(change):
    if (notebookDistributionTextbox.value == 'All'):
        df = surveyDF
    else:
        df = surveyDF[surveyDF['Q3'] == notebookDistributionTextbox.value][notebookColumns].reset_index(drop=True)

    with notebookDistributionFig.batch_update():
        for i in range(len(notebookDistributionFig.data)):
            notebookDistributionFig.data[i].y = df[notebookDistributionFig.data[i].name].dropna()
        if (notebookDistributionTextbox.value == 'All'):
            notebookDistributionFig.layout.title = dict(text="Interactive Notebooks Usage")
        else:
            notebookDistributionFig.layout.title = dict(
                text=notebookDistributionTextbox.value + " Interactive Notebooks Usage")


notebookDistributionTextbox.observe(notebookDistributionResponse, names="value")

widgets.VBox([notebookDistributionContainer,
              notebookDistributionFig])

###  <a class="anchor" id="Primary-Data-Analysis-Tools">4.6.3. Primary Data Analysis Tools</a>

Participants are asked to select the primary tool that you use at work or school to analyze data? (Include text response)
* Basic statistical software (Microsoft Excel, Google Sheets, etc.) 
* Advanced statistical software (SPSS, SAS, etc.)
* Business intelligence software (Salesforce, Tableau, Spotfire, etc.) 
* Local development environments (RStudio, JupyterLab, etc.) 
* Cloud-based data software & APIs (AWS, GCP, Azure, etc.)
* Other

In [None]:
surveyDF['Q38'] = surveyDF['Q38'].str.strip()

surveyDF.loc[
    surveyDF.Q38 == 'Local development environments (RStudio, JupyterLab, etc.)', 'Q38'] = 'Local development environments<br>(RStudio, JupyterLab, etc.)'
surveyDF.loc[
    surveyDF.Q38 == 'Basic statistical software (Microsoft Excel, Google Sheets, etc.)', 'Q38'] = 'Basic statistical software<br>(Microsoft Excel, Google Sheets, etc.)'
surveyDF.loc[
    surveyDF.Q38 == 'Business intelligence software (Salesforce, Tableau, Spotfire, etc.)', 'Q38'] = 'Business intelligence software<br>(Salesforce, Tableau, Spotfire, etc.)'
surveyDF.loc[
    surveyDF.Q38 == 'Advanced statistical software (SPSS, SAS, etc.)', 'Q38'] = 'Advanced statistical software<br>(SPSS, SAS, etc.)'
surveyDF.loc[
    surveyDF.Q38 == 'Cloud-based data software & APIs (AWS, GCP, Azure, etc.)', 'Q38'] = 'Cloud-based data software & APIs<br>(AWS, GCP, Azure, etc.)'

primaryDataAnalysisLabels = surveyDF.Q38.value_counts().index

primaryDataAnalysisDistributionFig = horizontalHistogramSingleColumn(surveyDF, column='Q38',
                                                                     title='Primary Data Analysis Tools', xaxis='Count',
                                                                     yaxis='Data Analysis Product',
                                                                     labels=primaryDataAnalysisLabels,
                                                                     width=990, height=600)

primaryDataAnalysisDistributionTextbox = widgets.Dropdown(
    description='Country:   ',
    value='All',
    options=countries
)

primaryDataAnalysisDistributionContainer = widgets.HBox(children=[primaryDataAnalysisDistributionTextbox])


def primaryDataAnalysisDistributionResponse(change):
    if (primaryDataAnalysisDistributionTextbox.value == 'All'):
        df = surveyDF
    else:
        df = surveyDF[surveyDF['Q3'] == primaryDataAnalysisDistributionTextbox.value][['Q38']].reset_index(drop=True)

    with primaryDataAnalysisDistributionFig.batch_update():
        for i in range(len(primaryDataAnalysisDistributionFig.data)):
            primaryDataAnalysisDistributionFig.data[i].y = \
                df[df['Q38'] == primaryDataAnalysisDistributionFig.data[i].name]['Q38'].values
        if (primaryDataAnalysisDistributionTextbox.value == 'All'):
            primaryDataAnalysisDistributionFig.layout.title = dict(text="Primary Data Analysis Tools")
        else:
            primaryDataAnalysisDistributionFig.layout.title = dict(
                text=primaryDataAnalysisDistributionTextbox.value + " Primary Data Analysis Tools")


primaryDataAnalysisDistributionTextbox.observe(primaryDataAnalysisDistributionResponse, names="value")

widgets.VBox([primaryDataAnalysisDistributionContainer,
              primaryDataAnalysisDistributionFig])

###  <a class="anchor" id="Data-Visualization-Tools">4.6.4. Data Visualization Tools</a>

Participants are asked to select the data visualization libraries or tools do you use on a regular basis? (Select all that apply)
* __[Matplotlib](https://matplotlib.org/)__
* __[Seaborn](https://seaborn.pydata.org/)__
* __[Plotly / Plotly Express](https://plotly.com/)__
* __[Ggplot / ggplot2](https://ggplot2.tidyverse.org/reference/ggplot.html)__
* __[Shiny](https://cran.r-project.org/web/packages/shiny/index.html)__
* __[D3js](https://d3js.org/)__
* __[Altair](https://altair-viz.github.io/)__
* __[Bokeh](https://docs.bokeh.org/en/latest/index.html/)__
* __[Geoplotlib](https://github.com/andrea-cuttone/geoplotlib/)__
* __[Leaflet / Folium](https://leafletjs.com/)__
* None
* Other

In [None]:
visualizationToolColumns = ['Q14_Part_1', 'Q14_Part_2', 'Q14_Part_3', 'Q14_Part_4', 'Q14_Part_5', 'Q14_Part_6',
                            'Q14_Part_7',
                            'Q14_Part_8',
                            'Q14_Part_9', 'Q14_Part_10', 'Q14_Part_11', 'Q14_OTHER']

visualizationToolDistributionFig = horizontalHistogramMultipleColumns(surveyDF, columns=visualizationToolColumns,
                                                                      title='Visualization Framework', xaxis='Count',
                                                                      yaxis='Visualization Framework', width=990,
                                                                      height=600)

visualizationToolDistributionTextbox = widgets.Dropdown(
    description='Country:   ',
    value='All',
    options=countries
)

visualizationToolDistributionContainer = widgets.HBox(children=[visualizationToolDistributionTextbox])


def visualizationToolDistributionResponse(change):
    if (visualizationToolDistributionTextbox.value == 'All'):
        df = surveyDF
    else:
        df = surveyDF[surveyDF['Q3'] == visualizationToolDistributionTextbox.value][
            visualizationToolColumns].reset_index(drop=True)

    with visualizationToolDistributionFig.batch_update():
        for i in range(len(visualizationToolDistributionFig.data)):
            visualizationToolDistributionFig.data[i].y = df[visualizationToolDistributionFig.data[i].name].dropna()
        if (visualizationToolDistributionTextbox.value == 'All'):
            visualizationToolDistributionFig.layout.title = dict(text="Visualization Framework")
        else:
            visualizationToolDistributionFig.layout.title = dict(
                text=visualizationToolDistributionTextbox.value + " Visualization Framework")


visualizationToolDistributionTextbox.observe(visualizationToolDistributionResponse, names="value")

widgets.VBox([visualizationToolDistributionContainer,
              visualizationToolDistributionFig])

###  <a class="anchor" id="Business-Intelligence">4.6.5. Business Intelligence</a>
    
###  <a class="anchor" id="Primary-Business-Intelligence-Tools">4.6.5.1. Primary Business Intelligence Tools</a>
    
Participants are asked to select the business intelligence tools do you use on a regular basis? (Select all that apply)
* __[Amazon QuickSight](https://aws.amazon.com/quicksight/)__
* __[Microsoft Power BI](https://powerbi.microsoft.com/en-us/)__
* __[Google Data Studio](https://datastudio.google.com/u/0/navigation/reporting)__
* __[Looker](https://cloud.google.com/looker)__
* __[Tableau](https://www.tableau.com/solutions/salesforce)__
* __[Salesforce](https://www.salesforce.com/ca/?ir=1)__
* __[Einstein Analytics](https://www.salesforce.com/products/crm-analytics/overview/)__
* __[Qlik](https://www.qlik.com/us/)__
* __[Domo](https://www.domo.com/)__
* __[TIBCO Spotfire](https://www.tibco.com/products/tibco-spotfire)__
* __[Alteryx](https://www.alteryx.com/)__
* __[Sisense](https://www.sisense.com/)__
* __[SAP Analytics Cloud](https://saphanajourney.com/sap-analytics-cloud/)__
* None
* Other

In [None]:
primaryBILabels = surveyDF.Q32.value_counts().index

primaryBIDistributionFig = horizontalHistogramSingleColumn(surveyDF, column='Q32',
                                                           title='Primary Business Intelligence Tool', xaxis='Count',
                                                           yaxis='BI Product', labels=primaryBILabels, width=990,
                                                           height=1000)

primaryBIDistributionTextbox = widgets.Dropdown(
    description='Country:   ',
    value='All',
    options=countries
)

primaryBIDistributionContainer = widgets.HBox(children=[primaryBIDistributionTextbox])


def primaryBIDistributionResponse(change):
    if (primaryBIDistributionTextbox.value == 'All'):
        df = surveyDF
    else:
        df = surveyDF[surveyDF['Q3'] == primaryBIDistributionTextbox.value][['Q32']].reset_index(drop=True)

    with primaryBIDistributionFig.batch_update():
        for i in range(len(primaryBIDistributionFig.data)):
            primaryBIDistributionFig.data[i].y = df[df['Q32'] == primaryBIDistributionFig.data[i].name]['Q32'].values
        if (primaryBIDistributionTextbox.value == 'All'):
            primaryBIDistributionFig.layout.title = dict(text="Primary Business Intelligence Tool")
        else:
            primaryBIDistributionFig.layout.title = dict(
                text=primaryBIDistributionTextbox.value + " Primary Business Intelligence Tool")


primaryBIDistributionTextbox.observe(primaryBIDistributionResponse, names="value")

widgets.VBox([primaryBIDistributionContainer,
              primaryBIDistributionFig])

###  <a class="anchor" id="Business-Intelligence-Tools">4.6.5.2 Business Intelligence Tools</a>

Participants are asked to select the business intelligence tools do you use on a regular basis? (Select all that apply)
* __[Amazon QuickSight](https://aws.amazon.com/quicksight/)__
* __[Microsoft Power BI](https://powerbi.microsoft.com/en-us/)__
* __[Google Data Studio](https://datastudio.google.com/u/0/navigation/reporting)__
* __[Looker](https://cloud.google.com/looker)__
* __[Tableau](https://www.tableau.com/solutions/salesforce)__
* __[Salesforce](https://www.salesforce.com/ca/?ir=1)__
* __[Einstein Analytics](https://www.salesforce.com/products/crm-analytics/overview/)__
* __[Qlik](https://www.qlik.com/us/)__
* __[Domo](https://www.domo.com/)__
* __[TIBCO Spotfire](https://www.tibco.com/products/tibco-spotfire)__
* __[Alteryx](https://www.alteryx.com/)__
* __[Sisense](https://www.sisense.com/)__
* __[SAP Analytics Cloud](https://saphanajourney.com/sap-analytics-cloud/)__
* None
* Other

In [None]:
biColumns = ['Q31_A_Part_1', 'Q31_A_Part_2', 'Q31_A_Part_3', 'Q31_A_Part_4', 'Q31_A_Part_5', 'Q31_A_Part_6',
           'Q31_A_Part_7', 'Q31_A_Part_8',
           'Q31_A_Part_9', 'Q31_A_Part_10', 'Q31_A_Part_11', 'Q31_A_Part_12', 'Q31_A_Part_13', 'Q31_A_Part_14',
           'Q31_A_OTHER']

biDistributionFig = horizontalHistogramMultipleColumns(surveyDF, columns=biColumns,
                                                       title='Business Intelligence Products',
                                                       xaxis='Count',
                                                       yaxis='Business Intelligence Product', width=990, height=800)

biDistributionTextbox = widgets.Dropdown(
    description='Country:   ',
    value='All',
    options=countries
)

biDistributionContainer = widgets.HBox(children=[biDistributionTextbox])


def biDistributionResponse(change):
    if (biDistributionTextbox.value == 'All'):
        df = surveyDF
    else:
        df = surveyDF[surveyDF['Q3'] == biDistributionTextbox.value][biColumns].reset_index(drop=True)

    with biDistributionFig.batch_update():
        for i in range(len(biDistributionFig.data)):
            biDistributionFig.data[i].y = df[biDistributionFig.data[i].name].dropna()
        if (biDistributionTextbox.value == 'All'):
            biDistributionFig.layout.title = dict(text="Business Intelligence Products")
        else:
            biDistributionFig.layout.title = dict(text=biDistributionTextbox.value + " Business Intelligence Products")


biDistributionTextbox.observe(biDistributionResponse, names="value")

widgets.VBox([biDistributionContainer,
              biDistributionFig])

###  <a class="anchor" id="Business-Intelligence-Tools-To-Learn">4.6.5.3. Business Intelligence Tools To Learn</a>

Non-professional participants are asked to select the business intelligence tools they hope to become more familiar with in the next 2 years? (Select all that apply)
* __[Amazon QuickSight](https://aws.amazon.com/quicksight/)__
* __[Microsoft Power BI](https://powerbi.microsoft.com/en-us/)__
* __[Google Data Studio](https://datastudio.google.com/u/0/navigation/reporting)__
* __[Looker](https://cloud.google.com/looker)__
* __[Tableau](https://www.tableau.com/solutions/salesforce)__
* __[Salesforce](https://www.salesforce.com/ca/?ir=1)__
* __[Einstein Analytics](https://www.salesforce.com/products/crm-analytics/overview/)__
* __[Qlik](https://www.qlik.com/us/)__
* __[Domo](https://www.domo.com/)__
* __[TIBCO Spotfire](https://www.tibco.com/products/tibco-spotfire)__
* __[Alteryx](https://www.alteryx.com/)__
* __[Sisense](https://www.sisense.com/)__
* __[SAP Analytics Cloud](https://saphanajourney.com/sap-analytics-cloud/)__
* None
* Other

Note - Non-professionals received questions with an alternate phrasing (questions for non-professionals asked what tools they hope to become familiar with in the next 2 years instead of asking what tools they use on a regular basis). Non-professionals were defined as students, unemployed, and respondents that have never spent any money in the cloud.

In [None]:
biToLearnColumns = ['Q31_B_Part_1', 'Q31_B_Part_2', 'Q31_B_Part_3', 'Q31_B_Part_4', 'Q31_B_Part_5', 'Q31_B_Part_6',
                    'Q31_B_Part_7', 'Q31_B_Part_8',
                    'Q31_B_Part_9', 'Q31_B_Part_10', 'Q31_B_Part_11', 'Q31_B_Part_12', 'Q31_B_Part_13', 'Q31_B_Part_14',
                    'Q31_B_OTHER']

biToLearnDistributionFig = horizontalHistogramMultipleColumns(surveyDF, columns=biToLearnColumns,
                                                              title='Business Intelligence Products To Learn',
                                                              xaxis='Count',
                                                              yaxis='Business Intelligence Product', width=990,
                                                              height=800)

biToLearnDistributionTextbox = widgets.Dropdown(
    description='Country:   ',
    value='All',
    options=countries
)

biToLearnDistributionContainer = widgets.HBox(children=[biToLearnDistributionTextbox])


def biToLearnDistributionResponse(change):
    if (biToLearnDistributionTextbox.value == 'All'):
        df = surveyDF
    else:
        df = surveyDF[surveyDF['Q3'] == biToLearnDistributionTextbox.value][biToLearnColumns].reset_index(drop=True)

    with biToLearnDistributionFig.batch_update():
        for i in range(len(biToLearnDistributionFig.data)):
            biToLearnDistributionFig.data[i].y = df[biToLearnDistributionFig.data[i].name].dropna()
        if (biToLearnDistributionTextbox.value == 'All'):
            biToLearnDistributionFig.layout.title = dict(text="Business Intelligence Products To Learn")
        else:
            biToLearnDistributionFig.layout.title = dict(
                text=biToLearnDistributionTextbox.value + " Business Intelligence Products To Learn")


biToLearnDistributionTextbox.observe(biToLearnDistributionResponse, names="value")

widgets.VBox([biToLearnDistributionContainer,
              biToLearnDistributionFig])

###  <a class="anchor" id="Big-Data">4.6.6. Big Data</a>

###  <a class="anchor" id="Primary-Big-Data-Products">4.6.6.1 Primary Big Data Products</a>

Participants are asked to select the big data products (relational databases, data warehouses, data lakes, or similar) they use most often? (Select all that apply)
* __[MySQL](https://www.mysql.com/)__
* __[PostgreSQL](https://www.postgresql.org/)__
* __[SQLite](https://www.sqlite.org/index.html)__
* __[Oracle Database](https://www.oracle.com/database/)__
* __[MongoDB](https://www.mongodb.com/)__
* __[Snowflake](https://www.snowflake.com/)__
* __[IBM Db2](https://www.ibm.com/analytics/db2)__
* __[Microsoft SQL Server](https://www.microsoft.com/en-us/sql-server)__
* __[Microsoft Access](https://www.microsoft.com/en-us/microsoft-365/access/)__
* __[Microsoft Azure Data Lake Storage](https://azure.microsoft.com/en-us/services/storage/data-lake-storage/#overview/)__
* __[Amazon Redshift](https://aws.amazon.com/redshift/?whats-new-cards.sort-by=item.additionalFields.postDateTime&whats-new-cards.sort-order=desc)__
* __[Amazon Athena](https://aws.amazon.com/athena/?whats-new-cards.sort-by=item.additionalFields.postDateTime&whats-new-cards.sort-order=desc)__
* __[Amazon DynamoDB](https://aws.amazon.com/dynamodb/)__
* __[Google Cloud BigQuery](https://cloud.google.com/bigquery/)__
* __[Google Cloud SQL](https://cloud.google.com/sql/)__
* __[Google Cloud Firestore](https://cloud.google.com/firestore/)__
* None
* Other

In [None]:
primaryBigDataLabels = surveyDF.Q30.value_counts().index

primaryBigDataDistributionFig = horizontalHistogramSingleColumn(surveyDF, column='Q30', title='Primary BigData Product',
                                                                xaxis='Count',
                                                                yaxis='BigData Product', labels=primaryBigDataLabels,
                                                                width=990,
                                                                height=1000)

primaryBigDataDistributionTextbox = widgets.Dropdown(
    description='Country:   ',
    value='All',
    options=countries
)

primaryBigDataDistributionContainer = widgets.HBox(children=[primaryBigDataDistributionTextbox])


def primaryBigDataDistributionResponse(change):
    if (primaryBigDataDistributionTextbox.value == 'All'):
        df = surveyDF
    else:
        df = surveyDF[surveyDF['Q3'] == primaryBigDataDistributionTextbox.value][['Q30']].reset_index(drop=True)

    with primaryBigDataDistributionFig.batch_update():
        for i in range(len(primaryBigDataDistributionFig.data)):
            primaryBigDataDistributionFig.data[i].y = df[df['Q30'] == primaryBigDataDistributionFig.data[i].name][
                'Q30'].values
        if (primaryBigDataDistributionTextbox.value == 'All'):
            primaryBigDataDistributionFig.layout.title = dict(text="Primary BigData Product")
        else:
            primaryBigDataDistributionFig.layout.title = dict(
                text=primaryBigDataDistributionTextbox.value + " Primary BigData Product")


primaryBigDataDistributionTextbox.observe(primaryBigDataDistributionResponse, names="value")

widgets.VBox([primaryBigDataDistributionContainer,
              primaryBigDataDistributionFig])

###  <a class="anchor" id="Big-Data-Products">4.6.6.2 Big Data Products</a>

Participants are asked to select the big data products (relational databases, data warehouses, data lakes, or similar) they use on a regular basis? (Select all that apply)
* __[MySQL](https://www.mysql.com/)__
* __[PostgreSQL](https://www.postgresql.org/)__
* __[SQLite](https://www.sqlite.org/index.html)__
* __[Oracle Database](https://www.oracle.com/database/)__
* __[MongoDB](https://www.mongodb.com/)__
* __[Snowflake](https://www.snowflake.com/)__
* __[IBM Db2](https://www.ibm.com/analytics/db2)__
* __[Microsoft SQL Server](https://www.microsoft.com/en-us/sql-server)__
* __[Microsoft Access](https://www.microsoft.com/en-us/microsoft-365/access/)__
* __[Microsoft Azure Data Lake Storage](https://azure.microsoft.com/en-us/services/storage/data-lake-storage/#overview/)__
* __[Amazon Redshift](https://aws.amazon.com/redshift/?whats-new-cards.sort-by=item.additionalFields.postDateTime&whats-new-cards.sort-order=desc)__
* __[Amazon Athena](https://aws.amazon.com/athena/?whats-new-cards.sort-by=item.additionalFields.postDateTime&whats-new-cards.sort-order=desc)__
* __[Amazon DynamoDB](https://aws.amazon.com/dynamodb/)__
* __[Google Cloud BigQuery](https://cloud.google.com/bigquery/)__
* __[Google Cloud SQL](https://cloud.google.com/sql/)__
* __[Google Cloud Firestore](https://cloud.google.com/firestore/)__
* None
* Other

In [None]:
bigDataColumns = ['Q29_A_Part_1', 'Q29_A_Part_2', 'Q29_A_Part_3', 'Q29_A_Part_4', 'Q29_A_Part_5', 'Q29_A_Part_6',
                  'Q29_A_Part_7', 'Q29_A_Part_8',
                  'Q29_A_Part_9', 'Q29_A_Part_10', 'Q29_A_Part_11', 'Q29_A_Part_12', 'Q29_A_Part_13', 'Q29_A_Part_14',
                  'Q29_A_Part_15', 'Q29_A_Part_16', 'Q29_A_Part_17', 'Q29_A_OTHER']

bigDataDistributionFig = horizontalHistogramMultipleColumns(surveyDF, columns=bigDataColumns,
                                                            title='Big Data Products',
                                                            xaxis='Count',
                                                            yaxis='BigData Product', width=990, height=800)

bigDataDistributionTextbox = widgets.Dropdown(
    description='Country:   ',
    value='All',
    options=countries
)

bigDataDistributionContainer = widgets.HBox(children=[bigDataDistributionTextbox])


def bigDataDistributionResponse(change):
    if (bigDataDistributionTextbox.value == 'All'):
        df = surveyDF
    else:
        df = surveyDF[surveyDF['Q3'] == bigDataDistributionTextbox.value][bigDataColumns].reset_index(drop=True)

    with bigDataDistributionFig.batch_update():
        for i in range(len(bigDataDistributionFig.data)):
            bigDataDistributionFig.data[i].y = df[bigDataDistributionFig.data[i].name].dropna()
        if (bigDataDistributionTextbox.value == 'All'):
            bigDataDistributionFig.layout.title = dict(text="Big Data Products")
        else:
            bigDataDistributionFig.layout.title = dict(text=bigDataDistributionTextbox.value + " Big Data Products")


bigDataDistributionTextbox.observe(bigDataDistributionResponse, names="value")

widgets.VBox([bigDataDistributionContainer,
              bigDataDistributionFig])

###  <a class="anchor" id="Big-Data-Products-To-Learn">4.6.6.3 Big Data Products To Learn</a>

Non-professional Participants are asked to select the big data products (relational databases, data warehouses, data lakes, or similar) they hope to become more familiar with in the next 2 years? (Select all that apply)
* __[MySQL](https://www.mysql.com/)__
* __[PostgreSQL](https://www.postgresql.org/)__
* __[SQLite](https://www.sqlite.org/index.html)__
* __[Oracle Database](https://www.oracle.com/database/)__
* __[MongoDB](https://www.mongodb.com/)__
* __[Snowflake](https://www.snowflake.com/)__
* __[IBM Db2](https://www.ibm.com/analytics/db2)__
* __[Microsoft SQL Server](https://www.microsoft.com/en-us/sql-server)__
* __[Microsoft Access](https://www.microsoft.com/en-us/microsoft-365/access/)__
* __[Microsoft Azure Data Lake Storage](https://azure.microsoft.com/en-us/services/storage/data-lake-storage/#overview/)__
* __[Amazon Redshift](https://aws.amazon.com/redshift/?whats-new-cards.sort-by=item.additionalFields.postDateTime&whats-new-cards.sort-order=desc)__
* __[Amazon Athena](https://aws.amazon.com/athena/?whats-new-cards.sort-by=item.additionalFields.postDateTime&whats-new-cards.sort-order=desc)__
* __[Amazon DynamoDB](https://aws.amazon.com/dynamodb/)__
* __[Google Cloud BigQuery](https://cloud.google.com/bigquery/)__
* __[Google Cloud SQL](https://cloud.google.com/sql/)__
* __[Google Cloud Firestore](https://cloud.google.com/firestore/)__
* None
* Other

Note - Non-professionals received questions with an alternate phrasing (questions for non-professionals asked what tools they hope to become familiar with in the next 2 years instead of asking what tools they use on a regular basis). Non-professionals were defined as students, unemployed, and respondents that have never spent any money in the cloud.

In [None]:
bigDataToLEarnColumns = ['Q29_B_Part_1', 'Q29_B_Part_2', 'Q29_B_Part_3', 'Q29_B_Part_4', 'Q29_B_Part_5', 'Q29_B_Part_6',
                         'Q29_B_Part_7', 'Q29_B_Part_8',
                         'Q29_B_Part_9', 'Q29_B_Part_10', 'Q29_B_Part_11', 'Q29_B_Part_12', 'Q29_B_Part_13',
                         'Q29_B_Part_14',
                         'Q29_B_Part_15', 'Q29_B_Part_16', 'Q29_B_Part_17', 'Q29_B_OTHER']

bigDataToLearnDistributionFig = horizontalHistogramMultipleColumns(surveyDF, columns=bigDataToLEarnColumns,
                                                                   title='Big Data Products To Learn',
                                                                   xaxis='Count',
                                                                   yaxis='BigData Product', width=990, height=800)

bigDataToLearnDistributionTextbox = widgets.Dropdown(
    description='Country:   ',
    value='All',
    options=countries
)

bigDataToLearnDistributionContainer = widgets.HBox(children=[bigDataToLearnDistributionTextbox])


def bigDataToLearnDistributionResponse(change):
    if (bigDataToLearnDistributionTextbox.value == 'All'):
        df = surveyDF
    else:
        df = surveyDF[surveyDF['Q3'] == bigDataToLearnDistributionTextbox.value][bigDataToLEarnColumns].reset_index(
            drop=True)

    with bigDataToLearnDistributionFig.batch_update():
        for i in range(len(bigDataToLearnDistributionFig.data)):
            bigDataToLearnDistributionFig.data[i].y = df[bigDataToLearnDistributionFig.data[i].name].dropna()
        if (bigDataToLearnDistributionTextbox.value == 'All'):
            bigDataToLearnDistributionFig.layout.title = dict(text="Big Data Products To Learn")
        else:
            bigDataToLearnDistributionFig.layout.title = dict(
                text=bigDataToLearnDistributionTextbox.value + " Big Data Products To Learn")


bigDataToLearnDistributionTextbox.observe(bigDataToLearnDistributionResponse, names="value")

widgets.VBox([bigDataToLearnDistributionContainer,
              bigDataToLearnDistributionFig])

###  <a class="anchor" id="Machine-Learning">4.6.7. Machine Learning</a>
###  <a class="anchor" id="Machine-Learning-Experience">4.6.7.1. Machine Learning Experience</a>

Participants are asked to select the number of years they have used machine learning methods?
* I do not use machine learning methods
* Under 1 year
* 1-2 years
* 2-3 years
* 3-4 years
* 4-5 years
* 5-10 years
* 10-20 years
* 20 or more years

In [None]:
machineLearningExperience = ['I do not use machine learning methods', 'Under 1 year', '1-2 years', '2-3 years',
                             '3-4 years', '4-5 years', '5-10 years', '10-20 years', '20 or more years']

mlExperienceDistributionFig = horizontalHistogramSingleColumn(surveyDF, column='Q15',
                                                              title='Machine Learning Experience', xaxis='Count',
                                                              yaxis='Experience', labels=machineLearningExperience,
                                                              width=990, height=600)

mlExperienceDistributionTextbox = widgets.Dropdown(
    description='Country:   ',
    value='All',
    options=countries
)

mlExperienceDistributionContainer = widgets.HBox(children=[mlExperienceDistributionTextbox])


def mlExperienceDistributionResponse(change):
    if (mlExperienceDistributionTextbox.value == 'All'):
        df = surveyDF
    else:
        df = surveyDF[surveyDF['Q3'] == mlExperienceDistributionTextbox.value][['Q15']].reset_index(drop=True)

    with mlExperienceDistributionFig.batch_update():
        for i in range(len(mlExperienceDistributionFig.data)):
            mlExperienceDistributionFig.data[i].y = df[df['Q15'] == mlExperienceDistributionFig.data[i].name][
                'Q15'].values
        if (mlExperienceDistributionTextbox.value == 'All'):
            mlExperienceDistributionFig.layout.title = dict(text="Machine Learning Experience")
        else:
            mlExperienceDistributionFig.layout.title = dict(
                text=mlExperienceDistributionTextbox.value + " Machine Learning Experience")


mlExperienceDistributionTextbox.observe(mlExperienceDistributionResponse, names="value")

widgets.VBox([mlExperienceDistributionContainer,
              mlExperienceDistributionFig])

###  <a class="anchor" id="Primary-Machine-Learning-Framework">4.6.7.2. Primary Machine Learning Framework</a>

Participants are asked to select the machine learning frameworks they use on a regular basis? (Select all that apply)
* __[Scikit-learn](https://scikit-learn.org/stable/)__
* __[TensorFlow](https://www.tensorflow.org/)__
* __[Keras](https://keras.io/)__
* __[PyTorch](https://pytorch.org/)__
* __[Fast.ai](https://docs.fast.ai/)__
* __[MXNet](https://github.com/apache/incubator-mxnet/)__
* __[Xgboost](https://xgboost.readthedocs.io/en/latest/)__
* __[LightGBM](https://lightgbm.readthedocs.io/en/latest/)__
* __[CatBoost](https://catboost.ai/docs/)__
* __[Prophet](https://facebook.github.io/prophet/)__
* __[H2O3](http://docs.h2o.ai/h2o/latest-stable/h2o-docs/welcome.html/)__
* __[Caret](https://github.com/topepo/caret/)__
* __[Tidymodels](https://github.com/tidymodels/tidymodels/)__ 
* __[JAX](https://github.com/google/jax/)__
* None
* Other

In [None]:
mlFrameworkColumns = ['Q16_Part_1', 'Q16_Part_2', 'Q16_Part_3', 'Q16_Part_4', 'Q16_Part_5', 'Q16_Part_6', 'Q16_Part_7',
                      'Q16_Part_8',
                      'Q16_Part_9', 'Q16_Part_10', 'Q16_Part_11', 'Q16_Part_12', 'Q16_Part_13', 'Q16_Part_14',
                      'Q16_Part_15',
                      'Q16_OTHER']

mlFrameworkDistributionFig = horizontalHistogramMultipleColumns(surveyDF, columns=mlFrameworkColumns,
                                                                title='Machine Learning Framework', xaxis='Count',
                                                                yaxis='ML Framework', width=990, height=900)

mlFrameworkDistributionTextbox = widgets.Dropdown(
    description='Country:   ',
    value='All',
    options=countries
)

mlFrameworkDistributionContainer = widgets.HBox(children=[mlFrameworkDistributionTextbox])


def mlFrameworkDistributionResponse(change):
    if (mlFrameworkDistributionTextbox.value == 'All'):
        df = surveyDF
    else:
        df = surveyDF[surveyDF['Q3'] == mlFrameworkDistributionTextbox.value][mlFrameworkColumns].reset_index(drop=True)

    with mlFrameworkDistributionFig.batch_update():
        for i in range(len(mlFrameworkDistributionFig.data)):
            mlFrameworkDistributionFig.data[i].y = df[mlFrameworkDistributionFig.data[i].name].dropna()
        if (mlFrameworkDistributionTextbox.value == 'All'):
            mlFrameworkDistributionFig.layout.title = dict(text="Machine Learning Framework")
        else:
            mlFrameworkDistributionFig.layout.title = dict(
                text=mlFrameworkDistributionTextbox.value + " Machine Learning Framework")


mlFrameworkDistributionTextbox.observe(mlFrameworkDistributionResponse, names="value")

widgets.VBox([mlFrameworkDistributionContainer,
              mlFrameworkDistributionFig])

###  <a class="anchor" id="Machine-Learning-Algorithm">4.6.7.3. Machine Learning Algorithm</a>

Participants are asked to select the ML algorithms do you use on a regular basis? (Select all that apply):
* Linear or Logistic Regression
* Decision Trees or Random Forests
* Gradient Boosting Machines (xgboost, lightgbm, etc)
* Bayesian Approaches
* Evolutionary Approaches
* Dense Neural Networks (MLPs, etc)
* Convolutional Neural Networks
* Generative Adversarial Networks
* Recurrent Neural Networks
* Transformer Networks (BERT, gpt-3, etc)
* None
* Other

In [None]:
mlAlogColumns = ['Q17_Part_1', 'Q17_Part_2', 'Q17_Part_3', 'Q17_Part_4', 'Q17_Part_5', 'Q17_Part_6', 'Q17_Part_7',
                 'Q17_Part_8',
                 'Q17_Part_9', 'Q17_Part_10', 'Q17_Part_11', 'Q17_OTHER']

surveyDF['Q17_Part_3'] = surveyDF['Q17_Part_3'].str.strip()
surveyDF['Q17_Part_6'] = surveyDF['Q17_Part_6'].str.strip()
surveyDF['Q17_Part_10'] = surveyDF['Q17_Part_10'].str.strip()

surveyDF.loc[
    surveyDF.Q17_Part_3 == 'Gradient Boosting Machines (xgboost, lightgbm, etc)',
    'Q17_Part_3'
] = 'Gradient Boosting Machines<br>(xgboost, lightgbm, etc)'

surveyDF.loc[
    surveyDF.Q17_Part_6 == 'Dense Neural Networks (MLPs, etc)',
    'Q17_Part_6'
] = 'Dense Neural Networks<br>(MLPs, etc)'

surveyDF.loc[
    surveyDF.Q17_Part_10 == 'Transformer Networks (BERT, gpt-3, etc)',
    'Q17_Part_10'
] = 'Transformer Networks<br>(BERT, gpt-3, etc)'

mlAlogDistributionFig = horizontalHistogramMultipleColumns(surveyDF, columns=mlAlogColumns,
                                                           title='Machine Learning Algorithm',
                                                           xaxis='Count', yaxis='ML Algorithm', width=990, height=600)

mlAlogDistributionTextbox = widgets.Dropdown(
    description='Country:   ',
    value='All',
    options=countries
)

mlAlogDistributionContainer = widgets.HBox(children=[mlAlogDistributionTextbox])


def mlAlogDistributionResponse(change):
    if (mlAlogDistributionTextbox.value == 'All'):
        df = surveyDF
    else:
        df = surveyDF[surveyDF['Q3'] == mlAlogDistributionTextbox.value][mlAlogColumns].reset_index(drop=True)

    with mlAlogDistributionFig.batch_update():
        for i in range(len(mlAlogDistributionFig.data)):
            mlAlogDistributionFig.data[i].y = df[mlAlogDistributionFig.data[i].name].dropna()
        if (mlAlogDistributionTextbox.value == 'All'):
            mlAlogDistributionFig.layout.title = dict(text="Machine Learning Algorithm")
        else:
            mlAlogDistributionFig.layout.title = dict(
                text=mlAlogDistributionTextbox.value + " Machine Learning Algorithm")


mlAlogDistributionTextbox.observe(mlAlogDistributionResponse, names="value")

widgets.VBox([mlAlogDistributionContainer,
              mlAlogDistributionFig])

###  <a class="anchor" id="Computer-Vision-Methods">4.6.7.4. Computer Vision Methods</a>

Participants are asked to select the computer vision methods do you use on a regular basis? (Select all that apply)
* General purpose image/video tools (PIL, cv2, skimage, etc)
* Image segmentation methods (U-Net, Mask R-CNN, etc)
* Object detection methods (YOLOv3, RetinaNet, etc)
* Image classification and other general purpose networks (VGG, Inception, ResNet,
ResNeXt, NASNet, EfficientNet, etc)
* Generative Networks (GAN, VAE, etc)
* None
* Other

Note - This question was only asked to respondents that selected the relevant answer choices in the Machine Leanring Algorithm question. 

In [None]:
cvColumns = ['Q18_Part_1', 'Q18_Part_2', 'Q18_Part_3', 'Q18_Part_4', 'Q18_Part_5', 'Q18_Part_6']

surveyDF['Q18_Part_1'] = surveyDF['Q18_Part_1'].str.strip()
surveyDF['Q18_Part_2'] = surveyDF['Q18_Part_2'].str.strip()
surveyDF['Q18_Part_3'] = surveyDF['Q18_Part_3'].str.strip()
surveyDF['Q18_Part_4'] = surveyDF['Q18_Part_4'].str.strip()
surveyDF['Q18_Part_5'] = surveyDF['Q18_Part_5'].str.strip()

surveyDF.loc[
    surveyDF.Q18_Part_1 == 'General purpose image/video tools (PIL, cv2, skimage, etc)',
    'Q18_Part_1'
] = 'General purpose image/video tools<br>(PIL, cv2, skimage, etc)'

surveyDF.loc[
    surveyDF.Q18_Part_2 == 'Image segmentation methods (U-Net, Mask R-CNN, etc)',
    'Q18_Part_2'
] = 'Image segmentation methods<br>(U-Net, Mask R-CNN, etc)'

surveyDF.loc[
    surveyDF.Q18_Part_3 == 'Object detection methods (YOLOv3, RetinaNet, etc)',
    'Q18_Part_3'
] = 'Object detection methods<br>(YOLOv3, RetinaNet, etc)'

surveyDF.loc[
    surveyDF.Q18_Part_4 == 'Image classification and other general purpose networks (VGG, Inception, ResNet, ResNeXt, NASNet, EfficientNet, etc)',
    'Q18_Part_4'
] = 'Image classification and other<br>general purpose networks (VGG, Inception,<br>ResNet, ResNeXt, NASNet, EfficientNet, etc)'

surveyDF.loc[
    surveyDF.Q18_Part_5 == 'Generative Networks (GAN, VAE, etc)',
    'Q18_Part_5'
] = 'Generative Networks<br>(GAN, VAE, etc)'

cvDistributionFig = horizontalHistogramMultipleColumns(surveyDF, columns=cvColumns, title='Computer Vision Methods',
                                                       xaxis='Count', yaxis='Computer Vision Methods', width=990,
                                                       height=600)

cvDistributionTextbox = widgets.Dropdown(
    description='Country:   ',
    value='All',
    options=countries
)

cvDistributionContainer = widgets.HBox(children=[cvDistributionTextbox])


def cvDistributionResponse(change):
    if (cvDistributionTextbox.value == 'All'):
        df = surveyDF
    else:
        df = surveyDF[surveyDF['Q3'] == cvDistributionTextbox.value][cvColumns].reset_index(drop=True)

    with cvDistributionFig.batch_update():
        for i in range(len(cvDistributionFig.data)):
            cvDistributionFig.data[i].y = df[cvDistributionFig.data[i].name].dropna()
        if (cvDistributionTextbox.value == 'All'):
            cvDistributionFig.layout.title = dict(text="Computer Vision Methods")
        else:
            cvDistributionFig.layout.title = dict(text=cvDistributionTextbox.value + " Computer Vision Methods")


cvDistributionTextbox.observe(cvDistributionResponse, names="value")

widgets.VBox([cvDistributionContainer,
              cvDistributionFig])

###  <a class="anchor" id="NLP-Methods">4.6.7.5. Natural Language Processing Methods</a>

Participants are asked to select the natural language processing (NLP) methods do you use on a regular basis? (Select all that apply)
* Word embeddings/vectors (GLoVe, fastText, word2vec)
* Encoder-decoder models (seq2seq, vanilla transformers)
* Contextualized embeddings (ELMo, CoVe)
* Transformer language models (GPT-3, BERT, XLnet, etc)
* None
* Other

Note - This question was only asked to respondents that selected the relevant answer choices in the Machine Leanring Algorithm question. 

In [None]:
nlpColumns = ['Q19_Part_1', 'Q19_Part_2', 'Q19_Part_3', 'Q19_Part_4', 'Q19_Part_5']

surveyDF['Q19_Part_1'] = surveyDF['Q19_Part_1'].str.strip()
surveyDF['Q19_Part_2'] = surveyDF['Q19_Part_2'].str.strip()
surveyDF['Q19_Part_3'] = surveyDF['Q19_Part_3'].str.strip()
surveyDF['Q19_Part_4'] = surveyDF['Q19_Part_4'].str.strip()

surveyDF.loc[
    surveyDF.Q19_Part_1 == 'Word embeddings/vectors (GLoVe, fastText, word2vec)',
    'Q19_Part_1'
] = 'Word embeddings/vectors<br>(GLoVe, fastText, word2vec)'

surveyDF.loc[
    surveyDF.Q19_Part_2 == 'Encoder-decorder models (seq2seq, vanilla transformers)',
    'Q19_Part_2'
] = 'Encoder-decorder models<br>(seq2seq, vanilla transformers)'

surveyDF.loc[
    surveyDF.Q19_Part_3 == 'Contextualized embeddings (ELMo, CoVe)',
    'Q19_Part_3'
] = 'Contextualized embeddings<br>(ELMo, CoVe)'

surveyDF.loc[
    surveyDF.Q19_Part_4 == 'Transformer language models (GPT-3, BERT, XLnet, etc)',
    'Q19_Part_4'
] = 'Transformer language models<br>(GPT-3, BERT, XLnet, etc)'

nlpDistributionFig = horizontalHistogramMultipleColumns(surveyDF, columns=nlpColumns,
                                                        title='Natural Language Processing Methods',
                                                        xaxis='Count', yaxis='Natural Language Processing Methods',
                                                        width=990, height=600)

nlpDistributionTextbox = widgets.Dropdown(
    description='Country:   ',
    value='All',
    options=countries
)

nlpDistributionContainer = widgets.HBox(children=[nlpDistributionTextbox])


def nlpDistributionResponse(change):
    if (nlpDistributionTextbox.value == 'All'):
        df = surveyDF
    else:
        df = surveyDF[surveyDF['Q3'] == nlpDistributionTextbox.value][nlpColumns].reset_index(drop=True)

    with nlpDistributionFig.batch_update():
        for i in range(len(nlpDistributionFig.data)):
            nlpDistributionFig.data[i].y = df[nlpDistributionFig.data[i].name].dropna()
        if (nlpDistributionTextbox.value == 'All'):
            nlpDistributionFig.layout.title = dict(text="Natural Language Processing Methods")
        else:
            nlpDistributionFig.layout.title = dict(
                text=nlpDistributionTextbox.value + " Natural Language Processing Methods")


nlpDistributionTextbox.observe(nlpDistributionResponse, names="value")

widgets.VBox([nlpDistributionContainer,
              nlpDistributionFig])

###  <a class="anchor" id="ML-PROD">4.6.7.6. Machine Learning in Production</a>

Participants are asked if their current employer incorporate machine learning methods into their business?
* We are exploring ML methods (and may one day put a model into production)
* We use ML methods for generating insights (but do not put working models into production) 
* We recently started using ML methods (i.e., models in production for less than 2 years)
* We have well established ML methods (i.e., models in production for more than 2 years) 
* No (we do not use ML methods)
* I do not know

In [None]:
surveyDF['Q22'] = surveyDF['Q22'].str.strip()

surveyDF.loc[
    surveyDF.Q22 == 'We are exploring ML methods (and may one day put a model into production)',
    'Q22'
] = 'We are exploring ML methods<br>(and may one day put a<br>model into production)'

surveyDF.loc[
    surveyDF.Q22 == 'We have well established ML methods (i.e., models in production for more than 2 years)',
    'Q22'
] = 'We have well established ML methods<br>(i.e., models in production<br>for more than 2 years)'

surveyDF.loc[
    surveyDF.Q22 == 'We recently started using ML methods (i.e., models in production for less than 2 years)',
    'Q22'
] = 'We recently started using ML methods<br>(i.e., models in production<br>for less than 2 years)'

surveyDF.loc[
    surveyDF.Q22 == 'We use ML methods for generating insights (but do not put working models into production)',
    'Q22'
] = 'We use ML methods for generating<br>insights (but do not put working<br>models into production)'

mlInProduction = surveyDF.Q22.value_counts().index

mlInProdDistributionFig = horizontalHistogramSingleColumn(surveyDF, column='Q22',
                                                          title='Machine Learning Deployed in Production',
                                                          xaxis='Count',
                                                          yaxis='ML in Production', labels=mlInProduction, width=990,
                                                          height=600)

mlInProdDistributionTextbox = widgets.Dropdown(
    description='Country:   ',
    value='All',
    options=countries
)

mlInProdDistributionContainer = widgets.HBox(children=[mlInProdDistributionTextbox])


def mlInProdDistributionResponse(change):
    if (mlInProdDistributionTextbox.value == 'All'):
        df = surveyDF
    else:
        df = surveyDF[surveyDF['Q3'] == mlInProdDistributionTextbox.value][['Q22']].reset_index(drop=True)

    with mlInProdDistributionFig.batch_update():
        for i in range(len(mlInProdDistributionFig.data)):
            mlInProdDistributionFig.data[i].y = df[df['Q22'] == mlInProdDistributionFig.data[i].name]['Q22'].values
        if (mlInProdDistributionTextbox.value == 'All'):
            mlInProdDistributionFig.layout.title = dict(text="Machine Learning Deployed in Production")
        else:
            mlInProdDistributionFig.layout.title = dict(
                text=mlInProdDistributionTextbox.value + " Machine Learning Deployed in Production")


mlInProdDistributionTextbox.observe(mlInProdDistributionResponse, names="value")

widgets.VBox([mlInProdDistributionContainer,
              mlInProdDistributionFig])

###  <a class="anchor" id="ML-Repository">4.6.7.7. Machine Learning Repository</a>

Participants are asked to select the tools they use to manage machine learning experiments? (Select all that apply)
* __[Neptune.ai](https://neptune.ai/)__
* __[Weights & Biases](https://wandb.ai/site)__
* __[Comet.ml](https://www.comet.ml/site/)__
* __[Sacred + Omniboard](https://github.com/IDSIA/sacred/)__
* __[TensorBoard](https://www.tensorflow.org/tensorboard/)__
* __[Guild.ai](https://guild.ai/)__
* __[Polyaxon](https://polyaxon.com/)__
* __[Trains](https://github.com/allegroai/clearml)__
* __[Domino Model Monitor](https://www.dominodatalab.com/product/domino-model-monitor/)__
* No / None
* Other

In [None]:
mlRepositoryColumns = ['Q35_A_Part_1', 'Q35_A_Part_2', 'Q35_A_Part_3', 'Q35_A_Part_4', 'Q35_A_Part_5', 'Q35_A_Part_6',
                       'Q35_A_Part_7', 'Q35_A_Part_8',
                       'Q35_A_Part_9', 'Q35_A_Part_10', 'Q35_A_OTHER']

mlRepositoryDistributionFig = horizontalHistogramMultipleColumns(surveyDF, columns=mlRepositoryColumns,
                                                                 title='Machine Learning Repositories',
                                                                 xaxis='Count',
                                                                 yaxis='ML Repository', width=990, height=800)

mlRepositoryDistributionTextbox = widgets.Dropdown(
    description='Country:   ',
    value='All',
    options=countries
)

mlRepositoryDistributionContainer = widgets.HBox(children=[mlRepositoryDistributionTextbox])


def mlRepositoryDistributionResponse(change):
    if (mlRepositoryDistributionTextbox.value == 'All'):
        df = surveyDF
    else:
        df = surveyDF[surveyDF['Q3'] == mlRepositoryDistributionTextbox.value][mlRepositoryColumns].reset_index(
            drop=True)

    with mlRepositoryDistributionFig.batch_update():
        for i in range(len(mlRepositoryDistributionFig.data)):
            mlRepositoryDistributionFig.data[i].y = df[mlRepositoryDistributionFig.data[i].name].dropna()
        if (mlRepositoryDistributionTextbox.value == 'All'):
            mlRepositoryDistributionFig.layout.title = dict(text="Machine Learning Repositories")
        else:
            mlRepositoryDistributionFig.layout.title = dict(
                text=mlRepositoryDistributionTextbox.value + " Machine Learning Repositories")


mlRepositoryDistributionTextbox.observe(mlRepositoryDistributionResponse, names="value")

widgets.VBox([mlRepositoryDistributionContainer,
              mlRepositoryDistributionFig])

###  <a class="anchor" id="ML-Repository-To-Learn">4.6.7.8. Machine Learning Repository To Learn</a>

Non-professional participants are asked to select the tools they use to manage machine learning experiments they hope to become more familiar with in the next 2 years? (Select all that apply)
* __[Neptune.ai](https://neptune.ai/)__
* __[Weights & Biases](https://wandb.ai/site)__
* __[Comet.ml](https://www.comet.ml/site/)__
* __[Sacred + Omniboard](https://github.com/IDSIA/sacred/)__
* __[TensorBoard](https://www.tensorflow.org/tensorboard/)__
* __[Guild.ai](https://guild.ai/)__
* __[Polyaxon](https://polyaxon.com/)__
* __[Trains](https://github.com/allegroai/clearml)__
* __[Domino Model Monitor](https://www.dominodatalab.com/product/domino-model-monitor/)__
* No / None
* Other

In [None]:
mlRepositoryToLearnColumns = ['Q35_B_Part_1', 'Q35_B_Part_2', 'Q35_B_Part_3', 'Q35_B_Part_4', 'Q35_B_Part_5',
                              'Q35_B_Part_6',
                              'Q35_B_Part_7', 'Q35_B_Part_8',
                              'Q35_B_Part_9', 'Q35_B_Part_10', 'Q35_B_OTHER']

mlRepositoryToLearnDistributionFig = horizontalHistogramMultipleColumns(surveyDF, columns=mlRepositoryToLearnColumns,
                                                                        title='Machine Learning Repositories To Learn',
                                                                        xaxis='Count',
                                                                        yaxis='ML Repository Product', width=990,
                                                                        height=800)

mlRepositoryToLearnDistributionTextbox = widgets.Dropdown(
    description='Country:   ',
    value='All',
    options=countries
)

mlRepositoryToLearnDistributionContainer = widgets.HBox(children=[mlRepositoryToLearnDistributionTextbox])


def mlRepositoryToLearnDistributionResponse(change):
    if (mlRepositoryToLearnDistributionTextbox.value == 'All'):
        df = surveyDF
    else:
        df = surveyDF[surveyDF['Q3'] == mlRepositoryToLearnDistributionTextbox.value][
            mlRepositoryToLearnColumns].reset_index(drop=True)

    with mlRepositoryToLearnDistributionFig.batch_update():
        for i in range(len(mlRepositoryToLearnDistributionFig.data)):
            mlRepositoryToLearnDistributionFig.data[i].y = df[mlRepositoryToLearnDistributionFig.data[i].name].dropna()
        if (mlRepositoryToLearnDistributionTextbox.value == 'All'):
            mlRepositoryToLearnDistributionFig.layout.title = dict(text="Machine Learning Repositories To Learn")
        else:
            mlRepositoryToLearnDistributionFig.layout.title = dict(
                text=mlRepositoryToLearnDistributionTextbox.value + " Machine Learning Repositories To Learn")


mlRepositoryToLearnDistributionTextbox.observe(mlRepositoryToLearnDistributionResponse, names="value")

widgets.VBox([mlRepositoryToLearnDistributionContainer,
              mlRepositoryToLearnDistributionFig])

###  <a class="anchor" id="ML-Public-Deployment-Tools">4.6.7.9. Machine Learning Public Deployment Tools</a>

Participants are asked to select the tools they use to publicly share or deploy your data analysis or machine learning applications? (Select apply)
* __[Plotly Dash](https://plotly.com/dash/)__
* __[Streamlit](https://www.streamlit.io/)__
* __[NBViewer](https://nbviewer.jupyter.org/)__
* __[GitHub](https://github.com/)__
* __[Personal blog](https://medium.com/)__
* __[Kaggle](https://www.kaggle.com/)__
* __[Colab](https://colab.research.google.com/notebooks/intro.ipynb#recent=true)__
* __[Shiny](https://shiny.rstudio.com/)__
* None / I do not share my work publicly 
* Other

In [None]:
mlPublicDeploymentColumns = ['Q36_Part_1', 'Q36_Part_2', 'Q36_Part_3', 'Q36_Part_4', 'Q36_Part_5', 'Q36_Part_6',
                             'Q36_Part_7',
                             'Q36_Part_8',
                             'Q36_Part_9', 'Q36_OTHER']

mlPublicDeploymentDistributionFig = horizontalHistogramMultipleColumns(surveyDF, columns=mlPublicDeploymentColumns,
                                                                       title='Machine Learnign Public Deployment',
                                                                       xaxis='Count',
                                                                       yaxis='ML Public Deployment Product', width=990,
                                                                       height=800)

mlPublicDeploymentDistributionTextbox = widgets.Dropdown(
    description='Country:   ',
    value='All',
    options=countries
)

mlPublicDeploymentDistributionContainer = widgets.HBox(children=[mlPublicDeploymentDistributionTextbox])


def mlPublicDeploymentDistributionResponse(change):
    if (mlPublicDeploymentDistributionTextbox.value == 'All'):
        df = surveyDF
    else:
        df = surveyDF[surveyDF['Q3'] == mlPublicDeploymentDistributionTextbox.value][
            mlPublicDeploymentColumns].reset_index(drop=True)

    with mlPublicDeploymentDistributionFig.batch_update():
        for i in range(len(mlPublicDeploymentDistributionFig.data)):
            mlPublicDeploymentDistributionFig.data[i].y = df[mlPublicDeploymentDistributionFig.data[i].name].dropna()
        if (mlPublicDeploymentDistributionTextbox.value == 'All'):
            mlPublicDeploymentDistributionFig.layout.title = dict(text="Machine Learnign Public Deployment")
        else:
            mlPublicDeploymentDistributionFig.layout.title = dict(
                text=mlPublicDeploymentDistributionTextbox.value + " Machine Learnign Public Deployment")


mlPublicDeploymentDistributionTextbox.observe(mlPublicDeploymentDistributionResponse, names="value")

widgets.VBox([mlPublicDeploymentDistributionContainer,
              mlPublicDeploymentDistributionFig])

###  <a class="anchor" id="Auto-ML">4.6.8. Auto Machine Learning</a>
###  <a class="anchor" id="Auto-ML-Methods">4.6.8.1. Auto ML Methods</a>

Participants are asked to select the automated machine learning tools (or partial AutoML tools) on a regular basis? (Select all that apply)
* Automated data augmentation (e.g. imgaug, albumentations)
* Automated feature engineering/selection (e.g. tpot, boruta_py)
* Automated model selection (e.g. auto-sklearn, xcessiv)
* Automated model architecture searches (e.g. darts, enas)
* Automated hyperparameter tuning (e.g. hyperopt, ray.tune, Vizier)
* Automation of full ML pipelines (e.g. Google AutoML, H20 Driverless AI)
* No / None
* Other

In [None]:
surveyDF.loc[(
                     surveyDF.Q33_A_Part_1 == 'Automated data augmentation (e.g. imgaug, albumentations)'), 'Q33_A_Part_1'] = 'Automated data augmentation<br>(e.g. imgaug, albumentations)'
surveyDF.loc[(
                     surveyDF.Q33_A_Part_2 == 'Automated feature engineering/selection (e.g. tpot, boruta_py)'), 'Q33_A_Part_2'] = 'Automated feature engineering/selection<br>(e.g. tpot, boruta_py)'
surveyDF.loc[(
                     surveyDF.Q33_A_Part_3 == 'Automated model selection (e.g. auto-sklearn, xcessiv)'), 'Q33_A_Part_3'] = 'Automated model selection<br>(e.g. auto-sklearn, xcessiv)'
surveyDF.loc[(
                     surveyDF.Q33_A_Part_4 == 'Automated model architecture searches (e.g. darts, enas)'), 'Q33_A_Part_4'] = 'Automated model architecture searches<br>(e.g. darts, enas)'
surveyDF.loc[(
                     surveyDF.Q33_A_Part_5 == 'Automated hyperparameter tuning (e.g. hyperopt, ray.tune, Vizier)'), 'Q33_A_Part_5'] = 'Automated hyperparameter tuning<br>(e.g. hyperopt, ray.tune, Vizier)'
surveyDF.loc[(
                     surveyDF.Q33_A_Part_6 == 'Automation of full ML pipelines (e.g. Google AutoML, H20 Driverless AI)'), 'Q33_A_Part_6'] = 'Automation of full ML pipelines<br>(e.g. Google AutoML, H20 Driverless AI)'

autoMLColumns = ['Q33_A_Part_1', 'Q33_A_Part_2', 'Q33_A_Part_3', 'Q33_A_Part_4', 'Q33_A_Part_5', 'Q33_A_Part_6',
                 'Q33_A_Part_7', 'Q33_A_OTHER']

autoMLDistributionFig = horizontalHistogramMultipleColumns(surveyDF, columns=autoMLColumns,
                                                           title='Automated Machine Learning Methods',
                                                           xaxis='Count',
                                                           yaxis='Automated Machine Learning Methods', width=990,
                                                           height=800)

autoMLDistributionTextbox = widgets.Dropdown(
    description='Country:   ',
    value='All',
    options=countries
)

autoMLDistributionContainer = widgets.HBox(children=[autoMLDistributionTextbox])


def autoMLDistributionResponse(change):
    if (autoMLDistributionTextbox.value == 'All'):
        df = surveyDF
    else:
        df = surveyDF[surveyDF['Q3'] == autoMLDistributionTextbox.value][autoMLColumns].reset_index(drop=True)

    with autoMLDistributionFig.batch_update():
        for i in range(len(autoMLDistributionFig.data)):
            autoMLDistributionFig.data[i].y = df[autoMLDistributionFig.data[i].name].dropna()
        if (autoMLDistributionTextbox.value == 'All'):
            autoMLDistributionFig.layout.title = dict(text="Automated Machine Learning Methods")
        else:
            autoMLDistributionFig.layout.title = dict(
                text=autoMLDistributionTextbox.value + " Automated Machine Learning Methods")


autoMLDistributionTextbox.observe(autoMLDistributionResponse, names="value")

widgets.VBox([autoMLDistributionContainer,
              autoMLDistributionFig])

###  <a class="anchor" id="Auto-ML-Methods-To-Learn">4.6.8.2. Auto ML Methods To Learn</a>

Non-professional Participants are asked to select automated machine learning tools (or partial AutoML tools) they hope to become more familiar with in the next 2 years? (Select all that apply)
* Automated data augmentation (e.g. imgaug, albumentations)
* Automated feature engineering/selection (e.g. tpot, boruta_py)
* Automated model selection (e.g. auto-sklearn, xcessiv)
* Automated model architecture searches (e.g. darts, enas)
* Automated hyperparameter tuning (e.g. hyperopt, ray.tune, Vizier)
* Automation of full ML pipelines (e.g. Google Cloud AutoML, H20 Driverless AI)
* None
* Other

In [None]:
autoMLToLearnColumns = ['Q33_B_Part_1', 'Q33_B_Part_2', 'Q33_B_Part_3', 'Q33_B_Part_4', 'Q33_B_Part_5', 'Q33_B_Part_6',
                        'Q33_B_Part_7', 'Q33_B_OTHER']

surveyDF.loc[(
                     surveyDF.Q33_B_Part_1 == 'Automated data augmentation (e.g. imgaug, albumentations)'), 'Q33_B_Part_1'] = 'Automated data augmentation<br>(e.g. imgaug, albumentations)'
surveyDF.loc[(
                     surveyDF.Q33_B_Part_2 == 'Automated feature engineering/selection (e.g. tpot, boruta_py)'), 'Q33_B_Part_2'] = 'Automated feature engineering/selection<br>(e.g. tpot, boruta_py)'
surveyDF.loc[(
                     surveyDF.Q33_B_Part_3 == 'Automated model selection (e.g. auto-sklearn, xcessiv)'), 'Q33_B_Part_3'] = 'Automated model selection<br>(e.g. auto-sklearn, xcessiv)'
surveyDF.loc[(
                     surveyDF.Q33_B_Part_4 == 'Automated model architecture searches (e.g. darts, enas)'), 'Q33_B_Part_4'] = 'Automated model architecture searches<br>(e.g. darts, enas)'
surveyDF.loc[(
                     surveyDF.Q33_B_Part_5 == 'Automated hyperparameter tuning (e.g. hyperopt, ray.tune, Vizier)'), 'Q33_B_Part_5'] = 'Automated hyperparameter tuning<br>(e.g. hyperopt, ray.tune, Vizier)'
surveyDF.loc[(
                     surveyDF.Q33_B_Part_6 == 'Automation of full ML pipelines (e.g. Google Cloud AutoML, H20 Driverless AI)'), 'Q33_B_Part_6'] = 'Automation of full ML pipelines<br>(e.g. Google Cloud AutoML, H20 Driverless AI)'

autoMLToLearnDistributionFig = horizontalHistogramMultipleColumns(surveyDF, columns=autoMLToLearnColumns,
                                                                  title='Automated Machine Learning Methods To Learn',
                                                                  xaxis='Count',
                                                                  yaxis='Automated Machine Learning Methods', width=990,
                                                                  height=800)

autoMLToLearnDistributionTextbox = widgets.Dropdown(
    description='Country:   ',
    value='All',
    options=countries
)

autoMLToLearnDistributionContainer = widgets.HBox(children=[autoMLToLearnDistributionTextbox])


def autoMLToLearnDistributionResponse(change):
    if (autoMLToLearnDistributionTextbox.value == 'All'):
        df = surveyDF
    else:
        df = surveyDF[surveyDF['Q3'] == autoMLToLearnDistributionTextbox.value][autoMLToLearnColumns].reset_index(
            drop=True)

    with autoMLToLearnDistributionFig.batch_update():
        for i in range(len(autoMLToLearnDistributionFig.data)):
            autoMLToLearnDistributionFig.data[i].y = df[autoMLToLearnDistributionFig.data[i].name].dropna()
        if (autoMLToLearnDistributionTextbox.value == 'All'):
            autoMLToLearnDistributionFig.layout.title = dict(text="Automated Machine Learning Methods")
        else:
            autoMLToLearnDistributionFig.layout.title = dict(
                text=autoMLToLearnDistributionTextbox.value + " Automated Machine Learning Methods")


autoMLToLearnDistributionTextbox.observe(autoMLToLearnDistributionResponse, names="value")

widgets.VBox([autoMLToLearnDistributionContainer,
              autoMLToLearnDistributionFig])

###  <a class="anchor" id="Auto-ML-Tools">4.6.8.3. Auto ML Tools</a>

Participants are asked to select the automated machine learning tools (or partial AutoML tools) do they use on a regular basis? (Select all that apply)
* __[Google Cloud AutoML](https://cloud.google.com/automl/)__
* __[H20 Driverless AI](https://www.h2o.ai/products/h2o-driverless-ai/)__
* __[Databricks AutoML](https://databricks.com/product/automl-on-databricks)__
* __[DataRobot AutoML](https://www.datarobot.com/lp/automated-machine-learning-works-business/)__
* __[Tpot](https://github.com/EpistasisLab/tpot/)__
* __[Auto-Keras](https://github.com/keras-team/autokeras/)__
* __[Auto-Sklearn](https://github.com/automl/auto-sklearn/)__
* __[Auto_ml](https://github.com/ClimbsRocks/auto_ml/)__
* __[Xcessiv](https://github.com/reiinakano/xcessiv/)__
* __[MLbox](https://github.com/AxeldeRomblay/MLBox/)__
* No / None
* Other

In [None]:
autoMLToolColumns = ['Q34_A_Part_1', 'Q34_A_Part_2', 'Q34_A_Part_3', 'Q34_A_Part_4', 'Q34_A_Part_5', 'Q34_A_Part_6',
                     'Q34_A_Part_7', 'Q34_A_Part_8',
                     'Q34_A_Part_9', 'Q34_A_Part_10', 'Q34_A_Part_11', 'Q34_A_OTHER']

autoMLToolDistributionFig = horizontalHistogramMultipleColumns(surveyDF, columns=autoMLToolColumns,
                                                               title='Auto Machine Learning Tools',
                                                               xaxis='Count',
                                                               yaxis='Auto ML Tool', width=990, height=800)

autoMLToolDistributionTextbox = widgets.Dropdown(
    description='Country:   ',
    value='All',
    options=countries
)

autoMLToolDistributionContainer = widgets.HBox(children=[autoMLToolDistributionTextbox])


def autoMLToolDistributionResponse(change):
    if (autoMLToolDistributionTextbox.value == 'All'):
        df = surveyDF
    else:
        df = surveyDF[surveyDF['Q3'] == autoMLToolDistributionTextbox.value][autoMLToolColumns].reset_index(drop=True)

    with autoMLToolDistributionFig.batch_update():
        for i in range(len(autoMLToolDistributionFig.data)):
            autoMLToolDistributionFig.data[i].y = df[autoMLToolDistributionFig.data[i].name].dropna()
        if (autoMLToolDistributionTextbox.value == 'All'):
            autoMLToolDistributionFig.layout.title = dict(text="Auto Machine Learning Tools")
        else:
            autoMLToolDistributionFig.layout.title = dict(
                text=autoMLToolDistributionTextbox.value + " Auto Machine Learning Tools")


autoMLToolDistributionTextbox.observe(autoMLToolDistributionResponse, names="value")

widgets.VBox([autoMLToolDistributionContainer,
              autoMLToolDistributionFig])

###  <a class="anchor" id="Auto-ML-Tools-To-Learn">4.6.8.4. Auto ML Tools To Learn</a>

Non-professional participants are asked to select the automated machine learning tools (or partial AutoML tools) they hope to become more familiar with in the next 2 years? (Select all that apply)
* __[Google Cloud AutoML](https://cloud.google.com/automl/)__
* __[H20 Driverless AI](https://www.h2o.ai/products/h2o-driverless-ai/)__
* __[Databricks AutoML](https://databricks.com/product/automl-on-databricks)__
* __[DataRobot AutoML](https://www.datarobot.com/lp/automated-machine-learning-works-business/)__
* __[Tpot](https://github.com/EpistasisLab/tpot/)__
* __[Auto-Keras](https://github.com/keras-team/autokeras/)__
* __[Auto-Sklearn](https://github.com/automl/auto-sklearn/)__
* __[Auto_ml](https://github.com/ClimbsRocks/auto_ml/)__
* __[Xcessiv](https://github.com/reiinakano/xcessiv/)__
* __[MLbox](https://github.com/AxeldeRomblay/MLBox/)__
* No / None
* Other

In [None]:
autoMLToolToLearnColumns = ['Q34_B_Part_1', 'Q34_B_Part_2', 'Q34_B_Part_3', 'Q34_B_Part_4', 'Q34_B_Part_5',
                            'Q34_B_Part_6',
                            'Q34_B_Part_7', 'Q34_B_Part_8',
                            'Q34_B_Part_9', 'Q34_B_Part_10', 'Q34_B_Part_11', 'Q34_B_OTHER']

autoMLToolToLearnDistributionFig = horizontalHistogramMultipleColumns(surveyDF, columns=autoMLToolToLearnColumns,
                                                                      title='Auto Machine Learning Tools To Learn',
                                                                      xaxis='Count',
                                                                      yaxis='Auto ML Tools', width=990, height=800)

autoMLToolToLearnDistributionTextbox = widgets.Dropdown(
    description='Country:   ',
    value='All',
    options=countries
)

autoMLToolToLearnDistributionContainer = widgets.HBox(children=[autoMLToolToLearnDistributionTextbox])


def autoMLToolToLearnDistributionResponse(change):
    if (autoMLToolToLearnDistributionTextbox.value == 'All'):
        df = surveyDF
    else:
        df = surveyDF[surveyDF['Q3'] == autoMLToolToLearnDistributionTextbox.value][
            autoMLToolToLearnColumns].reset_index(drop=True)

    with autoMLToolToLearnDistributionFig.batch_update():
        for i in range(len(autoMLToolToLearnDistributionFig.data)):
            autoMLToolToLearnDistributionFig.data[i].y = df[autoMLToolToLearnDistributionFig.data[i].name].dropna()
        if (autoMLToolToLearnDistributionTextbox.value == 'All'):
            autoMLToolToLearnDistributionFig.layout.title = dict(text=" Auto Machine Learning Tools To Learn")
        else:
            autoMLToolToLearnDistributionFig.layout.title = dict(
                text=autoMLToolToLearnDistributionTextbox.value + " Auto Machine Learning Tools To Learn")


autoMLToolToLearnDistributionTextbox.observe(autoMLToolToLearnDistributionResponse, names="value")

widgets.VBox([autoMLToolToLearnDistributionContainer,
              autoMLToolToLearnDistributionFig])

###  <a class="anchor" id="Computing-Environment">4.7. Computing Environment</a>

###  <a class="anchor" id="Computing-Platform">4.7.1. Computing Platform</a>

Participants are asked to select the type of computing platform do you use most often for your data science projects?
* A personal computer or laptop
* A deep learning workstation (NVIDIA GTX, LambdaLabs, etc)
* A cloud computing platform (AWS, Azure, GCP, hosted notebooks, etc)
* None
* Other

In [None]:
colors = ['rgba(95,158,209,1)', 'rgba(0,107,164,1)', 'rgba(255,128,14,1)', 'rgba(89,89,89,1)', 'rgba(171,171,171,1)', ]

pull = [0.2, 0.2, 0.2, 0.2, 0.2]

surveyDF['Q11'] = surveyDF['Q11'].str.strip()

surveyDF.loc[
    surveyDF.Q11 == 'A cloud computing platform (AWS, Azure, GCP, hosted notebooks, etc)',
    'Q11'
] = 'A cloud computing platform<br>(AWS, Azure, GCP, hosted notebooks, etc)'

surveyDF.loc[
    surveyDF.Q11 == 'A deep learning workstation (NVIDIA GTX, LambdaLabs, etc)',
    'Q11'
] = 'A deep learning workstation<br>(NVIDIA GTX, LambdaLabs, etc)'

computingPlatformDistributionFig = donutPie(surveyDF, column='Q11', title='Computing Platformn', colors=colors,
                                            pull=pull, hole=.3, width=990, height=600)

computingPlatformDistributionTextBox = widgets.Dropdown(
    description='Country:   ',
    value='All',
    options=countries
)

computingPlatformDistributionContainer = widgets.HBox(children=[computingPlatformDistributionTextBox])


def computingPlatformDistributionResponse(change):
    if (computingPlatformDistributionTextBox.value == 'All'):
        df = surveyDF
    else:
        df = surveyDF[surveyDF['Q3'] == computingPlatformDistributionTextBox.value][['Q11']].reset_index(drop=True)

    with computingPlatformDistributionFig.batch_update():
        values = df['Q11'].value_counts().values
        computingPlatformDistributionFig.data[0].values = values
        if (computingPlatformDistributionTextBox.value == 'All'):
            computingPlatformDistributionFig.layout.title = dict(text="Computing Platform")
        else:
            computingPlatformDistributionFig.layout.title = dict(
                text=computingPlatformDistributionTextBox.value + " Computing Platform")


computingPlatformDistributionTextBox.observe(computingPlatformDistributionResponse, names="value")

widgets.VBox([computingPlatformDistributionContainer,
              computingPlatformDistributionFig])

###  <a class="anchor" id="Specialized-Hardware">4.7.2. Specialized Hardware</a>

Participants are asked to select the types of specialized hardware do you use on a regular basis? (Select all that apply)
* GPUs 
* TPUs
* None
* Other

In [None]:
splHardwareColumns = ['Q12_Part_1', 'Q12_Part_2', 'Q12_Part_3', 'Q12_OTHER']

splHardwareDistributionFig = verticalHistogramMultipleColumns(surveyDF, columns=splHardwareColumns,
                                                              title='Specialized Hardware Usage',
                                                              xaxis='IDE', yaxis='Count', width=990, height=600)

splHardwareDistributionTextbox = widgets.Dropdown(
    description='Country:   ',
    value='All',
    options=countries
)

splHardwareDistributionContainer = widgets.HBox(children=[splHardwareDistributionTextbox])


def splHardwareDistributionResponse(change):
    if (splHardwareDistributionTextbox.value == 'All'):
        df = surveyDF
    else:
        df = surveyDF[surveyDF['Q3'] == splHardwareDistributionTextbox.value][splHardwareColumns].reset_index(drop=True)

    with splHardwareDistributionFig.batch_update():
        for i in range(len(splHardwareDistributionFig.data)):
            splHardwareDistributionFig.data[i].x = df[splHardwareDistributionFig.data[i].name].dropna()
        if (splHardwareDistributionTextbox.value == 'All'):
            splHardwareDistributionFig.layout.title = dict(text="Specialized Hardware Usage")
        else:
            splHardwareDistributionFig.layout.title = dict(
                text=splHardwareDistributionTextbox.value + " Specialized Hardware Usage")


splHardwareDistributionTextbox.observe(splHardwareDistributionResponse, names="value")

widgets.VBox([splHardwareDistributionContainer,
              splHardwareDistributionFig])

###  <a class="anchor" id="TPU-Usage">4.7.3. TPU Usage</a>

Participants are asked to select approximately how many times they have used a TPU (tensor processing unit)?
* Never
* Once
* 2-5 times
* 6-25 times
* More than 25 times

In [None]:
colors = ['rgba(95,158,209,1)', 'rgba(0,107,164,1)', 'rgba(255,128,14,1)', 'rgba(89,89,89,1)', 'rgba(171,171,171,1)', ]

pull = [0.2, 0.2, 0.2, 0.2, 0.2]

tpuUsageDistributionFig = donutPie(surveyDF, column='Q13', title='TPU Usage', colors=colors, pull=pull, hole=.3,
                                   width=990, height=600)

tpuUsageDistributionTextBox = widgets.Dropdown(
    description='Country:   ',
    value='All',
    options=countries
)

tpuUsageDistributionContainer = widgets.HBox(children=[tpuUsageDistributionTextBox])


def tpuUsageDistributionResponse(change):
    if (tpuUsageDistributionTextBox.value == 'All'):
        df = surveyDF
    else:
        df = surveyDF[surveyDF['Q3'] == tpuUsageDistributionTextBox.value][['Q13']].reset_index(drop=True)

    with tpuUsageDistributionFig.batch_update():
        values = df['Q13'].value_counts().values
        tpuUsageDistributionFig.data[0].values = values
        if (tpuUsageDistributionTextBox.value == 'All'):
            tpuUsageDistributionFig.layout.title = dict(
                text="TPU Usage")
        else:
            tpuUsageDistributionFig.layout.title = dict(
                text=tpuUsageDistributionTextBox.value + " TPU Usage")


tpuUsageDistributionTextBox.observe(tpuUsageDistributionResponse, names="value")

widgets.VBox([tpuUsageDistributionContainer,
              tpuUsageDistributionFig])

###  <a class="anchor" id="Cloud-Environments">4.7.4. Cloud Environments</a>
###  <a class="anchor" id="Cloud-Budget">4.7.4.1. Cloud Budget</a>

Participants are asked to select approximately how much money they have (or their team) spent on machine learning and/or cloud
computing services at home (or at work) in the past 5 years (​approximate $USD​)?
* 0 (USD)
* 1-99
* 100-999
* 1000-9,999
* 10,000-99,999
* 100,000 or more (USD)

In [None]:
colors = ['rgba(171,171,171,1)', 'rgba(95,158,209,1)', 'rgba(0,107,164,1)', 'rgba(255,128,14,1)', 'rgba(89,89,89,1)',
          'rgba(255,188,121,1)', ]

pull = [0.2, 0.2, 0.2, 0.2, 0.2, 0.2]

cloudBudgetDistributionFig = donutPie(surveyDF, column='Q25', title='Cloud Budget', colors=colors, pull=pull, hole=.0,
                                      width=990, height=600)

cloudBudgetDistributionTextBox = widgets.Dropdown(
    description='Country:   ',
    value='All',
    options=countries
)

cloudBudgetDistributionContainer = widgets.HBox(children=[cloudBudgetDistributionTextBox])


def cloudBudgetDistributionResponse(change):
    if (cloudBudgetDistributionTextBox.value == 'All'):
        df = surveyDF
    else:
        df = surveyDF[surveyDF['Q3'] == cloudBudgetDistributionTextBox.value][['Q11']].reset_index(drop=True)

    with cloudBudgetDistributionFig.batch_update():
        values = df['Q11'].value_counts().values
        cloudBudgetDistributionFig.data[0].values = values
        if (cloudBudgetDistributionTextBox.value == 'All'):
            cloudBudgetDistributionFig.layout.title = dict(
                text="Cloud Budget")
        else:
            cloudBudgetDistributionFig.layout.title = dict(
                text=cloudBudgetDistributionTextBox.value + " Cloud Budget")


cloudBudgetDistributionTextBox.observe(cloudBudgetDistributionResponse, names="value")

widgets.VBox([cloudBudgetDistributionContainer,
              cloudBudgetDistributionFig])

###  <a class="anchor" id="Cloud-Platforms">4.7.4.2. Cloud Platforms</a>

Participants are asked to select the cloud computing platforms they use on a regular basis? (Select all that apply)
* __[Amazon Web Services (AWS)](https://aws.amazon.com/)__
* __[Microsoft Azure](https://azure.microsoft.com/en-us/)__
* __[Google Cloud Platform (GCP)](https://cloud.google.com/gcp/)__
* __[IBM Cloud / Red Hat](https://www.ibm.com/cloud)__
* __[Oracle Cloud](https://www.oracle.com/cloud/)__
* __[SAP Cloud](https://www.sap.com/products/cloud-platform.html?btp=470d9626-7bbe-4ce8-afd2-1f36444df031)__
* __[Salesforce Cloud](https://www.salesforce.com/products/sales-cloud/features/)__
* __[VMware Cloud](https://cloud.vmware.com/)__
* __[Alibaba Cloud](https://us.alibabacloud.com/)__
* __[Tencent Cloud](https://intl.cloud.tencent.com/)__
* None
* Other

In [None]:
surveyDF['Q26_A_Part_1'] = surveyDF['Q26_A_Part_1'].str.strip()
surveyDF['Q26_A_Part_2'] = surveyDF['Q26_A_Part_2'].str.strip()
surveyDF['Q26_A_Part_3'] = surveyDF['Q26_A_Part_3'].str.strip()
surveyDF['Q26_A_Part_4'] = surveyDF['Q26_A_Part_4'].str.strip()
surveyDF['Q26_A_Part_5'] = surveyDF['Q26_A_Part_5'].str.strip()
surveyDF['Q26_A_Part_6'] = surveyDF['Q26_A_Part_6'].str.strip()
surveyDF['Q26_A_Part_7'] = surveyDF['Q26_A_Part_7'].str.strip()
surveyDF['Q26_A_Part_8'] = surveyDF['Q26_A_Part_8'].str.strip()
surveyDF['Q26_A_Part_9'] = surveyDF['Q26_A_Part_9'].str.strip()
surveyDF['Q26_A_Part_10'] = surveyDF['Q26_A_Part_10'].str.strip()
surveyDF['Q26_A_Part_11'] = surveyDF['Q26_A_Part_11'].str.strip()
surveyDF['Q26_A_OTHER'] = surveyDF['Q26_A_OTHER'].str.strip()

cloudProviderColumns = ['Q26_A_Part_1', 'Q26_A_Part_2', 'Q26_A_Part_3', 'Q26_A_Part_4', 'Q26_A_Part_5', 'Q26_A_Part_6',
                        'Q26_A_Part_7', 'Q26_A_Part_8',
                        'Q26_A_Part_9', 'Q26_A_Part_10', 'Q26_A_Part_11', 'Q26_A_OTHER']

cloudProviderDistributionFig = horizontalHistogramMultipleColumns(surveyDF, columns=cloudProviderColumns,
                                                                  title='Cloud Provider Usage',
                                                                  xaxis='Count',
                                                                  yaxis='Cloud Provider', width=990, height=600)

cloudProviderDistributionTextbox = widgets.Dropdown(
    description='Country:   ',
    value='All',
    options=countries
)

cloudProviderDistributionContainer = widgets.HBox(children=[cloudProviderDistributionTextbox])


def cloudProviderDistributionResponse(change):
    if (cloudProviderDistributionTextbox.value == 'All'):
        df = surveyDF
    else:
        df = surveyDF[surveyDF['Q3'] == cloudProviderDistributionTextbox.value][cloudProviderColumns].reset_index(
            drop=True)

    with cloudProviderDistributionFig.batch_update():
        for i in range(len(cloudProviderDistributionFig.data)):
            cloudProviderDistributionFig.data[i].y = df[cloudProviderDistributionFig.data[i].name].dropna()
        if (cloudProviderDistributionTextbox.value == 'All'):
            cloudProviderDistributionFig.layout.title = dict(text="Cloud Provider Usage")
        else:
            cloudProviderDistributionFig.layout.title = dict(
                text=cloudProviderDistributionTextbox.value + " Cloud Provider Usage")


cloudProviderDistributionTextbox.observe(cloudProviderDistributionResponse, names="value")

widgets.VBox([cloudProviderDistributionContainer,
              cloudProviderDistributionFig])

###  <a class="anchor" id="Cloud-Computing-Products">4.7.4.3. Cloud Computing Products</a>

Participants are asked to select the cloud computing products on a regular basis? (Select all that apply)3​
* __[Amazon EC2](https://aws.amazon.com/ec2/?ec2-whats-new.sort-by=item.additionalFields.postDateTime&ec2-whats-new.sort-order=desc)__
* __[AWS Lambda](https://aws.amazon.com/lambda/)__
* __[Amazon Elastic Container Service](https://aws.amazon.com/ecs/?whats-new-cards.sort-by=item.additionalFields.postDateTime&whats-new-cards.sort-order=desc&ecs-blogs.sort-by=item.additionalFields.createdDate&ecs-blogs.sort-order=desc)__
* __[Azure Cloud Services](https://azure.microsoft.com/en-us/services/cloud-services/)__
* __[Microsoft Azure Container Instances](https://azure.microsoft.com/en-us/services/cloud-services/)__
* __[Azure Functions](https://azure.microsoft.com/en-us/services/functions/)__
* __[Google Cloud Compute Engine](https://cloud.google.com/compute)__
* __[Google Cloud Functions](https://cloud.google.com/functions/)__
* __[Google Cloud Run](https://cloud.google.com/run/)__
* __[Google Cloud App Engine](https://cloud.google.com/appengine/)__
* No / None
* Other

In [None]:
def cloudProductUsage(surveyDF):
    df = pd.concat([
        surveyDF[surveyDF.Q26_A_Part_1 == 'Amazon Web Services (AWS)'][['Q26_A_Part_1', 'Q27_A_Part_1']].rename(
            columns={"Q26_A_Part_1": "cloud_provider", "Q27_A_Part_1": "cloud_product"}),
        surveyDF[surveyDF.Q26_A_Part_1 == 'Amazon Web Services (AWS)'][['Q26_A_Part_1', 'Q27_A_Part_2']].rename(
            columns={"Q26_A_Part_1": "cloud_provider", "Q27_A_Part_2": "cloud_product"}),
        surveyDF[surveyDF.Q26_A_Part_1 == 'Amazon Web Services (AWS)'][['Q26_A_Part_1', 'Q27_A_Part_3']].rename(
            columns={"Q26_A_Part_1": "cloud_provider", "Q27_A_Part_3": "cloud_product"}),
        surveyDF[surveyDF.Q26_A_Part_1 == 'Amazon Web Services (AWS)'][['Q26_A_Part_1', 'Q27_A_Part_11']].rename(
            columns={"Q26_A_Part_1": "cloud_provider", "Q27_A_Part_11": "cloud_product"}),
        surveyDF[surveyDF.Q26_A_Part_1 == 'Amazon Web Services (AWS)'][['Q26_A_Part_1', 'Q27_A_OTHER']].rename(
            columns={"Q26_A_Part_1": "cloud_provider", "Q27_A_OTHER": "cloud_product"}),
        surveyDF[surveyDF.Q26_A_Part_1 == 'Amazon Web Services (AWS)'][['Q26_A_Part_1', 'Q28_A_Part_1']].rename(
            columns={"Q26_A_Part_1": "cloud_provider", "Q28_A_Part_1": "cloud_product"}),
        surveyDF[surveyDF.Q26_A_Part_1 == 'Amazon Web Services (AWS)'][['Q26_A_Part_1', 'Q28_A_Part_2']].rename(
            columns={"Q26_A_Part_1": "cloud_provider", "Q28_A_Part_2": "cloud_product"}),
        surveyDF[surveyDF.Q26_A_Part_1 == 'Amazon Web Services (AWS)'][['Q26_A_Part_1', 'Q28_A_Part_3']].rename(
            columns={"Q26_A_Part_1": "cloud_provider", "Q28_A_Part_3": "cloud_product"}),
        surveyDF[surveyDF.Q26_A_Part_1 == 'Amazon Web Services (AWS)'][['Q26_A_Part_1', 'Q28_A_Part_10']].rename(
            columns={"Q26_A_Part_1": "cloud_provider", "Q28_A_Part_10": "cloud_product"}),
        surveyDF[surveyDF.Q26_A_Part_1 == 'Amazon Web Services (AWS)'][['Q26_A_Part_1', 'Q28_A_OTHER']].rename(
            columns={"Q26_A_Part_1": "cloud_provider", "Q28_A_OTHER": "cloud_product"}),
        surveyDF[surveyDF.Q26_A_Part_2 == 'Microsoft Azure'][['Q26_A_Part_2', 'Q27_A_Part_4']].rename(
            columns={"Q26_A_Part_2": "cloud_provider", "Q27_A_Part_4": "cloud_product"}),
        surveyDF[surveyDF.Q26_A_Part_2 == 'Microsoft Azure'][['Q26_A_Part_2', 'Q27_A_Part_5']].rename(
            columns={"Q26_A_Part_2": "cloud_provider", "Q27_A_Part_5": "cloud_product"}),
        surveyDF[surveyDF.Q26_A_Part_2 == 'Microsoft Azure'][['Q26_A_Part_2', 'Q27_A_Part_6']].rename(
            columns={"Q26_A_Part_2": "cloud_provider", "Q27_A_Part_6": "cloud_product"}),
        surveyDF[surveyDF.Q26_A_Part_2 == 'Microsoft Azure'][['Q26_A_Part_2', 'Q27_A_Part_11']].rename(
            columns={"Q26_A_Part_2": "cloud_provider", "Q27_A_Part_11": "cloud_product"}),
        surveyDF[surveyDF.Q26_A_Part_2 == 'Microsoft Azure'][['Q26_A_Part_2', 'Q27_A_OTHER']].rename(
            columns={"Q26_A_Part_2": "cloud_provider", "Q27_A_OTHER": "cloud_product"}),
        surveyDF[surveyDF.Q26_A_Part_2 == 'Microsoft Azure'][['Q26_A_Part_2', 'Q28_A_Part_4']].rename(
            columns={"Q26_A_Part_2": "cloud_provider", "Q28_A_Part_4": "cloud_product"}),
        surveyDF[surveyDF.Q26_A_Part_2 == 'Microsoft Azure'][['Q26_A_Part_2', 'Q28_A_Part_5']].rename(
            columns={"Q26_A_Part_2": "cloud_provider", "Q28_A_Part_5": "cloud_product"}),
        surveyDF[surveyDF.Q26_A_Part_2 == 'Microsoft Azure'][['Q26_A_Part_2', 'Q28_A_Part_10']].rename(
            columns={"Q26_A_Part_2": "cloud_provider", "Q28_A_Part_10": "cloud_product"}),
        surveyDF[surveyDF.Q26_A_Part_2 == 'Microsoft Azure'][['Q26_A_Part_2', 'Q28_A_OTHER']].rename(
            columns={"Q26_A_Part_2": "cloud_provider", "Q28_A_OTHER": "cloud_product"}),
        surveyDF[surveyDF.Q26_A_Part_3 == 'Google Cloud Platform (GCP)'][['Q26_A_Part_3', 'Q27_A_Part_7']].rename(
            columns={"Q26_A_Part_3": "cloud_provider", "Q27_A_Part_7": "cloud_product"}),
        surveyDF[surveyDF.Q26_A_Part_3 == 'Google Cloud Platform (GCP)'][['Q26_A_Part_3', 'Q27_A_Part_8']].rename(
            columns={"Q26_A_Part_3": "cloud_provider", "Q27_A_Part_8": "cloud_product"}),
        surveyDF[surveyDF.Q26_A_Part_3 == 'Google Cloud Platform (GCP)'][['Q26_A_Part_3', 'Q27_A_Part_9']].rename(
            columns={"Q26_A_Part_3": "cloud_provider", "Q27_A_Part_9": "cloud_product"}),
        surveyDF[surveyDF.Q26_A_Part_3 == 'Google Cloud Platform (GCP)'][['Q26_A_Part_3', 'Q27_A_Part_10']].rename(
            columns={"Q26_A_Part_3": "cloud_provider", "Q27_A_Part_10": "cloud_product"}),
        surveyDF[surveyDF.Q26_A_Part_3 == 'Google Cloud Platform (GCP)'][['Q26_A_Part_3', 'Q27_A_Part_11']].rename(
            columns={"Q26_A_Part_3": "cloud_provider", "Q27_A_Part_11": "cloud_product"}),
        surveyDF[surveyDF.Q26_A_Part_3 == 'Google Cloud Platform (GCP)'][['Q26_A_Part_3', 'Q27_A_OTHER']].rename(
            columns={"Q26_A_Part_3": "cloud_provider", "Q27_A_OTHER": "cloud_product"}),
        surveyDF[surveyDF.Q26_A_Part_3 == 'Google Cloud Platform (GCP)'][['Q26_A_Part_3', 'Q28_A_Part_6']].rename(
            columns={"Q26_A_Part_3": "cloud_provider", "Q28_A_Part_6": "cloud_product"}),
        surveyDF[surveyDF.Q26_A_Part_3 == 'Google Cloud Platform (GCP)'][['Q26_A_Part_3', 'Q28_A_Part_7']].rename(
            columns={"Q26_A_Part_3": "cloud_provider", "Q28_A_Part_7": "cloud_product"}),
        surveyDF[surveyDF.Q26_A_Part_3 == 'Google Cloud Platform (GCP)'][['Q26_A_Part_3', 'Q28_A_Part_8']].rename(
            columns={"Q26_A_Part_3": "cloud_provider", "Q28_A_Part_8": "cloud_product"}),
        surveyDF[surveyDF.Q26_A_Part_3 == 'Google Cloud Platform (GCP)'][['Q26_A_Part_3', 'Q28_A_Part_9']].rename(
            columns={"Q26_A_Part_3": "cloud_provider", "Q28_A_Part_9": "cloud_product"}),
        surveyDF[surveyDF.Q26_A_Part_3 == 'Google Cloud Platform (GCP)'][['Q26_A_Part_3', 'Q28_A_Part_10']].rename(
            columns={"Q26_A_Part_3": "cloud_provider", "Q28_A_Part_10": "cloud_product"}),
        surveyDF[surveyDF.Q26_A_Part_3 == 'Google Cloud Platform (GCP)'][['Q26_A_Part_3', 'Q28_A_OTHER']].rename(
            columns={"Q26_A_Part_3": "cloud_provider", "Q28_A_OTHER": "cloud_product"}),
    ]).reset_index(drop=True)

    df = df.groupby(['cloud_provider', 'cloud_product'])['cloud_provider'].count().reset_index(name="count")

    colors = ['rgba(95,158,209,1)', 'rgba(0,107,164,1)', 'rgba(255,128,14,1)', 'rgba(89,89,89,1)',
              'rgba(255,188,121,1)',
              'rgba(207,207,207,1)', 'rgba(200,82,0,1)', 'rgba(162,200,236,1)', 'rgba(137,137,137,1)',
              'rgba(171,171,171,1)', ]

    fig = px.sunburst(df,
                      path=['cloud_provider', 'cloud_product'],
                      values='count',
                      color='cloud_product',
                      branchvalues='total',
                      color_discrete_sequence=colors)

    fig.update_layout(title="Cloud Computing Products Usage",
                      autosize=False,
                      width=990,
                      height=800)

    fig.show()

    gc.collect()


cloudProductUsage(surveyDF)

###  <a class="anchor" id="Cloud-Platforms-To-Learn">4.7.4.4. Cloud Platforms To Learn</a>

Non-professional participants are asked to select cloud computing platforms do they hope to become more familiar with in the next 2 years?
* __[Amazon Web Services (AWS)](https://aws.amazon.com/)__
* __[Microsoft Azure](https://azure.microsoft.com/en-us/)__
* __[Google Cloud Platform (GCP)](https://cloud.google.com/gcp/)__
* __[IBM Cloud / Red Hat](https://www.ibm.com/cloud)__
* __[Oracle Cloud](https://www.oracle.com/cloud/)__
* __[SAP Cloud](https://www.sap.com/products/cloud-platform.html?btp=470d9626-7bbe-4ce8-afd2-1f36444df031)__
* __[Salesforce Cloud](https://www.salesforce.com/products/sales-cloud/features/)__
* __[VMware Cloud](https://cloud.vmware.com/)__
* __[Alibaba Cloud](https://us.alibabacloud.com/)__
* __[Tencent Cloud](https://intl.cloud.tencent.com/)__
* None
* Other

Note - Non-professionals received questions with an alternate phrasing (questions for non-professionals asked what tools they hope to become familiar with in the next 2 years instead of asking what tools they use on a regular basis). Non-professionals were defined as students, unemployed, and respondents that have never spent any money in the cloud.

In [None]:
surveyDF['Q26_B_Part_1'] = surveyDF['Q26_B_Part_1'].str.strip()
surveyDF['Q26_B_Part_2'] = surveyDF['Q26_B_Part_2'].str.strip()
surveyDF['Q26_B_Part_3'] = surveyDF['Q26_B_Part_3'].str.strip()
surveyDF['Q26_B_Part_4'] = surveyDF['Q26_B_Part_4'].str.strip()
surveyDF['Q26_B_Part_5'] = surveyDF['Q26_B_Part_5'].str.strip()
surveyDF['Q26_B_Part_6'] = surveyDF['Q26_B_Part_6'].str.strip()
surveyDF['Q26_B_Part_7'] = surveyDF['Q26_B_Part_7'].str.strip()
surveyDF['Q26_B_Part_8'] = surveyDF['Q26_B_Part_8'].str.strip()
surveyDF['Q26_B_Part_9'] = surveyDF['Q26_B_Part_9'].str.strip()
surveyDF['Q26_B_Part_10'] = surveyDF['Q26_B_Part_10'].str.strip()
surveyDF['Q26_B_Part_11'] = surveyDF['Q26_B_Part_11'].str.strip()
surveyDF['Q26_B_OTHER'] = surveyDF['Q26_B_OTHER'].str.strip()

cloudProviderToLearnColumns = ['Q26_B_Part_1', 'Q26_B_Part_2', 'Q26_B_Part_3', 'Q26_B_Part_4', 'Q26_B_Part_5',
                               'Q26_B_Part_6',
                               'Q26_B_Part_7', 'Q26_B_Part_8',
                               'Q26_B_Part_9', 'Q26_B_Part_10', 'Q26_B_Part_11', 'Q26_B_OTHER']

cloudProviderToLearnDistributionFig = horizontalHistogramMultipleColumns(surveyDF, columns=cloudProviderToLearnColumns,
                                                                         title='Cloud Provider To Learn',
                                                                         xaxis='Count',
                                                                         yaxis='Cloud Provider', width=990, height=600)

cloudProviderToLearnDistributionTextbox = widgets.Dropdown(
    description='Country:   ',
    value='All',
    options=countries
)

cloudProviderToLearnDistributionContainer = widgets.HBox(children=[cloudProviderToLearnDistributionTextbox])


def cloudProviderToLearnDistributionResponse(change):
    if (cloudProviderToLearnDistributionTextbox.value == 'All'):
        df = surveyDF
    else:
        df = surveyDF[surveyDF['Q3'] == cloudProviderToLearnDistributionTextbox.value][
            cloudProviderToLearnColumns].reset_index(drop=True)

    with cloudProviderToLearnDistributionFig.batch_update():
        for i in range(len(cloudProviderToLearnDistributionFig.data)):
            cloudProviderToLearnDistributionFig.data[i].y = df[
                cloudProviderToLearnDistributionFig.data[i].name].dropna()
        if (cloudProviderToLearnDistributionTextbox.value == 'All'):
            cloudProviderToLearnDistributionFig.layout.title = dict(text="Cloud Provider To Learn")
        else:
            cloudProviderToLearnDistributionFig.layout.title = dict(
                text=cloudProviderToLearnDistributionTextbox.value + " Cloud Provider To Learn")


cloudProviderToLearnDistributionTextbox.observe(cloudProviderToLearnDistributionResponse, names="value")

widgets.VBox([cloudProviderToLearnDistributionContainer,
              cloudProviderToLearnDistributionFig])

###  <a class="anchor" id="Cloud-Computing-Products-To-Learn">4.7.4.5. Cloud Computing Products To Learn</a>

Non-professional participants are asked to select cloud computing products do they hope to become more familiar with in the next 2 years?
* __[Amazon EC2](https://aws.amazon.com/ec2/?ec2-whats-new.sort-by=item.additionalFields.postDateTime&ec2-whats-new.sort-order=desc)__
* __[AWS Lambda](https://aws.amazon.com/lambda/)__
* __[Amazon Elastic Container Service](https://aws.amazon.com/ecs/?whats-new-cards.sort-by=item.additionalFields.postDateTime&whats-new-cards.sort-order=desc&ecs-blogs.sort-by=item.additionalFields.createdDate&ecs-blogs.sort-order=desc)__
* __[Azure Cloud Services](https://azure.microsoft.com/en-us/services/cloud-services/)__
* __[Microsoft Azure Container Instances](https://azure.microsoft.com/en-us/services/cloud-services/)__
* __[Azure Functions](https://azure.microsoft.com/en-us/services/functions/)__
* __[Google Cloud Compute Engine](https://cloud.google.com/compute)__
* __[Google Cloud Functions](https://cloud.google.com/functions/)__
* __[Google Cloud Run](https://cloud.google.com/run/)__
* __[Google Cloud App Engine](https://cloud.google.com/appengine/)__
* No / None
* Other

Note - Non-professionals received questions with an alternate phrasing (questions for non-professionals asked what tools they hope to become familiar with in the next 2 years instead of asking what tools they use on a regular basis). Non-professionals were defined as students, unemployed, and respondents that have never spent any money in the cloud.

In [None]:
def cloudProductToLearn(surveyDF):
    df = pd.concat([
        surveyDF[surveyDF.Q26_B_Part_1 == 'Amazon Web Services (AWS)'][['Q26_B_Part_1', 'Q27_B_Part_1']].rename(
            columns={"Q26_B_Part_1": "cloud_provider", "Q27_B_Part_1": "cloud_product"}),
        surveyDF[surveyDF.Q26_B_Part_1 == 'Amazon Web Services (AWS)'][['Q26_B_Part_1', 'Q27_B_Part_2']].rename(
            columns={"Q26_B_Part_1": "cloud_provider", "Q27_B_Part_2": "cloud_product"}),
        surveyDF[surveyDF.Q26_B_Part_1 == 'Amazon Web Services (AWS)'][['Q26_B_Part_1', 'Q27_B_Part_3']].rename(
            columns={"Q26_B_Part_1": "cloud_provider", "Q27_B_Part_3": "cloud_product"}),
        surveyDF[surveyDF.Q26_B_Part_1 == 'Amazon Web Services (AWS)'][['Q26_B_Part_1', 'Q27_B_Part_11']].rename(
            columns={"Q26_B_Part_1": "cloud_provider", "Q27_B_Part_11": "cloud_product"}),
        surveyDF[surveyDF.Q26_B_Part_1 == 'Amazon Web Services (AWS)'][['Q26_B_Part_1', 'Q27_B_OTHER']].rename(
            columns={"Q26_B_Part_1": "cloud_provider", "Q27_B_OTHER": "cloud_product"}),
        surveyDF[surveyDF.Q26_B_Part_1 == 'Amazon Web Services (AWS)'][['Q26_B_Part_1', 'Q28_B_Part_1']].rename(
            columns={"Q26_B_Part_1": "cloud_provider", "Q28_B_Part_1": "cloud_product"}),
        surveyDF[surveyDF.Q26_B_Part_1 == 'Amazon Web Services (AWS)'][['Q26_B_Part_1', 'Q28_B_Part_2']].rename(
            columns={"Q26_B_Part_1": "cloud_provider", "Q28_B_Part_2": "cloud_product"}),
        surveyDF[surveyDF.Q26_B_Part_1 == 'Amazon Web Services (AWS)'][['Q26_B_Part_1', 'Q28_B_Part_3']].rename(
            columns={"Q26_B_Part_1": "cloud_provider", "Q28_B_Part_3": "cloud_product"}),
        surveyDF[surveyDF.Q26_B_Part_1 == 'Amazon Web Services (AWS)'][['Q26_B_Part_1', 'Q28_B_Part_10']].rename(
            columns={"Q26_B_Part_1": "cloud_provider", "Q28_B_Part_10": "cloud_product"}),
        surveyDF[surveyDF.Q26_B_Part_1 == 'Amazon Web Services (AWS)'][['Q26_B_Part_1', 'Q28_B_OTHER']].rename(
            columns={"Q26_B_Part_1": "cloud_provider", "Q28_B_OTHER": "cloud_product"}),
        surveyDF[surveyDF.Q26_B_Part_2 == 'Microsoft Azure'][['Q26_B_Part_2', 'Q27_B_Part_4']].rename(
            columns={"Q26_B_Part_2": "cloud_provider", "Q27_B_Part_4": "cloud_product"}),
        surveyDF[surveyDF.Q26_B_Part_2 == 'Microsoft Azure'][['Q26_B_Part_2', 'Q27_B_Part_5']].rename(
            columns={"Q26_B_Part_2": "cloud_provider", "Q27_B_Part_5": "cloud_product"}),
        surveyDF[surveyDF.Q26_B_Part_2 == 'Microsoft Azure'][['Q26_B_Part_2', 'Q27_B_Part_6']].rename(
            columns={"Q26_B_Part_2": "cloud_provider", "Q27_B_Part_6": "cloud_product"}),
        surveyDF[surveyDF.Q26_B_Part_2 == 'Microsoft Azure'][['Q26_B_Part_2', 'Q27_B_Part_11']].rename(
            columns={"Q26_B_Part_2": "cloud_provider", "Q27_B_Part_11": "cloud_product"}),
        surveyDF[surveyDF.Q26_B_Part_2 == 'Microsoft Azure'][['Q26_B_Part_2', 'Q27_B_OTHER']].rename(
            columns={"Q26_B_Part_2": "cloud_provider", "Q27_B_OTHER": "cloud_product"}),
        surveyDF[surveyDF.Q26_B_Part_2 == 'Microsoft Azure'][['Q26_B_Part_2', 'Q28_B_Part_4']].rename(
            columns={"Q26_B_Part_2": "cloud_provider", "Q28_B_Part_4": "cloud_product"}),
        surveyDF[surveyDF.Q26_B_Part_2 == 'Microsoft Azure'][['Q26_B_Part_2', 'Q28_B_Part_5']].rename(
            columns={"Q26_B_Part_2": "cloud_provider", "Q28_B_Part_5": "cloud_product"}),
        surveyDF[surveyDF.Q26_B_Part_2 == 'Microsoft Azure'][['Q26_B_Part_2', 'Q28_B_Part_10']].rename(
            columns={"Q26_B_Part_2": "cloud_provider", "Q28_B_Part_10": "cloud_product"}),
        surveyDF[surveyDF.Q26_B_Part_2 == 'Microsoft Azure'][['Q26_B_Part_2', 'Q28_B_OTHER']].rename(
            columns={"Q26_B_Part_2": "cloud_provider", "Q28_B_OTHER": "cloud_product"}),
        surveyDF[surveyDF.Q26_B_Part_3 == 'Google Cloud Platform (GCP)'][['Q26_B_Part_3', 'Q27_B_Part_7']].rename(
            columns={"Q26_B_Part_3": "cloud_provider", "Q27_B_Part_7": "cloud_product"}),
        surveyDF[surveyDF.Q26_B_Part_3 == 'Google Cloud Platform (GCP)'][['Q26_B_Part_3', 'Q27_B_Part_8']].rename(
            columns={"Q26_B_Part_3": "cloud_provider", "Q27_B_Part_8": "cloud_product"}),
        surveyDF[surveyDF.Q26_B_Part_3 == 'Google Cloud Platform (GCP)'][['Q26_B_Part_3', 'Q27_B_Part_9']].rename(
            columns={"Q26_B_Part_3": "cloud_provider", "Q27_B_Part_9": "cloud_product"}),
        surveyDF[surveyDF.Q26_B_Part_3 == 'Google Cloud Platform (GCP)'][['Q26_B_Part_3', 'Q27_B_Part_10']].rename(
            columns={"Q26_B_Part_3": "cloud_provider", "Q27_B_Part_10": "cloud_product"}),
        surveyDF[surveyDF.Q26_B_Part_3 == 'Google Cloud Platform (GCP)'][['Q26_B_Part_3', 'Q27_B_Part_11']].rename(
            columns={"Q26_B_Part_3": "cloud_provider", "Q27_B_Part_11": "cloud_product"}),
        surveyDF[surveyDF.Q26_B_Part_3 == 'Google Cloud Platform (GCP)'][['Q26_B_Part_3', 'Q27_B_OTHER']].rename(
            columns={"Q26_B_Part_3": "cloud_provider", "Q27_B_OTHER": "cloud_product"}),
        surveyDF[surveyDF.Q26_B_Part_3 == 'Google Cloud Platform (GCP)'][['Q26_B_Part_3', 'Q28_B_Part_6']].rename(
            columns={"Q26_B_Part_3": "cloud_provider", "Q28_B_Part_6": "cloud_product"}),
        surveyDF[surveyDF.Q26_B_Part_3 == 'Google Cloud Platform (GCP)'][['Q26_B_Part_3', 'Q28_B_Part_7']].rename(
            columns={"Q26_B_Part_3": "cloud_provider", "Q28_B_Part_7": "cloud_product"}),
        surveyDF[surveyDF.Q26_B_Part_3 == 'Google Cloud Platform (GCP)'][['Q26_B_Part_3', 'Q28_B_Part_8']].rename(
            columns={"Q26_B_Part_3": "cloud_provider", "Q28_B_Part_8": "cloud_product"}),
        surveyDF[surveyDF.Q26_B_Part_3 == 'Google Cloud Platform (GCP)'][['Q26_B_Part_3', 'Q28_B_Part_9']].rename(
            columns={"Q26_B_Part_3": "cloud_provider", "Q28_B_Part_9": "cloud_product"}),
        surveyDF[surveyDF.Q26_B_Part_3 == 'Google Cloud Platform (GCP)'][['Q26_B_Part_3', 'Q28_B_Part_10']].rename(
            columns={"Q26_B_Part_3": "cloud_provider", "Q28_B_Part_10": "cloud_product"}),
        surveyDF[surveyDF.Q26_B_Part_3 == 'Google Cloud Platform (GCP)'][['Q26_B_Part_3', 'Q28_B_OTHER']].rename(
            columns={"Q26_B_Part_3": "cloud_provider", "Q28_B_OTHER": "cloud_product"}),
    ]).reset_index(drop=True)

    df = df.groupby(['cloud_provider', 'cloud_product'])['cloud_provider'].count().reset_index(name="count")

    colors = ['rgba(95,158,209,1)', 'rgba(0,107,164,1)', 'rgba(255,128,14,1)', 'rgba(89,89,89,1)',
              'rgba(255,188,121,1)',
              'rgba(207,207,207,1)', 'rgba(200,82,0,1)', 'rgba(162,200,236,1)', 'rgba(137,137,137,1)',
              'rgba(171,171,171,1)', ]

    fig = px.sunburst(df,
                      path=['cloud_provider', 'cloud_product'],
                      values='count',
                      color='cloud_product',
                      branchvalues='total',
                      color_discrete_sequence=colors)

    fig.update_layout(title="Cloud Products To Learn",
                      autosize=False,
                      width=990,
                      height=800)

    fig.show()

    gc.collect()


cloudProductToLearn(surveyDF)