---
# kaggle's  2021 survey responses

a workbook to view responses on kaggle's questionnaire on machine learning and data science 

---
The setup:
kaggle's questionnaire on machine learning and data science was broken down into 3 sections about demographics, data science and machine learning.
Overall 51 questions were answered by the respondents with some conditional choices for questions as highlighted below and in the [footnotes](#fns).


__Jump to a specific topic__ : _total 51 questions_ <span> <a class="anchor" id="TOC"></a>

---
> __participant's demographics__ : _6 questions_  <a class="anchor" id="DEM"></a>
    
* _Q1 - Q6_ : responses about [participants in the questionnaire](#demographic)
* [time to complete the questionnaire](#time_taken)
    
---
> __data science__ : _26 questions_ <a class="anchor" id="DS"></a>
    
* _Q7 - Q10_ responses about [data science setup](#programming_setup) 
* _Q11 - Q15_ responses about [platforms and hardware](#platform_hardware) 
* _Q24 - Q26_ responses about [industry activities, compensation and cloud expenditure](#activities) 
* _Q34A, Q34B, Q35_ responses about [business intelligence [<sup>5</sup>](#fn5)](#bi)
* _Q27A, Q28, Q29A, Q30A_ responses about [cloud computing choices in industry [<sup>3</sup>](#fn3)](#ind_cc)
* _Q27B, Q29B, Q30B_ responses about [cloud computing choices among non professionals [<sup>2</sup>](#fn2)](#np_cc)
* _Q39 - Q42_ Responses about [sharing work and data science knowledge portals](#ds_kp)
---
> __machine learning__ : _19 questions_<a class="anchor" id="ML"></a>

* _Q20 - Q23_ responses about [industry using data science](#ind_ds)
* _Q16 - Q19_ responses about [machine learning, nlp and computer vision [<sup>1</sup>](#fn1)](#ml_nlp_cv)
* _Q31A, Q31B, Q32A, Q32B, Q33_ responses about [big data and managed machine learning [<sup>4</sup>](#fn4)](#mml)
* _Q36A, Q37A, Q38A_ responses about [auto-ml choices in industry [<sup>6</sup>](#fn6)](#ind_aml)
* _Q36B, Q37B, Q38B_ responses about [auto-ml choices among non professionals [<sup>2</sup>](#fn2)](#np_aml)
--- 


In [None]:
# Importing neccessary libraries and dataset.
import pandas as pd
import numpy as np
from IPython.display import display
from IPython.display import Markdown as md
import plotly.express as px
import plotly.graph_objects as go
from plotly.subplots import make_subplots

df_2021 = pd.read_csv('../input/kaggle-survey-2021/kaggle_survey_2021_responses.csv', dtype = object)

In [None]:
# Seperating question text and responses
questions_2021  = df_2021.iloc[:1,:]
response_2021 = df_2021.iloc[1:, :]
# Renaming columns 1-6 in demographics 
# to not confuse function to count multiple choices e.g. for Q1 and Q11, Q2 and Q2X_X etc.
answers_2021 = response_2021.copy()
answers_2021['Time from Start to Finish (seconds)'] = (response_2021['Time from Start to Finish (seconds)'].astype(int)/3600).round(2)
answers_2021 = answers_2021.rename(columns={'Q1': 'age', 'Q2': 'gender',
                                  'Q3': 'country', 'Q4': 'education',
                                 'Q5': 'role', 'Q6': 'code_exp',
                                 'Time from Start to Finish (seconds)' : 'time_hours'})
response_time_2021 = answers_2021.copy()
response_time_2021['time_category'] = pd.cut(answers_2021['time_hours'],3, labels = ['low', 'medium', 'high'])

In [None]:
def count_multiple_choice(df,response_col,response_type,response_share):
    """
    A helper function to count each response option

    Parameters
    ----------
    df : (dataframe) The answers dataframe
    response_col : (string) The name of question's response column e.g. 'Q1' or 'Q23_A'
    response_type : (string) The name for returned dataframe's response categories column
    response_count : (string) The name of returned dataframe's response count column

    Returns
    -------
    df : (dataframe) response_count of each response_type
    """
    responses = [col for col in df.columns if response_col in col]
    df = df[list(responses)].apply(pd.Series.value_counts)
    df = pd.DataFrame(df.sum(axis = 1))
    df = df.reset_index()
    df = df.rename(columns = {df.columns[0]: response_type , df.columns[1]: response_share})
#     df[response_share] = (df[response_share]/df[response_share].sum()).round(2)
    return df

def get_max(df,var,count):
    max_r = np.array(df.loc[df[count] == df[count].max()])
    max_n, max_v = max_r[0,0], max_r[0,1]
    max_p = (max_v/df[count].sum()*100).round(2)
    return [max_n, max_v, max_p]

In [None]:
df_dm_age_2021 = count_multiple_choice(answers_2021, 'age', 'age', 'age_count')
df_dm_gender_2021 = count_multiple_choice(answers_2021, 'gender', 'gender', 'gender_count')
df_dm_country_2021 = count_multiple_choice(answers_2021, 'country', 'country', 'country_count')
df_dm_education_2021 = count_multiple_choice(answers_2021, 'education', 'education', 'education_count')
df_dm_role_2021 = count_multiple_choice(answers_2021, 'role', 'role', 'role_count')
df_dm_coding_exp_2021 = count_multiple_choice(answers_2021, 'code_exp', 'code_exp', 'code_exp_count')

In [None]:
md(f'''

# __participants in the questionnaire 2021__ <a class="anchor" id="demographic"></a>
---
[all questions](#TOC) || [demographics](#DEM) || [data science](#DS) || [machine learning](#ML)

---
\n

question                                                                         | most common response                                                 | number of responses                                             | % of total 
-------------------------------------------------------------------------------- | -------------------------------------------------------------------- | --------------------------------------------------------------- | ------------------------------------------------------------------
__What is your age (# years)?__      | {get_max(df_dm_age_2021,'age','age_count')[0]}     | {get_max(df_dm_age_2021,'age','age_count')[1]}    | {get_max(df_dm_age_2021,'age','age_count')[2]}
__What is your gender?__                    | {get_max(df_dm_gender_2021,'gender','gender_count')[0]}     | {get_max(df_dm_gender_2021,'gender','gender_count')[1]}    | {get_max(df_dm_gender_2021,'gender','gender_count')[2]}
__In which country do you currently reside?__    | {str(get_max(df_dm_country_2021,'country','country_count')[0])}   | {get_max(df_dm_country_2021,'country','country_count')[1]}  | {get_max(df_dm_country_2021,'country','country_count')[2]}
__What is the highest level of formal education that you have attained or plan to attain within the next 2 years?__    | {str(get_max(df_dm_education_2021,'education','education_count')[0])}   | {get_max(df_dm_education_2021,'education','education_count')[1]}  | {get_max(df_dm_education_2021,'education','education_count')[2]}
__Select the title most similar to your current role or most recent title if retired__    | {str(get_max(df_dm_role_2021,'role','role_count')[0])}   | {get_max(df_dm_role_2021,'role','role_count')[1]}  | {get_max(df_dm_role_2021,'role','role_count')[2]}
__For how many years have you been writing code and/or programming?__    | {str(get_max(df_dm_coding_exp_2021,'code_exp','code_exp_count')[0])}   | {get_max(df_dm_coding_exp_2021,'code_exp','code_exp_count')[1]}  | {get_max(df_dm_coding_exp_2021,'code_exp','code_exp_count')[2]}''')

In [None]:
fig = make_subplots(
    rows=3, cols=2,
    row_heights=[0.33,0.33,0.33],
    column_widths=[0.5,0.5],
    subplot_titles=['age group','education', 'gender','role', 'country','coding experience'],
    shared_yaxes = False,
    vertical_spacing=0.1)

fig.add_trace(go.Scatter(
    x=df_dm_age_2021.age,
    y=df_dm_age_2021['age_count'],
    mode="lines+markers",
    name="age group 2021"),row=1, col=1)
fig.add_trace(go.Scatter(
    x=df_dm_gender_2021.gender,
    y=df_dm_gender_2021['gender_count'],
    mode="lines+markers",
    name="gender 2021"),row=2, col=1)
fig.add_trace(go.Scatter(
    x=df_dm_country_2021.country,
    y=df_dm_country_2021['country_count'],
    mode="lines+markers",
    name="country 2021"),row=3, col=1)
fig.add_trace(go.Scatter(
    x=df_dm_education_2021.education,
    y=df_dm_education_2021['education_count'],
    name="education 2021",
    mode="lines+markers",
    xaxis="x2"),row=1, col=2)
fig.add_trace(go.Scatter(
    x=df_dm_role_2021.role,
    y=df_dm_role_2021['role_count'],
    mode="lines+markers",
    name="role 2021",
    xaxis="x3"),row=2, col=2)
fig.add_trace(go.Scatter(
    x=df_dm_coding_exp_2021.code_exp,
    y=df_dm_coding_exp_2021['code_exp_count'],
    mode="lines+markers",
    name="code_exp_2021",
    xaxis="x4"),row=3, col=2)

# Set theme, margin, and annotation in layout
fig.update_layout(
    template="ggplot2",
    showlegend = False,
    title ='participants 2021',
    margin=dict(r=5, t=25, b=40, l=5))
fig.update_xaxes(showticklabels=False) # hide all the xticks
fig.update_yaxes(range=[0, 22000])
fig.show()

# time to complete questionnaire <a class="anchor" id="time_taken"></a> 
---
[all questions](#TOC) || [demographics](#DEM) || [data science](#DS) || [machine learning](#ML)

---

### time from start to finish


* The data collected about time to finish the survey is present in the 'Time from Start to Finish (seconds)' column and has varied distribution with some very large values(max = 692 hours!), for further analysis time taken by a participant is represented in hours as 'time_hours'.
* A 'time_category' column, bins the responses based on 'time_hours' into low, medium and high time taken categories.

In [None]:
fig = make_subplots(
    rows=2, cols=2,
    subplot_titles=['high time: age group',
                   'high time: gender','high time: roles','high time: education'],
    row_heights=[0.5,0.5],
    column_widths=[0.5,0.5],
    shared_yaxes = False,
    vertical_spacing=0.1)

fig.add_trace(go.Box(
    x=response_time_2021.loc[response_time_2021['time_category'] == 'high'].age,
    y=response_time_2021.loc[response_time_2021['time_category'] == 'high'].time_hours,
    name="age 2021"),row=1, col=1)
fig.add_trace(go.Box(
    x=response_time_2021.loc[response_time_2021['time_category'] == 'high'].role,
    y=response_time_2021.loc[response_time_2021['time_category'] == 'high'].time_hours,
    name="roles 2021"),row=2, col=1)
fig.add_trace(go.Box(
    x=response_time_2021.loc[response_time_2021['time_category'] == 'high'].gender,
    y=response_time_2021.loc[response_time_2021['time_category'] == 'high'].time_hours,
    name="gender 2021"),row=1, col=2)
fig.add_trace(go.Box(
    x=response_time_2021.loc[response_time_2021['time_category'] == 'high'].education,
    y=response_time_2021.loc[response_time_2021['time_category'] == 'high'].time_hours,
    name="education 2021"),row=2, col=2)
fig.update_layout(
    template="ggplot2",
    showlegend = False,
    height = 800,
    margin=dict(r=5, t=25, b=40, l=5)
)
fig.update_xaxes(showticklabels=False) # hide all the xticks
fig.update_yaxes(showticklabels=True, range=[0, 750])
fig.show()

In [None]:
df_ps_use_lang = count_multiple_choice(answers_2021, 'Q7', 'used_language','used_language_count')
df_ps_rec_lang = count_multiple_choice(answers_2021, 'Q8', 'recommended_language', 'recommended_language_count')
df_ps_ide = count_multiple_choice(answers_2021, 'Q9', 'ide','ide_count')
df_ps_hosted_nb = count_multiple_choice(answers_2021, 'Q10', 'hosted_nb','hosted_nb_count')

In [None]:
md(f'''

# __data science setup__ <a class="anchor" id="programming_setup"></a>
---
[all questions](#TOC) || [demographics](#DEM) || [data science](#DS) || [machine learning](#ML)

---
\n

question                                                                         | most common response                                                 | number of responses                                             | % of total 
-------------------------------------------------------------------------------- | -------------------------------------------------------------------- | --------------------------------------------------------------- | ------------------------------------------------------------------
__What programming languages do you use on a regular basis?__      | {get_max(df_ps_use_lang,'used_language','used_language_count')[0]}     | {get_max(df_ps_use_lang,'used_language','used_language_count')[1]}    | {get_max(df_ps_use_lang,'used_language','used_language_count')[2]}
__What programming language would you recommend an aspiring data scientist to learn first?__                    | {get_max(df_ps_rec_lang,'recommended_language','recommended_language_count')[0]}     | {get_max(df_ps_rec_lang,'recommended_language','recommended_language_count')[1]}    | {get_max(df_ps_rec_lang,'recommended_language','recommended_language_count')[2]}
__Which of the following integrated development environments (IDE's) do you use on a regular basis?__    | {str(get_max(df_ps_ide,'ide','ide_count')[0])}   | {get_max(df_ps_ide,'ide','ide_count')[1]}  | {get_max(df_ps_ide,'ide','ide_count')[2]}
__Which of the following hosted notebook products do you use on a regular basis?__    | {str(get_max(df_ps_hosted_nb,'hosted_nb','hosted_nb_count')[0])}   | {get_max(df_ps_hosted_nb,'hosted_nb','hosted_nb_count')[1]}  | {get_max(df_ps_hosted_nb,'hosted_nb','hosted_nb_count')[2]}''')

In [None]:
fig = make_subplots(
    rows=3, cols=2,
    row_heights=[0.33,0.33,0.33],
    column_widths=[0.5,0.5],
    shared_yaxes = False,
    vertical_spacing=0.1)

fig.add_trace(go.Scatter(
    x=df_ps_use_lang.used_language,
    y=df_ps_use_lang['used_language_count'],
    mode="lines+markers",
    name="used language"),row=1, col=1)
fig.add_trace(go.Scatter(
    x=df_ps_rec_lang.recommended_language,
    y=df_ps_rec_lang['recommended_language_count'],
    mode="lines+markers",
    name="recommended language"),row=2, col=1)
fig.add_trace(go.Scatter(
    x=df_ps_ide.ide,
    y=df_ps_ide['ide_count'],
    mode="lines+markers",
    name="IDE used"),row=1, col=2)
fig.add_trace(go.Scatter(
    x=df_ps_hosted_nb.hosted_nb,
    y=df_ps_hosted_nb['hosted_nb_count'],
    name="hosted notebooks used",
    mode="lines+markers",
    xaxis="x2"),row=2, col=2)

# Set theme, margin, and annotation in layout
fig.update_layout(
    template="ggplot2",
    margin=dict(r=5, t=45, b=10, l=5),
     title = 'count of responses : programming setup'
)
fig.update_xaxes(showticklabels=False) # hide all the xticks
fig.update_yaxes(range=[0, 25000])
fig.show()

In [None]:
df_ds_platform = count_multiple_choice(answers_2021, 'Q11', 'platform','platform_count')
df_ds_sp_hdw = count_multiple_choice(answers_2021, 'Q12', 'hardware','hardware_count')
df_ds_tpu_use = count_multiple_choice(answers_2021, 'Q13', 'tpu','tpu_count')
df_ds_data_viz = count_multiple_choice(answers_2021, 'Q14', 'data_viz','data_viz_count')
df_ds_ml_exp = count_multiple_choice(answers_2021, 'Q15', 'ml_exp','ml_exp_count')

In [None]:
md(f'''

# __platforms and hardware__ <a class="anchor" id="platform_hardware"></a>
---
[all questions](#TOC) || [demographics](#DEM) || [data science](#DS) || [machine learning](#ML)

---
\n

question                                                                         | most common response                                                 | number of responses                                             | % of total 
-------------------------------------------------------------------------------- | -------------------------------------------------------------------- | --------------------------------------------------------------- | ------------------------------------------------------------------
__What type of computing platform do you use most often for your data science projects?__      | {get_max(df_ds_platform,'platform','platform_count')[0]}     | {get_max(df_ds_platform,'platform','platform_count')[1]}    | {get_max(df_ds_platform,'platform','platform_count')[2]}
__Which types of specialized hardware do you use on a regular basis?__                    | {get_max(df_ds_sp_hdw,'hardware','hardware_count')[0]}     | {get_max(df_ds_sp_hdw,'hardware','hardware_count')[1]}    | {get_max(df_ds_sp_hdw,'hardware','hardware_count')[2]}
__Approximately how many times have you used a TPU (tensor processing unit)?__    | {str(get_max(df_ds_tpu_use,'tpu','tpu_count')[0])}   | {get_max(df_ds_tpu_use,'tpu','tpu_count')[1]}  | {get_max(df_ds_tpu_use,'tpu','tpu_count')[2]}
__What data visualization libraries or tools do you use on a regular basis?__    | {str(get_max(df_ds_data_viz,'data_viz','data_viz_count')[0])}   | {get_max(df_ds_data_viz,'data_viz','data_viz_count')[1]}  | {get_max(df_ds_data_viz,'data_viz','data_viz_count')[2]}
__For how many years have you used machine learning methods?__    | {str(get_max(df_ds_ml_exp,'ml_exp','ml_exp_count')[0])}   | {get_max(df_ds_ml_exp,'ml_exp','ml_exp_count')[1]}  | {get_max(df_ds_ml_exp,'ml_exp','ml_exp_count')[2]}''')

In [None]:
fig = make_subplots(
    rows=3, cols=2,
    row_heights=[0.33,0.33,0.33],
    column_widths=[0.5,0.5],
    shared_yaxes = False,
    vertical_spacing=0.1)

fig.add_trace(go.Scatter(
    x=df_ds_platform.platform,
    y=df_ds_platform['platform_count'],
    mode="lines+markers",
    name="computing platform used"),row=1, col=1)
fig.add_trace(go.Scatter(
    x=df_ds_sp_hdw.hardware,
    y=df_ds_sp_hdw['hardware_count'],
    mode="lines+markers",
    name="specialized hardware"),row=2, col=1)
fig.add_trace(go.Scatter(
    x=df_ds_tpu_use.tpu,
    y=df_ds_tpu_use['tpu_count'],
    mode="lines+markers",
    name="TPU used"),row=3, col=1)
fig.add_trace(go.Scatter(
    x=df_ds_data_viz.data_viz,
    y=df_ds_data_viz['data_viz_count'],
    name="data visualization libraries",
    mode="lines+markers",
    xaxis="x2"),row=1, col=2)
fig.add_trace(go.Scatter(
    x=df_ds_ml_exp.ml_exp,
    y=df_ds_ml_exp['ml_exp_count'],
    name="ML experience",
    mode="lines+markers",
    xaxis="x2"),row=2, col=2)

# Set theme, margin, and annotation in layout
fig.update_layout(
    template="ggplot2",
    margin=dict(r=5, t=45, b=10, l=5),
    title = 'count of responses : platforms and hardware'
)
fig.update_xaxes(showticklabels=False) # hide all the xticks
fig.update_yaxes(range=[0, 25000])
fig.show()

In [None]:
df_ds_ml_framework = count_multiple_choice(answers_2021, 'Q16', 'ml_framework','ml_framework_count')
df_ds_ml_algo = count_multiple_choice(answers_2021, 'Q17', 'ml_algorithm','ml_algorithm_count')
df_ds_ml_cv_method = count_multiple_choice(answers_2021, 'Q18', 'cv_method','cv_method_count')
df_ds_ml_nlp_method = count_multiple_choice(answers_2021, 'Q19', 'nlp_method','nlp_method_count')

In [None]:
md(f'''

# __machine learning, nlp and computer vision__ <a class="anchor" id="ml_nlp_cv"></a>

---
[all questions](#TOC) || [demographics](#DEM) || [data science](#DS) || [machine learning](#ML)

---
\n

question                                                                         | most common response                                                 | number of responses                                             | % of total 
-------------------------------------------------------------------------------- | -------------------------------------------------------------------- | --------------------------------------------------------------- | ------------------------------------------------------------------
__Which of the following machine learning frameworks do you use on a regular basis?__      | {get_max(df_ds_ml_framework,'ml_framework','ml_framework_count')[0]}     | {get_max(df_ds_ml_framework,'ml_framework','ml_framework_count')[1]}    | {get_max(df_ds_ml_framework,'ml_framework','ml_framework_count')[2]}
__Which of the following ML algorithms do you use on a regular basis?__                    | {get_max(df_ds_ml_algo,'ml_algorithm','ml_algorithm_count')[0]}     | {get_max(df_ds_ml_algo,'ml_algorithm','ml_algorithm_count')[1]}    | {get_max(df_ds_ml_algo,'ml_algorithm','ml_algorithm_count')[2]}
__Which categories of computer vision methods do you use on a regular basis?__    | {str(get_max(df_ds_ml_cv_method,'cv_method','cv_method_count')[0])}   | {get_max(df_ds_ml_cv_method,'cv_method','cv_method_count')[1]}  | {get_max(df_ds_ml_cv_method,'cv_method','cv_method_count')[2]}
__Which of the following natural language processing (NLP) methods do you use on a regular basis?__    | {str(get_max(df_ds_ml_nlp_method,'nlp_method','nlp_method_count')[0])}   | {get_max(df_ds_ml_nlp_method,'nlp_method','nlp_method_count')[1]}  | {get_max(df_ds_ml_nlp_method,'nlp_method','nlp_method_count')[2]}''')

In [None]:
fig = make_subplots(
    rows=2, cols=2,
    row_heights=[0.35,0.5],
    column_widths=[0.5,0.5],
    shared_yaxes = False,
    vertical_spacing=0.1)

fig.add_trace(go.Scatter(
    x=df_ds_ml_framework.ml_framework,
    y=df_ds_ml_framework['ml_framework_count'],
    mode="lines+markers",
    name="machine learning framework"),row=1, col=1)
fig.add_trace(go.Scatter(
    x=df_ds_ml_algo.ml_algorithm,
    y=df_ds_ml_algo['ml_algorithm_count'],
    mode="lines+markers",
    name="machine learning algorithm"),row=2, col=1)
fig.add_trace(go.Scatter(
    x=df_ds_ml_cv_method.cv_method,
    y=df_ds_ml_cv_method['cv_method_count'],
    mode="lines+markers",
    name="computer vision method"),row=1, col=2)
fig.add_trace(go.Scatter(
    x=df_ds_ml_nlp_method.nlp_method,
    y=df_ds_ml_nlp_method['nlp_method_count'],
    name="NLP method",
    mode="lines+markers",
    xaxis="x2"),row=2, col=2)

# Set theme, margin, and annotation in layout
fig.update_layout(
    template="ggplot2",
    margin=dict(r=5, t=45, b=10, l=5),
    title = 'count of responses : machine learning, nlp and computer vision'
)
fig.update_xaxes(showticklabels=False) # hide all the xticks
fig.update_yaxes(range=[0, 20000])
fig.show()

In [None]:
df_in_industry = count_multiple_choice(answers_2021, 'Q20', 'industry', 'industry_count')
df_in_company_size = count_multiple_choice(answers_2021, 'Q21', 'company_size', 'company_size_count')
df_in_fte_ds = count_multiple_choice(answers_2021, 'Q22', 'fte_ds', 'fte_ds_count')
df_in_ml_inc = count_multiple_choice(answers_2021, 'Q23', 'ml_included', 'ml_included_count')

In [None]:
md(f'''

# __industry using data science__ <a class="anchor" id="ind_ds"></a>
---
[all questions](#TOC) || [demographics](#DEM) || [data science](#DS) || [machine learning](#ML)

---
\n

question                                                                         | most common response                                                 | number of responses                                             | % of total 
-------------------------------------------------------------------------------- | -------------------------------------------------------------------- | --------------------------------------------------------------- | ------------------------------------------------------------------
__In what industry is your current employer/contract (or your most recent employer if retired)?__      | {get_max(df_in_industry,'industry','industry_count')[0]}     | {get_max(df_in_industry,'industry','industry_count')[1]}    | {get_max(df_in_industry,'industry','industry_count')[2]}
__What is the size of the company where you are employed?__                    | {get_max(df_in_company_size,'company_size','company_size_count')[0]}     | {get_max(df_in_company_size,'company_size','company_size_count')[1]}    | {get_max(df_in_company_size,'company_size','company_size_count')[2]}
__Approximately how many individuals are responsible for data science workloads at your place of business?__    | {str(get_max(df_in_fte_ds,'fte_ds','fte_ds_count')[0])}   | {get_max(df_in_fte_ds,'fte_ds','fte_ds_count')[1]}  | {get_max(df_in_fte_ds,'fte_ds','fte_ds_count')[2]}
__Does your current employer incorporate machine learning methods into their business?__    | {str(get_max(df_in_ml_inc,'ml_included','ml_included_count')[0])}   | {get_max(df_in_ml_inc,'ml_included','ml_included_count')[1]}  | {get_max(df_in_ml_inc,'ml_included','ml_included_count')[2]}''')

In [None]:
fig = make_subplots(
    rows=2, cols=2,
    row_heights=[0.5,0.5],
    column_widths=[0.5,0.5],
    shared_yaxes = False,
    vertical_spacing=0.1)

fig.add_trace(go.Scatter(
    x=df_in_industry.industry,
    y=df_in_industry['industry_count'],
    mode="lines+markers",
    name="employer industry"),row=1, col=1)

fig.add_trace(go.Scatter(
    x=df_in_company_size.company_size,
    y=df_in_company_size['company_size_count'],
    name="company size",
    mode="lines+markers",
    xaxis="x2"),row=2, col=1)

fig.add_trace(go.Scatter(
    x=df_in_fte_ds.fte_ds,
    y=df_in_fte_ds['fte_ds_count'],
    mode="lines+markers",
    name="data science individuals",
    xaxis="x3"),row=1, col=2)

fig.add_trace(go.Scatter(
    x=df_in_ml_inc.ml_included,
    y=df_in_ml_inc['ml_included_count'],
    mode="lines+markers",
    name="ml inclusion",
    xaxis="x4"),row=2, col=2)

# Set theme, margin, and annotation in layout
fig.update_layout(
    template="ggplot2",
    margin=dict(r=5, t=45, b=10, l=5),
    title = 'count of responses : industry using data science'
)
fig.update_xaxes(showticklabels=False) # hide all the xticks
fig.update_yaxes(range=[0, 6000])
fig.show()

In [None]:
df_in_activity = count_multiple_choice(answers_2021, 'Q24', 'activities', 'activities_count')
df_in_comp = count_multiple_choice(answers_2021, 'Q25', 'compensation', 'compensation_count')
df_in_spent = count_multiple_choice(answers_2021, 'Q26', 'expenditure_ml_cloud', 'spent_count')

In [None]:
md(f'''# industry activities, compensation and cloud expenditure <a class="anchor" id="activities"></a>
---
[all questions](#TOC) || [demographics](#DEM) || [data science](#DS) || [machine learning](#ML)

---

\n

question                                                                         | most common response                                                 | number of responses                                             | % of total 
-------------------------------------------------------------------------------- | -------------------------------------------------------------------- | --------------------------------------------------------------- | ------------------------------------------------------------------
__Select any activities that make up an important part of your role at work__      | {get_max(df_in_activity,'activities','activities_count')[0]}    | {get_max(df_in_activity,'activities','activities_count')[1]}    | {get_max(df_in_activity,'activities','activities_count')[2]}
__What is your current yearly compensation (approximate USD)?__                   | {get_max(df_in_comp,'compensation','compensation_count')[0]}     | {get_max(df_in_comp,'compensation','compensation_count')[1]}    | {get_max(df_in_comp,'compensation','compensation_count')[2]}

question                                                                         | most common response                                                 | number of responses                                             | % of total 
-------------------------------------------------------------------------------- | -------------------------------------------------------------------- | --------------------------------------------------------------- | ------------------------------------------------------------------
__How much money have you spent on ML and/or CC services in the past 5 years?__    | {str(get_max(df_in_spent,'expenditure_ml_cloud','spent_count')[0])}   | {get_max(df_in_spent,'expenditure_ml_cloud','spent_count')[1]}  | {get_max(df_in_spent,'expenditure_ml_cloud','spent_count')[2]}''')

In [None]:
fig = make_subplots(
    rows=2, cols=2,
    row_heights=[0.5,0.5],
    column_widths=[0.5,0.5],
    shared_yaxes = False,
    vertical_spacing=0.1)

fig.add_trace(go.Scatter(
    x=df_in_activity.activities,
    y=df_in_activity['activities_count'],
    mode="lines+markers",
    name="industry activities"),row=1, col=1)

fig.add_trace(go.Scatter(
    x=df_in_comp.compensation,
    y=df_in_comp['compensation_count'],
    name="compensation",
    mode="lines+markers",
    xaxis="x2"),row=2, col=1)

fig.add_trace(go.Scatter(
    x=df_in_spent.expenditure_ml_cloud,
    y=df_in_spent['spent_count'],
    mode="lines+markers",
    name="cloud expense",
    xaxis="x3"),row=1, col=2)

# Set theme, margin, and annotation in layout
fig.update_layout(
    template="ggplot2",
    margin=dict(r=5, t=45, b=10, l=5),
    title = 'count of responses : industry activities, compensation & cloud expenditure'
)
fig.update_xaxes(showticklabels=False) # hide all the xticks
fig.update_yaxes(range=[0, 10000])
fig.show()

In [None]:
df_cc_pro_platform = count_multiple_choice(answers_2021, 'Q27_A', 'cc_pro_platform', 'cc_pro_platform_count')
df_cc_dev_exp = count_multiple_choice(answers_2021, 'Q28', 'cc_developer_experience', 'cc_developer_experience_count')
df_cc_pro_products = count_multiple_choice(answers_2021, 'Q29_A', 'cc_pro_products', 'cc_pro_products_count')
df_cc_pro_ds_products = count_multiple_choice(answers_2021, 'Q30_A', 'cc_pro_ds_products', 'cc_pro_ds_products_count')

In [None]:
md(f'''# cloud computing choices in industry<a class="anchor" id="ind_cc"></a>
---
[all questions](#TOC) || [demographics](#DEM) || [data science](#DS) || [machine learning](#ML)

---
\n

question                                                                       | most common response                                                     | number of responses                                           | % of total 
------------------------------------------------------------------------------ | ---------------------------------------------------------- | --------------------------------------------------- | ---------------------------------------------------
_Which of the following cloud computing platforms do you use on a regular basis?_ | __{get_max(df_cc_pro_platform,'cc_pro_platform','cc_pro_platform_count')[0].strip()}__ | {get_max(df_cc_pro_platform,'cc_pro_platform','cc_pro_platform_count')[1]}  | {get_max(df_cc_pro_platform,'cc_pro_platform','cc_pro_platform_count')[2]}
_Of the cloud platforms that you are familiar with, which has the best developer experience (most enjoyable to use)?_ | __{get_max(df_cc_dev_exp,'cc_developer_experience','cc_developer_experience_count')[0].strip()}__ | {get_max(df_cc_dev_exp,'cc_developer_experience','cc_developer_experience_count')[1]}| {get_max(df_cc_dev_exp,'cc_developer_experience','cc_developer_experience_count')[2]}
_Do you use any of the following cloud computing products on a regular basis?_ | __{get_max(df_cc_pro_products,'cc_pro_products','cc_pro_products_count')[0].strip()}__ | {get_max(df_cc_pro_products,'cc_pro_products','cc_pro_products_count')[1]} | {get_max(df_cc_pro_products,'cc_pro_products','cc_pro_products_count')[2]}
_Do you use any of the following data storage products on a regular basis?_ | __{get_max(df_cc_pro_ds_products,'cc_pro_ds_products','cc_pro_ds_products_count')[0].strip()}__ | {get_max(df_cc_pro_ds_products,'cc_pro_ds_products','cc_pro_ds_products_count')[1]}| {get_max(df_cc_pro_ds_products,'cc_pro_ds_products','cc_pro_ds_products_count')[2]}''')

In [None]:
fig = make_subplots(
    rows=2, cols=2,
    row_heights=[0.5,0.5],
    column_widths=[0.5,0.5],
    shared_yaxes = False,
    vertical_spacing=0.1)

fig.add_trace(go.Scatter(
    x=df_cc_pro_platform.cc_pro_platform,
    y=df_cc_pro_platform['cc_pro_platform_count'],
    mode="lines+markers",
    name="regular cloud computing platform"),row=1, col=1)

fig.add_trace(go.Scatter(
    x=df_cc_dev_exp.cc_developer_experience,
    y=df_cc_dev_exp['cc_developer_experience_count'],
    name="best developer experience",
    mode="lines+markers",
    xaxis="x2"),row=2, col=1)

fig.add_trace(go.Scatter(
    x=df_cc_pro_products.cc_pro_products,
    y=df_cc_pro_products['cc_pro_products_count'],
    mode="lines+markers",
    name="professional cloud product",
    xaxis="x3"),row=1, col=2)

fig.add_trace(go.Scatter(
    x=df_cc_pro_ds_products.cc_pro_ds_products,
    y=df_cc_pro_ds_products['cc_pro_ds_products_count'],
    mode="lines+markers",
    name="professional data storage product",
    xaxis="x4"),row=2, col=2)

# Set theme, margin, and annotation in layout
fig.update_layout(
    template="ggplot2",
    margin=dict(r=5, t=45, b=10, l=5),
    title = 'count of responses : cloud computing in industry'
)
fig.update_xaxes(showticklabels=False) # hide all the xticks
fig.update_yaxes(range=[0, 4000])
fig.show()

In [None]:
df_cc_np_platform = count_multiple_choice(answers_2021, 'Q27_B', 'cc_np_platform', 'cc_np_platform_count')
df_cc_np_products = count_multiple_choice(answers_2021, 'Q29_B', 'cc_np_products', 'cc_np_products_count')
# df_cc_np_ds_products = count_multiple_choice(answers_2021, 'Q30_B_', 'cc_np_ds_products', 'cc_np_ds_products_count')
# list(answers.columns)
# df_cc_np_ds_products

In [None]:
md(f'''# cloud computing choices among non professionals <a class="anchor" id="np_cc"></a>
---
[all questions](#TOC) || [demographics](#DEM) || [data science](#DS) || [machine learning](#ML)

---
\n

question                                                                       | most common response                                                     | number of responses                                           | % of total 
------------------------------------------------------------------------------ | ---------------------------------------------------------- | --------------------------------------------------- | ---------------------------------------------------
__Which of the following cloud computing platforms do you hope to become more familiar with in the next 2 years?__ | {get_max(df_cc_np_platform,'cc_np_platform','cc_np_platform_count')[0].strip()} | {get_max(df_cc_np_platform,'cc_np_platform_count','cc_np_platform_count')[1]}  | {get_max(df_cc_np_platform,'cc_np_platform_count','cc_np_platform_count')[2]}
__In the next 2 years, do you hope to become more familiar with any of these specific cloud computing products?__ | {get_max(df_cc_np_products,'cc_np_products','cc_np_products_count')[0].strip()} | {get_max(df_cc_np_products,'cc_np_products','cc_np_products_count')[1]}| {get_max(df_cc_np_products,'cc_np_products','cc_np_products_count')[2]}''')

In [None]:
fig = make_subplots(
    rows=2, cols=1,
    row_heights=[0.5,0.5],
#     column_widths=[0.5,0.5],
    shared_yaxes = False,
    vertical_spacing=0.1)

fig.add_trace(go.Scatter(
    x=df_cc_np_platform.cc_np_platform,
    y=df_cc_np_platform['cc_np_platform_count'],
    mode="lines+markers",
    name="preferred cloud computing platform"),row=1, col=1)

fig.add_trace(go.Scatter(
    x=df_cc_np_products.cc_np_products,
    y=df_cc_np_products['cc_np_products_count'],
    name="preferred cloud computing tool",
    mode="lines+markers",
    xaxis="x2"),row=2, col=1)

# fig.add_trace(go.Scatter(
#     x=df_cc_np_ds_products.cc_np_ds_products,
#     y=df_cc_np_ds_products['cc_np_ds_products_count'],
#     mode="lines+markers",
#     name="preferred data storage product",
#     xaxis="x3"),row=1, col=2)

# Set theme, margin, and annotation in layout
fig.update_layout(
    template="ggplot2",
    margin=dict(r=5, t=45, b=10, l=5),
    title = 'count of responses : cloud computing among non professionals'
)
fig.update_xaxes(showticklabels=False) # hide all the xticks
fig.update_yaxes(range=[0, 12000])
fig.show()

In [None]:
df_mml_pro_products = count_multiple_choice(answers_2021, 'Q31_A', 'mml_pro_products', 'mml_pro_products_count')
df_mml_pro_bd_products = count_multiple_choice(answers_2021, 'Q32_A', 'mml_pro_bd_products', 'mml_pro_bd_products_count')
df_mml_bd_products = count_multiple_choice(answers_2021, 'Q33', 'mml_bd_products', 'mml_bd_products_count')
df_mml_np_products = count_multiple_choice(answers_2021, 'Q31_B', 'mml_np_products', 'mml_np_products_count')
df_mml_np_bd_products = count_multiple_choice(answers_2021, 'Q32_B', 'mml_np_bd_products', 'mml_np_bd_products_count')

In [None]:
md(f'''# big data and managed machine learning <a class="anchor" id="mml"></a>
---
[all questions](#TOC) || [demographics](#DEM) || [data science](#DS) || [machine learning](#ML)

---
\n

question                                                                       | most common response                                                     | number of responses                                           | % of total 
------------------------------------------------------------------------------ | ---------------------------------------------------------- | --------------------------------------------------- | ---------------------------------------------------
__Do you use any of the following managed machine learning products on a regular basis?__ | {get_max(df_mml_pro_products,'mml_pro_products','mml_pro_products_count')[0].strip()} | {get_max(df_mml_pro_products,'mml_pro_products','mml_pro_products_count')[1]}  | {get_max(df_mml_pro_products,'mml_pro_products','mml_pro_products_count')[2]}
__In the next 2 years, do you hope to become more familiar with any of these managed machine learning products?__ | {get_max(df_mml_pro_bd_products,'mml_pro_bd_products','mml_pro_bd_products_count')[0].strip()} | {get_max(df_mml_pro_bd_products,'mml_pro_bd_products','mml_pro_bd_products_count')[1]}| {get_max(df_mml_pro_bd_products,'mml_pro_bd_products','mml_pro_bd_products_count')[2]}
__Which of the following big data products (relational databases, data warehouses, data lakes, or similar) do you use on a regular basis?__ | {get_max(df_mml_bd_products,'mml_bd_products','mml_bd_products_count')[0].strip()} | {get_max(df_mml_bd_products,'mml_bd_products','mml_bd_products_count')[1]} | {get_max(df_mml_bd_products,'mml_bd_products','mml_bd_products_count')[2]}
__Which of the following big data products (relational databases, data warehouses, data lakes, or similar) do you hope to become more familiar with in the next 2 years?__ | {get_max(df_mml_np_products,'mml_np_products','mml_np_products_count')[0].strip()}__ | {get_max(df_mml_np_products,'mml_np_products','mml_np_products_count')[1]} | {get_max(df_mml_np_products,'mml_np_products','mml_np_products_count')[2]}
__Which of the following big data products (relational database, data warehouse, data lake, or similar) do you use most often?__ | {get_max(df_mml_np_bd_products,'mml_np_bd_products','mml_np_bd_products_count')[0].strip()} | {get_max(df_mml_np_bd_products,'mml_np_bd_products','mml_np_bd_products_count')[1]} | {get_max(df_mml_np_bd_products,'mml_np_bd_products','mml_np_bd_products_count')[2]}''')

In [None]:
fig = make_subplots(
    rows=3, cols=2,
    row_heights=[0.33,0.33,0.33],
    column_widths=[0.5,0.5],
    shared_yaxes = False,
    vertical_spacing=0.1)

fig.add_trace(go.Scatter(
    x=df_mml_pro_products.mml_pro_products,
    y=df_mml_pro_products['mml_pro_products_count'],
    mode="lines+markers",
    name="Professional: regular managed ML product"),row=1, col=1)

fig.add_trace(go.Scatter(
    x=df_mml_pro_bd_products.mml_pro_bd_products,
    y=df_mml_pro_bd_products['mml_pro_bd_products_count'],
    name="Professional: regular big data product",
    mode="lines+markers",
    xaxis="x2"),row=2, col=1)

fig.add_trace(go.Scatter(
    x=df_mml_bd_products.mml_bd_products,
    y=df_mml_bd_products['mml_bd_products_count'],
    mode="lines+markers",
    name="regular big data product",
    xaxis="x3"),row=3, col=1)

fig.add_trace(go.Scatter(
    x=df_mml_np_products.mml_np_products,
    y=df_mml_np_products['mml_np_products_count'],
    mode="lines+markers",
    name="Non professional: preferred managed ML product",
    xaxis="x3"),row=1, col=2)

fig.add_trace(go.Scatter(
    x=df_mml_np_bd_products.mml_np_bd_products,
    y=df_mml_np_bd_products['mml_np_bd_products_count'],
    mode="lines+markers",
    name="Non professional: preferred big data product",
    xaxis="x3"),row=2, col=2)

# Set theme, margin, and annotation in layout
fig.update_layout(
    template="ggplot2",
    margin=dict(r=5, t=45, b=10, l=5),
    title = 'count of responses : big data and managed machine learning'
)
fig.update_xaxes(showticklabels=False) # hide all the xticks
fig.update_yaxes(range=[0, 8000])
fig.show()

In [None]:
df_bi_pro_products = count_multiple_choice(answers_2021, 'Q34_A', 'bi_pro_products', 'bi_pro_products_count')
df_bi_np_products = count_multiple_choice(answers_2021, 'Q34_B', 'bi_np_products', 'bi_np_products_count')
df_bi_tools = count_multiple_choice(answers_2021, 'Q35', 'bi_tools_used', 'bi_tools_used_count')

In [None]:
md(f'''# business intelligence <a class="anchor" id="bi"></a>
---
[all questions](#TOC) || [demographics](#DEM) || [data science](#DS) || [machine learning](#ML)

---
\n

question                                                                       | most common response                                                     | number of responses                                           | % of total 
------------------------------------------------------------------------------ | ---------------------------------------------------------- | --------------------------------------------------- | ---------------------------------------------------
__Which of the following business intelligence tools do you use on a regular basis?__ | {get_max(df_bi_pro_products,'bi_pro_products','bi_pro_products_count')[0].strip()} | {get_max(df_bi_pro_products,'bi_pro_products','bi_pro_products_count')[1]}  | {get_max(df_bi_pro_products,'bi_pro_products','bi_pro_products_count')[2]}
__Which of the following business intelligence tools do you hope to become more familiar with in the next 2 years?__ | {get_max(df_bi_np_products,'bi_np_products','bi_np_products_count')[0].strip()} | {get_max(df_bi_np_products,'bi_np_products','bi_np_products_count')[1]}| {get_max(df_bi_np_products,'bi_np_products','bi_np_products_count')[2]}
__Which of the following business intelligence tools do you use most often?__ | {get_max(df_bi_tools,'bi_tools_used','bi_tools_used_count')[0]} | {get_max(df_bi_tools,'bi_tools_used','bi_tools_used_count')[1]} | {get_max(df_bi_tools,'bi_tools_used','bi_tools_used_count')[2]}''')

In [None]:
fig = make_subplots(
    rows=2, cols=2,
    row_heights=[0.5,0.5],
    subplot_titles=["regular use BI tool",
                    "preferred BI tool (non professionals)",
                    "BI tool used most often"
                   ],
    column_widths=[0.5,0.5],
    shared_yaxes = False,
    vertical_spacing=0.1)

fig.add_trace(go.Scatter(
    x=df_bi_pro_products.bi_pro_products,
    y=df_bi_pro_products['bi_pro_products_count'],
    mode="lines+markers",
    name="regular use BI tool"),row=1, col=1)

fig.add_trace(go.Scatter(
    x=df_bi_np_products.bi_np_products,
    y=df_bi_np_products['bi_np_products_count'],
    name="preferred BI tool (Non professionals)",
    mode="lines+markers",
    xaxis="x2"),row=2, col=1)

fig.add_trace(go.Scatter(
    x=df_bi_tools.bi_tools_used,
    y=df_bi_tools['bi_tools_used_count'],
    mode="lines+markers",
    name="BI tool used most often",
    xaxis="x3"),row=1, col=2)

# Set theme, margin, and annotation in layout
fig.update_layout(
    template="ggplot2",
    showlegend =False,
    title = 'count of responses : business intelligence'
)
fig.update_xaxes(showticklabels=False) # hide all the xticks
fig.update_yaxes(range=[0, 6000])
fig.show()

In [None]:
df_aml_pro_tool_o = count_multiple_choice(answers_2021,'Q36_A','aml_pro_tool_o','pto_count')
df_aml_pro_tool_r = count_multiple_choice(answers_2021,'Q37_A','aml_pro_tool_r','ptr_count')
df_aml_pro_tool_m = count_multiple_choice(answers_2021,'Q38_A','aml_pro_tool_m','ptm_count')

In [None]:
md(f'''# auto-ml choices in industry<a class="anchor" id="ind_aml"></a>
---
[all questions](#TOC) || [demographics](#DEM) || [data science](#DS) || [machine learning](#ML)

---

\n

question                                                                       | most common response                                                     | number of responses                                           | % of total 
------------------------------------------------------------------------------ | ---------------------------------------------------------- | --------------------------------------------------- | ---------------------------------------------------
__Do you use any automated machine learning tools (or partial AutoML tools) on a regular basis?__ | {get_max(df_aml_pro_tool_o,'aml_pro_tool_o','pto_count')[0].strip()} | {get_max(df_aml_pro_tool_o,'aml_pro_tool_o','pto_count')[1]}  | {get_max(df_aml_pro_tool_o,'aml_pro_tool_o','pto_count')[2]}
__Which of the following automated machine learning tools (or partial AutoML tools) do you use on a regular basis?__                                           | {get_max(df_aml_pro_tool_r,'aml_pro_tool_r','ptr_count')[0].strip()} | {get_max(df_aml_pro_tool_r,'aml_pro_tool_r','ptr_count')[1]}| {get_max(df_aml_pro_tool_r,'aml_pro_tool_r','ptr_count')[2]}
__Do you use any tools to help manage machine learning experiments?__ | {get_max(df_aml_pro_tool_m,'aml_pro_tool_m','ptm_count')[0]} | {get_max(df_aml_pro_tool_m,'aml_pro_tool_m','ptm_count')[1]} | {get_max(df_aml_pro_tool_m,'aml_pro_tool_m','ptm_count')[2]}''')

In [None]:
fig = make_subplots(
    rows=2, cols=2,
    row_heights=[0.5,0.5],
    column_widths=[0.5,0.5],
    subplot_titles=["do you use auto-ml?",
                    "auto-ml tools used",
                    "tools to manage ml"
                   ],
    shared_yaxes = False,
    vertical_spacing=0.1)

fig.add_trace(go.Scatter(
    x=df_aml_pro_tool_o.aml_pro_tool_o,
    y=df_aml_pro_tool_o['pto_count'],
    mode="lines+markers",
    name="Do you use Auto ML?"),row=1, col=1)

fig.add_trace(go.Scatter(
    x=df_aml_pro_tool_r.aml_pro_tool_r,
    y=df_aml_pro_tool_r['ptr_count'],
    name="Auto ML tools used",
    mode="lines+markers",
    xaxis="x2"),row=2, col=1)

fig.add_trace(go.Scatter(
    x=df_aml_pro_tool_m.aml_pro_tool_m,
    y=df_aml_pro_tool_m['ptm_count'],
    showlegend =False,
    mode="lines+markers",
    name="tools to manage ML",
    xaxis="x3"),row=1, col=2)

# Set theme, margin, and annotation in layout
fig.update_layout(
    template="ggplot2",
    showlegend =False,
#     margin=dict(r=5, t=45, b=10, l=5),
    title = 'count of responses : auto-ml in industry'
)
fig.update_xaxes(showticklabels=False) # hide all the xticks
fig.update_yaxes(range=[0, 7000])
fig.show()

In [None]:
df_aml_np_tool_o = count_multiple_choice(answers_2021,'Q36_B','aml_np_tool_o','nto_count')
df_aml_np_tool_r = count_multiple_choice(answers_2021,'Q37_B','aml_np_tool_r','ntr_count')
df_aml_np_tool_m = count_multiple_choice(answers_2021,'Q38_B','aml_np_tool_m','ntm_count')

In [None]:
md(f'''# auto-ml choices among non professionals<a class="anchor" id="np_aml"></a>
---
[all questions](#TOC) || [demographics](#DEM) || [data science](#DS) || [machine learning](#ML)

---
\n

question                                                                       | most common response                                                     | number of responses                                           | % of total 
------------------------------------------------------------------------------ | ---------------------------------------------------------- | --------------------------------------------------- | ---------------------------------------------------
__categories of automated machine learning tools__ | {get_max(df_aml_np_tool_o,'aml_np_tool_o','nto_count')[0].strip()} | {get_max(df_aml_np_tool_o,'aml_np_tool_o','nto_count')[1]}  | {get_max(df_aml_np_tool_o,'aml_np_tool_o','nto_count')[2]}
__specific automated machine learning tools__                                            | {get_max(df_aml_np_tool_r,'aml_np_tool_r','ntr_count')[0].strip()} | {get_max(df_aml_np_tool_r,'aml_np_tool_r','ntr_count')[1]}| {get_max(df_aml_np_tool_r,'aml_np_tool_r','ntr_count')[2]}
__tools for managing ML experiments__ | {get_max(df_aml_np_tool_m,'aml_np_tool_m','ntm_count')[0]} | {get_max(df_aml_np_tool_m,'aml_np_tool_m','ntm_count')[1]} | {get_max(df_aml_np_tool_m,'aml_np_tool_m','ntm_count')[2]}''')

In [None]:
fig = make_subplots(
    rows=2, cols=2,
    row_heights=[0.5,0.5],
    subplot_titles=["preferred auto-ml tool categories",
                    "preferred specific auto-ml tool",
                    "preferred tool to manage ml"],
    column_widths=[0.5,0.5],
    shared_yaxes = False,
    vertical_spacing=0.1)

fig.add_trace(go.Scatter(
    x=df_aml_np_tool_o.aml_np_tool_o,
    y=df_aml_np_tool_o['nto_count'],
    showlegend=False,
    mode="lines+markers",
    name="preferred categories Auto ML tool"),row=1, col=1)

fig.add_trace(go.Scatter(
    x=df_aml_np_tool_r.aml_np_tool_r,
    y=df_aml_np_tool_r['ntr_count'],
    showlegend=False,
    name="preferred specific Auto ML tool",
    mode="lines+markers",
    xaxis="x2"),row=2, col=1)

fig.add_trace(go.Scatter(
    x=df_aml_np_tool_m.aml_np_tool_m,
    y=df_aml_np_tool_m['ntm_count'],
    showlegend=False,
    text = df_aml_np_tool_m.aml_np_tool_m,
    mode="lines+markers",
    name="preferred tool to manage ML",
    xaxis="x3"),row=1, col=2)

# Set theme, margin, and annotation in layout
fig.update_layout(
    template="ggplot2",
#     margin=dict(r=5, t=45, b=10, l=5),
    title = 'count of responses : auto-ml among non professionals'
)
fig.update_xaxes(showticklabels=False) # hide all the xticks
fig.update_yaxes(range=[0, 6000])
fig.show()

In [None]:
df_sh_public = count_multiple_choice(answers_2021,'Q39','sh_public','count')
df_sh_course_platform = count_multiple_choice(answers_2021,'Q40','sh_course_platform','count')
df_sh_da_tool = count_multiple_choice(answers_2021,'Q41','sh_da_tool','count')
df_sh_media = count_multiple_choice(answers_2021,'Q42','sh_media','count')

In [None]:
md(f'''# sharing work and data science knowledge portals<a class="anchor" id="ds_kp"></a>
---
[all questions](#TOC) || [demographics](#DEM) || [data science](#DS) || [machine learning](#ML)

---
\n

question                                                                       | most common response                                                     | number of responses                                           | % of total 
------------------------------------------------------------------------------ | ---------------------------------------------------------- | --------------------------------------------------- | ---------------------------------------------------
__Publicly sharing or deploying data analysis or machine learning applications__ | {get_max(df_sh_public,'sh_public','count')[0].strip()} | {get_max(df_sh_public,'sh_public','count')[1]}  | {get_max(df_sh_public,'sh_public','count')[2]}
__Platform for data science courses__                                            | {get_max(df_sh_course_platform,'sh_course_platform','count')[0]} | {get_max(df_sh_course_platform,'sh_course_platform','count')[1]}| {get_max(df_sh_course_platform,'sh_course_platform','count')[2]}
__Primary tool used at work or school to analyze data__ | {get_max(df_sh_da_tool,'sh_da_tool','count')[0]} | {get_max(df_sh_da_tool,'sh_da_tool','count')[1]} | {get_max(df_sh_da_tool,'sh_da_tool','count')[2]}
__Favorite media source that reports on data science topics__ | {get_max(df_sh_media,'sh_media','count')[0]} | {get_max(df_sh_media,'sh_media','count')[1]} | {get_max(df_sh_media,'sh_media','count')[2]}''')

In [None]:
fig = make_subplots(
    rows=2, cols=2,
    row_heights=[0.5,0.5],
    shared_xaxes = False,
    subplot_titles=("sharing work publically", "course platform", "data analysis tool", "media sources"),
    specs=[[{"type": "xy"}, {"type": "xy"}],
           [{"type": "xy"}, {"type": "xy"}]],
    vertical_spacing=0.1)

fig.add_trace(go.Bar(
    y=df_sh_public.sh_public,
    x=df_sh_public['count'],
    name="sharing work publically",
    showlegend=False,
    orientation='h',
    text=df_sh_public.sh_public,
    textposition='auto'
),
              row=1, col=1)


fig.add_trace(go.Bar(
    y=df_sh_course_platform.sh_course_platform,
    x=df_sh_course_platform['count'],
    name="course platform",
    showlegend=False,
    orientation='h',
    text=df_sh_course_platform.sh_course_platform,
    textposition='auto'
),
              row=2, col=1)

fig.add_trace(go.Bar(
    y=df_sh_da_tool.sh_da_tool,
    x=df_sh_da_tool['count'],
    name="data analysis tool",
    showlegend=False,
    orientation='h',
    text=df_sh_da_tool.sh_da_tool,
    textposition='auto'
),
              row=1, col=2)

fig.add_trace(go.Bar(
    y=df_sh_media.sh_media,
    x=df_sh_media['count'],
    name="media sources",
    showlegend=False,
    hoverinfo='skip',
    orientation='h',
    text=df_sh_media.sh_media,
    textfont_size = 8,
    textposition='auto'
),
              row=2, col=2)

# Set theme, margin, and annotation in layout
fig.update_layout(
    height=800,
    uniformtext_minsize=6, uniformtext_mode='hide',
    template="ggplot2",
    margin=dict(r=5, t=25, b=40, l=5),
    barmode='stack'
)
fig.update_yaxes(showticklabels=False) # hide all the xticks
fig.layout.annotations[0].update(x=0.09, font= {'size': 12})
fig.layout.annotations[1].update(x=0.6, font= {'size': 12})
fig.layout.annotations[2].update(x=0.06, font= {'size': 12})
fig.layout.annotations[3].update(x=0.6, font= {'size': 12})
fig.update_traces()
fig.show()

# <span id="fns"> Footnotes</span>

[all questions](#TOC) || [demographics](#DEM) || [data science](#DS) || [machine learning](#ML)

---
<span id="fn1"><sup>1</sup></span> __Question 18 (which specific ML methods)__ was only asked to respondents that selected the relevant answer choices for Question 17 (which categories of algorithms). __Question 19 (which specific ML methods)__ was only asked to respondents that selected the relevant answer choices for Question 17 (which categories of algorithms).
    
<span id="fn2"><sup>2</sup></span> __Non-professionals__, _defined as students, unemployed, and respondents that have never spent any money in the cloud._ received questions with an alternate phrasing: (questions for non-professionals asked what tools they hope to become familiar with in the next 2 years instead of asking what tools they use on a regular basis).

<span id="fn3"><sup>3</sup></span> __Question 28 (which specific product)__ was only asked to respondents that selected more than one choice for Question 27-A (which of the following products). __Question 29-A (which specific AWS/Azure/GCP products)__ was only asked to respondents that selected the relevant answer choices for Question 27-A (which of the following companies). __Question 30-A (which specific AWS/Azure/GCP products)__ was only asked to respondents that selected the relevant answer choices for Question 27-A (which of the following companies).

<span id="fn4"><sup>4</sup></span> __Question 33 (which specific product)__ was only asked to respondents that selected more than one choice for Question 32-A (which of the following products).

<span id="fn5"><sup>5</sup></span> __Question 35 (which specific product)__ was only asked to respondents that selected more than one choice for Question 34-A (which of the following products).

<span id="fn6"><sup>6</sup></span> __Question 37-A (which specific product)__ was only asked to respondents that answered affirmatively to Question 36-A (which of the following categories of products).
