#  **How Far Is It to Get Your Feet Into the Door of DS & ML World**

> Hi everyone. This is my very first individual public Kaggle project. I graduated with a master's degree in Economics and have learned some basic ideas about data science and machine learning from the course work. There is endless learning in a DS & ML career as a professional. As a recent graduate, I need a lot of practice. Coding skill is not the only criterium to determine if a person can do a good job. Soft skills like understanding the industry, being able to interpret the data to audience, and providing users with valuable findings are as important. While the soft skill is not easy to measure, a good understanding of frequently used tools could provide a clearer idea of what is the next skill to be improved as a novice or whoever wants to make improvement. This project will start with some demographic data visualization to understand the current labor in this field.

In [None]:
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
from ipywidgets import *
from wordcloud import WordCloud, STOPWORDS
import io
import re

In [None]:
all_response = pd.read_csv("../input/kaggle-survey-2020/kaggle_survey_2020_responses.csv")
question_list=all_response.iloc[0,:]

all_response = all_response.drop(all_response.index[0]).reset_index()

# Create a column to identify if the participant has ML experience
all_response['is_ML'] = all_response.Q15.apply(lambda x: 'ML' if x in['1-2 years', '3-4 years','Under 1 year', 
                                               '2-3 years', '4-5 years', '5-10 years', '20 or more years', '10-20 years'] else 'non ML')

# Get subgroup of people who has ML experience
ml_response = all_response.loc[~all_response['Q15'].isin(['I do not use machine learning methods'])].dropna(subset=['Q15'])
ml_response.shape # (17961, 356)

# Get subgroup of people who has NO ML experience
no_ml_response = all_response.loc[all_response['Q15'].isin(['I do not use machine learning methods'])].dropna(subset=['Q15'])
no_ml_response.shape # (2075, 357)

    The dataset provided for this project has 20,036 usable responses. Questions vary from education level to the tools and environment experience. 
	First of all, let’s take a look at the people who participated in this survey.
	This histogram is sorted in a descending order to show a better visual comparison of how many people are with this job title comparing to other job titles. Student is the largest group. Since we don’t have sufficient information about the Other and Currently not employed groups, the following may not have discussions about these two groups but will still include them.


In [None]:
# Sorted histogram
def get_hist(x_axis, x_label, src):
    # Get unique value and sort by count
    title_group_size = src.groupby(x_axis).size().to_frame().reset_index(level = [x_axis])
    title_group_size.columns = [x_label, "Count"]
    # Sort
    title_group_size_sort = title_group_size.sort_values(["Count"], ascending = False).reset_index(drop=True)
    plt.figure(figsize=(11,8))
    ax = sns.barplot(x = x_label, y = "Count", errwidth = 10, palette="Blues_d", data = title_group_size_sort)
    ax.get_yaxis().set_major_formatter(plt.FuncFormatter(lambda x, loc: "{:,}".format(int(x))))
    ax.set_xlabel(x_label, fontsize = 15)
    ax.set_ylabel("Count", fontsize = 15)
    ax.set_title('Histogram of ' + x_label, fontsize = 30)

    #Customize x axis and show the total in each bar
    ax.set_xticklabels(title_group_size_sort.iloc[:,0])
    for item in ax.get_xticklabels():
        item.set_rotation(90)
    for i, v in enumerate(title_group_size_sort["Count"].iteritems()):
        ax.text(i, v[1], "{:,}".format(v[1]), color = 'navy', va = 'bottom', ha = 'center')

    plt.tight_layout()
    plt.show()

In [None]:
get_hist('Q5', 'Job Title', all_response)

    In addition, I like to see how different it is between participants with ML experience and participants with no ML experience. I split the dataset into two parts based on their responses to the years of ML experience.
	From the histogram below, we can tell that Data Scientist, Research Scientist, and Machine Learning Engineer have higher ML to non ML ratio.


In [None]:

title_group_size = all_response.groupby('Q5').size().to_frame()

sns.catplot(x = "Q5", kind = "count", hue = "is_ML", aspect = 2, data = all_response, palette="ch:.25", legend = False,
           order = title_group_size.sort_values(title_group_size.columns[0], ascending = False).index)

plt.xticks(rotation=70, fontsize=14)
plt.yticks(fontsize=14)
plt.legend(loc='upper right')
plt.title('Histogram of Positions Regarding ML Experience', fontsize = 18)
plt.show()

    The three heatmap below show the column percentages to reflect the distribution in each category despite the category size. However, the group of participants with no ML experience is smaller and might not provide accurate demographic and professional information. But the following charts will as show the data for an overall view.
	Master’s degree is the most common education level in most of the job titles. In the group of participants with ML experience, the percentages show higher concentration in the Master’s degree or Doctoral degree whichever is the mode in the no ML experience group. 


In [None]:
# Pivot two categorical columns and plot a heatmap
def pivot_heatmap(pvt_index, pvt_col, pvt_src, pvt_filter_col = None, pvt_filter_val= None, index_cate = None):
    if pvt_filter_col:
        pvt_df = pvt_src.loc[pvt_src[pvt_filter_col].isin([pvt_filter_val])]
        pvt_df = pd.DataFrame(pvt_df[[pvt_index, pvt_col, pvt_filter_col]])
        df_heatmap = pvt_df.pivot_table(index= pvt_index, columns = pvt_col, aggfunc='count') 
    else:
        pvt_df = pd.DataFrame(pvt_src[[pvt_index, pvt_col]])
        pvt_df['value'] = 1
        df_heatmap = pvt_df.pivot_table(index= pvt_index, columns = pvt_col, aggfunc='sum') 

    if index_cate:
        df_heatmap.index = pd.CategoricalIndex(df_heatmap.index, categories = index_cate)
        df_heatmap.sort_index(level=0, inplace=True)
    df_heatmap=df_heatmap.apply(lambda x:x/x.sum())

    # Create a heatmap
    sns.heatmap(df_heatmap,annot=True, fmt = '.1%', cmap='YlGnBu', linewidths=.5, xticklabels = df_heatmap.columns.get_level_values(1))
    plt.show()

In [None]:
# Pivot two categorical columns
def pivot(pvt_index, pvt_col, pvt_src, pvt_filter_col = None, pvt_filter_val= None, index_cate = None):
    if pvt_filter_col:
        pvt_df = pvt_src.loc[pvt_src[pvt_filter_col].isin([pvt_filter_val])]
        pvt_df = pd.DataFrame(pvt_df[[pvt_index, pvt_col, pvt_filter_col]])
        df_heatmap = pvt_df.pivot_table(index= pvt_index, columns = pvt_col, aggfunc='count') 
    else:
        pvt_df = pd.DataFrame(pvt_src[[pvt_index, pvt_col]])
        pvt_df['value'] = 1
        df_heatmap = pvt_df.pivot_table(index= pvt_index, columns = pvt_col, aggfunc='sum') 

    if index_cate:
        df_heatmap.index = pd.CategoricalIndex(df_heatmap.index, categories = index_cate)
        df_heatmap.sort_index(level=0, inplace=True)
    df_heatmap=df_heatmap.apply(lambda x:x/x.sum())

    return df_heatmap

In [None]:
# Heatmap Education level vs. Job Title
cate = ["Doctoral degree", 
        'Master’s degree',
        "Professional degree",
        'Bachelor’s degree',
        'Some college/university study without earning a bachelor’s degree',
        "No formal education past high school",
        "I prefer not to answer"]

plt.figure(figsize=(15,10))

plt.subplot(2,1,1)
plt.title('Distribution of Education in Each Job Title (ML experience vs no ML experience)', fontsize = 16)
#plt.set_xlabel('Job Title')
a = pivot(pvt_index = 'Q4', pvt_col = 'Q5', pvt_src = ml_response, index_cate = cate)
sns.heatmap(a,annot=True, fmt = '.1%', cmap='YlGnBu', linewidths=.5, xticklabels = a.columns.get_level_values(1))

plt.subplot(2,1,2)
#plt.figure(figsize=(12,5))
b=pivot(pvt_index = 'Q4', pvt_col = 'Q5', pvt_src = no_ml_response, index_cate = cate)
sns.heatmap(b,annot=True, fmt = '.1%', cmap='YlGnBu', linewidths=.5, xticklabels = b.columns.get_level_values(1))

plt.tight_layout()

    Participants who have ML experience tend to have more years of experience on average. The darker color is comparatively spread even in the upper plot while most of the darker color sediments in the bottom where years of experience is less.

In [None]:
# Heatmap Education level vs. Job Title
cate = ['20+ years', '10-20 years', '5-10 years', '3-5 years', '1-2 years', '< 1 years', 'I have never written code']

plt.figure(figsize=(12,10))

plt.subplot(2,1,1)
plt.title('Distribution of Years of Experience in Each Job Title (ML experience vs no ML experience)', fontsize = 16)
a = pivot(pvt_index = 'Q6', pvt_col = 'Q5', pvt_src = ml_response, index_cate = cate)
sns.heatmap(a,annot=True, fmt = '.1%', cmap='Purples', linewidths=.5, xticklabels = a.columns.get_level_values(1))

plt.subplot(2,1,2)
b=pivot(pvt_index = 'Q6', pvt_col = 'Q5', pvt_src = no_ml_response, index_cate = cate)
sns.heatmap(b,annot=True, fmt = '.1%', cmap='Purples', linewidths=.5, xticklabels = b.columns.get_level_values(1))

plt.tight_layout()

    Age structure in the no ML experience group is clustered in a couple age group while it is spreading into different age groups in the ML experience group.

In [None]:
# Heatmap Age vs. Job Title
#cate = ['20+ years', '10-20 years', '5-10 years', '3-5 years', '1-2 years', '< 1 years', 'I have never written code']

plt.figure(figsize=(12,12))

plt.subplot(2,1,1)
plt.title('Distribution of Age in Each Job Title (ML experience vs no ML experience)', fontsize = 16)
a = pivot(pvt_index = 'Q1', pvt_col = 'Q5', pvt_src = ml_response)
sns.heatmap(a,annot=True, fmt = '.1%', cmap='bone_r', linewidths=.5, xticklabels = a.columns.get_level_values(1))

plt.subplot(2,1,2)
b=pivot(pvt_index = 'Q1', pvt_col = 'Q5', pvt_src = no_ml_response)
sns.heatmap(b,annot=True, fmt = '.1%', cmap='bone_r', linewidths=.5, xticklabels = b.columns.get_level_values(1))

plt.tight_layout()

    We can assume that years of experience is correlating to the age. The following heatmap shows the correlation between the two factors of each job title in the ML group. Data Engineer, Software Engineer, Data Scientist, Research Scientist, and Machine Learning Engineer show significant correlations between years of experience and age indicated by the blue diagonal belts.

In [None]:
plt.figure(figsize=(20,18))

plt.subplot(7,2,1)
plt.title('Student', fontsize = 16)
pt = pivot(pvt_index = 'Q6', pvt_col = 'Q1', pvt_src = ml_response, pvt_filter_col = 'Q5', pvt_filter_val= 'Student', index_cate = cate)
sns.heatmap(pt,annot=True, fmt = '.1%', cmap='YlGnBu', linewidths=.5, xticklabels = pt.columns.get_level_values(1))

plt.subplot(7,2,2)
plt.title('Data Scientist', fontsize = 16)
pt = pivot(pvt_index = 'Q6', pvt_col = 'Q1', pvt_src = ml_response, pvt_filter_col = 'Q5', pvt_filter_val= 'Data Scientist', index_cate = cate)
sns.heatmap(pt,annot=True, fmt = '.1%', cmap='YlGnBu', linewidths=.5, xticklabels = pt.columns.get_level_values(1))

plt.subplot(7,2,3)
plt.title('Software Engineer', fontsize = 16)
pt = pivot(pvt_index = 'Q6', pvt_col = 'Q1', pvt_src = ml_response, pvt_filter_col = 'Q5', pvt_filter_val= 'Software Engineer', index_cate = cate)
sns.heatmap(pt,annot=True, fmt = '.1%', cmap='YlGnBu', linewidths=.5, xticklabels = pt.columns.get_level_values(1))

plt.subplot(7,2,4)
plt.title('Other', fontsize = 16)
pt = pivot(pvt_index = 'Q6', pvt_col = 'Q1', pvt_src = ml_response, pvt_filter_col = 'Q5', pvt_filter_val= 'Other', index_cate = cate)
sns.heatmap(pt,annot=True, fmt = '.1%', cmap='YlGnBu', linewidths=.5, xticklabels = pt.columns.get_level_values(1))

plt.subplot(7,2,5)
plt.title('Currently not employed', fontsize = 16)
pt = pivot(pvt_index = 'Q6', pvt_col = 'Q1', pvt_src = ml_response, pvt_filter_col = 'Q5', pvt_filter_val= 'Currently not employed', index_cate = cate)
sns.heatmap(pt,annot=True, fmt = '.1%', cmap='YlGnBu', linewidths=.5, xticklabels = pt.columns.get_level_values(1))

plt.subplot(7,2,6)
plt.title('Data Analyst', fontsize = 16)
pt = pivot(pvt_index = 'Q6', pvt_col = 'Q1', pvt_src = ml_response, pvt_filter_col = 'Q5', pvt_filter_val= 'Data Analyst', index_cate = cate)
sns.heatmap(pt,annot=True, fmt = '.1%', cmap='YlGnBu', linewidths=.5, xticklabels = pt.columns.get_level_values(1))

plt.subplot(7,2,7)
plt.title('Research Scientist', fontsize = 16)
pt = pivot(pvt_index = 'Q6', pvt_col = 'Q1', pvt_src = ml_response, pvt_filter_col = 'Q5', pvt_filter_val= 'Research Scientist', index_cate = cate)
sns.heatmap(pt,annot=True, fmt = '.1%', cmap='YlGnBu', linewidths=.5, xticklabels = pt.columns.get_level_values(1))

plt.subplot(7,2,8)
plt.title('Machine Learning Engineer', fontsize = 16)
pt = pivot(pvt_index = 'Q6', pvt_col = 'Q1', pvt_src = ml_response, pvt_filter_col = 'Q5', pvt_filter_val= 'Machine Learning Engineer', index_cate = cate)
sns.heatmap(pt,annot=True, fmt = '.1%', cmap='YlGnBu', linewidths=.5, xticklabels = pt.columns.get_level_values(1))

plt.subplot(7,2,9)
plt.title('Business Analyst', fontsize = 16)
pt = pivot(pvt_index = 'Q6', pvt_col = 'Q1', pvt_src = ml_response, pvt_filter_col = 'Q5', pvt_filter_val= 'Business Analyst', index_cate = cate)
sns.heatmap(pt,annot=True, fmt = '.1%', cmap='YlGnBu', linewidths=.5, xticklabels = pt.columns.get_level_values(1))

plt.subplot(7,2,10)
plt.title('Product/Project Manager', fontsize = 16)
pt = pivot(pvt_index = 'Q6', pvt_col = 'Q1', pvt_src = ml_response, pvt_filter_col = 'Q5', pvt_filter_val= 'Product/Project Manager', index_cate = cate)
sns.heatmap(pt,annot=True, fmt = '.1%', cmap='YlGnBu', linewidths=.5, xticklabels = pt.columns.get_level_values(1))

plt.subplot(7,2,11)
plt.title('Data Engineer', fontsize = 16)
pt = pivot(pvt_index = 'Q6', pvt_col = 'Q1', pvt_src = ml_response, pvt_filter_col = 'Q5', pvt_filter_val= 'Data Engineer', index_cate = cate)
sns.heatmap(pt,annot=True, fmt = '.1%', cmap='YlGnBu', linewidths=.5, xticklabels = pt.columns.get_level_values(1))

plt.subplot(7,2,12)
plt.title('Statistician', fontsize = 16)
pt = pivot(pvt_index = 'Q6', pvt_col = 'Q1', pvt_src = ml_response, pvt_filter_col = 'Q5', pvt_filter_val= 'Statistician', index_cate = cate)
sns.heatmap(pt,annot=True, fmt = '.1%', cmap='YlGnBu', linewidths=.5, xticklabels = pt.columns.get_level_values(1))

plt.subplot(7,2,13)
plt.title('DBA/Database Engineer', fontsize = 16)
pt = pivot(pvt_index = 'Q6', pvt_col = 'Q1', pvt_src = ml_response, pvt_filter_col = 'Q5', pvt_filter_val= 'DBA/Database Engineer', index_cate = cate)
sns.heatmap(pt,annot=True, fmt = '.1%', cmap='YlGnBu', linewidths=.5, xticklabels = pt.columns.get_level_values(1))

plt.tight_layout()

In [None]:
# Use Regex to find the columns contain response of a question 
# and rename the response columns with the response

# single_response can be a list of header names
# q_num is the question number included in the header
def get_question_columns(single_response, q_num, src):
    #selc_choice_list = all_response[single_response]
    #q_num = re.search('^\S*', question).group().replace('-','_')
    selc_choice_list = pd.DataFrame(src[single_response])
    col_num = 0
    for i in src.columns:
        if re.search(q_num,i):
            
            # The response is at the end of the question row (which was the row below headers) and after "Selected Choice - "
            if re.search('Selected Choice - ',question_list[i]):
                hdr_name= question_list[i][re.search('Selected Choice - ',question_list[i]).end():len(question_list[i])]
            else:
                hdr_name = q_num

            selc_choice_list = selc_choice_list.merge(src.iloc[:, col_num], left_index=True, right_index=True)
            # Rename new column
            selc_choice_list = selc_choice_list.rename(columns={selc_choice_list.columns[-1]:hdr_name})
        col_num += 1
        
    return selc_choice_list
    


In [None]:
# Get unique value on column 1 and sum the count for the rest of the columns
def long_short(df):
    # Get unique value from column 1 and drop nan.
    unique_list = pd.DataFrame(df.iloc[:,0].unique()).dropna().reset_index(drop=True)
    unique_df = []
    #Iterate each unique value
    for i in range(len(unique_list)):
        this_group = df.groupby(df.columns[0]).get_group(unique_list.iloc[i,0])
        unique_row = [unique_list[0][i]]
        # Append count of each following column
        for col_name in this_group.columns[1:]:
            unique_row.append(this_group[col_name].count())

        unique_df.append(unique_row)
    # Convert to a DataFrame
    unique_df = pd.DataFrame(unique_df, columns = df.columns)
    unique_df = unique_df.style.background_gradient(cmap='Blues')
    return unique_df
        
#long_short(Q7_df)

In [None]:
# Divide into two groups, people have DS degrees and people do not have DS degrees
with_degree = ml_response.loc[ml_response['Q37_Part_10'].isin(['University Courses (resulting in a university degree)'])]
no_degree = ml_response.loc[~ml_response['Q37_Part_10'].isin(['University Courses (resulting in a university degree)'])]

    Next, let’s dig deeper into the responses that have ML experience. DS and ML requires good quantitative and technical skills even for an entry level job. Only 22.7% of the ML experience group have degrees in data science. It is quite easy to understand this. Some disciplines help students to obtain the skillset which is the good fit for DS and ML such as statistics, math, and computer science. Also, some school programs list data science as track or concentration under other major disciplines. 

In [None]:
# Pie chart
labels = ['Has a DS degree', "Doesn't have a DS degree"]
sizes = [len(with_degree), len(no_degree)]
# only "explode" the 2nd slice (i.e. 'Hogs')
#explode = (0, 0.1, 0, 0)
#add colors
colors = ['cornflowerblue', 'sienna']#['#ff9999', '#ffcc99']#,'#ff9999','#99ff99','#ffcc99']
fig1, ax1 = plt.subplots()
ax1.pie(sizes, labels=labels, colors=colors, autopct='%1.1f%%',
        shadow=True, startangle=90)
# Equal aspect ratio ensures that pie is drawn as a circle
ax1.axis('equal')
plt.tight_layout()
ax1.set_title('Distribution of People With and Without A DS Degree (%)', fontsize = 20);
plt.show()

    In general, there aren’t a big difference between people who has a DS degree and people doesn’t have. The percentage differences in the Research Scientist and Software Engineer group are slightly larger but not proven as significant yet. 

In [None]:
# Split data by if participant have earned or working on earning DS degrees.
# Normalized
degree = with_degree.groupby('Q5').size()
degree_y = pd.DataFrame(degree/degree.sum()*100)
degree_y = degree_y.sort_values(degree_y.columns[0], ascending = False)

degree = no_degree.groupby('Q5').size()
degree_n = pd.DataFrame(-degree/degree.sum()*100)
degree_n = degree_n.reindex(degree_y.index)

In [None]:


fig = plt.figure(figsize=[12,7])
gs = fig.add_gridspec(1, 1)
ax0 = fig.add_subplot(gs[0, 0], ylim=[-50, 50])

bg_color = '#f7f7f7'
fig.patch.set_facecolor(bg_color)
ax0.set_facecolor(bg_color)

sns.barplot(x = degree_y.index, y = degree_y[0], data = degree_y, errwidth = 10, palette="Blues_d", label='Yes')
sns.barplot(x = degree_n.index, y = degree_n[0], data = degree_n, errwidth = 10, palette="Oranges_r", label='No')

for i in range(len(degree_y)):
    ax0.annotate('{:.2f}%'.format(degree_y[0][i]),
                xy=(i, degree_y[0][i]+3),
                va='center', 
                ha='center',
                alpha=0.5,
                fontsize=10)
for j in range(len(degree_n)):
    ax0.annotate('{:.2f}%'.format(-degree_n[0][j]),
                xy=(j, degree_n[0][j]-3),
                va='center',
                ha='center',
                alpha=0.5,
                fontsize=10)

x_axis = degree_y.index
ticks, labels = plt.xticks()
ax0.set_xticklabels(x_axis,
                    {'rotation': 90,
                     'fontsize': 10})

yticks = [-30, -20, -10, 0, 10, 20, 30]
plt.yticks(ticks = yticks, labels = np.abs(yticks))

for s in ['top', 'left', 'right']:
    ax0.spines[s].set_visible(False)
    
ax0.grid(which='major', axis='y', color='lightgrey', zorder=0)
ax0.axhline(c='black', lw=1, ls=':')

ax0.legend(title='Have or working on a DS degree', 
           fontsize=10,
           title_fontsize=10,
           bbox_to_anchor=[1, 0.88], 
           loc='upper right',
           framealpha=0)

ax0.text(-1, 50, 
         'Distribution of Position in People With and Without A DS Degree and (Normalized %)', 
         fontsize=14, 
         fontweight='bold');

    Last but not the least, the following interactive visualization helped me a lot to explore interesting findings. By selecting a question in the dropdown box, you can see the response layout of the multi-response questions. Hope you can have fun with it!

In [None]:
ml_tool_df = get_question_columns('Q5', 'Q16', ml_response)

mt={'framework': [], 'count':[]}

for col in ml_tool_df.iloc[:,1:].columns:
    mt['framework'].append(col)
    mt['count'].append(ml_tool_df.iloc[:,1:][col].count())
mt = pd.DataFrame(mt)

data = dict(zip(mt['framework'].tolist(),mt['count'].tolist()))
#print(data)
wc = WordCloud(width=600, height=300, min_font_size = 15).generate_from_frequencies(data)
plt.figure(figsize=(12,12))
plt.imshow(wc, interpolation='bilinear')
plt.axis('off')
plt.title('Frequencies of Machine Learning Framework Mentioned', fontsize=20)
plt.show()

Background color reflecting the volumes in columns.

There are more details that we can dig into. The following interative visualization helped me a lot to explore interesting findings by selecting a question in the dropdown box. Hope you can have fun with it!

For questions that have response in multiple columns:

In [None]:
def see_mul_response(question, source):

    q_num = re.search('^\S*', question).group().replace('-','_')
    print(q_num)
    if source == 'All':
        data_src = all_response
    elif source == 'ML Response':
        data_src = ml_response
    elif source == 'no ML Response':
        data_src = no_ml_response
    df = get_question_columns('Q5', q_num, data_src)
    #print(df)
    df = long_short(df)
    df.background_gradient(cmap='Greens', axis = 1)
    return df

interact(see_mul_response, question = ['Q7 What programming languages do you use on a regular basis? (Select all that apply)',
                                       "Q9 Which of the following integrated development environments (IDE's) do you use on a regular basis?",
                                       "Q10 Which of the following hosted notebook products do you use on a regular basis? (Select all that apply)",
                                       "Q12 Which types of specialized hardware do you use on a regular basis? (Select all that apply)",
                                       "Q14 What data visualization libraries or tools do you use on a regular basis? (Select all that apply",
                                       "Q16 Which of the following machine learning frameworks do you use on a regular basis? (Select all that apply)",
                                       "Q17 Which of the following ML algorithms do you use on a regular basis? (Select all that apply)",
                                       "Q18 Which categories of computer vision methods do you use on a regular basis? (Select all that apply)",
                                       "Q19 Which of the following natural language processing (NLP) methods do you use on a regular basis? (Select all that apply)",
                                       "Q23 Select any activities that make up an important part of your role at work: (Select all that apply)",
                                       "Q26-A Which of the following cloud computing platforms do you use on a regular basis? (Select all that apply)",
                                       "Q27-A Do you use any of the following cloud computing products on a regular basis? (Select all that apply)",
                                       "Q28-A Do you use any of the following machine learning products on a regular basis? (Select all that apply)",
                                       "Q29-A Which of the following big data products (relational databases, data warehouses, data lakes, or similar) do you use on a regular basis? (Select all that apply)",
                                       "Q31-A Which of the following business intelligence tools do you use on a regular basis? (Select all that apply)",
                                       "Q33-A Do you use any automated machine learning tools (or partial AutoML tools) on a regular basis? (Select all that apply)",
                                       "Q34-A Which of the following automated machine learning tools (or partial AutoML tools) do you use on a regular basis? (Select all that apply)",
                                       "Q35-A Do you use any tools to help manage machine learning experiments? (Select all that apply)",
                                       "Q36 Where do you publicly share or deploy your data analysis or machine learning applications? (Select all that apply)",
                                       "Q37 On which platforms have you begun or completed data science courses? (Select all that apply)",
                                       'Q39 Who/what are your favorite media sources that report on data science topics? (Select all that apply)',
                                       "Q27-B In the next 2 years, do you hope to become more familiar with any of these specific cloud computing products? (Select all that apply)",
                                       "Q28-B In the next 2 years, do you hope to become more familiar with any of these specific machine learning products? (Select all that apply)",
                                       "Q29-B Which of the following big data products (relational databases, data warehouses, data lakes, or similar) do you hope to become more familiar with in the next 2 years? (Select all that apply)",
                                       "Q31-B Which of the following business intelligence tools do you hope to become more familiar with in the next 2 years? (Select all that apply)", 
                                       "Q33-B Which categories of automated machine learning tools (or partial AutoML tools) do you hope to become more familiar with in the next 2 years? (Select all that apply)",
                                       "Q34-B Which specific automated machine learning tools (or partial AutoML tools) do you hope to become more familiar with in the next 2 years? (Select all that apply)",
                                       "Q35-B In the next 2 years, do you hope to become more familiar with any of these tools for managing ML experiments? (Select all that apply)"
                                      ],
        source = ['All', 'ML Response', 'no ML Response'])