# ***Kaggle-ML-DS-Survey***

**Description**

This year Kaggle is once again launching an annual Data Science Survey Challenge, where we will be awarding a prize pool of $30,000 to notebook authors who tell a rich story about a subset of the data science and machine learning community.

In our fourth year running this survey, we were once again awed by the global, diverse, and dynamic nature of the data science and machine learning industry. This survey data EDA provides an overview of the industry on an aggregate scale, but it also leaves us wanting to know more about the many specific communities comprised within the survey. For that reason, weâ€™re inviting the Kaggle community to dive deep into the survey datasets and help us tell the diverse stories of data scientists from around the world.

<br><br>

In [None]:
import numpy as np
import pandas as pd

# Viz Library
import seaborn as sns
import matplotlib.pyplot as plt
from matplotlib.pyplot import show

In [None]:
# Reading the Response Dataset
response = pd.read_csv('../input/kaggle-survey-2020/kaggle_survey_2020_responses.csv')
response = response.drop(response.index[0], axis = 0)

In [None]:
response.head()

In [None]:
response.info

In [None]:
response.describe()

In [None]:
response.shape

***Question 1: Age Distribution***

In [None]:
data = response['Q1'].sort_values(ascending=True)

In [None]:
sns.set(font_scale=1.4)
sns.color_palette("tab10")
plt.figure(figsize=(15,10))

total = float(len(response)) # one person per row 

ax = sns.countplot(x = data, data = response)
for p in ax.patches:
    height = p.get_height()
    ax.text(p.get_x()+p.get_width()/2.,
            height + 3,
            '{:1.2f}%'.format((height/total)*100),
            ha="center") 

plt.title('Age Distribution',
         fontsize =30)

plt.xlabel('Age', fontsize = 24)
plt.ylabel('Percentage', fontsize = 24)

We can see that Most of the Data Scientist are in the Age Range of 18 to 29. There are also 0.38% People with 70+ Age which is Awesome ðŸ˜Ž. Also as the Age increases count of people also decreases...

***Question 2: Gender Identity of Data Scientist***

In [None]:
sns.set(font_scale=1.4)
plt.figure(figsize=(15,10))

ax = sns.countplot(y = "Q2", data = response, palette = "rainbow")

plt.title('Gender Distribution')
plt.xlabel('Frequencies')

total = len(response['Q2'])
for p in ax.patches:
        percentage = '{:.1f}%'.format(100 * p.get_width()/total)
        x = p.get_x() + p.get_width() + 0.02
        y = p.get_y() + p.get_height()/2
        ax.annotate(percentage, (x, y))


plt.title('Data Scientist as per Gender',
         fontsize =30)

plt.xlabel('Count', fontsize = 24)
plt.ylabel('Gender', fontsize = 24)

plt.show()

We can clearly see that 78.8% are Men which are dominating in this field, and 19.4% are Women, 0.3% Nonbinary, 0.3% Prefer to Self-descibe, 1.3% Prefer not to say, which in my thought is Very Less Numbers in DS/ML responses.

***Question 3: In which country do you currently reside ?***

In [None]:
'''
Plotting Top 7 Countries which has More Number of Responses
'''

sns.set(font_scale=1.4)
plt.figure(figsize=(15,10))

sns.countplot(y = response['Q3'], palette = 'spring', data = response, order=['India', 'United States of America', 'Other', 'Brazil',
                                                                                       'Japan', 'Russia', 'United Kingdom of Great Britain and Northern Ireland',
                                                                                       'Nigeria', 'Germany', 'Spain'])

plt.title('Top 7 Country Per Responses',
         fontsize =30)

plt.xlabel('Frequency', fontsize = 24)
plt.ylabel('Country', fontsize = 24)

plt.show()

India has the Highest number of people who submitted their responses, followed by United States of America, Other, and so on...........

***Question 4: What is the highest level of formal education that you have attained or plan to attain within the next 2 years?***

In [None]:
sns.set(font_scale=1.4)
plt.figure(figsize=(15,10))

data = response['Q4'].sort_values(ascending=True)

ax = sns.countplot(y = data, data = response, palette = "rainbow")

total = len(response['Q4'])
for p in ax.patches:
        percentage = '{:.1f}%'.format(100 * p.get_width()/total)
        x = p.get_x() + p.get_width() + 0.02
        y = p.get_y() + p.get_height()/2
        ax.annotate(percentage, (x, y))


plt.title('Formal Education',
         fontsize =30)

plt.xlabel('Frequency', fontsize = 24)
plt.ylabel('Education', fontsize = 24)

plt.show()

There are less number of people those who have done their `Doctoral Degree` which is Great ðŸ˜Ž. We have 39.2% People those who have done their `Master's Degree`, followed by 34.8% People with their `Bachelor's degree`. Also we also have 1.2% people those who have not done their Formal Education, they may be learning by their own in ML/DS field ðŸ™Œ.

***Question 5: Select the title most similar to your current role (or most recent title if retired)***

In [None]:
sns.set(font_scale=1.4)
plt.figure(figsize=(17,10))

data = response['Q5'].sort_values(ascending=True)

ax = sns.countplot(y = data, data = response, palette = "spring")

total = len(response['Q5'])
for p in ax.patches:
        percentage = '{:.1f}%'.format(100 * p.get_width()/total)
        x = p.get_x() + p.get_width() + 0.02
        y = p.get_y() + p.get_height()/2
        ax.annotate(percentage, (x, y))


plt.title('Job Title',
         fontsize =30)

plt.xlabel('Frequency', fontsize = 24)
plt.ylabel('Job', fontsize = 24)

plt.show()

Most of them are `Student`. There are 13.4% as Data Scientist and 5.4% Ml Engineer. We can also see Only 1.4% of Statistician in the Job Title. Also 8.2% people are `Not Employed` which is a SAD ðŸ˜­ news. Hope they Get a Job Soon!

***Question 6: For how many years have you been writing code and/or programming?***

In [None]:
sns.set(font_scale=1.4)
plt.figure(figsize=(17,10))

ax = sns.countplot(y = 'Q6', data = response, palette = "spring")

total = len(response['Q6'])
for p in ax.patches:
        percentage = '{:.1f}%'.format(100 * p.get_width()/total)
        x = p.get_x() + p.get_width() + 0.02
        y = p.get_y() + p.get_height()/2
        ax.annotate(percentage, (x, y))


plt.title('Years of Programming Experience',
         fontsize =30)

plt.xlabel('Frequency', fontsize = 24)
plt.ylabel('Code', fontsize = 24)

plt.show()

5.6% People have never written Code. We can see as the years increases number of people also decreases! We have 22.5% of people with 1-2 Years of Experience and 22.7% of people with 3-5 years of expericence.

***Question 7: What programming languages do you use on a regular basis?***

In [None]:
# Combining All the Question 7 answers

question_7 = np.concatenate([response.Q7_Part_1,
                             response.Q7_Part_2,
                             response.Q7_Part_3,
                             response.Q7_Part_4,
                             response.Q7_Part_5,
                             response.Q7_Part_6,
                             response.Q7_Part_7,
                             response.Q7_Part_8,
                             response.Q7_Part_9,
                             response.Q7_Part_10,
                             response.Q7_Part_11,
                             response.Q7_Part_12,
                             response.Q7_OTHER,
])

ques_7 = pd.concat([response, pd.DataFrame(question_7)], ignore_index = True, axis = 1)

ques_7.columns = np.append(response.columns.values, 'Q7')

In [None]:
sns.set(font_scale=1.4)
plt.figure(figsize=(17,10))

ax = sns.countplot(y = ques_7.Q7, data = response, palette = "rainbow")

total = len(response['Q7_Part_1'])
for p in ax.patches:
        percentage = '{:.1f}%'.format(100 * p.get_width()/total)
        x = p.get_x() + p.get_width() + 0.02
        y = p.get_y() + p.get_height()/2
        ax.annotate(percentage, (x, y))


plt.title('Programming Language Used Most Often!',
         fontsize =30)

plt.xlabel('Frequency', fontsize = 24)
plt.ylabel('Programming Language', fontsize = 24)

plt.show()

The Results is very Clear, `Python` sits at a 1st Place on Programming Language Used Most Often, followed by `SQL` 2nd Place and `R` 3rd Place, and so on other languages..........

***Question 8: What programming language would you recommend an aspiring data scientist to learn first?***

In [None]:
sns.set(font_scale=1.4)
plt.figure(figsize=(17,10))

ax = sns.countplot(y = 'Q8', data = response, palette = "rainbow")

total = len(response['Q8'])
for p in ax.patches:
        percentage = '{:.1f}%'.format(100 * p.get_width()/total)
        x = p.get_x() + p.get_width() + 0.02
        y = p.get_y() + p.get_height()/2
        ax.annotate(percentage, (x, y))


plt.title('Programming Language To Learn First For Data Science Aspirant!',
         fontsize =30)

plt.xlabel('Frequency', fontsize = 24)
plt.ylabel('Programming Language', fontsize = 24)

plt.show()

`Python` is treding Language to Learn first for data Science, `R` is 2nd Preference language followed by `SQL`

***Question 9: Which of the following integrated development environments (IDE's) do you use on a regular basis?***

In [None]:
# Combining All the Question 9 answers

question_9 = np.concatenate([response.Q9_Part_1,
                             response.Q9_Part_2,
                             response.Q9_Part_3,
                             response.Q9_Part_4,
                             response.Q9_Part_5,
                             response.Q9_Part_6,
                             response.Q9_Part_7,
                             response.Q9_Part_8,
                             response.Q9_Part_9,
                             response.Q9_Part_10,
                             response.Q9_Part_11,
                             response.Q9_OTHER
])

ques_9 = pd.concat([response, pd.DataFrame(question_9)], ignore_index = True, axis = 1)

ques_9.columns = np.append(response.columns.values, 'Q9')

In [None]:
sns.set(font_scale=1.4)
plt.figure(figsize=(17,10))

ax = sns.countplot(y = ques_9.Q9, data = response, palette = "rainbow")

total = len(response['Q9_Part_1'])
for p in ax.patches:
        percentage = '{:.1f}%'.format(100 * p.get_width()/total)
        x = p.get_x() + p.get_width() + 0.02
        y = p.get_y() + p.get_height()/2
        ax.annotate(percentage, (x, y))


plt.title('IDE\'s Used Most Often',
         fontsize =30)

plt.xlabel('Frequency', fontsize = 24)
plt.ylabel('IDE\'s', fontsize = 24)

plt.show()

56% of people uses **Jupyter** as most used IDE's. We can see that  29.3% people uses **VS Code**, followed by **PyCharm**. 1.9% people does not use any IDE's ðŸ˜¢.

***Question 10: Which of the following hosted notebook products do you use on a regular basis?***

In [None]:
# Combining All the Question 10 answers

question_10 = np.concatenate([response.Q10_Part_1,
                              response.Q10_Part_2,
                              response.Q10_Part_3,
                              response.Q10_Part_4,
                              response.Q10_Part_5,
                              response.Q10_Part_6,
                              response.Q10_Part_7,
                              response.Q10_Part_8,
                              response.Q10_Part_9,
                              response.Q10_Part_10,
                              response.Q10_Part_11,
                              response.Q10_Part_12,
                              response.Q10_Part_13,
                              response.Q10_OTHER
])

ques_10 = pd.concat([response, pd.DataFrame(question_10)], ignore_index = True, axis = 1)

ques_10.columns = np.append(response.columns.values, 'Q10')

In [None]:
sns.set(font_scale=1.4)
plt.figure(figsize=(17,10))

ax = sns.countplot(y = ques_10.Q10, data = response, palette = "rainbow")

total = len(response['Q10_Part_1'])
for p in ax.patches:
        percentage = '{:.1f}%'.format(100 * p.get_width()/total)
        x = p.get_x() + p.get_width() + 0.02
        y = p.get_y() + p.get_height()/2
        ax.annotate(percentage, (x, y))


plt.title('Hosted Notebook Commonly Used',
         fontsize =30)

plt.xlabel('Frequency', fontsize = 24)
plt.ylabel('Hosted Notebooks\'s', fontsize = 24)

plt.show()

31.6% people uses Hosted **Collab Notebooks** for their ML/DS work. 29.9% people uses **Kaggle Notebooks**. 26.4% people **does not use any hosted Notebook.**

***Question 11: What type of computing platform do you use most often for your data science projects?***

In [None]:
sns.set(font_scale=1.4)
plt.figure(figsize=(17,10))

ax = sns.countplot(y = 'Q11', data = response, palette = "rainbow")

total = len(response['Q11'])
for p in ax.patches:
        percentage = '{:.1f}%'.format(100 * p.get_width()/total)
        x = p.get_x() + p.get_width() + 0.02
        y = p.get_y() + p.get_height()/2
        ax.annotate(percentage, (x, y))


plt.title('Computing Platform Most used',
         fontsize =30)

plt.xlabel('Frequency', fontsize = 24)
plt.ylabel('Computing Platform\'s', fontsize = 24)

plt.show()

Most people uses their **own Machine** for ML/DS work. Only 11.8% people uses a **cloud Computing Platform**. 4.2% uses **Deep Learning Workstations**.

***Question 12: Which types of specialized hardware do you use on a regular basis?***

In [None]:
# Combining All the Question 12 answers

question_12 = np.concatenate([response.Q12_Part_1,
                              response.Q12_Part_2,
                              response.Q12_Part_3,
                              response.Q12_OTHER
])

ques_12 = pd.concat([response, pd.DataFrame(question_12)], ignore_index = True, axis = 1)

ques_12.columns = np.append(response.columns.values, 'Q12')

In [None]:
sns.set(font_scale=1.4)
plt.figure(figsize=(17,10))

ax = sns.countplot(y = ques_12.Q12, data = response, palette = "rainbow")

total = len(response['Q12_Part_1'])
for p in ax.patches:
        percentage = '{:.1f}%'.format(100 * p.get_width()/total)
        x = p.get_x() + p.get_width() + 0.02
        y = p.get_y() + p.get_height()/2
        ax.annotate(percentage, (x, y))


plt.title('Specialized Hardware Used Often',
         fontsize =30)

plt.xlabel('Frequency', fontsize = 24)
plt.ylabel('Specialized Hardware', fontsize = 24)

plt.show()

**GPU's** are used most often. Almost 40% of users **does not use** any Specialized hardware. Only 4.8% people uses **TPU**

***Question 13: Approximately how many times have you used a TPU?***

In [None]:
sns.set(font_scale=1.4)
plt.figure(figsize=(17,10))

ax = sns.countplot(y = 'Q13', data = response, palette = "rainbow")

total = len(response['Q13'])
for p in ax.patches:
        percentage = '{:.1f}%'.format(100 * p.get_width()/total)
        x = p.get_x() + p.get_width() + 0.02
        y = p.get_y() + p.get_height()/2
        ax.annotate(percentage, (x, y))


plt.title('Number of times TPU used',
         fontsize =30)

plt.xlabel('Frequency', fontsize = 24)
plt.ylabel('TPU used', fontsize = 24)

plt.show()

60% of people have never used **TPU's**. 10.5% of people has used **2-5 times**. Very less number of people have used it **More than 25 times**!

***Question 14: What data visualization libraries or tools do you use on a regular basis?***

In [None]:
question_14 = np.concatenate([response.Q14_Part_1,
                              response.Q14_Part_2,
                              response.Q14_Part_3,
                              response.Q14_Part_4,
                              response.Q14_Part_5,
                              response.Q14_Part_6,
                              response.Q14_Part_7,
                              response.Q14_Part_8,
                              response.Q14_Part_9,
                              response.Q14_Part_10,
                              response.Q14_Part_11,
                              response.Q14_OTHER
])

ques_14 = pd.concat([response, pd.DataFrame(question_14)], ignore_index = True, axis = 1)

ques_14.columns = np.append(response.columns.values, 'Q14')

In [None]:
sns.set(font_scale=1.4)
plt.figure(figsize=(17,10))

ax = sns.countplot(y = ques_14.Q14, data = response, palette = "rainbow")

total = len(response['Q14_Part_1'])
for p in ax.patches:
        percentage = '{:.1f}%'.format(100 * p.get_width()/total)
        x = p.get_x() + p.get_width() + 0.02
        y = p.get_y() + p.get_height()/2
        ax.annotate(percentage, (x, y))


plt.title('Visualization libraries or tools do you use on a regular basis?',
         fontsize =30)

plt.xlabel('Frequency', fontsize = 24)
plt.ylabel('Visualization Tools', fontsize = 24)

plt.show()

**Matplotlib** Is used Very Often, followed by **Seaborn**. **Plotly and GGplot** Both stands at 20.6% of users. **Altair** is not used very often. 

***Question 15: For how many years have you used machine learning methods?***

In [None]:
sns.set(font_scale=1.4)
plt.figure(figsize=(17,10))

ax = sns.countplot(y = 'Q15', data = response, palette = "rainbow")

total = len(response['Q15'])
for p in ax.patches:
        percentage = '{:.1f}%'.format(100 * p.get_width()/total)
        x = p.get_x() + p.get_width() + 0.02
        y = p.get_y() + p.get_height()/2
        ax.annotate(percentage, (x, y))


plt.title('For how many years have you used machine learning methods?',
         fontsize =30)

plt.xlabel('Frequency', fontsize = 24)
plt.ylabel('Years', fontsize = 24)

plt.show()

31.5% of people have used it **Under 1 year**. 10.4% of people has **never** used it. We can also see that as the Number of years increases, ML methods are used not used very often!

***Question 16: Which of the following machine learning frameworks do you use on a regular basis?***

In [None]:
question_16 = np.concatenate([response.Q16_Part_1,
                              response.Q16_Part_2,
                              response.Q16_Part_3,
                              response.Q16_Part_4,
                              response.Q16_Part_5,
                              response.Q16_Part_6,
                              response.Q16_Part_7,
                              response.Q16_Part_8,
                              response.Q16_Part_9,
                              response.Q16_Part_10,
                              response.Q16_Part_11,
                              response.Q16_Part_12,
                              response.Q16_Part_13,
                              response.Q16_Part_14,
                              response.Q16_Part_15,
                              response.Q16_OTHER
])

ques_16 = pd.concat([response, pd.DataFrame(question_16)], ignore_index = True, axis = 1)

ques_16.columns = np.append(response.columns.values, 'Q16')

In [None]:
sns.set(font_scale=1.4)
plt.figure(figsize=(17,10))

ax = sns.countplot(y = ques_16.Q16, data = response, palette = "rainbow")

total = len(response['Q16_Part_1'])
for p in ax.patches:
        percentage = '{:.1f}%'.format(100 * p.get_width()/total)
        x = p.get_x() + p.get_width() + 0.02
        y = p.get_y() + p.get_height()/2
        ax.annotate(percentage, (x, y))


plt.title('ML FrameWorks used Often',
         fontsize =30)

plt.xlabel('Frequency', fontsize = 24)
plt.ylabel('ML', fontsize = 24)

plt.show()

Most people are using **Scikit-learn**. 34.6% of people are using **Tensorflow**. **Keras** are used by only 30.9% of users.
**Pytorch** and **Xgboost** are used by 20.9% and 19.6% users respectively!

***Question 17: Which of the following ML algorithms do you use on a regular basis?***

In [None]:
question_17 = np.concatenate([response.Q17_Part_1,
                              response.Q17_Part_2,
                              response.Q17_Part_3,
                              response.Q17_Part_4,
                              response.Q17_Part_5,
                              response.Q17_Part_6,
                              response.Q17_Part_7,
                              response.Q17_Part_8,
                              response.Q17_Part_9,
                              response.Q17_Part_10,
                              response.Q17_Part_11,
                              response.Q17_OTHER
])

ques_17 = pd.concat([response, pd.DataFrame(question_17)], ignore_index = True, axis = 1)

ques_17.columns = np.append(response.columns.values, 'Q17')

In [None]:
sns.set(font_scale=1.4)
plt.figure(figsize=(17,10))

ax = sns.countplot(y = ques_17.Q17, data = response, palette = "rainbow")

total = len(response['Q17_Part_1'])
for p in ax.patches:
        percentage = '{:.1f}%'.format(100 * p.get_width()/total)
        x = p.get_x() + p.get_width() + 0.02
        y = p.get_y() + p.get_height()/2
        ax.annotate(percentage, (x, y))


plt.title('ML Algorithm Used Most Often',
         fontsize =30)

plt.xlabel('Frequency', fontsize = 24)
plt.ylabel('ML Algorithm', fontsize = 24)

plt.show()

**Linear Regression** is used most often! followed by **Decision Tree** or **Random Forest**. **CNN** are used by 29.2% of people. Only 2% of people used other ML Algorithm.....

***Question 18: Which categories of computer vision methods do you use on a regular basis?***

In [None]:
question_18 = np.concatenate([response.Q18_Part_1,
                              response.Q18_Part_2,
                              response.Q18_Part_3,
                              response.Q18_Part_4,
                              response.Q18_Part_5,
                              response.Q18_Part_6,
                              response.Q18_OTHER
])

ques_18 = pd.concat([response, pd.DataFrame(question_18)], ignore_index = True, axis = 1)

ques_18.columns = np.append(response.columns.values, 'Q18')

In [None]:
sns.set(font_scale=1.4)
plt.figure(figsize=(17,10))

ax = sns.countplot(y = ques_18.Q18, data = response, palette = "rainbow")

total = len(response['Q18_Part_1'])
for p in ax.patches:
        percentage = '{:.1f}%'.format(100 * p.get_width()/total)
        x = p.get_x() + p.get_width() + 0.02
        y = p.get_y() + p.get_height()/2
        ax.annotate(percentage, (x, y))


plt.title('Computer Vision Technique Used Most Often',
         fontsize =30)

plt.xlabel('Frequency', fontsize = 24)
plt.ylabel('CV Technique', fontsize = 24)

plt.show()

**Image Classification and Other GP networks** are uesd by 17.5% of people. **Object Detection, Image Segmentation and General purpose image/video** have the same % of people using.

***Question 19: Which of the following natural language processing (NLP) methods do you use on a regular basis?***

In [None]:
question_19 = np.concatenate([response.Q19_Part_1,
                              response.Q19_Part_2,
                              response.Q19_Part_3,
                              response.Q19_Part_4,
                              response.Q19_Part_5,
                              response.Q19_OTHER
])

ques_19 = pd.concat([response, pd.DataFrame(question_19)], ignore_index = True, axis = 1)

ques_19.columns = np.append(response.columns.values, 'Q19')

In [None]:
sns.set(font_scale=1.4)
plt.figure(figsize=(17,10))

ax = sns.countplot(y = ques_19.Q19, data = response, palette = "rainbow")

total = len(response['Q19_Part_1'])
for p in ax.patches:
        percentage = '{:.1f}%'.format(100 * p.get_width()/total)
        x = p.get_x() + p.get_width() + 0.02
        y = p.get_y() + p.get_height()/2
        ax.annotate(percentage, (x, y))


plt.title('NLP methods Used Most Often',
         fontsize =30)

plt.xlabel('Frequency', fontsize = 24)
plt.ylabel('NLP methods', fontsize = 24)

plt.show()

**Word Embeddings/vectors** are most common NLP methods which is used most Often! 5.2% people does not use **NLP technique**. Only .4% uses **Other technique**.

***Question 20: What is the size of the company where you are employed?***

In [None]:
sns.set(font_scale=1.4)
plt.figure(figsize=(17,10))

ax = sns.countplot(y = 'Q20', data = response, palette = "rainbow")

total = len(response['Q20'])
for p in ax.patches:
        percentage = '{:.1f}%'.format(100 * p.get_width()/total)
        x = p.get_x() + p.get_width() + 0.02
        y = p.get_y() + p.get_height()/2
        ax.annotate(percentage, (x, y))


plt.title('Size of the Company where you are Working',
         fontsize =30)

plt.xlabel('Frequency', fontsize = 24)
plt.ylabel('Company Size', fontsize = 24)

plt.show()

Most people are working where employee size is **0-49** which may be a startup company. 11.2% of users are working with company size more than **10,000**.

In [None]:
for name in dir():
    if name[0:2] != "__":
        del globals()[name]

del name
print(dir())

In [None]:
import numpy as np
import pandas as pd

# Viz Library
import seaborn as sns
import matplotlib.pyplot as plt
from matplotlib.pyplot import show

# Reading the Response Dataset
response = pd.read_csv('../input/kaggle-survey-2020/kaggle_survey_2020_responses.csv')
response = response.drop(response.index[0], axis = 0)

***Question 21: Approximately how many individuals are responsible for data science workloads at your place of business?***

In [None]:
sns.set(font_scale=1.4)
plt.figure(figsize=(17,10))

ax = sns.countplot(y = 'Q21', data = response, palette = "rainbow")

total = len(response['Q21'])
for p in ax.patches:
        percentage = '{:.1f}%'.format(100 * p.get_width()/total)
        x = p.get_x() + p.get_width() + 0.02
        y = p.get_y() + p.get_height()/2
        ax.annotate(percentage, (x, y))


plt.title('Total People responsible for data science workload',
         fontsize =30)

plt.xlabel('Frequency', fontsize = 24)
plt.ylabel('Size', fontsize = 24)

plt.show()

***Question 22: Does your current employer incorporate machine learning methods into their business?***

In [None]:
sns.set(font_scale=1.4)
plt.figure(figsize=(17,10))

ax = sns.countplot(y = 'Q22', data = response, palette = "rainbow")

total = len(response['Q22'])
for p in ax.patches:
        percentage = '{:.1f}%'.format(100 * p.get_width()/total)
        x = p.get_x() + p.get_width() + 0.02
        y = p.get_y() + p.get_height()/2
        ax.annotate(percentage, (x, y))


plt.title('Employer incorporate machine learning technique',
         fontsize =30)

plt.xlabel('Frequency', fontsize = 24)
plt.ylabel('Responses', fontsize = 24)

plt.show()

***Question 23: Select any activities that make up an important part of your role at work?***

In [None]:
question_23 = np.concatenate([response.Q23_Part_1,
                              response.Q23_Part_2,
                              response.Q23_Part_3,
                              response.Q23_Part_4,
                              response.Q23_Part_5,
                              response.Q23_Part_6,
                              response.Q23_Part_7,
                              response.Q23_OTHER
])

ques_23 = pd.concat([response, pd.DataFrame(question_23)], ignore_index = True, axis = 1)

ques_23.columns = np.append(response.columns.values, 'Q23')

In [None]:
sns.set(font_scale=1.4)
plt.figure(figsize=(17,10))

ax = sns.countplot(y = ques_23.Q23, data = response, palette = "rainbow")

total = len(response['Q23_Part_1'])
for p in ax.patches:
        percentage = '{:.1f}%'.format(100 * p.get_width()/total)
        x = p.get_x() + p.get_width() + 0.02
        y = p.get_y() + p.get_height()/2
        ax.annotate(percentage, (x, y))


plt.title('Role Done by you at the current Company',
         fontsize =30)

plt.xlabel('Frequency', fontsize = 24)
plt.ylabel('Work', fontsize = 24)

plt.show()

Most of the Role done at the company is **Analyzing and understanding the data** which is very important before moving to ML/DS work!!

***Question 24: What is your current yearly compensation (approximate $USD)?***

In [None]:
sns.set(font_scale=1.4)
plt.figure(figsize=(17,10))

ax = sns.countplot(y = 'Q24', data = response, palette = "rainbow")

total = len(response['Q24'])
for p in ax.patches:
        percentage = '{:.1f}%'.format(100 * p.get_width()/total)
        x = p.get_x() + p.get_width() + 0.02
        y = p.get_y() + p.get_height()/2
        ax.annotate(percentage, (x, y))


plt.title('Yearly Compensation',
         fontsize =30)

plt.xlabel('Frequency', fontsize = 24)
plt.ylabel('Compensation', fontsize = 24)

plt.show()

***Question 25: Approximately how much money have you (or your team) spent on machine learning and/or cloud computing services at home (or at work) in the past 5 years (approximate $USD)?***

In [None]:
sns.set(font_scale=1.4)
plt.figure(figsize=(17,10))

ax = sns.countplot(y = 'Q25', data = response, palette = "rainbow")

total = len(response['Q25'])
for p in ax.patches:
        percentage = '{:.1f}%'.format(100 * p.get_width()/total)
        x = p.get_x() + p.get_width() + 0.02
        y = p.get_y() + p.get_height()/2
        ax.annotate(percentage, (x, y))


plt.title('Money Spent in Cloud computing and Machine Learing by team in past 5 Year',
         fontsize =30)

plt.xlabel('Frequency', fontsize = 24)
plt.ylabel('Money', fontsize = 24)

plt.show()

***Question 26: Which of the following cloud computing platforms do you use on a regular basis? Part-A***

In [None]:
question_26 = np.concatenate([response.Q26_A_Part_1,
                              response.Q26_A_Part_2,
                              response.Q26_A_Part_3,
                              response.Q26_A_Part_4,
                              response.Q26_A_Part_5,
                              response.Q26_A_Part_6,
                              response.Q26_A_Part_7,
                              response.Q26_A_Part_8,
                              response.Q26_A_Part_9,
                              response.Q26_A_Part_10,
                              response.Q26_A_Part_11,
                              response.Q26_A_OTHER
])

ques_26 = pd.concat([response, pd.DataFrame(question_26)], ignore_index = True, axis = 1)

ques_26.columns = np.append(response.columns.values, 'Q26')

In [None]:
sns.set(font_scale=1.4)
plt.figure(figsize=(17,10))

ax = sns.countplot(y = ques_26.Q26, data = response, palette = "rainbow")

total = len(response['Q26_A_Part_1'])
for p in ax.patches:
        percentage = '{:.5}%'.format(100 * p.get_width()/total)
        x = p.get_x() + p.get_width() + 0.02
        y = p.get_y() + p.get_height()/2
        ax.annotate(percentage, (x, y))


plt.title('Cloud Computing Platform Uses Regular',
         fontsize =30)

plt.xlabel('Frequency', fontsize = 24)
plt.ylabel('Cloud Service Provider', fontsize = 24)

plt.show()

***Question 26 Part-B: Which of the following cloud computing platforms do you hope to become more familiar with in the next 2 years?***

In [None]:
question_26_b = np.concatenate([response.Q26_B_Part_1,
                              response.Q26_B_Part_2,
                              response.Q26_B_Part_3,
                              response.Q26_B_Part_4,
                              response.Q26_B_Part_5,
                              response.Q26_B_Part_6,
                              response.Q26_B_Part_7,
                              response.Q26_B_Part_8,
                              response.Q26_B_Part_9,
                              response.Q26_B_Part_10,
                              response.Q26_B_Part_11,
                              response.Q26_B_OTHER
])

ques_26_b = pd.concat([response, pd.DataFrame(question_26_b)], ignore_index = True, axis = 1)

ques_26_b.columns = np.append(response.columns.values, 'Q26')

In [None]:
sns.set(font_scale=1.4)
plt.figure(figsize=(17,10))

ax = sns.countplot(y = ques_26_b.Q26, data = response, palette = "rainbow")

total = len(response['Q26_B_Part_1'])
for p in ax.patches:
        percentage = '{:.5}%'.format(100 * p.get_width()/total)
        x = p.get_x() + p.get_width() + 0.02
        y = p.get_y() + p.get_height()/2
        ax.annotate(percentage, (x, y))


plt.title('Which of the following cloud computing platforms do you hope to become more familiar with in the next 2 years',
         fontsize =30)

plt.xlabel('Frequency', fontsize = 24)
plt.ylabel('Cloud Service Provider', fontsize = 24)

plt.show()

***Question 27-Part_A: Do you use any of the following cloud computing products on a regular basis?***

In [None]:
question_27 = np.concatenate([response.Q27_A_Part_1,
                              response.Q27_A_Part_2,
                              response.Q27_A_Part_3,
                              response.Q27_A_Part_4,
                              response.Q27_A_Part_5,
                              response.Q27_A_Part_6,
                              response.Q27_A_Part_7,
                              response.Q27_A_Part_8,
                              response.Q27_A_Part_9,
                              response.Q27_A_Part_10,
                              response.Q27_A_Part_11,
                              response.Q27_A_OTHER
])

ques_27 = pd.concat([response, pd.DataFrame(question_27)], ignore_index = True, axis = 1)

ques_27.columns = np.append(response.columns.values, 'Q27')

In [None]:
sns.set(font_scale=1.4)
plt.figure(figsize=(17,10))

ax = sns.countplot(y = ques_27.Q27, data = response, palette = "rainbow")

total = len(response['Q27_A_Part_1'])
for p in ax.patches:
        percentage = '{:.3}%'.format(100 * p.get_width()/total)
        x = p.get_x() + p.get_width() + 0.02
        y = p.get_y() + p.get_height()/2
        ax.annotate(percentage, (x, y))


plt.title('Cloud Computing Products used',
         fontsize =30)

plt.xlabel('Frequency', fontsize = 24)
plt.ylabel('Products', fontsize = 24)

plt.show()

**Amazon EC-2** product is used by more than 8.24% of users.

***Question 27-Part_B: In the next 2 years, do you hope to become more familiar with any of these specific cloud computing products?***

In [None]:
question_27_b = np.concatenate([response.Q27_B_Part_1,
                              response.Q27_B_Part_2,
                              response.Q27_B_Part_3,
                              response.Q27_B_Part_4,
                              response.Q27_B_Part_5,
                              response.Q27_B_Part_6,
                              response.Q27_B_Part_7,
                              response.Q27_B_Part_8,
                              response.Q27_B_Part_9,
                              response.Q27_B_Part_10,
                              response.Q27_B_Part_11,
                              response.Q27_B_OTHER
])

ques_27_b = pd.concat([response, pd.DataFrame(question_27_b)], ignore_index = True, axis = 1)

ques_27_b.columns = np.append(response.columns.values, 'Q27_b')

In [None]:
sns.set(font_scale=1.4)
plt.figure(figsize=(17,10))

ax = sns.countplot(y = ques_27_b.Q27_b, data = response, palette = "rainbow")

total = len(response['Q27_B_Part_1'])
for p in ax.patches:
        percentage = '{:.3}%'.format(100 * p.get_width()/total)
        x = p.get_x() + p.get_width() + 0.02
        y = p.get_y() + p.get_height()/2
        ax.annotate(percentage, (x, y))


plt.title('In the next 2 years, do you hope to become more familiar with any of these specific cloud computing products?',
         fontsize =30)

plt.xlabel('Frequency', fontsize = 24)
plt.ylabel('Responses', fontsize = 24)

plt.show()

Most people are going to learn **Google Cloud Compute Engine** in the next 2 years!

***Question 28_A: Do you use any of the following machine learning products on a regular basis?***

In [None]:
question_28 = np.concatenate([response.Q28_A_Part_1,
                              response.Q28_A_Part_2,
                              response.Q28_A_Part_3,
                              response.Q28_A_Part_4,
                              response.Q28_A_Part_5,
                              response.Q28_A_Part_6,
                              response.Q28_A_Part_7,
                              response.Q28_A_Part_8,
                              response.Q28_A_Part_9,
                              response.Q28_A_Part_10,
                              response.Q28_A_OTHER
])

ques_28 = pd.concat([response, pd.DataFrame(question_28)], ignore_index = True, axis = 1)

ques_28.columns = np.append(response.columns.values, 'Q28')

In [None]:
sns.set(font_scale=1.4)
plt.figure(figsize=(17,10))

ax = sns.countplot(y = ques_28.Q28, data = response, palette = "rainbow")

total = len(response['Q28_A_Part_1'])
for p in ax.patches:
        percentage = '{:.3}%'.format(100 * p.get_width()/total)
        x = p.get_x() + p.get_width() + 0.02
        y = p.get_y() + p.get_height()/2
        ax.annotate(percentage, (x, y))


plt.title('Machine Learning Products used often',
         fontsize =30)

plt.xlabel('Frequency', fontsize = 24)
plt.ylabel('ML Products', fontsize = 24)

plt.show()

***Question 28_B: In the next 2 years, do you hope to become more familiar with any of these specific machine learning products?***

In [None]:
question_28_b = np.concatenate([response.Q28_B_Part_1,
                              response.Q28_B_Part_2,
                              response.Q28_B_Part_3,
                              response.Q28_B_Part_4,
                              response.Q28_B_Part_5,
                              response.Q28_B_Part_6,
                              response.Q28_B_Part_7,
                              response.Q28_B_Part_8,
                              response.Q28_B_Part_9,
                              response.Q28_B_Part_10,
                              response.Q28_B_OTHER
])

ques_28_b = pd.concat([response, pd.DataFrame(question_28_b)], ignore_index = True, axis = 1)

ques_28_b.columns = np.append(response.columns.values, 'Q28_b')

In [None]:
sns.set(font_scale=1.4)
plt.figure(figsize=(17,10))

ax = sns.countplot(y = ques_28_b.Q28_b, data = response, palette = "rainbow")

total = len(response['Q28_B_Part_1'])
for p in ax.patches:
        percentage = '{:.3}%'.format(100 * p.get_width()/total)
        x = p.get_x() + p.get_width() + 0.02
        y = p.get_y() + p.get_height()/2
        ax.annotate(percentage, (x, y))


plt.title('In the next 2 years, do you hope to become more familiar with any of these specific machine learning products?',
         fontsize =30)

plt.xlabel('Frequency', fontsize = 24)
plt.ylabel('Responses', fontsize = 24)

plt.show()

***Question 29_A: Which of the following big data products (relational databases, data warehouses, data lakes, or similar) do you use on a regular basis?***

In [None]:
question_29 = np.concatenate([response.Q29_A_Part_1,
                              response.Q29_A_Part_2,
                              response.Q29_A_Part_3,
                              response.Q29_A_Part_4,
                              response.Q29_A_Part_5,
                              response.Q29_A_Part_6,
                              response.Q29_A_Part_7,
                              response.Q29_A_Part_8,
                              response.Q29_A_Part_9,
                              response.Q29_A_Part_10,
                              response.Q29_A_Part_11,
                              response.Q29_A_Part_12,
                              response.Q29_A_Part_13,
                              response.Q29_A_Part_14,
                              response.Q29_A_Part_15,
                              response.Q29_A_Part_16,
                              response.Q29_A_Part_17,
                              response.Q29_A_OTHER
])

ques_29 = pd.concat([response, pd.DataFrame(question_29)], ignore_index = True, axis = 1)

ques_29.columns = np.append(response.columns.values, 'Q29')

In [None]:
sns.set(font_scale=1.4)
plt.figure(figsize=(17,10))

ax = sns.countplot(y = ques_29.Q29, data = response, palette = "rainbow")

total = len(response['Q29_A_Part_1'])
for p in ax.patches:
        percentage = '{:.3}%'.format(100 * p.get_width()/total)
        x = p.get_x() + p.get_width() + 0.02
        y = p.get_y() + p.get_height()/2
        ax.annotate(percentage, (x, y))


plt.title('Big Data Products used often',
         fontsize =30)

plt.xlabel('Frequency', fontsize = 24)
plt.ylabel('Big Data Products', fontsize = 24)

plt.show()

**MySQL** is most used Big Data Products. **PostgresSQL** is used more after **MySQL**!

***Question 29_B: Which of the following big data products (relational databases, data warehouses, data lakes, or similar) do you hope to become more familiar with in the next 2 years?***

In [None]:
question_29_B = np.concatenate([response.Q29_B_Part_1,
                              response.Q29_B_Part_2,
                              response.Q29_B_Part_3,
                              response.Q29_B_Part_4,
                              response.Q29_B_Part_5,
                              response.Q29_B_Part_6,
                              response.Q29_B_Part_7,
                              response.Q29_B_Part_8,
                              response.Q29_B_Part_9,
                              response.Q29_B_Part_10,
                              response.Q29_B_Part_11,
                              response.Q29_B_Part_12,
                              response.Q29_B_Part_13,
                              response.Q29_B_Part_14,
                              response.Q29_B_Part_15,
                              response.Q29_B_Part_16,
                              response.Q29_B_Part_17,
                              response.Q29_B_OTHER
])

ques_29_b = pd.concat([response, pd.DataFrame(question_29_B)], ignore_index = True, axis = 1)

ques_29_b.columns = np.append(response.columns.values, 'Q29_b')

In [None]:
sns.set(font_scale=1.4)
plt.figure(figsize=(17,10))

ax = sns.countplot(y = ques_29_b.Q29_b, data = response, palette = "rainbow")

total = len(response['Q29_B_Part_1'])
for p in ax.patches:
        percentage = '{:.3}%'.format(100 * p.get_width()/total)
        x = p.get_x() + p.get_width() + 0.02
        y = p.get_y() + p.get_height()/2
        ax.annotate(percentage, (x, y))


plt.title('Which of the following big data products do you hope to become more familiar with in the next 2 years?\n',
         fontsize =30)

plt.xlabel('Frequency', fontsize = 24)
plt.ylabel('Big Data Products', fontsize = 24)

plt.show()

***Question 30: Which of the following big data products (relational database, data warehouse, data lake, or similar) do you use most often?***

In [None]:
sns.set(font_scale=1.4)
plt.figure(figsize=(17,10))

ax = sns.countplot(y = 'Q30', data = response, palette = "rainbow")

total = len(response['Q30'])
for p in ax.patches:
        percentage = '{:.1f}%'.format(100 * p.get_width()/total)
        x = p.get_x() + p.get_width() + 0.02
        y = p.get_y() + p.get_height()/2
        ax.annotate(percentage, (x, y))


plt.title('Which of the following big data products (relational database, data warehouse, data lake, or similar) do you use most often?',
         fontsize =30)

plt.xlabel('Frequency', fontsize = 24)
plt.ylabel('Big Data Products', fontsize = 24)

plt.show()

***Question 31: Which of the following business intelligence tools do you use on a regular basis?***

In [None]:
question_31 = np.concatenate([response.Q31_A_Part_1,
                              response.Q31_A_Part_2,
                              response.Q31_A_Part_3,
                              response.Q31_A_Part_4,
                              response.Q31_A_Part_5,
                              response.Q31_A_Part_6,
                              response.Q31_A_Part_7,
                              response.Q31_A_Part_8,
                              response.Q31_A_Part_9,
                              response.Q31_A_Part_10,
                              response.Q31_A_Part_11,
                              response.Q31_A_Part_12,
                              response.Q31_A_Part_13,
                              response.Q31_A_Part_14,
                              response.Q31_A_OTHER
])

ques_31 = pd.concat([response, pd.DataFrame(question_31)], ignore_index = True, axis = 1)

ques_31.columns = np.append(response.columns.values, 'Q31')

In [None]:
sns.set(font_scale=1.4)
plt.figure(figsize=(17,10))

ax = sns.countplot(y = ques_31.Q31, data = response, palette = "rainbow")

total = len(response['Q31_A_Part_1'])
for p in ax.patches:
        percentage = '{:.3}%'.format(100 * p.get_width()/total)
        x = p.get_x() + p.get_width() + 0.02
        y = p.get_y() + p.get_height()/2
        ax.annotate(percentage, (x, y))


plt.title('Business intelligence tools do you use on a regular basis?',
         fontsize =30)

plt.xlabel('Frequency', fontsize = 24)
plt.ylabel('BI tools', fontsize = 24)

plt.show()

***Question 31_b: Which of the following business intelligence tools do you hope to become more familiar with in the next 2 years?***

In [None]:
question_31_b = np.concatenate([response.Q31_B_Part_1,
                              response.Q31_B_Part_2,
                              response.Q31_B_Part_3,
                              response.Q31_B_Part_4,
                              response.Q31_B_Part_5,
                              response.Q31_B_Part_6,
                              response.Q31_B_Part_7,
                              response.Q31_B_Part_8,
                              response.Q31_B_Part_9,
                              response.Q31_B_Part_10,
                              response.Q31_B_Part_11,
                              response.Q31_B_Part_12,
                              response.Q31_B_Part_13,
                              response.Q31_B_Part_14,
                              response.Q31_B_OTHER
])

ques_31_b = pd.concat([response, pd.DataFrame(question_31_b)], ignore_index = True, axis = 1)

ques_31_b.columns = np.append(response.columns.values, 'Q31')

In [None]:
sns.set(font_scale=1.4)
plt.figure(figsize=(17,10))

ax = sns.countplot(y = ques_31_b.Q31, data = response, palette = "rainbow")

total = len(response['Q31_B_Part_1'])
for p in ax.patches:
        percentage = '{:.3}%'.format(100 * p.get_width()/total)
        x = p.get_x() + p.get_width() + 0.02
        y = p.get_y() + p.get_height()/2
        ax.annotate(percentage, (x, y))


plt.title('business intelligence tools do you hope to become more familiar with in the next 2 years?',
         fontsize =30)

plt.xlabel('Frequency', fontsize = 24)
plt.ylabel('BI tools', fontsize = 24)

plt.show()

***Question 32: Which of the following business intelligence tools do you use most often?***

In [None]:
sns.set(font_scale=1.4)
plt.figure(figsize=(17,10))

ax = sns.countplot(y = 'Q32', data = response, palette = "rainbow")

total = len(response['Q32'])
for p in ax.patches:
        percentage = '{:.1f}%'.format(100 * p.get_width()/total)
        x = p.get_x() + p.get_width() + 0.02
        y = p.get_y() + p.get_height()/2
        ax.annotate(percentage, (x, y))


plt.title('Business Intelligence Tools used Often',
         fontsize =30)

plt.xlabel('Frequency', fontsize = 24)
plt.ylabel('BI tools', fontsize = 24)

plt.show()

In [None]:
for name in dir():
    if name[0:2] != "__":
        del globals()[name]

del name
print(dir())

In [None]:
import numpy as np
import pandas as pd

# Viz Library
import seaborn as sns
import matplotlib.pyplot as plt
from matplotlib.pyplot import show

# Reading the Response Dataset
response = pd.read_csv('../input/kaggle-survey-2020/kaggle_survey_2020_responses.csv')
response = response.drop(response.index[0], axis = 0)

***Question 33: Do you use any automated machine learning tools (or partial AutoML tools) on a regular basis?***

In [None]:
question_33 = np.concatenate([response.Q33_A_Part_1,
                              response.Q33_A_Part_2,
                              response.Q33_A_Part_3,
                              response.Q33_A_Part_4,
                              response.Q33_A_Part_5,
                              response.Q33_A_Part_6,
                              response.Q33_A_Part_7,
                              response.Q33_A_OTHER
])

ques_33 = pd.concat([response, pd.DataFrame(question_33)], ignore_index = True, axis = 1)

ques_33.columns = np.append(response.columns.values, 'Q33')

In [None]:
sns.set(font_scale=1.4)
plt.figure(figsize=(17,10))

ax = sns.countplot(y = ques_33.Q33, data = response, palette = "rainbow")

total = len(response['Q33_A_Part_1'])
for p in ax.patches:
        percentage = '{:.3}%'.format(100 * p.get_width()/total)
        x = p.get_x() + p.get_width() + 0.02
        y = p.get_y() + p.get_height()/2
        ax.annotate(percentage, (x, y))


plt.title('automated machine learning tools used on a regular basis',
         fontsize =30)

plt.xlabel('Frequency', fontsize = 24)
plt.ylabel('Automated ML tools', fontsize = 24)

plt.show()

***Question 33_b: Which categories of automated machine learning tools (or partial AutoML tools) do you hope to become more familiar with in the next 2 years?***

In [None]:
question_33_b = np.concatenate([response.Q33_B_Part_1,
                              response.Q33_B_Part_2,
                              response.Q33_B_Part_3,
                              response.Q33_B_Part_4,
                              response.Q33_B_Part_5,
                              response.Q33_B_Part_6,
                              response.Q33_B_Part_7,
                              response.Q33_B_OTHER
])

ques_33_b = pd.concat([response, pd.DataFrame(question_33_b)], ignore_index = True, axis = 1)

ques_33_b.columns = np.append(response.columns.values, 'Q33')

In [None]:
sns.set(font_scale=1.4)
plt.figure(figsize=(17,10))

ax = sns.countplot(y = ques_33_b.Q33, data = response, palette = "rainbow")

total = len(response['Q33_B_Part_1'])
for p in ax.patches:
        percentage = '{:.3}%'.format(100 * p.get_width()/total)
        x = p.get_x() + p.get_width() + 0.02
        y = p.get_y() + p.get_height()/2
        ax.annotate(percentage, (x, y))


plt.title('Which categories of machine learning tools (or partial AutoML tools) do you hope to become more familiar with in the next 2 years?',
         fontsize =30)

plt.xlabel('Frequency', fontsize = 24)
plt.ylabel('Automated/Partial ML tools', fontsize = 24)

plt.show()

***Question 34: Which of the following automated machine learning tools (or partial AutoML tools) do you use on a regular basis?***

In [None]:
question_34 = np.concatenate([response.Q34_A_Part_1,
                              response.Q34_A_Part_2,
                              response.Q34_A_Part_3,
                              response.Q34_A_Part_4,
                              response.Q34_A_Part_5,
                              response.Q34_A_Part_6,
                              response.Q34_A_Part_7,
                              response.Q34_A_Part_8,
                              response.Q34_A_Part_9,
                              response.Q34_A_Part_10,
                              response.Q34_A_Part_11,
                              response.Q34_A_OTHER
])

ques_34 = pd.concat([response, pd.DataFrame(question_34)], ignore_index = True, axis = 1)

ques_34.columns = np.append(response.columns.values, 'Q34')

In [None]:
sns.set(font_scale=1.4)
plt.figure(figsize=(17,10))

ax = sns.countplot(y = ques_34.Q34, data = response, palette = "rainbow")

total = len(response['Q34_A_Part_1'])
for p in ax.patches:
        percentage = '{:.3}%'.format(100 * p.get_width()/total)
        x = p.get_x() + p.get_width() + 0.02
        y = p.get_y() + p.get_height()/2
        ax.annotate(percentage, (x, y))


plt.title('Which of the following automated machine learning tools (or partial AutoML tools) do you use on a regular basis?',
         fontsize =30)

plt.xlabel('Frequency', fontsize = 24)
plt.ylabel('Automated or Partial ML tools', fontsize = 24)

plt.show()

***Question 34_B: Which specific automated machine learning tools (or partial AutoML tools) do you hope to become more familiar with in the next 2 years?***

In [None]:
question_34_b = np.concatenate([response.Q34_B_Part_1,
                              response.Q34_B_Part_2,
                              response.Q34_B_Part_3,
                              response.Q34_B_Part_4,
                              response.Q34_B_Part_5,
                              response.Q34_B_Part_6,
                              response.Q34_B_Part_7,
                              response.Q34_B_Part_8,
                              response.Q34_B_Part_9,
                              response.Q34_B_Part_10,
                              response.Q34_B_Part_11,
                              response.Q34_B_OTHER
])

ques_34_b = pd.concat([response, pd.DataFrame(question_34_b)], ignore_index = True, axis = 1)

ques_34_b.columns = np.append(response.columns.values, 'Q34_b')

In [None]:
sns.set(font_scale=1.4)
plt.figure(figsize=(17,10))

ax = sns.countplot(y = ques_34_b.Q34_b, data = response, palette = "rainbow")

total = len(response['Q34_B_Part_1'])
for p in ax.patches:
        percentage = '{:.3}%'.format(100 * p.get_width()/total)
        x = p.get_x() + p.get_width() + 0.02
        y = p.get_y() + p.get_height()/2
        ax.annotate(percentage, (x, y))


plt.title('Automated/spartial machine learning tool do you hope to become more familiar with in the next 2 years?',
         fontsize =30)

plt.xlabel('Frequency', fontsize = 24)
plt.ylabel('Automated or Partial ML tools', fontsize = 24)

plt.show()

***Question 35: Do you use any tools to help manage machine learning experiments?***

In [None]:
question_35 = np.concatenate([response.Q35_A_Part_1,
                              response.Q35_A_Part_2,
                              response.Q35_A_Part_3,
                              response.Q35_A_Part_4,
                              response.Q35_A_Part_5,
                              response.Q35_A_Part_6,
                              response.Q35_A_Part_7,
                              response.Q35_A_Part_8,
                              response.Q35_A_Part_9,
                              response.Q35_A_Part_10,
                              response.Q35_A_OTHER
])

ques_35 = pd.concat([response, pd.DataFrame(question_35)], ignore_index = True, axis = 1)

ques_35.columns = np.append(response.columns.values, 'Q35')

In [None]:
sns.set(font_scale=1.4)
plt.figure(figsize=(17,10))

ax = sns.countplot(y = ques_35.Q35, data = response, palette = "rainbow")

total = len(response['Q35_A_Part_1'])
for p in ax.patches:
        percentage = '{:.3}%'.format(100 * p.get_width()/total)
        x = p.get_x() + p.get_width() + 0.02
        y = p.get_y() + p.get_height()/2
        ax.annotate(percentage, (x, y))


plt.title('Do you use any tools to help manage machine learning experiments?',
         fontsize =30)

plt.xlabel('Frequency', fontsize = 24)
plt.ylabel('Tools', fontsize = 24)

plt.show()

***Question 35_b: In the next 2 years, do you hope to become more familiar with any of these tools for managing ML experiments?***

In [None]:
question_35_b = np.concatenate([response.Q35_B_Part_1,
                              response.Q35_B_Part_2,
                              response.Q35_B_Part_3,
                              response.Q35_B_Part_4,
                              response.Q35_B_Part_5,
                              response.Q35_B_Part_6,
                              response.Q35_B_Part_7,
                              response.Q35_B_Part_8,
                              response.Q35_B_Part_9,
                              response.Q35_B_Part_10,
                              response.Q35_B_OTHER
])

ques_35_b = pd.concat([response, pd.DataFrame(question_35_b)], ignore_index = True, axis = 1)

ques_35_b.columns = np.append(response.columns.values, 'Q35')

In [None]:
sns.set(font_scale=1.4)
plt.figure(figsize=(17,10))

ax = sns.countplot(y = ques_35.Q35, data = response, palette = "rainbow")

total = len(response['Q35_A_Part_1'])
for p in ax.patches:
        percentage = '{:.3}%'.format(100 * p.get_width()/total)
        x = p.get_x() + p.get_width() + 0.02
        y = p.get_y() + p.get_height()/2
        ax.annotate(percentage, (x, y))


plt.title('In the next 2 years, do you hope to become more familiar with any of these tools for managing ML experiments?',
         fontsize =30)

plt.xlabel('Frequency', fontsize = 24)
plt.ylabel('Tools', fontsize = 24)

plt.show()

23.6 people said **No/None** to become more familiar with any of these tools for managing ML experiments. 

***Question 36: Where do you publicly share or deploy your data analysis or machine learning applications?***

In [None]:
question_36 = np.concatenate([response.Q36_Part_1,
                              response.Q36_Part_2,
                              response.Q36_Part_3,
                              response.Q36_Part_4,
                              response.Q36_Part_5,
                              response.Q36_Part_6,
                              response.Q36_Part_7,
                              response.Q36_Part_8,
                              response.Q36_Part_9,
                              response.Q36_OTHER
])

ques_36 = pd.concat([response, pd.DataFrame(question_36)], ignore_index = True, axis = 1)

ques_36.columns = np.append(response.columns.values, 'Q36')

In [None]:
sns.set(font_scale=1.4)
plt.figure(figsize=(17,10))

ax = sns.countplot(y = ques_36.Q36, data = response, palette = "rainbow")

total = len(response['Q36_Part_1'])
for p in ax.patches:
        percentage = '{:.3}%'.format(100 * p.get_width()/total)
        x = p.get_x() + p.get_width() + 0.02
        y = p.get_y() + p.get_height()/2
        ax.annotate(percentage, (x, y))


plt.title('Where do you Publicly share or Deploy your DA or ML application',
         fontsize =30)

plt.xlabel('Frequency', fontsize = 24)
plt.ylabel('Deploy tools', fontsize = 24)

plt.show()

11.6% People **Does not share/Deploy** their Application, which is a sad news ðŸ˜¢. Most people uses **GitHub** for deploying/sharing the code. 9.37% people uses **Kaggle** 

***Question 37: On which platforms have you begun or completed data science courses?***

In [None]:
question_37 = np.concatenate([response.Q37_Part_1,
                              response.Q37_Part_2,
                              response.Q37_Part_3,
                              response.Q37_Part_4,
                              response.Q37_Part_5,
                              response.Q37_Part_6,
                              response.Q37_Part_7,
                              response.Q37_Part_8,
                              response.Q37_Part_9,
                              response.Q37_Part_10,
                              response.Q37_Part_11,
                              response.Q37_OTHER
])

ques_37 = pd.concat([response, pd.DataFrame(question_37)], ignore_index = True, axis = 1)

ques_37.columns = np.append(response.columns.values, 'Q37')

In [None]:
sns.set(font_scale=1.4)
plt.figure(figsize=(17,10))

ax = sns.countplot(y = ques_37.Q37, data = response, palette = "rainbow")

total = len(response['Q37_Part_1'])
for p in ax.patches:
        percentage = '{:.3}%'.format(100 * p.get_width()/total)
        x = p.get_x() + p.get_width() + 0.02
        y = p.get_y() + p.get_height()/2
        ax.annotate(percentage, (x, y))


plt.title('Which platforms have you have begin or completed data science course?',
         fontsize =30)

plt.xlabel('Frequency', fontsize = 24)
plt.ylabel('Online Learning Platform', fontsize = 24)

plt.show()

Most people have completed courses from **Coursera**. 24.2% People have completed the course from **Kaggle Micro-course**. 23.1% people have completed their online course from **Udemy**. Only 5.28% people have completed courses from **Fast.ai**

***Question 38: What is the primary tool that you use at work or school to analyze data?***

In [None]:
sns.set(font_scale=1.4)
plt.figure(figsize=(17,10))

ax = sns.countplot(y = 'Q38', data = response, palette = "rainbow")

total = len(response['Q38'])
for p in ax.patches:
        percentage = '{:.1f}%'.format(100 * p.get_width()/total)
        x = p.get_x() + p.get_width() + 0.02
        y = p.get_y() + p.get_height()/2
        ax.annotate(percentage, (x, y))


plt.title('Primary Tool for analyzing data',
         fontsize =30)

plt.xlabel('Frequency', fontsize = 24)
plt.ylabel('Tools', fontsize = 24)

plt.show()

**`Local Development environment`s** are the most used tools. We can see that 21.1 uses% **`Basic Statistical software`**. Only 3.4% people uses **`Cloud-based`** Data software.

***Question 39: Who/what are your favorite media sources that report on data science topics?***

In [None]:
question_39 = np.concatenate([response.Q39_Part_1,
                              response.Q39_Part_2,
                              response.Q39_Part_3,
                              response.Q39_Part_4,
                              response.Q39_Part_5,
                              response.Q39_Part_6,
                              response.Q39_Part_7,
                              response.Q39_Part_8,
                              response.Q39_Part_9,
                              response.Q39_Part_10,
                              response.Q39_Part_11,
                              response.Q39_OTHER
])

ques_39 = pd.concat([response, pd.DataFrame(question_39)], ignore_index = True, axis = 1)

ques_39.columns = np.append(response.columns.values, 'Q39')

In [None]:
sns.set(font_scale=1.4)
plt.figure(figsize=(17,10))

ax = sns.countplot(y = ques_39.Q39, data = response, palette = "rainbow")

total = len(response['Q39_Part_1'])
for p in ax.patches:
        percentage = '{:.3}%'.format(100 * p.get_width()/total)
        x = p.get_x() + p.get_width() + 0.02
        y = p.get_y() + p.get_height()/2
        ax.annotate(percentage, (x, y))


plt.title('Favorite media sources that report on data science topics',
         fontsize =30)

plt.xlabel('Frequency', fontsize = 24)
plt.ylabel('Media Resources', fontsize = 24)

plt.show()

Mostly People uses **`Kaggle`**, **`Youtube`** is ranked 2nd, followed by **`Blogs`**