**Welcome here..**   
Here I have presented an analysis on what the survey data contains about different aspects of Bangladeshi Kaggalers in the Kaggle community. It's an ongoing work, so any kinds of suggestions or recommendations are welcomed. If you find this useful please consider upvoting it after your time.

In [None]:
import os
import numpy as np 
import pandas as pd 
import plotly.express as px
import seaborn as sns
from plotly.subplots import make_subplots
import plotly.graph_objects as go

import plotly.offline as pyo
pyo.init_notebook_mode()
pd.set_option('display.max_rows', 500)
pd.set_option('display.max_columns', 500)
pd.set_option('display.width', 1000)

# Where are we?

In [None]:
import plotly.express as px
fig = px.choropleth(locations=["BGD"], locationmode = "ISO-3", color_continuous_scale="Plasma")
fig.show()

**Let's have a glimpse where we are in the global map(marked in blue)..**

In [None]:
#reading data in pandas dataframe
df_survey_20 = pd.read_csv("../input/kaggle-survey-2020/kaggle_survey_2020_responses.csv", low_memory=False)
df_survey_19 = pd.read_csv("../input/kagglesurvey2019/kaggle-survey-2019/multiple_choice_responses.csv", low_memory=False)
df_survey_18 = pd.read_csv("../input/kaggle-survey-2018/multipleChoiceResponses.csv", low_memory=False)
df_survey_17 = pd.read_csv("../input/kaggle-survey-2017/multipleChoiceResponses.csv", low_memory=False, encoding='ISO-8859-1')

A question analysis from the year between 2018 to 2020 can be found [here](https://www.kaggle.com/sharif485/what-kaggle-is-asking-over-the-time-2018-2020). I have analyzed all the question to find out the range of questions Kaggle were asking over time. That notebook helped a lot to create this notebook. Feel free to check that out.

# We are not in kaggle-survey-2017 data!

Lets start with **kaggle-survey-2017** data.

In [None]:
df_survey_17.Country.unique()

It seems that Bangladesh is not in the list of countries!!! As per the rules by [kaggle-survey-2017](https://www.kaggle.com/kaggle/kaggle-survey-2017), if total survey_response count for a country exceeds 50 only then the country name will be appeared in country list otherwise it will be included in "Other". Unfortunately this happened in 2017 for our country. For this reason, I have to skip 2017 survey data. All the comments were made depending on the data. Let's explore other year's data...

# Summary of total survey response from Bangladesh:

Let's have a summary of the data first. How many response from Bangladesh we got over time.

In [None]:
count_df_list = []
df_bd_20 = df_survey_20.loc[df_survey_20["Q3"] == "Bangladesh"]
df_bd_20_row, col = df_bd_20.shape
count_df_list.append(df_bd_20_row)
df_bd_19 = df_survey_19.loc[df_survey_19["Q3"] == "Bangladesh"]
df_bd_19_row, col = df_bd_19.shape
count_df_list.append(df_bd_19_row)
df_bd_18 = df_survey_18.loc[df_survey_18["Q3"] == "Bangladesh"]
df_bd_18_row, col = df_bd_18.shape
count_df_list.append(df_bd_18_row)
years_ = ["2020", "2019", "2018"]

data = [go.Bar(
   x = years_,
   y = count_df_list
)]
fig = go.Figure(data=data)
fig.update_layout(title_text='Survey Participant over the year', width=500, title_x=0.5)

The survey response from our country increased over time. It has gone from 107 to 137 in 2018 to 2019 and total response data we got this year is 143.

# Let's have a look in other questions...

I will start with the most common and frequest questions asked over times(2018-2020). I have already [shared a notebook](https://www.kaggle.com/sharif485/what-kaggle-is-asking-over-the-time-2018-2020)(If you found it useful don't forget to upvote) on that which contains different types of questions Kaggle asked over the time. I found that there is 15 question that was available all the time. Let's focus on that first...


# Age Distribution..

In [None]:
#age distribution
df_age_20 = df_bd_20.groupby("Q1").count().Q2
df_age_19 = df_bd_19.groupby("Q1").count().Q2
df_age_18 = df_bd_18.groupby("Q2").count().Q1

fig = make_subplots(rows = 1, cols = 3,
                   specs = [[{"type": "bar"}, {"type": "bar"}, {"type": "bar"}]],
                   subplot_titles=('2020', '2019', '2018'))

fig.add_trace(go.Bar(x=df_age_20.index, y=df_age_20.values,name='2020'),row=1, col=1)
fig.add_trace(go.Bar(x=df_age_19.index, y=df_age_19.values,name='2019'),row=1, col=2)
fig.add_trace(go.Bar(x=df_age_18.index, y=df_age_18.values,name='2018'),row=1, col=3)

fig.update_layout(title_text='Age Group Distribution', title_x=0.5)

From the figures above we can clearly say that, person with age between 22 to 29 is mostly interested in data science sector in Bangladesh.

# Gender Distribution..

In [None]:
#gender distribution
gen_dist_20 = df_bd_20.Q2.iloc[1:].value_counts()
gen_dist_19 = df_bd_19.Q2.iloc[1:].value_counts()
gen_dist_18 = df_bd_18.Q1.iloc[1:].value_counts()
#pic slice name filtering
as_list = gen_dist_20.index.tolist()
idx_man = as_list.index('Man')
idx_woman = as_list.index('Woman')
as_list[idx_man] = 'Male'
as_list[idx_woman] = 'Female'
gen_dist_20.index = as_list

fig=make_subplots(rows=1,cols=3,
                  specs=[[{"type": "pie"},{"type": "pie"},{"type": "pie"}]]
                 ,subplot_titles=('2020','2019','2018'))
fig.add_trace(go.Pie(labels=gen_dist_20.index[:2], values=gen_dist_20.values[:2], hole=0.2, name='2020', pull=[0,0.3]), row=1, col=1)
fig.add_trace(go.Pie(labels=gen_dist_19.index[:2], values=gen_dist_19.values[:2], hole=0.2, name='2019', pull=[0,0.3]), row=1, col=2)
fig.add_trace(go.Pie(labels=gen_dist_18.index[:2], values=gen_dist_18.values[:2], hole=0.2, name='2018', pull=[0,0.3]), row=1, col=3)

fig.update_layout(title_text='Gender Distribution', title_x=0.5)

Depending on the pi chart, it can be said that the response from female kagglers has significantly increased in 2020. It could be inferred that, all over they are engaging themselves in data science more and more in upcoming years.

# Educational Background..

In [None]:
# educational background
deg_dist_20=df_bd_20.Q4.iloc[1:].value_counts()
deg_dist_19=df_bd_19.Q4.iloc[1:].value_counts()
deg_dist_18=df_bd_18.Q4.iloc[1:].value_counts()

as_list_20 = deg_dist_20.index.tolist()
as_list_19 = deg_dist_19.index.tolist()
as_list_18 = deg_dist_18.index.tolist()
idx_20 = as_list_20.index("Some college/university study without earning a bachelor’s degree")
idx_19 = as_list_19.index("Some college/university study without earning a bachelor’s degree")
idx_18 = as_list_18.index("Some college/university study without earning a bachelor’s degree")
as_list_20[idx_20] = 'no bechelore degree'
as_list_19[idx_19] = 'no bechelore degree'
as_list_18[idx_18] = 'no bechelore degree'
deg_dist_20.index = as_list_20
deg_dist_19.index = as_list_19
deg_dist_18.index = as_list_18

fig=make_subplots(rows=1,cols=3,
                  specs=[[{"type": "pie"},{"type": "pie"},{"type": "pie"}]]
                 ,subplot_titles=('2020','2019','2018'))
fig.add_trace(go.Pie(labels=deg_dist_20.index[:4], values=deg_dist_20.values[:4], hole=0.2,name='2020', pull=[0,0.1]), row=1, col=1)
fig.add_trace(go.Pie(labels=deg_dist_19.index[:4], values=deg_dist_19.values[:4], hole=0.2,name='2019', pull=[0,0.1]), row=1, col=2)
fig.add_trace(go.Pie(labels=deg_dist_18.index[:4], values=deg_dist_18.values[:4], hole=0.2, name='2018', pull=[0,0.1]), row=1, col=3)

fig.update_layout(title_text='Educational Background', title_x=0.5)

Kagglers with Bachelor's Degrees are dominating in kaggle. It is also true in reality. From the chart above, it can be found that over the percentage of Master’s degree holders are not increasing like bachelor’s degree holders. There is also a slight increase in Doctoral degree holders in Bangladesh.

# Most used Programming Language..

In [None]:
#most used language in day to day life
val_count_20 ={
    "Python": (df_bd_20["Q7_Part_1"].value_counts().values[0]),
    "R": (df_bd_20["Q7_Part_2"].value_counts().values[0]),
    "SQL": (df_bd_20["Q7_Part_3"].value_counts().values[0]),
    "C/C++": (df_bd_20["Q7_Part_4"].value_counts().values[0] + df_bd_20["Q7_Part_5"].value_counts().values[0]),
    "Java": (df_bd_20["Q7_Part_6"].value_counts().values[0]),
    "JavaScript": (df_bd_20["Q7_Part_7"].value_counts().values[0]),
    "Bash": (df_bd_20["Q7_Part_10"].value_counts().values[0]),
    "Matlab": (df_bd_20["Q7_Part_11"].value_counts().values[0]),
}

val_count_19 ={
    "Python": (df_bd_19["Q18_Part_1"].value_counts().values[0]),
    "R": (df_bd_19["Q18_Part_2"].value_counts().values[0]),
    "SQL": (df_bd_19["Q18_Part_3"].value_counts().values[0]),
    "C/C++": (df_bd_19["Q18_Part_4"].value_counts().values[0] + df_bd_20["Q18_Part_5"].value_counts().values[0]),
    "Java": (df_bd_19["Q18_Part_6"].value_counts().values[0]),
    "JavaScript": (df_bd_19["Q18_Part_7"].value_counts().values[0]),
    "Bash": (df_bd_19["Q18_Part_9"].value_counts().values[0]),
    "Matlab": (df_bd_19["Q18_Part_10"].value_counts().values[0]),
}

val_count_18 ={
    "Python": (df_bd_18["Q16_Part_1"].value_counts().values[0]),
    "R": (df_bd_18["Q16_Part_2"].value_counts().values[0]),
    "SQL": (df_bd_18["Q16_Part_3"].value_counts().values[0]),
    "C/C++": (df_bd_18["Q16_Part_8"].value_counts().values[0]),
    "Java": (df_bd_18["Q16_Part_5"].value_counts().values[0]),
    "JavaScript": (df_bd_18["Q16_Part_6"].value_counts().values[0]),
    "Bash": (df_bd_18["Q16_Part_4"].value_counts().values[0]),
    "Matlab": (df_bd_18["Q16_Part_9"].value_counts().values[0]),
}

lst_2018 = []
lst_2019 = []
lst_2020 = []
label_keys = list(val_count_20.keys())
for i in range(len(label_keys)):
    lst_2018.append(val_count_18[label_keys[i]])
    lst_2019.append(val_count_19[label_keys[i]])
    lst_2020.append(val_count_20[label_keys[i]])
data = {
    "dt_18":lst_2018,
    "dt_19": lst_2019,
    "dt_20": lst_2020,
    "labels": label_keys
}

fig = go.Figure(
    data=[
        go.Bar(
            name="2020",
            x=data["labels"],
            y=data["dt_20"],
            offsetgroup=0,
        ),
        go.Bar(
            name="2019",
            x=data["labels"],
            y=data["dt_19"],
            offsetgroup=1,
        ),
        
        go.Bar(
            name="2018",
            x=data["labels"],
            y=data["dt_18"],
            offsetgroup=2,
        )
    ],
    layout=go.Layout(
        title="Most used programming language in day to day life",
        yaxis_title="Number of Kaggler",
        title_x=0.5
    )
)

fig.show()

As a data science enthusiast, it is obvious that python will take place at the top as the most used programming language among Kagglers over times. Along with  Python other languages like SQL, C/C++, Java, JavaScript, and R are getting priorities.

# Time invested to study ML..

In [None]:
#studying ML
study_ml_20=df_bd_20.Q15.iloc[1:].value_counts()
study_ml_19=df_bd_19.Q23.iloc[1:].value_counts()
study_ml_18=df_bd_18.Q25.iloc[1:].value_counts()

as_list_20 = study_ml_20.index.tolist()
as_list_19 = study_ml_19.index.tolist()
as_list_18 = study_ml_18.index.tolist()
idx_20 = as_list_20.index('Under 1 year')
idx_19 = as_list_19.index('< 1 years')
idx_18 = as_list_18.index('I have never studied machine learning but plan to learn in the future')
as_list_20[idx_20] = '< 1 year'
as_list_19[idx_19] = '< 1 year'
as_list_18[idx_18] = 'never studied ML'
study_ml_20.index = as_list_20
study_ml_19.index = as_list_19
study_ml_18.index = as_list_18

fig=make_subplots(rows=1,cols=3,
                  specs=[[{"type": "pie"},{"type": "pie"},{"type": "pie"}]]
                 ,subplot_titles=('2020','2019','2018'))
fig.add_trace(go.Pie(labels=study_ml_20.index[:6], values=study_ml_20.values[:6], hole=0.1, name='2020', pull=[0,0.1]), row=1, col=1)
fig.add_trace(go.Pie(labels=study_ml_19.index[:6], values=study_ml_19.values[:6], hole=0.1, name='2019', pull=[0,0.1]), row=1, col=2)
fig.add_trace(go.Pie(labels=study_ml_18.index[:6], values=study_ml_18.values[:6], hole=0.1, name='2018', pull=[0,0.1]), row=1, col=3)

fig.update_layout(title_text='How much time is invested to study ML', title_x=0.5)

It can be seen from the chart that, most of the Kaggle participants are fresh graduates or they experience between 1-2 years.

# Impact of Online Mediums..

In [None]:
#importance of online courses
val_count_20 ={
    "Coursera": (df_bd_20["Q37_Part_1"].value_counts().values[0]),
    "Edx": (df_bd_20["Q37_Part_2"].value_counts().values[0]),
    "Kaggle_Leam": (df_bd_20["Q37_Part_3"].value_counts().values[0]),
    "DataCamp": (df_bd_20["Q37_Part_4"].value_counts().values[0]),
    "Fastai": (df_bd_20["Q37_Part_5"].value_counts().values[0]),
    "Udacity": (df_bd_20["Q37_Part_6"].value_counts().values[0]),
    "Udemy": (df_bd_20["Q37_Part_7"].value_counts().values[0]),
    "Linkedin_learning": (df_bd_20["Q37_Part_8"].value_counts().values[0]),
    "From_university": (df_bd_20["Q37_Part_10"].value_counts().values[0])
}
val_count_19 ={
    "Coursera": (df_bd_19["Q13_Part_2"].value_counts().values[0]),
    "Edx": (df_bd_19["Q13_Part_3"].value_counts().values[0]),
    "Kaggle_Leam": (df_bd_19["Q13_Part_6"].value_counts().values[0]),
    "DataCamp": (df_bd_19["Q13_Part_4"].value_counts().values[0]),
    "Fastai": (df_bd_19["Q13_Part_7"].value_counts().values[0]),
    "Udacity": (df_bd_19["Q13_Part_1"].value_counts().values[0]),
    "Udemy": (df_bd_19["Q13_Part_8"].value_counts().values[0]),
    "Linkedin_learning": (df_bd_19["Q13_Part_9"].value_counts().values[0]),
    "From_university": (df_bd_19["Q13_Part_10"].value_counts().values[0]),
}
val_count_18 ={
    "Coursera": (df_bd_18["Q36_Part_2"].value_counts().values[0]),
    "Edx": (df_bd_18["Q36_Part_3"].value_counts().values[0]),
    "Kaggle_Leam": (df_bd_18["Q36_Part_6"].value_counts().values[0]),
    "DataCamp": (df_bd_18["Q36_Part_4"].value_counts().values[0]),
    "Fastai": (df_bd_18["Q36_Part_7"].value_counts().values[0]),
    "Udacity": (df_bd_18["Q36_Part_1"].value_counts().values[0]),
    "Udemy": (df_bd_18["Q36_Part_9"].value_counts().values[0]),
    #No data given for Linkedin_learning
    "Linkedin_learning": 0,
    "From_university": (df_bd_18["Q36_Part_11"].value_counts().values[0]),
}
lst_2018 = []
lst_2019 = []
lst_2020 = []
label_keys = list(val_count_20.keys())
for i in range(len(label_keys)):
    lst_2018.append(val_count_18[label_keys[i]])
    lst_2019.append(val_count_19[label_keys[i]])
    lst_2020.append(val_count_20[label_keys[i]])
data = {
    "dt_18":lst_2018,
    "dt_19": lst_2019,
    "dt_20": lst_2020,
    "labels": label_keys
}

fig = go.Figure(
    data=[
        go.Bar(
            name="2020",
            x=data["labels"],
            y=data["dt_20"],
            offsetgroup=0,
        ),
        go.Bar(
            name="2019",
            x=data["labels"],
            y=data["dt_19"],
            offsetgroup=1,
        ),
        go.Bar(
            name="2018",
            x=data["labels"],
            y=data["dt_18"],
            offsetgroup=2,
        )
        
    ],
    layout=go.Layout(
        title="Importance of online courses",
        yaxis_title="Number of Kaggler",
        title_x=0.5
    )
)

fig.show()

It is not surprising seeing that, for gaining knowledge we are mostly dependent on different kinds of online learning platforms which offers varieties of free and paid courses. In addition to this, it can be found from the chart that students are almost fully dependent on online courses comparing to the courses thought at university. From those platforms, Coursera got undivided attention all the time for its well-designed courses. 

# Most preferred IDE's..

In [None]:
#which IDE is used mostly
val_count_20 ={
    "Jupyter": (df_bd_20["Q9_Part_1"].value_counts().values[0]),
    "RStudio": (df_bd_20["Q9_Part_2"].value_counts().values[0]),
    "Pycharm": (df_bd_20["Q9_Part_5"].value_counts().values[0]),
    "MATLAB": (df_bd_20["Q9_Part_10"].value_counts().values[0]),
    "VS_Std+Code": (df_bd_20["Q9_Part_3"].value_counts().values[0]) +
                    (df_bd_20["Q9_Part_4"].value_counts().values[0]),
    "Spyder": (df_bd_20["Q9_Part_6"].value_counts().values[0]),
    "VIM": (df_bd_20["Q9_Part_9"].value_counts().values[0]),
    "Sublime": (df_bd_20["Q9_Part_8"].value_counts().values[0]),
    "Notepad++": (df_bd_20["Q9_Part_7"].value_counts().values[0]),
}                   
val_count_19 ={
    "Jupyter": (df_bd_19["Q16_Part_1"].value_counts().values[0]),
    "RStudio": (df_bd_19["Q16_Part_2"].value_counts().values[0]),
    "Pycharm": (df_bd_19["Q16_Part_3"].value_counts().values[0]),
    "MATLAB": (df_bd_19["Q16_Part_5"].value_counts().values[0]),
    "VS_Std+Code": (df_bd_19["Q16_Part_6"].value_counts().values[0]),
    "Spyder": (df_bd_19["Q16_Part_7"].value_counts().values[0]),
    "VIM": (df_bd_19["Q16_Part_8"].value_counts().values[0]),
    "Sublime": (df_bd_19["Q16_Part_10"].value_counts().values[0]),
    "Notepad++": (df_bd_19["Q16_Part_9"].value_counts().values[0]),
}
val_count_18 ={
    "Jupyter": (df_bd_18["Q13_Part_1"].value_counts().values[0]),
    "RStudio": (df_bd_18["Q13_Part_2"].value_counts().values[0]),
    "Pycharm": (df_bd_18["Q13_Part_3"].value_counts().values[0]),
    "MATLAB": (df_bd_18["Q13_Part_7"].value_counts().values[0]),
    "VS_Std+Code": (df_bd_18["Q13_Part_4"].value_counts().values[0]) + 
                (df_bd_18["Q13_Part_8"].value_counts().values[0]),
    "Spyder": (df_bd_18["Q13_Part_12"].value_counts().values[0]),
    "VIM": (df_bd_18["Q13_Part_11"].value_counts().values[0]),
    "Sublime": (df_bd_18["Q13_Part_10"].value_counts().values[0]),
    "Notepad++": (df_bd_18["Q13_Part_9"].value_counts().values[0]),
}

lst_2018 = []
lst_2019 = []
lst_2020 = []
label_keys = list(val_count_20.keys())
for i in range(len(label_keys)):
    lst_2018.append(val_count_18[label_keys[i]])
    lst_2019.append(val_count_19[label_keys[i]])
    lst_2020.append(val_count_20[label_keys[i]])
data = {
    "dt_18":lst_2018,
    "dt_19": lst_2019,
    "dt_20": lst_2020,
    "labels": label_keys
}

fig = go.Figure(
    data=[
        go.Bar(
            name="2020",
            x=data["labels"],
            y=data["dt_20"],
            offsetgroup=0,
        ),
        go.Bar(
            name="2019",
            x=data["labels"],
            y=data["dt_19"],
            offsetgroup=1,
        ),
        go.Bar(
            name="2018",
            x=data["labels"],
            y=data["dt_18"],
            offsetgroup=2,
        )
        
    ],
    layout=go.Layout(
        title="Which IDE is most liked",
        yaxis_title="Number of Kaggler",
        title_x=0.5
    )
)

fig.show()

# Usages of Hosted Notebooks..

In [None]:
#0 denotes the product was not listed in 2018 
val_count_20 ={
    "Kaggle_NBook": (df_bd_20["Q10_Part_1"].value_counts().values[0]),
    "Colab_NBook": (df_bd_20["Q10_Part_2"].value_counts().values[0]),
    "Azure_NBook": (df_bd_20["Q10_Part_3"].value_counts().values[0]),
    "Paperspace/Gradient": (df_bd_20["Q10_Part_4"].value_counts().values[0]),
    "Binder/JupyterHub": (df_bd_20["Q10_Part_5"].value_counts().values[0]),
    "IBM_WNBook": (df_bd_20["Q10_Part_7"].value_counts().values[0]),
    "Amazon_EMR/SM": (df_bd_20["Q10_Part_8"].value_counts().values[0]) + (df_bd_20["Q10_Part_9"].value_counts().values[0]),
    "Google_AI/Datalab": (df_bd_20["Q10_Part_10"].value_counts().values[0]) + (df_bd_20["Q10_Part_11"].value_counts().values[0]),
}

val_count_19 ={
    "Kaggle_NBook": (df_bd_19["Q17_Part_1"].value_counts().values[0]),
    "Colab_NBook": (df_bd_19["Q17_Part_2"].value_counts().values[0]),
    "Azure_NBook": (df_bd_19["Q17_Part_3"].value_counts().values[0]),
    "Paperspace/Gradient": 0,
    "Binder/JupyterHub": (df_bd_19["Q17_Part_7"].value_counts().values[0]),
    "IBM_WNBook": (df_bd_19["Q17_Part_8"].value_counts().values[0]),
    "Amazon_EMR/SM": (df_bd_19["Q17_Part_10"].value_counts().values[0]),
    "Google_AI/Datalab": (df_bd_19["Q17_Part_4"].value_counts().values[0]),
}

val_count_18 ={
    "Kaggle_NBook": (df_bd_18["Q14_Part_1"].value_counts().values[0]),
    "Colab_NBook": (df_bd_18["Q14_Part_2"].value_counts().values[0]),
    "Azure_NBook": (df_bd_18["Q14_Part_3"].value_counts().values[0]),
    "Paperspace/Gradient": (df_bd_18["Q14_Part_6"].value_counts().values[0]),
    "Binder/JupyterHub": (df_bd_18["Q14_Part_9"].value_counts().values[0]),
    "IBM_WNBook": 0,
    "Amazon_EMR/SM": 0,
    "Google_AI/Datalab": (df_bd_18["Q14_Part_5"].value_counts().values[0]),
}

lst_2018 = []
lst_2019 = []
lst_2020 = []
label_keys = list(val_count_20.keys())
for i in range(len(label_keys)):
    lst_2018.append(val_count_18[label_keys[i]])
    lst_2019.append(val_count_19[label_keys[i]])
    lst_2020.append(val_count_20[label_keys[i]])
data = {
    "dt_18":lst_2018,
    "dt_19": lst_2019,
    "dt_20": lst_2020,
    "labels": label_keys
}

fig = go.Figure(
    data=[
        go.Bar(
            name="2020",
            x=data["labels"],
            y=data["dt_20"],
            offsetgroup=0,
        ),
        go.Bar(
            name="2019",
            x=data["labels"],
            y=data["dt_19"],
            offsetgroup=1,
        ),
        go.Bar(
            name="2018",
            x=data["labels"],
            y=data["dt_18"],
            offsetgroup=2,
        ),
        
    ],
    layout=go.Layout(
        title="Used Hosted Notebook",
        yaxis_title="Number of Kaggler",
        title_x=0.5
    )
)

fig.show()

# Employment Status..

In [None]:
#employment status
emp_stat_20=df_bd_20.Q5.iloc[1:].value_counts()
emp_stat_19=df_bd_19.Q5.iloc[1:].value_counts()
emp_stat_18=df_bd_18.Q6.iloc[1:].value_counts()

#pic slice name filtering
as_list = emp_stat_20.index.tolist()
idx_not = as_list.index('Currently not employed')
as_list[idx_not] = 'Not employed'
emp_stat_20.index = as_list

fig=make_subplots(rows=1,cols=3,
                  specs=[[{"type": "pie"},{"type": "pie"},{"type": "pie"}]]
                 ,subplot_titles=('2020','2019','2018'))
fig.add_trace(go.Pie(labels=emp_stat_20.index[:6], values=emp_stat_20.values[:6], hole=0.1, name='2020', pull=[0,0.2]), row=1, col=1)
fig.add_trace(go.Pie(labels=emp_stat_19.index[:6], values=emp_stat_19.values[:6], hole=0.1, name='2019', pull=[0,0.2]), row=1, col=2)
fig.add_trace(go.Pie(labels=emp_stat_18.index[:6], values=emp_stat_18.values[:6], hole=0.1, name='2018', pull=[0,0.2]), row=1, col=3)

fig.update_layout(title_text='Employment Status', title_x=0.5)

Most of the participants in Kaggle from our country are students and in 2020 it's even increased significantly. A noticeable state from the figure is that the unemployment rates have increased dramatically compared to other years. World pandemic hit can be the root cause for that. Another noticeable thing is in 2020 the employment for machine learning engineer positions has increased vastly.

# Most used Machine Learning or Deep Learning Frameworks..

In [None]:
#most used Framework
val_count_20 ={
    "Scikit-Learn": (df_bd_20["Q16_Part_1"].value_counts().values[0]),
    "TensorFlow": (df_bd_20["Q16_Part_2"].value_counts().values[0]),
    "Keras": (df_bd_20["Q16_Part_3"].value_counts().values[0]),
    "PyTorch": (df_bd_20["Q16_Part_4"].value_counts().values[0]),
    "Fastai": (df_bd_20["Q16_Part_5"].value_counts().values[0]),
    "Boosting": (df_bd_20["Q16_Part_7"].value_counts().values[0]) + 
                (df_bd_20["Q16_Part_8"].value_counts().values[0]) + 
                (df_bd_20["Q16_Part_9"].value_counts().values[0]),
    "Caret": (df_bd_20["Q16_Part_12"].value_counts().values[0]),
    "Mxnet+H20+Prophet": 0 + (df_bd_20["Q16_Part_11"].value_counts().values[0]) +
                        (df_bd_20["Q16_Part_10"].value_counts().values[0]),
}
val_count_19 ={
    "Scikit-Learn": (df_bd_19["Q28_Part_1"].value_counts().values[0]),
    "TensorFlow": (df_bd_19["Q28_Part_2"].value_counts().values[0]),
    "Keras": (df_bd_19["Q28_Part_3"].value_counts().values[0]),
    "PyTorch": (df_bd_19["Q28_Part_6"].value_counts().values[0]),
    "Fastai": (df_bd_19["Q28_Part_10"].value_counts().values[0]),
    "Boosting": (df_bd_19["Q28_Part_5"].value_counts().values[0]) + (df_bd_19["Q28_Part_8"].value_counts().values[0]),
    "Caret": (df_bd_19["Q28_Part_7"].value_counts().values[0]),
    #no data is given for Mxnet+H20+Prophet
    "Mxnet+H20+Prophet": 0+0+0,
}
val_count_18 ={
    "Scikit-Learn": (df_bd_20["Q19_Part_1"].value_counts().values[0]),
    "TensorFlow": (df_bd_20["Q19_Part_2"].value_counts().values[0]),
    "Keras": (df_bd_20["Q19_Part_3"].value_counts().values[0]),
    "PyTorch": (df_bd_20["Q19_Part_4"].value_counts().values[0]),
    "Fastai": (df_bd_18["Q19_Part_7"].value_counts().values[0]),
    "Boosting": (df_bd_18["Q19_Part_10"].value_counts().values[0]) +
                (df_bd_18["Q19_Part_13"].value_counts().values[0]) +
                (df_bd_18["Q19_Part_14"].value_counts().values[0]),
    "Caret": (df_bd_18["Q19_Part_9"].value_counts().values[0]),
    "Mxnet+H20+Prophet": (df_bd_18["Q19_Part_8"].value_counts().values[0]) +
                        (df_bd_18["Q19_Part_6"].value_counts().values[0]) +
                        (df_bd_18["Q19_Part_12"].value_counts().values[0]),
}
lst_2018 = []
lst_2019 = []
lst_2020 = []
label_keys = list(val_count_20.keys())
for i in range(len(label_keys)):
    lst_2018.append(val_count_18[label_keys[i]])
    lst_2019.append(val_count_19[label_keys[i]])
    lst_2020.append(val_count_20[label_keys[i]])
data = {
    "dt_18":lst_2018,
    "dt_19": lst_2019,
    "dt_20": lst_2020,
    "labels": label_keys
}

fig = go.Figure(
    data=[
        go.Bar(
            name="2020",
            x=data["labels"],
            y=data["dt_20"],
            offsetgroup=2,
        ),
        go.Bar(
            name="2019",
            x=data["labels"],
            y=data["dt_19"],
            offsetgroup=1,
        ),
        go.Bar(
            name="2018",
            x=data["labels"],
            y=data["dt_18"],
            offsetgroup=0,
        )
        
    ],
    layout=go.Layout(
        title="Most used machine learning or deep learning framework",
        yaxis_title="Number of Kaggler",
        title_x=0.5
    )
)

fig.show()

When it comes to talking about the most favourite deep learning framework Tensorflow and Keras are still popular in the community. The users for PyTorch is also growing gradually but not like Tensorflow or Keras. For machine learning, Scikit-Learn is always getting top priority for its enriched modules. Usages of different kinds of boosting algorithms are also in the rise.

# Earnings..

In [None]:
#Earning
df_sal_20 = df_bd_20.groupby("Q24").count().Q1
df_sal_19 = df_bd_19.groupby("Q10").count().Q1
df_sal_18 = df_bd_18.groupby("Q9").count()

as_list = df_sal_18.index.tolist()
idx = as_list.index('I do not wish to disclose my approximate yearly compensation')
as_list[idx] = 'concealed'
df_sal_18.index = as_list
df_sal_18 = df_sal_18.Q1

fig = make_subplots(rows = 1, cols = 3,
                   specs = [[{"type": "bar"}, {"type": "bar"}, {"type": "bar"}]],
                   subplot_titles=('2020', '2019', '2018'))

fig.add_trace(go.Bar(x=df_sal_20.index.sort_values(), y=df_sal_20.values, name='2020'), row=1, col=1)
fig.add_trace(go.Bar(x=df_sal_19.index, y=df_sal_19.values, name='2019'), row=1, col=2)
fig.add_trace(go.Bar(x=df_sal_18.index, y=df_sal_18.values, name='2018'), row=1, col=3)

fig.update_layout(title_text='Earnings', title_x=0.5)

# Mostly used Cloud Searvice Providers..

In [None]:
#mostly used cloud service providers
val_count_20 ={
    "AWS": (df_bd_20["Q26_A_Part_1"].value_counts().values[0]),
    "Microsoft_Azure": (df_bd_20["Q26_A_Part_2"].value_counts().values[0]),
    "GCP": (df_bd_20["Q26_A_Part_3"].value_counts().values[0]),
    "IBM_Cloud": (df_bd_20["Q26_A_Part_4"].value_counts().values[0]),
    "Alibaba_cloud": (df_bd_20["Q26_A_Part_8"].value_counts().values[0]),
    "Oracle+Salesforce+VMware_cloud": (df_bd_20["Q26_A_Part_5"].value_counts().values[0]) + 
                                    #total VMware_cloud count = 0
                                    (df_bd_20["Q26_A_Part_6"].value_counts().values[0]) + 0
}
val_count_19 ={
    "AWS": (df_bd_19["Q29_Part_2"].value_counts().values[0]),
    "Microsoft_Azure": (df_bd_19["Q29_Part_3"].value_counts().values[0]),
    "GCP": (df_bd_19["Q29_Part_1"].value_counts().values[0]),
    #Added RED Hat cloud values with this
    "IBM_Cloud": (df_bd_19["Q29_Part_4"].value_counts().values[0]) +
                (df_bd_19["Q29_Part_9"].value_counts().values[0]),
    "Alibaba_cloud": (df_bd_19["Q29_Part_5"].value_counts().values[0]),
    "Oracle+Salesforce+VMware_cloud": (df_bd_19["Q29_Part_7"].value_counts().values[0]) + 
                                    #total VMware_cloud count = 0
                                    (df_bd_19["Q29_Part_6"].value_counts().values[0]) + 0
}
val_count_18 ={
    "AWS": (df_bd_18["Q15_Part_2"].value_counts().values[0]),
    "Microsoft_Azure": (df_bd_18["Q15_Part_3"].value_counts().values[0]),
    "GCP": (df_bd_18["Q15_Part_1"].value_counts().values[0]),
    "IBM_Cloud": (df_bd_18["Q15_Part_4"].value_counts().values[0]),
     #total Alibaba_cloud count = 0
    "Alibaba_cloud": 0,
    "Oracle+Salesforce+VMware_cloud": 0 + 0 + 0
}
lst_2018 = []
lst_2019 = []
lst_2020 = []
label_keys = list(val_count_20.keys())
for i in range(len(label_keys)):
    lst_2018.append(val_count_18[label_keys[i]])
    lst_2019.append(val_count_19[label_keys[i]])
    lst_2020.append(val_count_20[label_keys[i]])
data = {
    "dt_18":lst_2018,
    "dt_19": lst_2019,
    "dt_20": lst_2020,
    "labels": label_keys
}

fig = go.Figure(
    data=[
        go.Bar(
            name="2020",
            x=data["labels"],
            y=data["dt_20"],
            offsetgroup=2,
        ),
        go.Bar(
            name="2019",
            x=data["labels"],
            y=data["dt_19"],
            offsetgroup=1,
        ),
        go.Bar(
            name="2018",
            x=data["labels"],
            y=data["dt_18"],
            offsetgroup=0,
        )
    ],
    layout=go.Layout(
        title="Mostly used Cloud Searvice Providers",
        yaxis_title="Number of Kaggler",
        title_x=0.5
    )
)

fig.show()

It seems that Amazon, Microsoft, and Google all of them are getting popular after a fall from 2018. Other cloud services are still in the market.

# Still working on it..  

If you have any suggestions or comments feel free to post a comment.