<img src='https://i.imgur.com/qCDfv08.gif' width="1200" height="1000">


<div style="
            display:fill;
            border-radius: False;
            border-style: solid;
            border-color:#a7a2e2;
            border-style: false;
            border-radius: 10px;
            border-width: 5px;
            color:#000000;
            font-size:20px;
            font-family: Tan Mon Cheri;
            background-color:#f4f1e9;
            text-align:center;
            letter-spacing:0.5px;
            padding: 0.7em;
            text-align:left">
    
    
Kaggle conducts an industry-wide survey that presents a truly comprehensive view of the state of data science and machine learning. The survey was live from 09/01/2021 to 10/04/2021, and after cleaning the data we finished with 25,973 responses.</br>

There's a lot to explore here. The results include raw numbers about who is working with data, what’s happening with machine learning in different industries, and the best ways for new data scientists to break into the field. We've published the data in as raw a format as possible without compromising anonymization, which makes it an unusual example of a survey dataset.</br>

In our fifth year running this survey, we were once again awed by the global, diverse, and dynamic nature of the data science and machine learning industry. This survey data EDA provides an overview of the industry on an aggregate scale, but it also leaves us wanting to know more about the many specific communities comprised within the survey. For that reason, we’re inviting the Kaggle community to dive deep into the survey datasets and help us tell the diverse stories of data scientists from around the world.</div>

<a id='top'></a>
# <p style="background-color:#f4f1e9;font-family:roboto;color:#0a0a0b;font-size:37px;text-align:center;border-radius:60px 40px;">Table Of Contents</p>

* [1. LOADING DATA](#1)
    
* [2. Countries Distribution](#2)
    
* [3. Kagglers continents](#3) 
 
* [4. Kagglers Age](#4) 
    
* [5. Kagglers Gender](#5)
    
* [6. Kagglers Education](#6)
* [7. Kagglers Role](#7)
    
* [8. Kagglers Programming Experience](#8)
* [9. Kagglers Programming Languages](#9)
* [10. Kagglers IDE](#10)
* [11. Kagglers Visualization Libraries](#11)    
    
* [12. Kagglers Machine Learning Algorithms](#12) 
     
* [13.  Kagglers Machine Learning Framework](#13)   
* [14.  Kagglers NLP Methods](#14)     
* [15.  Kagglers Computing Platform](#15)    
* [16.  Kagglers Cloud computing Platform](#16)    

<a id="1"></a>
# <p style="background-color:#f4f1e9;font-family:roboto;color:#0a0a0b;font-size:150%;text-align:center;border-radius:60px 40px;">Loading Data</p>

In [None]:
import numpy as np
import pandas as pd
import matplotlib
import matplotlib.pyplot as plt
import matplotlib.ticker as mtick
import seaborn as sns
import plotly.express as px
import plotly.graph_objects as go
from plotly.subplots import make_subplots
import squarify
import warnings
warnings.filterwarnings('ignore')

In [None]:
survey_df = pd.read_csv("../input/kaggle-survey-2021/kaggle_survey_2021_responses.csv")
survey_df.head(10)

In [None]:
survey_all = survey_df.iloc[1:,:]

In [None]:
print("There are \033[1m{} countries\033[0m took part in this survey".format(len(survey_all['Q3'].value_counts().to_list())))

<a id="2"></a>
# <p style="background-color:#f4f1e9;font-family:roboto;color:#0a0a0b;font-size:150%;text-align:center;border-radius:60px 40px;">Countries Distribution</p>

In [None]:
fig = px.treemap(survey_all, path=['Q3'], color='Q3')
fig.update_layout(margin = dict(t=60, l=15, r=15, b=15),
                  title_text="<b>Countries Distribution</b>",
                  title_x=0.5,
                  font=dict(family="serif", size=20, color='rgb(5, 14, 48)'))
fig.show()

<a id="3"></a>
# <p style="background-color:#f4f1e9;font-family:roboto;color:#0a0a0b;font-size:150%;text-align:center;border-radius:60px 40px;">Kagglers Continents</p>

* Asia is dominating in terms of number of Kaggler and it's mostly come from India. India has the most Kaggler in the world.
* Except for European continents, there is always 1 country that dominating in every continent like USA, Nigeria and Australia.

In [None]:
continents_df = pd.read_csv("../input/2020-kaggle-survey-supporting/2020 Continents.csv")
continents_dict = d = {k:v for k,v in zip(continents_df["Country"],continents_df["Continents"])}
# delete the first row
survey_df.drop(survey_df.index[0], inplace=True)
# add continent columns
survey_df["Continents"] = survey_df["Q3"].map(continents_dict)
# replacing long name countries
country_long_name_dict = {"United States of America": "USA", 
                          "United Kingdom of Great Britain and Northern Ireland": "United Kingdom",                
                         }
survey_df["Q3"] = survey_df["Q3"].replace(country_long_name_dict)
# replacing gender other than man and woman into others
gender_dict = {"Nonbinary": "Others", "Prefer not to say": "Others", "Prefer to self-describe": "Others"}
survey_df["Q2"] = survey_df["Q2"].replace(gender_dict)

In [None]:
survey_asia_df = survey_df[survey_df["Continents"]=="Asia"]
survey_america_df = survey_df[survey_df["Continents"]=="America"]
survey_europe_df = survey_df[survey_df["Continents"]=="Europe"]
survey_africa_df = survey_df[survey_df["Continents"]=="Africa"]
survey_australia_df = survey_df[survey_df["Continents"]=="Australia"]
survey_others_df = survey_df[survey_df["Continents"]=="Others"]

In [None]:
# Continents
continents_count_df = pd.DataFrame(survey_df["Continents"].value_counts())
continents_count_df.columns = ["Count"]
continents_count_df["Percentage"] = continents_count_df["Count"].apply(lambda x: (x/sum(continents_count_df["Count"])*100))
continents_count_df = pd.DataFrame(continents_count_df.unstack()).reset_index(drop=False)
continents_count_df = continents_count_df.iloc[6:12, 1:3]
continents_count_df.columns = ["Continents", "Percentage"]
continents_count_df = continents_count_df.reset_index(drop=True)
continents_count_df = continents_count_df.set_index('Continents')
continents_count_df = continents_count_df.T

# Asia
asia_country_count_df = pd.DataFrame(survey_asia_df["Q3"].value_counts())
asia_country_count_df = asia_country_count_df.reset_index(drop=False)
asia_country_count_df.columns = ["Country","Count"]

# America
america_country_count_df = pd.DataFrame(survey_america_df["Q3"].value_counts())
america_country_count_df = america_country_count_df.reset_index(drop=False)
america_country_count_df.columns = ["Country","Count"]

# Africa
africa_country_count_df = pd.DataFrame(survey_africa_df["Q3"].value_counts())
africa_country_count_df = africa_country_count_df.reset_index(drop=False)
africa_country_count_df.columns = ["Country","Count"]

# Europe
europe_country_count_df = pd.DataFrame(survey_europe_df["Q3"].value_counts())
europe_country_count_df = europe_country_count_df.reset_index(drop=False)
europe_country_count_df.columns = ["Country","Count"]

In [None]:
# Setting up figure and axes
fig = plt.figure(figsize=(18,17)) # create figure
gs = fig.add_gridspec(3, 2)
gs.update(wspace=0, hspace=0.8)
ax0 = fig.add_subplot(gs[0, 0:2])
ax1 = fig.add_subplot(gs[1, 0], ylim=(0, 6000)) # create axes
ax2 = fig.add_subplot(gs[1, 1], ylim=(0, 6000)) # create axes
ax3 = fig.add_subplot(gs[2, 0], ylim=(0, 600)) # create axes
ax4 = fig.add_subplot(gs[2, 1], ylim=(0, 600)) # create axes

# Color selection
color_map = ["#4A4655" for _ in range(5)]
color_map[0] = "#4898EF"

# Change background color
background_color = "#ffffff"
fig.patch.set_facecolor(background_color) # figure background color
ax0.set_facecolor(background_color) # axes background color
ax1.set_facecolor(background_color) # axes background color
ax2.set_facecolor(background_color) # axes background color
ax3.set_facecolor(background_color) # axes background color
ax4.set_facecolor(background_color) # axes background color

# Continents
ax0.barh(continents_count_df.index, continents_count_df['Asia'],
       color="#4898EF", zorder=3, label="Asia"
)
ax0.barh(continents_count_df.index, continents_count_df['America'], 
       left=continents_count_df['Asia'],
       color="#112A86", zorder=3, label="America"
)
ax0.barh(continents_count_df.index, continents_count_df['Europe'], 
       left=continents_count_df['America']+continents_count_df['Asia'],
       color="#2CC8E4", zorder=3, label="Europe"
)
ax0.barh(continents_count_df.index, continents_count_df['Others'], 
       left=continents_count_df['Europe']+continents_count_df['America']+continents_count_df['Asia'],
       color="#FDBF08", zorder=3, label="Others"
)
ax0.barh(continents_count_df.index, continents_count_df['Africa'], 
       left=continents_count_df['Others']+continents_count_df['Europe']+continents_count_df['America']
       +continents_count_df['Asia'],
       color="#62751C", zorder=3, label="Africa"
)
ax0.barh(continents_count_df.index, continents_count_df['Australia'], 
       left=continents_count_df['Africa']+continents_count_df['Others']+continents_count_df['Europe']
       +continents_count_df['America']+continents_count_df['Asia'],
       color="#4A4655", zorder=3, label="Australia"
)

for s in ["top","right","left"]:
    ax0.spines[s].set_visible(False)

ax0.xaxis.set_major_formatter(mtick.PercentFormatter())    
ax0.legend(loc='lower center', ncol=6, bbox_to_anchor=(0.48, -0.3))

ax0.text(0, 0.8, 
         'Kagglers Continents', 
         fontsize=30, 
         fontweight='bold', 
         fontfamily='serif')

ax0.text(0, 0.7, 
         'with top 5 countries', 
         fontsize=18, 
         fontweight='light', 
         fontfamily='serif')

ax0.text(0, 0.53, 
         'Most of the Kagglers in the world are from Asia, America, Europe', 
         fontsize=13, 
         fontweight='light', 
         fontfamily='serif')

ax0.text(0, 0.45, 
         'Others, Africa and Australia', 
         fontsize=13, 
         fontweight='light', 
         fontfamily='serif')

# Asia
asia_country = asia_country_count_df["Country"]
ax1.bar(asia_country_count_df.iloc[0:5, 0], asia_country_count_df.iloc[0:5, 1], 
       color=color_map, zorder=3
)
ax1.grid(color='black', linestyle=':', axis='y', zorder=0,  dashes=(1,5))
ax1.set_xticklabels(asia_country_count_df['Country'], rotation=90)

ax1.text(-1.2, 7500, 
         'Asian Kagglers', 
         fontsize=20, 
         fontweight='bold', 
         fontfamily='serif')

ax1.text(-1.2, 6950, 
         'India has the highest Kaggler in Asia even', 
         fontsize=13, 
         fontweight='light', 
         fontfamily='serif')

ax1.text(-1.2, 6450, 
         'in the world, no other countries can match it', 
         fontsize=13, 
         fontweight='light', 
         fontfamily='serif')

# America
america_country = america_country_count_df["Country"]
ax2.bar(america_country_count_df.iloc[0:5, 0], america_country_count_df.iloc[0:5, 1], 
       color=color_map, zorder=3
)
ax2.grid(color='black', linestyle=':', axis='y', zorder=0,  dashes=(1,5))
ax2.set_xticklabels(america_country_count_df['Country'], rotation=90)

ax2.text(-0.5, 7500, 
         'American Kagglers', 
         fontsize=20, 
         fontweight='bold', 
         fontfamily='serif')

ax2.text(-0.5, 6950, 
         'USA Kagglers is dominating in America', 
         fontsize=13, 
         fontweight='light', 
         fontfamily='serif')

ax2.text(-0.5, 6450, 
         'continent, leaving behind Brazil', 
         fontsize=13, 
         fontweight='light', 
         fontfamily='serif')

# Europe
europe_country = europe_country_count_df["Country"]
ax3.bar(europe_country_count_df.iloc[0:5, 0], europe_country_count_df.iloc[0:5, 1], 
       color=color_map, zorder=3
)
ax3.grid(color='black', linestyle=':', axis='y', zorder=0,  dashes=(1,5))
ax3.set_xticklabels(europe_country_count_df['Country'], rotation=90)

ax3.text(-0.5, 740, 
         'European Kagglers', 
         fontsize=20, 
         fontweight='bold', 
         fontfamily='serif')

ax3.text(-0.5, 680, 
         'No dominating country as in Asia or', 
         fontsize=13, 
         fontweight='light', 
         fontfamily='serif')

ax3.text(-0.5, 630, 
         'America continent', 
         fontsize=13, 
         fontweight='light', 
         fontfamily='serif')

# Africa
africa_country = africa_country_count_df["Country"]
ax4.bar(africa_country_count_df.iloc[0:5, 0], africa_country_count_df.iloc[0:5, 1], 
       color=color_map, zorder=3
)
ax4.grid(color='black', linestyle=':', axis='y', zorder=0,  dashes=(1,5))
ax4.set_xticklabels(africa_country_count_df['Country'], rotation=90)

ax4.text(-0.5, 740, 
         'African Kagglers', 
         fontsize=20, 
         fontweight='bold', 
         fontfamily='serif')

ax4.text(-0.5, 680, 
         'Nigeria is taking the lead of #no of', 
         fontsize=13, 
         fontweight='light', 
         fontfamily='serif')

ax4.text(-0.5, 630, 
         'Kagglers in Africa', 
         fontsize=13, 
         fontweight='light', 
         fontfamily='serif')

# Remove top, right and left line 
for s in ["top","right","left"]:
    ax1.spines[s].set_visible(False)
    ax2.spines[s].set_visible(False)
    ax3.spines[s].set_visible(False)
    ax4.spines[s].set_visible(False)
    
ax0.set_yticklabels([])
ax2.set_yticklabels([])
ax4.set_yticklabels([])

ax2.tick_params(left=False)
ax4.tick_params(left=False)

<a id="4"></a>
# <p style="background-color:#f4f1e9;font-family:roboto;color:#0a0a0b;font-size:150%;text-align:center;border-radius:60px 40px;">Kagglers Age</p>

* Kaggle is dominated by young people with age below 30 years, with peak at age 25-29
* We can classify Kaggler age range into 3 category: (1) Asia is dominated by age 18-21, (2) America, Europe, Others and Australia are dominated by age 25-29 and (3) Africa is dominated by age 22-24.
* Asia has the most young Kagglers from age 18-22 and the interest going down as the age increases. It seems "data related job" has a big interest for younger Asian.

In [None]:
continents_count_df = pd.DataFrame(survey_df["Q1"].value_counts())
continents_count_df = continents_count_df.reset_index(drop=False)
continents_count_df.columns = ["Age","Count"]
continents_count_df = continents_count_df.sort_values(by="Age")

asia_country_count_df = pd.DataFrame(survey_asia_df["Q1"].value_counts())
asia_country_count_df = asia_country_count_df.reset_index(drop=False)
asia_country_count_df.columns = ["Age","Count"]
asia_country_count_df = asia_country_count_df.sort_values(by="Age")

america_country_count_df = pd.DataFrame(survey_america_df["Q1"].value_counts())
america_country_count_df = america_country_count_df.reset_index(drop=False)
america_country_count_df.columns = ["Age","Count"]
america_country_count_df = america_country_count_df.sort_values(by="Age")

europe_country_count_df = pd.DataFrame(survey_europe_df["Q1"].value_counts())
europe_country_count_df = europe_country_count_df.reset_index(drop=False)
europe_country_count_df.columns = ["Age","Count"]
europe_country_count_df = europe_country_count_df.sort_values(by="Age")

others_country_count_df = pd.DataFrame(survey_others_df["Q1"].value_counts())
others_country_count_df = others_country_count_df.reset_index(drop=False)
others_country_count_df.columns = ["Age","Count"]
others_country_count_df = others_country_count_df.sort_values(by="Age")

africa_country_count_df = pd.DataFrame(survey_africa_df["Q1"].value_counts())
africa_country_count_df = africa_country_count_df.reset_index(drop=False)
africa_country_count_df.columns = ["Age","Count"]
africa_country_count_df = africa_country_count_df.sort_values(by="Age")

australia_country_count_df = pd.DataFrame(survey_australia_df["Q1"].value_counts())
australia_country_count_df = australia_country_count_df.reset_index(drop=False)
australia_country_count_df.columns = ["Age","Count"]
australia_country_count_df = australia_country_count_df.sort_values(by="Age")

In [None]:
# Setting up figure and axes
fig = plt.figure(figsize=(18,25)) # create figure
gs = fig.add_gridspec(7, 1)
gs.update(wspace=0, hspace=0.5)
ax0 = fig.add_subplot(gs[0, 0], ylim=(0, 4500))
ax1 = fig.add_subplot(gs[1, 0], ylim=(0, 3000)) # create axes
ax2 = fig.add_subplot(gs[2, 0], ylim=(0, 800)) # create axes
ax3 = fig.add_subplot(gs[3, 0], ylim=(0, 600)) # create axes
ax4 = fig.add_subplot(gs[4, 0], ylim=(0, 400)) # create axes
ax5 = fig.add_subplot(gs[5, 0], ylim=(0, 400)) # create axes
ax6 = fig.add_subplot(gs[6, 0], ylim=(0, 60)) # create axes


# Change background color
background_color = "#ffffff"
fig.patch.set_facecolor(background_color) # figure background color
ax0.set_facecolor(background_color) # axes background color
ax1.set_facecolor(background_color) # axes background color
ax2.set_facecolor(background_color) # axes background color
ax3.set_facecolor(background_color) # axes background color
ax4.set_facecolor(background_color) # axes background color
ax5.set_facecolor(background_color) # axes background color
ax6.set_facecolor(background_color) # axes background color

# World
color_map = ["#4A4655" for _ in range(11)]
color_map[2] = "#4898EF"

ax0.grid(color='black', linestyle=':', axis='y', zorder=0,  dashes=(1,5))
ax0.bar(continents_count_df["Age"], continents_count_df['Count'], 
       color=color_map, zorder=3
)

ax0.text(-1.8, 4450, 
         'World', 
         fontsize=15, 
         fontweight='bold', 
         fontfamily='serif',
        )

ax0.text(-1.8, 7000, 
         'Kagglers Age', 
         fontsize=20, 
         fontweight='bold', 
         fontfamily='serif',
        )

ax0.text(-1.8, 6000, 
         'Most of Kagglers are young people that below 30 years old, the peak is at age 25-29 years old', 
         fontsize=13, 
         fontweight='light', 
         fontfamily='serif',
        )

# Asia
color_map = ["#4A4655" for _ in range(11)]
color_map[0] = "#4898EF"

ax1.grid(color='black', linestyle=':', axis='y', zorder=0,  dashes=(1,5))
ax1.bar(asia_country_count_df["Age"], asia_country_count_df['Count'], 
       color=color_map, zorder=3
)

ax1.text(-1.8, 3300, 
         'Asia', 
         fontsize=15, 
         fontweight='bold', 
         fontfamily='serif',
        )

# America
color_map = ["#4A4655" for _ in range(11)]
color_map[2] = "#4898EF"

ax2.grid(color='black', linestyle=':', axis='y', zorder=0,  dashes=(1,5))
ax2.bar(america_country_count_df["Age"], america_country_count_df['Count'], 
       color=color_map, zorder=3
)

ax2.text(-1.8, 880, 
         'America', 
         fontsize=15, 
         fontweight='bold', 
         fontfamily='serif',
        )

# Europe
color_map = ["#4A4655" for _ in range(11)]
color_map[2] = "#4898EF"

ax3.grid(color='black', linestyle=':', axis='y', zorder=0,  dashes=(1,5))
ax3.bar(europe_country_count_df["Age"], europe_country_count_df['Count'], 
       color=color_map, zorder=3
)

ax3.text(-1.8, 660, 
         'Europe', 
         fontsize=15, 
         fontweight='bold', 
         fontfamily='serif',
        )

# Others
ax4.grid(color='black', linestyle=':', axis='y', zorder=0,  dashes=(1,5))
ax4.bar(others_country_count_df["Age"], others_country_count_df['Count'], 
       color=color_map, zorder=3
)

ax4.text(-1.8, 440, 
         'Others', 
         fontsize=15, 
         fontweight='bold', 
         fontfamily='serif',
        )

# Africa
color_map = ["#4A4655" for _ in range(11)]
color_map[1] = "#4898EF"

ax5.grid(color='black', linestyle=':', axis='y', zorder=0,  dashes=(1,5))
ax5.bar(africa_country_count_df["Age"], africa_country_count_df['Count'], 
       color=color_map, zorder=3
)

ax5.text(-1.8, 440, 
         'Africa', 
         fontsize=15, 
         fontweight='bold', 
         fontfamily='serif',
        )

# Australia
color_map = ["#4A4655" for _ in range(11)]
color_map[2] = "#4898EF"

ax6.grid(color='black', linestyle=':', axis='y', zorder=0,  dashes=(1,5))
ax6.bar(australia_country_count_df["Age"], australia_country_count_df['Count'], 
       color=color_map, zorder=3
)

ax6.text(-1.8, 66, 
         'Australia', 
         fontsize=15, 
         fontweight='bold', 
         fontfamily='serif',
        )

for s in ["top","right","left"]:
    ax0.spines[s].set_visible(False)
    ax1.spines[s].set_visible(False)
    ax2.spines[s].set_visible(False)
    ax3.spines[s].set_visible(False)
    ax4.spines[s].set_visible(False)
    ax5.spines[s].set_visible(False)
    ax6.spines[s].set_visible(False)

<a id="5"></a>
# <p style="background-color:#f4f1e9;font-family:roboto;color:#0a0a0b;font-size:150%;text-align:center;border-radius:60px 40px;">Kagglers Gender</p>

* Most of Kagglers are man and it is consistent across all continents.

In [None]:
continents_count_df = pd.DataFrame(survey_df["Q2"].value_counts())
continents_count_df = continents_count_df.reset_index(drop=False)
continents_count_df.columns = ["Gender","Count"]

asia_country_count_df = pd.DataFrame(survey_asia_df["Q2"].value_counts())
asia_country_count_df = asia_country_count_df.reset_index(drop=False)
asia_country_count_df.columns = ["Gender","Count"]

america_country_count_df = pd.DataFrame(survey_america_df["Q2"].value_counts())
america_country_count_df = america_country_count_df.reset_index(drop=False)
america_country_count_df.columns = ["Gender","Count"]

europe_country_count_df = pd.DataFrame(survey_europe_df["Q2"].value_counts())
europe_country_count_df = europe_country_count_df.reset_index(drop=False)
europe_country_count_df.columns = ["Gender","Count"]

others_country_count_df = pd.DataFrame(survey_others_df["Q2"].value_counts())
others_country_count_df = others_country_count_df.reset_index(drop=False)
others_country_count_df.columns = ["Gender","Count"]

africa_country_count_df = pd.DataFrame(survey_africa_df["Q2"].value_counts())
africa_country_count_df = africa_country_count_df.reset_index(drop=False)
africa_country_count_df.columns = ["Gender","Count"]

australia_country_count_df = pd.DataFrame(survey_australia_df["Q2"].value_counts())
australia_country_count_df = australia_country_count_df.reset_index(drop=False)
australia_country_count_df.columns = ["Gender","Count"]

In [None]:
# Setting up figure and axes
fig = plt.figure(figsize=(10,5)) # create figure
gs = fig.add_gridspec(2, 4)
gs.update(wspace=0.1, hspace=0)
ax0 = fig.add_subplot(gs[0:2, 0])
ax1 = fig.add_subplot(gs[0, 1]) # create axes
ax2 = fig.add_subplot(gs[1, 1]) # create axes
ax3 = fig.add_subplot(gs[0, 2]) # create axes
ax4 = fig.add_subplot(gs[1, 2]) # create axes
ax5 = fig.add_subplot(gs[0, 3]) # create axes
ax6 = fig.add_subplot(gs[1, 3]) # create axes

# Change background color
background_color = "#ffffff"
fig.patch.set_facecolor(background_color) # figure background color
ax0.set_facecolor(background_color)
ax1.set_facecolor(background_color) 
ax2.set_facecolor(background_color) 
ax3.set_facecolor(background_color) 
ax4.set_facecolor(background_color) 
ax5.set_facecolor(background_color) 
ax6.set_facecolor(background_color) 

color_map = ["#112A86", "#f04fc5", "#bdbdbd"]

# World
ax0.pie(x=continents_count_df['Count'], colors=color_map, wedgeprops=dict(width=0.2))

ax0.text(-1.5, 3.5, 'Kagglers Gender', fontsize=20, fontweight='bold', fontfamily='serif')
ax0.text(-1.5, 3.1, 'Kagglers are dominated by man and are consistent across all continents', 
         fontsize=13, fontweight='light', fontfamily='serif')

ax0.text(0, 1.2, 'World', fontsize=13, fontweight='bold', fontfamily='serif', horizontalalignment='center')

ax0.legend(continents_count_df["Gender"], loc="lower center", bbox_to_anchor=(0.5, -0.5))

# Asia
ax1.pie(x=asia_country_count_df['Count'], colors=color_map, wedgeprops=dict(width=0.2))
ax1.text(0, 1.2, 'Asia', fontsize=13, fontweight='bold', fontfamily='serif', horizontalalignment='center')

# America
ax2.pie(x=america_country_count_df['Count'], colors=color_map, wedgeprops=dict(width=0.2))
ax2.text(0, 1.2, 'America', fontsize=13, fontweight='bold', fontfamily='serif', horizontalalignment='center')

# Europe
ax3.pie(x=europe_country_count_df['Count'], colors=color_map, wedgeprops=dict(width=0.2))
ax3.text(0, 1.2, 'Europe', fontsize=13, fontweight='bold', fontfamily='serif', horizontalalignment='center')

# Others
ax4.pie(x=others_country_count_df['Count'], colors=color_map, wedgeprops=dict(width=0.2))
ax4.text(0, 1.2, 'Others', fontsize=13, fontweight='bold', fontfamily='serif', horizontalalignment='center')

# Africa
ax5.pie(x=africa_country_count_df['Count'], colors=color_map, wedgeprops=dict(width=0.2))
ax5.text(0, 1.2, 'Africa', fontsize=13, fontweight='bold', fontfamily='serif',horizontalalignment='center')

# Australia
ax6.pie(x=australia_country_count_df['Count'], colors=color_map, wedgeprops=dict(width=0.2))
ax6.text(0, 1.2, 'Australia', fontsize=13, fontweight='bold', fontfamily='serif', horizontalalignment='center')

<a id="6"></a>
# <p style="background-color:#f4f1e9;font-family:roboto;color:#0a0a0b;font-size:150%;text-align:center;border-radius:60px 40px;">Kagglers Education</p>

* Most of Kagglers have Master degree followed by Bachelor degree. Master degree is the most education Kagglers had in every continents except for Asia and Africa.
* While most of Asian and African Kagglers have a bachelor degree.

In [None]:
education_categorical_list = ["No formal education past high school",
                            "Some college/university study without earning a bachelor’s degree",
                            "Bachelor’s degree",
                            "Master’s degree",
                            "Doctoral degree",
                            "Professional degree",
                            "I prefer not to answer"]

continents_count_df = pd.DataFrame(survey_df["Q4"].value_counts())
continents_count_df = continents_count_df.reset_index(drop=False)
continents_count_df.columns = ["Education","Count"]
continents_count_df["Education"] = pd.Categorical(
    continents_count_df["Education"], education_categorical_list)
continents_count_df = continents_count_df.sort_values(by="Education")

asia_country_count_df = pd.DataFrame(survey_asia_df["Q4"].value_counts())
asia_country_count_df = asia_country_count_df.reset_index(drop=False)
asia_country_count_df.columns = ["Education","Count"]
asia_country_count_df["Education"] = pd.Categorical(
    asia_country_count_df["Education"], education_categorical_list)
asia_country_count_df = asia_country_count_df.sort_values(by="Education")

america_country_count_df = pd.DataFrame(survey_america_df["Q4"].value_counts())
america_country_count_df = america_country_count_df.reset_index(drop=False)
america_country_count_df.columns = ["Education","Count"]
america_country_count_df["Education"] = pd.Categorical(
    america_country_count_df["Education"], education_categorical_list)
america_country_count_df = america_country_count_df.sort_values(by="Education")

europe_country_count_df = pd.DataFrame(survey_europe_df["Q4"].value_counts())
europe_country_count_df = europe_country_count_df.reset_index(drop=False)
europe_country_count_df.columns = ["Education","Count"]
europe_country_count_df["Education"] = pd.Categorical(
    europe_country_count_df["Education"], education_categorical_list)
europe_country_count_df = europe_country_count_df.sort_values(by="Education")

others_country_count_df = pd.DataFrame(survey_others_df["Q4"].value_counts())
others_country_count_df = others_country_count_df.reset_index(drop=False)
others_country_count_df.columns = ["Education","Count"]
others_country_count_df["Education"] = pd.Categorical(
    others_country_count_df["Education"], education_categorical_list)
others_country_count_df = others_country_count_df.sort_values(by="Education")

africa_country_count_df = pd.DataFrame(survey_africa_df["Q4"].value_counts())
africa_country_count_df = africa_country_count_df.reset_index(drop=False)
africa_country_count_df.columns = ["Education","Count"]
africa_country_count_df["Education"] = pd.Categorical(
    africa_country_count_df["Education"], education_categorical_list)
africa_country_count_df = africa_country_count_df.sort_values(by="Education")

australia_country_count_df = pd.DataFrame(survey_australia_df["Q4"].value_counts())
australia_country_count_df = australia_country_count_df.reset_index(drop=False)
australia_country_count_df.columns = ["Education","Count"]
australia_country_count_df["Education"] = pd.Categorical(
    australia_country_count_df["Education"], education_categorical_list)
australia_country_count_df = australia_country_count_df.sort_values(by="Education")

In [None]:
fig = plt.figure(figsize=(15,10)) # create figure
gs = fig.add_gridspec(1, 1)
gs.update(wspace=0.1, hspace=0)
ax0 = fig.add_subplot(gs[0, 0])

# Change background color
background_color = "#ffffff"
fig.patch.set_facecolor(background_color) # figure background color
ax0.set_facecolor(background_color) # axes background color

# World
color_map1 = ["#c7c7c7" for _ in range(7)]
color_map1[3] = "#112A86"
color_map2 = ["#c7c7c7" for _ in range(7)]
color_map2[2] = "#112A86"
color_map3 = ["#c7c7c7" for _ in range(7)]
color_map3[3] = "#112A86"
color_map4 = ["#c7c7c7" for _ in range(7)]
color_map4[3] = "#112A86"
color_map5 = ["#c7c7c7" for _ in range(7)]
color_map5[3] = "#112A86"
color_map6 = ["#c7c7c7" for _ in range(7)]
color_map6[2] = "#112A86"
color_map7 = ["#c7c7c7" for _ in range(7)]
color_map7[3] = "#112A86"

x_dummy = np.arange(1, 7)
y_dummy = np.arange(1, 8)
ax0.scatter([1 for _ in range(7)], y_dummy, color=color_map1, s=continents_count_df["Count"]/3)
ax0.scatter([2 for _ in range(7)], y_dummy, color=color_map2, s=asia_country_count_df["Count"]/3)
ax0.scatter([3 for _ in range(7)], y_dummy, color=color_map3, s=america_country_count_df["Count"]/3)
ax0.scatter([4 for _ in range(7)], y_dummy, color=color_map4, s=europe_country_count_df["Count"]/3)
ax0.scatter([5 for _ in range(7)], y_dummy, color=color_map5, s=others_country_count_df["Count"]/3)
ax0.scatter([6 for _ in range(7)], y_dummy, color=color_map6, s=africa_country_count_df["Count"]/3)
ax0.scatter([7 for _ in range(7)], y_dummy, color=color_map7, s=australia_country_count_df["Count"]/3)

ax0.set_yticklabels(["", "Below high school", "University study w/o bachelor", "Bachelor’s degree",
                     "Master’s degree", "Doctoral degree", "Professional degree", "No answer"])
ax0.set_xticklabels(["", "World", "Asia", "America", "Europe", "Other", "Africa", "Australia"])
ax0.invert_yaxis()

ax0.text(-1.5, 0.25, 'Kagglers Education', fontsize=20, fontweight='bold', fontfamily='serif')
ax0.text(-1.5, 0.5, 'Most of Kagglers have Master / Bachelor degrees', fontsize=13, fontweight='light', fontfamily='serif')

for s in ["top","right","left"]:
    ax0.spines[s].set_visible(False)
    ax1.spines[s].set_visible(False)
    ax2.spines[s].set_visible(False)
    ax3.spines[s].set_visible(False)
    ax4.spines[s].set_visible(False)
    ax5.spines[s].set_visible(False)
    ax6.spines[s].set_visible(False)

<a id="7"></a>
# <p style="background-color:#f4f1e9;font-family:roboto;color:#0a0a0b;font-size:150%;text-align:center;border-radius:60px 40px;">Kagglers Role</p>

* Most of Kagglers are Students, meaning there is a growing interest from younger people to pursue data related job which is a good sign and Asia has the biggest student compared to other continents.
* Other than Asia and Africa, most of the role between Student and Data Scientist are quite balance as it can be seen in America, Europe, Australia and Others

In [None]:
continents_count_df = pd.DataFrame(survey_df["Q5"].value_counts())
continents_count_df = continents_count_df.reset_index(drop=False)
continents_count_df.columns = ["Role","Count"]

asia_country_count_df = pd.DataFrame(survey_asia_df["Q5"].value_counts())
asia_country_count_df = asia_country_count_df.reset_index(drop=False)
asia_country_count_df.columns = ["Role","Count"]

america_country_count_df = pd.DataFrame(survey_america_df["Q5"].value_counts())
america_country_count_df = america_country_count_df.reset_index(drop=False)
america_country_count_df.columns = ["Role","Count"]

europe_country_count_df = pd.DataFrame(survey_europe_df["Q5"].value_counts())
europe_country_count_df = europe_country_count_df.reset_index(drop=False)
europe_country_count_df.columns = ["Role","Count"]

others_country_count_df = pd.DataFrame(survey_others_df["Q5"].value_counts())
others_country_count_df = others_country_count_df.reset_index(drop=False)
others_country_count_df.columns = ["Role","Count"]

africa_country_count_df = pd.DataFrame(survey_africa_df["Q5"].value_counts())
africa_country_count_df = africa_country_count_df.reset_index(drop=False)
africa_country_count_df.columns = ["Role","Count"]

australia_country_count_df = pd.DataFrame(survey_australia_df["Q5"].value_counts())
australia_country_count_df = australia_country_count_df.reset_index(drop=False)
australia_country_count_df.columns = ["Role","Count"]

# Merging into 1 dataframe
continents_count_df = continents_count_df.set_index("Role")
asia_country_count_df = asia_country_count_df.set_index("Role")
america_country_count_df = america_country_count_df.set_index("Role")
europe_country_count_df = europe_country_count_df.set_index("Role")
others_country_count_df = others_country_count_df.set_index("Role")
africa_country_count_df = africa_country_count_df.set_index("Role")
australia_country_count_df = australia_country_count_df.set_index("Role")
all_country_count_df = pd.DataFrame()
all_country_count_df = pd.concat([
    asia_country_count_df, america_country_count_df,
    europe_country_count_df, others_country_count_df,
    africa_country_count_df, australia_country_count_df
    ], axis=1
)
all_country_count_df.columns = ["Asia", "America", "Europe", "Others", "Africa", "Australia"]
all_country_count_df = all_country_count_df.T

In [None]:
# Setting up figure and axes
fig = plt.figure(figsize=(14,7)) # create figure
gs = fig.add_gridspec(1, 1)
gs.update(wspace=0, hspace=0.5)
ax0 = fig.add_subplot(gs[0, 0], ylim=(0, 3500))

# Change background color
background_color = "#ffffff"
fig.patch.set_facecolor(background_color) # figure background color
ax0.set_facecolor(background_color) # axes background color

x = np.arange(len(all_country_count_df))
bar_width = 0.06

ax0.grid(color='black', linestyle=':', axis='y', zorder=0,  dashes=(1,5))
ax0.bar(x, all_country_count_df["Student"], width=bar_width, color="#008294", label="Student", zorder=3)
ax0.bar(x+bar_width+0.01, all_country_count_df["Data Scientist"], width=bar_width, color="#006c7b", label="Data Scientist", zorder=3)
ax0.bar(x+bar_width*2+0.01*2, all_country_count_df["Software Engineer"], width=bar_width, color="#005561", label="Software Engineer", zorder=3)
ax0.bar(x+bar_width*3+0.01*3, all_country_count_df["Currently not employed"], width=bar_width, color="#003f48", label="Currently not employed", zorder=3)
ax0.bar(x+bar_width*4+0.01*4, all_country_count_df["Other"], width=bar_width, color="#003d46", label="Other", zorder=3)
ax0.bar(x+bar_width*5+0.01*5, all_country_count_df["Data Analyst"], width=bar_width, color="#002c32", label="Data Analyst", zorder=3)
ax0.bar(x+bar_width*6+0.01*6, all_country_count_df["Machine Learning Engineer"], width=bar_width, color="#001b1e", label="Machine Learning Engineer", zorder=3)
ax0.bar(x+bar_width*7+0.01*7, all_country_count_df["Research Scientist"], width=bar_width, color="#001b1e", label="Research Scientist", zorder=3)
ax0.bar(x+bar_width*8+0.01*8, all_country_count_df["Business Analyst"], width=bar_width, color="#4b4b4c", label="Business Analyst", zorder=3)
#ax0.bar(x+bar_width*9+0.01*9, all_country_count_df["Product/Project Manager"], width=bar_width, color="#676767", label="Product/Project Manager", zorder=3)
ax0.bar(x+bar_width*10+0.01*10, all_country_count_df["Data Engineer"], width=bar_width, color="#808080", label="Data Engineer", zorder=3)
ax0.bar(x+bar_width*11+0.01*11, all_country_count_df["Statistician"], width=bar_width, color="#989898", label="Statistician", zorder=3)
ax0.bar(x+bar_width*12+0.01*12, all_country_count_df["DBA/Database Engineer"], width=bar_width, color="#c6c6c6", label="DBA/Database Engineer", zorder=3)

# Fix the x-axes.
ax0.set_xticks(x + bar_width)
x_labels = list(all_country_count_df.index)
ax0.set_xticklabels(x_labels)

ax0.text(-0.5, 4200, 'Kagglers Role', fontsize=20, fontweight='bold', fontfamily='serif')
ax0.text(-0.5, 3900, 'Most of Kagglers are Students and Data Scientist', fontsize=13, fontweight='light', fontfamily='serif')

for s in ["top","right","left"]:
    ax0.spines[s].set_visible(False)
    
ax0.legend(loc='lower center', ncol=4, bbox_to_anchor=(0.48, -0.48))

<a id="8"></a>
# <p style="background-color:#f4f1e9;font-family:roboto;color:#0a0a0b;font-size:150%;text-align:center;border-radius:60px 40px;">Kagglers Programming Experience</p>

* Most of Asian and African Kagglers have experience between 1-3 years and Kagglers with 20+ years experience are very small.
* On the other hand American, European and Australian Kagglers have experience between 1-5 years.

In [None]:
experience_df = pd.DataFrame(survey_df.groupby(["Continents", "Q6"])["Continents"].count())
experience_df.columns = ["Count"]
experience_df = experience_df.reset_index(drop=False)

year_lst = ['I have never written code', '< 1 years', '1-3 years', '3-5 years', '5-10 years', '10-20 years', '20+ years']
experience_df["Q6"] = pd.Categorical(
    experience_df["Q6"], year_lst)
experience_df = experience_df.sort_values(by=["Continents", "Q6"])

asia_country_count_df = experience_df[experience_df["Continents"]=="Asia"]
asia_country_count_df["Asia"] = asia_country_count_df["Count"].apply(lambda x: (x/sum(asia_country_count_df["Count"])*100))
asia_country_count_df.index = asia_country_count_df["Q6"]
asia_country_count_df = asia_country_count_df.drop(["Q6", "Count", "Continents"], axis=1)
asia_country_count_df = asia_country_count_df.T

america_country_count_df = experience_df[experience_df["Continents"]=="America"]
america_country_count_df["America"] = america_country_count_df["Count"].apply(lambda x: (x/sum(america_country_count_df["Count"])*100))
america_country_count_df.index = america_country_count_df["Q6"]
america_country_count_df = america_country_count_df.drop(["Q6", "Count", "Continents"], axis=1)
america_country_count_df = america_country_count_df.T

europe_country_count_df = experience_df[experience_df["Continents"]=="Europe"]
europe_country_count_df["Europe"] = europe_country_count_df["Count"].apply(lambda x: (x/sum(europe_country_count_df["Count"])*100))
europe_country_count_df.index = europe_country_count_df["Q6"]
europe_country_count_df = europe_country_count_df.drop(["Q6", "Count", "Continents"], axis=1)
europe_country_count_df = europe_country_count_df.T

others_country_count_df = experience_df[experience_df["Continents"]=="Others"]
others_country_count_df["Others"] = others_country_count_df["Count"].apply(lambda x: (x/sum(others_country_count_df["Count"])*100))
others_country_count_df.index = others_country_count_df["Q6"]
others_country_count_df = others_country_count_df.drop(["Q6", "Count", "Continents"], axis=1)
others_country_count_df = others_country_count_df.T

africa_country_count_df = experience_df[experience_df["Continents"]=="Africa"]
africa_country_count_df["Africa"] = africa_country_count_df["Count"].apply(lambda x: (x/sum(africa_country_count_df["Count"])*100))
africa_country_count_df.index = africa_country_count_df["Q6"]
africa_country_count_df = africa_country_count_df.drop(["Q6", "Count", "Continents"], axis=1)
africa_country_count_df = africa_country_count_df.T

australia_country_count_df = experience_df[experience_df["Continents"]=="Australia"]
australia_country_count_df["Australia"] = australia_country_count_df["Count"].apply(lambda x: (x/sum(australia_country_count_df["Count"])*100))
australia_country_count_df.index = australia_country_count_df["Q6"]
australia_country_count_df = australia_country_count_df.drop(["Q6", "Count", "Continents"], axis=1)
australia_country_count_df = australia_country_count_df.T

experience_merge_df = pd.DataFrame()
experience_merge_df = pd.concat([asia_country_count_df, america_country_count_df, europe_country_count_df,
                                others_country_count_df, africa_country_count_df, australia_country_count_df], axis=0)

In [None]:
# Setting up figure and axes
fig = plt.figure(figsize=(12,5)) # create figure
gs = fig.add_gridspec(1, 1)
gs.update(wspace=0, hspace=0)
ax0 = fig.add_subplot(gs[0, 0])

# Change background color
background_color = "#ffffff"
fig.patch.set_facecolor(background_color) # figure background color
ax0.set_facecolor(background_color) # axes background color

ax0.barh(experience_merge_df.index, experience_merge_df['I have never written code'],
       color="#008294", zorder=3, label="No Experience"
)

ax0.barh(experience_merge_df.index, experience_merge_df['< 1 years'], 
       left=experience_merge_df['I have never written code'],
       color="#003f48", zorder=3, label="< 1 years"
)

ax0.barh(experience_merge_df.index, experience_merge_df['1-3 years'], 
       left=experience_merge_df['I have never written code']+experience_merge_df['< 1 years'],
       color="#4b4b4c", zorder=3, label="1-3 years"
)

ax0.barh(experience_merge_df.index, experience_merge_df['3-5 years'], 
       left=experience_merge_df['I have never written code']+experience_merge_df['< 1 years']+experience_merge_df['1-3 years'],
       color="#676767", zorder=3, label="3-5 years"
)

ax0.barh(experience_merge_df.index, experience_merge_df['5-10 years'], 
       left=experience_merge_df['I have never written code']+experience_merge_df['< 1 years']+experience_merge_df['1-3 years']+experience_merge_df['3-5 years'],
       color="#808080", zorder=3, label="5-10 years"
)

ax0.barh(experience_merge_df.index, experience_merge_df['10-20 years'], 
       left=experience_merge_df['I have never written code']+experience_merge_df['< 1 years']+experience_merge_df['1-3 years']+experience_merge_df['3-5 years']+experience_merge_df['5-10 years'],
       color="#989898", zorder=3, label="10-20 years"
)

ax0.barh(experience_merge_df.index, experience_merge_df['20+ years'], 
       left=experience_merge_df['I have never written code']+experience_merge_df['< 1 years']+experience_merge_df['1-3 years']+experience_merge_df['3-5 years']+experience_merge_df['5-10 years']+experience_merge_df['10-20 years'],
       color="#c6c6c6", zorder=3, label="20+ years"
)

ax0.invert_yaxis()
ax0.xaxis.set_major_formatter(mtick.PercentFormatter())    
ax0.legend(loc='lower center', ncol=7, bbox_to_anchor=(0.48, -0.25))

# Remove top, right and left line 
for s in ["top","right","left"]:
    ax0.spines[s].set_visible(False)
    
ax0.text(-0.5, -1.3, 'Kagglers Programming Experiences', fontsize=20, fontweight='bold', fontfamily='serif')
ax0.text(-0.5, -0.75, 'Most of Kagglers have experience between 1-3 years and 3-5 years', fontsize=13, fontweight='light', 
         fontfamily='serif')

<a id="9"></a>
# <p style="background-color:#f4f1e9;font-family:roboto;color:#0a0a0b;font-size:150%;text-align:center;border-radius:60px 40px;"> Kagglers Programming Languages</p>

* Python is the winners in here, Python is the most popular programming languanges amount Kagglers in every continents.
* The second most popular programming languanges that Kagglers use is SQL.
* R is not popular among Kaggler in Asia as it is ranked 7, which very different from the rest of the world which put R as no 3 programming languanges.
C and C++ are also still popular in Asia compared to other continents.

In [None]:
languange_lst = ["Q7_Part_1", "Q7_Part_2", "Q7_Part_3", "Q7_Part_4", "Q7_Part_5", "Q7_Part_6",
                "Q7_Part_7", "Q7_Part_8", "Q7_Part_9", "Q7_Part_10", "Q7_Part_11", "Q7_Part_12", "Q7_OTHER"] 
languange_df = survey_df.groupby(["Continents"])[languange_lst].count()
languange_df.columns = ["Python", "R", "SQL", "C", "C++", "Java", "Javascript", "Julia", "Swift", "Bash", "MATLAB", "None", "Other"]
languange_df = languange_df.loc[["Asia", "America", "Europe", "Others", "Africa", "Australia"], :]
languange_df = languange_df.T

In [None]:
# Setting up figure and axes
fig = plt.figure(figsize=(20,9)) # create figure
gs = fig.add_gridspec(2, 3)
gs.update(wspace=0.2, hspace=0.7)
ax0 = fig.add_subplot(gs[0, 0], ylim=(0, 10000))
ax1 = fig.add_subplot(gs[0, 1], ylim=(0, 4000)) # create axes
ax2 = fig.add_subplot(gs[0, 2], ylim=(0, 2500)) # create axes
ax3 = fig.add_subplot(gs[1, 0], ylim=(0, 1000)) # create axes
ax4 = fig.add_subplot(gs[1, 1], ylim=(0, 1000)) # create axes
ax5 = fig.add_subplot(gs[1, 2], ylim=(0, 200)) # create axes

color_map = ["#bdbdbd" for _ in range(13)]
color_map[0] = "#112A86"

# Change background color
background_color = "#ffffff"
fig.patch.set_facecolor(background_color) # figure background color
ax0.set_facecolor(background_color) # axes background color
ax1.set_facecolor(background_color) # axes background color
ax2.set_facecolor(background_color) # axes background color
ax3.set_facecolor(background_color) # axes background color
ax4.set_facecolor(background_color) # axes background color
ax5.set_facecolor(background_color) # axes background color

ax0.grid(color='black', linestyle=':', axis='y', zorder=0,  dashes=(1,5))
ax1.grid(color='black', linestyle=':', axis='y', zorder=0,  dashes=(1,5))
ax2.grid(color='black', linestyle=':', axis='y', zorder=0,  dashes=(1,5))
ax3.grid(color='black', linestyle=':', axis='y', zorder=0,  dashes=(1,5))
ax4.grid(color='black', linestyle=':', axis='y', zorder=0,  dashes=(1,5))
ax5.grid(color='black', linestyle=':', axis='y', zorder=0,  dashes=(1,5))

languange_df = languange_df.sort_values(by="Asia", ascending=False)
ax0.bar(languange_df.index, height=languange_df['Asia'], color=color_map, zorder=3)
ax0.set_xticklabels(languange_df.index, rotation=90)
languange_df = languange_df.sort_values(by="America", ascending=False)
ax1.bar(languange_df.index, height=languange_df['America'], color=color_map, zorder=3)
ax1.set_xticklabels(languange_df.index, rotation=90)
languange_df = languange_df.sort_values(by="Europe", ascending=False)
ax2.bar(languange_df.index, height=languange_df['Europe'], color=color_map, zorder=3)
ax2.set_xticklabels(languange_df.index, rotation=90)
languange_df = languange_df.sort_values(by="Others", ascending=False)
ax3.bar(languange_df.index, height=languange_df['Others'], color=color_map, zorder=3)
ax3.set_xticklabels(languange_df.index, rotation=90)
languange_df = languange_df.sort_values(by="Africa", ascending=False)
ax4.bar(languange_df.index, height=languange_df['Africa'], color=color_map, zorder=3)
ax4.set_xticklabels(languange_df.index, rotation=90)
languange_df = languange_df.sort_values(by="Australia", ascending=False)
ax5.bar(languange_df.index, height=languange_df['Australia'], color=color_map, zorder=3)
ax5.set_xticklabels(languange_df.index, rotation=90)

ax0.text(-2.5, 14000, 
         'Kagglers Programming Languange', 
         fontsize=20, fontweight='bold', fontfamily='serif')

ax0.text(-2.5, 12700, 
         'Two most popular programming languanges among Kagglers are Python and SQL', 
         fontsize=13, fontweight='light', fontfamily='serif')

ax0.text(-2.5, 11000, 
         'Asia', 
         fontsize=15, fontweight='bold', fontfamily='serif')

ax1.text(-2.5, 4400, 
         'America', 
         fontsize=15, fontweight='bold', fontfamily='serif')

ax2.text(-2.5, 2750, 
         'Europe', 
         fontsize=15, fontweight='bold', fontfamily='serif')

ax3.text(-2.5, 1100, 
         'Others', 
         fontsize=15, fontweight='bold', fontfamily='serif')

ax4.text(-2.5, 1100, 
         'Africa', 
         fontsize=15, fontweight='bold', fontfamily='serif')

ax5.text(-2.5, 220, 
         'Australia', 
         fontsize=15, fontweight='bold', fontfamily='serif')

for s in ["top","right","left"]:
    ax0.spines[s].set_visible(False)
    ax1.spines[s].set_visible(False)
    ax2.spines[s].set_visible(False)
    ax3.spines[s].set_visible(False)
    ax4.spines[s].set_visible(False)
    ax5.spines[s].set_visible(False)
    ax6.spines[s].set_visible(False)

 <a id="10"></a>
# <p style="background-color:#f4f1e9;font-family:roboto;color:#0a0a0b;font-size:150%;text-align:center;border-radius:60px 40px;"> Kagglers IDE</p>

* Jupyter is the most used IDE for Kagglers around the world.
* Though VSCode is the 2nd most used IDE in the world, it still has a very big gap compared to Jupyter.
* The third place is varies among the continents with most of Kagglers choose PyCharm except for America and Australia which prefer RStudio compared to PyCharm.
* R is ranked as the 3rd highest languange in all continents (except in Asia), but RStudio is not as popular as R itself in Europe, Others and Africa. Pycharm is still dominating in those three continents

In [None]:
ide_lst = ["Q9_Part_1", "Q9_Part_2", "Q9_Part_3", "Q9_Part_4", "Q9_Part_5", "Q9_Part_6",
           "Q9_Part_7", "Q9_Part_8", "Q9_Part_9", "Q9_Part_10","Q9_Part_11"] 
ide_df = survey_df.groupby(["Continents"])[ide_lst].count()
ide_df.columns = ["Jupyter", "RStudio", "Visual Studio", "VSCode", "PyCharm", "Spyder", "Notebook++", 
                  "Sublime Text", "Vim/ Emacs", "MATLAB","Jupyter Notebook"]
ide_df = ide_df.loc[["Asia", "America", "Europe", "Others", "Africa", "Australia"], :]
ide_df = ide_df.T

asia_country_count_df = pd.DataFrame(ide_df["Asia"]).sort_values(by="Asia", ascending=True)
america_country_count_df = pd.DataFrame(ide_df["America"]).sort_values(by="America", ascending=True)
europe_country_count_df = pd.DataFrame(ide_df["Europe"]).sort_values(by="Europe", ascending=True)
others_country_count_df = pd.DataFrame(ide_df["Others"]).sort_values(by="Others", ascending=True)
africa_country_count_df = pd.DataFrame(ide_df["Africa"]).sort_values(by="Africa", ascending=True)
australia_country_count_df = pd.DataFrame(ide_df["Australia"]).sort_values(by="Australia", ascending=True)

In [None]:
# Setting up figure and axes
fig = plt.figure(figsize=(20,9)) # create figure
gs = fig.add_gridspec(2, 3)
gs.update(wspace=0.5, hspace=0.5)
ax0 = fig.add_subplot(gs[0, 0])
ax1 = fig.add_subplot(gs[0, 1]) # create axes
ax2 = fig.add_subplot(gs[0, 2]) # create axes
ax3 = fig.add_subplot(gs[1, 0]) # create axes
ax4 = fig.add_subplot(gs[1, 1]) # create axes
ax5 = fig.add_subplot(gs[1, 2]) # create axes

color_map = ["#bdbdbd" for _ in range(13)]
color_map[10] = "#112A86"

# Change background color
background_color = "#ffffff"
fig.patch.set_facecolor(background_color) # figure background color
ax0.set_facecolor(background_color) # axes background color
ax1.set_facecolor(background_color) # axes background color
ax2.set_facecolor(background_color) # axes background color
ax3.set_facecolor(background_color) # axes background color
ax4.set_facecolor(background_color) # axes background color
ax5.set_facecolor(background_color) # axes background color

ax0.grid(color='black', linestyle=':', axis='x', zorder=0,  dashes=(1,5))
ax1.grid(color='black', linestyle=':', axis='x', zorder=0,  dashes=(1,5))
ax2.grid(color='black', linestyle=':', axis='x', zorder=0,  dashes=(1,5))
ax3.grid(color='black', linestyle=':', axis='x', zorder=0,  dashes=(1,5))
ax4.grid(color='black', linestyle=':', axis='x', zorder=0,  dashes=(1,5))
ax5.grid(color='black', linestyle=':', axis='x', zorder=0,  dashes=(1,5))

ax0.barh(asia_country_count_df.index, asia_country_count_df["Asia"], color=color_map, zorder=3)
ax1.barh(america_country_count_df.index, america_country_count_df["America"], color=color_map, zorder=3)
ax2.barh(europe_country_count_df.index, europe_country_count_df["Europe"], color=color_map, zorder=3)
ax3.barh(others_country_count_df.index, others_country_count_df["Others"], color=color_map, zorder=3)
ax4.barh(africa_country_count_df.index, africa_country_count_df["Africa"], color=color_map, zorder=3)
ax5.barh(australia_country_count_df.index, australia_country_count_df["Australia"], color=color_map, zorder=3)

ax0.text(-1500, 16, 'Kagglers IDE', fontsize=20, fontweight='bold', fontfamily='serif')
ax0.text(-1500, 14.5, 'Jupyter is the most used IDE among Kagglers', fontsize=13, fontweight='light', fontfamily='serif')
ax0.text(-1300, 12, 'Asia', fontsize=15, fontweight='bold', fontfamily='serif')
ax1.text(-500, 12, 'America', fontsize=15, fontweight='bold', fontfamily='serif')
ax2.text(-350, 12, 'Europe', fontsize=15, fontweight='bold', fontfamily='serif')
ax3.text(-150, 12, 'Others', fontsize=15, fontweight='bold', fontfamily='serif')
ax4.text(-150, 12, 'Africa', fontsize=15, fontweight='bold', fontfamily='serif')
ax5.text(-25, 12, 'Australia', fontsize=15, fontweight='bold', fontfamily='serif')

for s in ["top","right","left"]:
    ax0.spines[s].set_visible(False)
    ax1.spines[s].set_visible(False)
    ax2.spines[s].set_visible(False)
    ax3.spines[s].set_visible(False)
    ax4.spines[s].set_visible(False)
    ax5.spines[s].set_visible(False)
    ax6.spines[s].set_visible(False)

<a id="11"></a>
# <p style="background-color:#f4f1e9;font-family:roboto;color:#0a0a0b;font-size:150%;text-align:center;border-radius:60px 40px;"> Kagglers Visualization Libraries</p>

* Matplotlib is the most popular visualization libraries followed by Seaborn due to popularity Python among Kagglers
* Plotly, Plotply Express and ggplot2 share almost the same popularity with Plotply/Plotply Express a little bit more popular.
* Ggplot/ggplot2 is more popular than Pyplot in the rest of the world except for America.

In [None]:
viz_lst = ["Q14_Part_1", "Q14_Part_2", "Q14_Part_3", "Q14_Part_4", "Q14_Part_5", "Q14_Part_6", "Q14_Part_7", "Q14_Part_8", 
             "Q14_Part_9", "Q14_Part_10", "Q14_Part_11", "Q14_OTHER"] 
viz_df = survey_df.groupby(["Continents"])[viz_lst].count()
viz_df.columns = ["Matplotlib", "Seaborn", "Plotly/Plotly Express", "Ggplot/ggplot2", "Shiny", "D3 js", "Altair",
                  "Bokeh", "Geoplotlib", "Leaflet/Folium", "None", "Other"]
viz_df = viz_df.loc[["Asia", "America", "Europe", "Others", "Africa", "Australia"], :]
viz_df = viz_df.T
viz_df["World"] = viz_df.sum(axis=1)
viz_df = viz_df.sort_values(by="World", ascending=True)
viz_df["Min"] = viz_df.min(axis=1)
viz_df["Max"] = viz_df.max(axis=1)

In [None]:
# Setting up figure and axes
fig = plt.figure(figsize=(17,8)) # create figure
gs = fig.add_gridspec(1, 1)
gs.update(wspace=0, hspace=0)
ax0 = fig.add_subplot(gs[0, 0])

# Change background color
background_color = "#ffffff"
fig.patch.set_facecolor(background_color) # figure background color
ax0.set_facecolor(background_color) # axes background color

y_dummy = np.arange(1,len(viz_df.index)+1)

ax0.hlines(y=y_dummy, xmin=viz_df["Min"], xmax=viz_df["Max"], color='grey', alpha=0.4, zorder=3)
ax0.scatter(viz_df['World'], y_dummy, color='red', label='World')
ax0.scatter(viz_df['Asia'], y_dummy, color='#4898EF', label='Asia')
ax0.scatter(viz_df['America'], y_dummy, color='#112A86', label='America')
ax0.scatter(viz_df['Europe'], y_dummy, color='#2CC8E4', label='Europe')
ax0.scatter(viz_df['Others'], y_dummy, color='#FDBF08', label='Others')
ax0.scatter(viz_df['Africa'], y_dummy, color='#62751C', label='Africa')
ax0.scatter(viz_df['Australia'], y_dummy, color='#4A4655', label='Australia')

y_label = list(viz_df.index)
y_label.insert(0, "")
ax0.yaxis.set_major_locator(mtick.MultipleLocator(1))
ax0.set_yticklabels(y_label)
ax0.set_xticklabels([])
ax0.tick_params(bottom=False)
   
ax0.text(-100, 14, 
         'Kagglers Visualization Libraries', 
         fontsize=20, fontweight='bold', fontfamily='serif')

ax0.text(-100, 13.2, 
         'Matplotlib is the most popular visualization libraries among Kagglers ', 
         fontsize=13, fontweight='light', fontfamily='serif')

ax0.legend(loc='lower center', ncol=7, bbox_to_anchor=(0.53, -0.1))

for s in ["top","right", "left", "bottom"]:
    ax0.spines[s].set_visible(False)

<a id="12"></a>
# <p style="background-color:#f4f1e9;font-family:roboto;color:#0a0a0b;font-size:150%;text-align:center;border-radius:60px 40px;">Kagglers Machine Learning Algorithms</p>

* The first two algorithm that are regularly used by Kaggler around the world are the same. They are Linear/Logistic Regression and Decision Trees/Random Forests.
* Asian & African Kagglers ranked Convolutional Neural Networks as their 3rd regularly used algorithm and ranked Gradient Boosting Machines in the 4th position. This is reversed from America and Europe continents which ranked Gradient Boosting Machines in the 3rd position.
* Fifth position can be separated into 3 classification: (a) Recurrent Neural Networks are popular in Asia and Others continents, (b) Bayesian Approaches is more popular in America, Africa and Australia and (c) Dense Neural Networks is popular among Kagglers in Europe.

In [None]:
ml_algo_lst = ["Q17_Part_1", "Q17_Part_2", "Q17_Part_3", "Q17_Part_4", "Q17_Part_5", "Q17_Part_6",
               "Q17_Part_7", "Q17_Part_8", "Q17_Part_9", "Q17_Part_10", "Q17_Part_11", "Q17_OTHER"] 
ml_algo_df = survey_df.groupby(["Continents"])[ml_algo_lst].count()
ml_algo_df.columns = ["Linear/Logistic\nRegression", "Decision Trees/\nRandom Forests", "Gradient\nBoosting\nMachines", 
                      "Bayesian\nApproaches", "Evolutionary\nApproaches", "Dense\nNeural\nNetworks",
                      "Convolutional\nNeural\nNetworks", "Generative\nAdversarial\nNetworks", "Recurrent\nNeural\nNetworks",
                      "Transformer\nNetworks", "None", "Other"]     
ml_algo_df = ml_algo_df.loc[["Asia", "America", "Europe", "Others", "Africa", "Australia"], :]
ml_algo_df = ml_algo_df.T

In [None]:
# Setting up figure and axes
fig = plt.figure(figsize=(20,9)) # create figure
gs = fig.add_gridspec(2, 3)
gs.update(wspace=0.05, hspace=0.2)
ax0 = fig.add_subplot(gs[0, 0])
ax1 = fig.add_subplot(gs[0, 1]) # create axes
ax2 = fig.add_subplot(gs[0, 2]) # create axes
ax3 = fig.add_subplot(gs[1, 0]) # create axes
ax4 = fig.add_subplot(gs[1, 1]) # create axes
ax5 = fig.add_subplot(gs[1, 2]) # create axes

color_map = ["#008294", "#007180", "#00606d", "#004e59", "#003d46", "#002c32", "#4b4b4c", "#676767",
             "#808080", "#989898", "#c6c6c6", "#d3d3d3"]

# Change background color
background_color = "#ffffff"
fig.patch.set_facecolor(background_color) # figure background color
ax0.set_facecolor(background_color) # axes background color
ax1.set_facecolor(background_color) # axes background color
ax2.set_facecolor(background_color) # axes background color
ax3.set_facecolor(background_color) # axes background color
ax4.set_facecolor(background_color) # axes background color
ax5.set_facecolor(background_color) # axes background color

ax0.set_yticklabels([])
ax1.set_yticklabels([])
ax2.set_yticklabels([])
ax3.set_yticklabels([])
ax4.set_yticklabels([])
ax5.set_yticklabels([])

ax0.set_xticklabels([])
ax1.set_xticklabels([])
ax2.set_xticklabels([])
ax3.set_xticklabels([])
ax4.set_xticklabels([])
ax5.set_xticklabels([])

ax0.tick_params(left=False, bottom=False)
ax1.tick_params(left=False, bottom=False)
ax2.tick_params(left=False, bottom=False)
ax3.tick_params(left=False, bottom=False)
ax4.tick_params(left=False, bottom=False)
ax5.tick_params(left=False, bottom=False)

ml_algo_df = ml_algo_df.sort_values(by="Asia", ascending=False)
squarify.plot(sizes=ml_algo_df['Asia'], label=ml_algo_df.index[:5], color=color_map, ax=ax0)
ml_algo_df = ml_algo_df.sort_values(by="America", ascending=False)
squarify.plot(sizes=ml_algo_df['America'], label=ml_algo_df.index[:5], color=color_map, ax=ax1)
ml_algo_df = ml_algo_df.sort_values(by="Europe", ascending=False)
squarify.plot(sizes=ml_algo_df['Europe'], label=ml_algo_df.index[:5], color=color_map, ax=ax2)
ml_algo_df = ml_algo_df.sort_values(by="Others", ascending=False)
squarify.plot(sizes=ml_algo_df['Others'], label=ml_algo_df.index[:5], color=color_map, ax=ax3)
ml_algo_df = ml_algo_df.sort_values(by="Africa", ascending=False)
squarify.plot(sizes=ml_algo_df['Africa'], label=ml_algo_df.index[:5], color=color_map, ax=ax4)
ml_algo_df = ml_algo_df.sort_values(by="Australia", ascending=False)
squarify.plot(sizes=ml_algo_df['Australia'], label=ml_algo_df.index[:5], color=color_map, ax=ax5)

ax0.text(0, 130, 
         'Kagglers Machine Learning Algorithm', 
         fontsize=20, fontweight='bold', fontfamily='serif')

ax0.text(0, 119, 
         'Linear/Logistic Regression and Decision Trees/Random Forests are two most popular algorithm among Kagglers', 
         fontsize=13, fontweight='light', fontfamily='serif')

ax0.text(0, 103, 
         'Asia', 
         fontsize=15, fontweight='bold', fontfamily='serif')

ax1.text(0, 103, 
         'America', 
         fontsize=15, fontweight='bold', fontfamily='serif')

ax2.text(0, 103, 
         'Europe', 
         fontsize=15, fontweight='bold', fontfamily='serif')

ax3.text(0, 103, 
         'Others', 
         fontsize=15, fontweight='bold', fontfamily='serif')

ax4.text(0, 103, 
         'Africa', 
         fontsize=15, fontweight='bold', fontfamily='serif')

ax5.text(0, 103, 
         'Australia', 
         fontsize=15, fontweight='bold', fontfamily='serif')

for s in ["top","right","left", "bottom"]:
    ax0.spines[s].set_visible(False)
    ax1.spines[s].set_visible(False)
    ax2.spines[s].set_visible(False)
    ax3.spines[s].set_visible(False)
    ax4.spines[s].set_visible(False)
    ax5.spines[s].set_visible(False)

<a id="13"></a>
# <p style="background-color:#f4f1e9;font-family:roboto;color:#0a0a0b;font-size:150%;text-align:center;border-radius:60px 40px;">Kagglers Machine Learning Framework</p>

* Kagglers around the world agree the popularity rank for Scikit-Learn, TensorFlow and Keras which are in order of 1 to 3 respectively.
* The differences between TensorFlow and Keras are not far away among Kagglers as Keras is the high level API for TensorFlow. Kagglers that use Keras is also using TensorFlow but Kagglers that use TensorFlow may not using Keras, this situation also explain why TensorFlow is more popular than Keras.
* Xgboost is more popular in Africa and Australia compared to PyTorch while the rest of world prefers PyTorch.

In [None]:
ml_fm_lst = ["Q16_Part_1", "Q16_Part_2", "Q16_Part_3", "Q16_Part_4", "Q16_Part_5", "Q16_Part_6", "Q16_Part_7", "Q16_Part_8", 
             "Q16_Part_9", "Q16_Part_10", "Q16_Part_11", "Q16_Part_12", "Q16_Part_13", "Q16_Part_14", "Q16_Part_15", "Q16_OTHER"] 
ml_fm_df = survey_df.groupby(["Continents"])[ml_fm_lst].count()
ml_fm_df.columns = ["Scikit-learn", "TensorFlow", "Keras", "PyTorch", "Fast.ai", "MXNet", "Xgboost", "LightGBM", "CatBoost",
                    "Prophet", "H2O 3", "Caret", "Tidymodels", "JAX", "None", "Other"]
ml_fm_df = ml_fm_df.loc[["Asia", "America", "Europe", "Others", "Africa", "Australia"], :]
ml_fm_df = ml_fm_df.T.sort_values(by="Asia")

ml_fm_df["Asia_Percentage"] = ml_fm_df["Asia"].apply(lambda x: (x/sum(ml_fm_df["Asia"])*100))
ml_fm_df["America_Percentage"] = ml_fm_df["America"].apply(lambda x: (x/sum(ml_fm_df["America"])*100))
ml_fm_df["Europe_Percentage"] = ml_fm_df["Europe"].apply(lambda x: (x/sum(ml_fm_df["Europe"])*100))
ml_fm_df["Others_Percentage"] = ml_fm_df["Others"].apply(lambda x: (x/sum(ml_fm_df["Others"])*100))
ml_fm_df["Africa_Percentage"] = ml_fm_df["Africa"].apply(lambda x: (x/sum(ml_fm_df["Africa"])*100))
ml_fm_df["Australia_Percentage"] = ml_fm_df["Australia"].apply(lambda x: (x/sum(ml_fm_df["Australia"])*100))

In [None]:
# Setting up figure and axes
fig = plt.figure(figsize=(17,10)) # create figure
gs = fig.add_gridspec(3, 2)
gs.update(wspace=0.2, hspace=1)
ax0 = fig.add_subplot(gs[0, 0], ylim=(0, 30))
ax1 = fig.add_subplot(gs[0, 1], ylim=(0, 30))
ax2 = fig.add_subplot(gs[1, 0], ylim=(0, 30))
ax3 = fig.add_subplot(gs[1, 1], ylim=(0, 30))
ax4 = fig.add_subplot(gs[2, 0], ylim=(0, 30))
ax5 = fig.add_subplot(gs[2, 1], ylim=(0, 30))

# Change background color
background_color = "#ffffff"
fig.patch.set_facecolor(background_color) # figure background color
ax0.set_facecolor(background_color) # axes background color
ax1.set_facecolor(background_color) # axes background color
ax2.set_facecolor(background_color) # axes background color
ax3.set_facecolor(background_color) # axes background color
ax4.set_facecolor(background_color) # axes background color
ax5.set_facecolor(background_color) # axes background color

ml_fm_df = ml_fm_df.sort_values(by="Asia_Percentage", ascending=False)
ax0.step(ml_fm_df.index, ml_fm_df["Asia_Percentage"], color="red")
ax0.plot(ml_fm_df.index, ml_fm_df["Asia_Percentage"], 'o--', color="#4b4b4c", alpha=0.3)
ax0.set_xticklabels(ml_fm_df.index, rotation=90)

ml_fm_df = ml_fm_df.sort_values(by="America_Percentage", ascending=False)
ax1.step(ml_fm_df.index, ml_fm_df["America_Percentage"], color="#4898EF")
ax1.plot(ml_fm_df.index, ml_fm_df["America_Percentage"], 'o--', color="#4b4b4c", alpha=0.3)
ax1.set_xticklabels(ml_fm_df.index, rotation=90)

ml_fm_df = ml_fm_df.sort_values(by="Europe_Percentage", ascending=False)
ax2.step(ml_fm_df.index, ml_fm_df["Europe_Percentage"], color="#112A86")
ax2.plot(ml_fm_df.index, ml_fm_df["Europe_Percentage"], 'o--', color="#4b4b4c", alpha=0.3)
ax2.set_xticklabels(ml_fm_df.index, rotation=90)

ml_fm_df = ml_fm_df.sort_values(by="Others_Percentage", ascending=False)
ax3.step(ml_fm_df.index, ml_fm_df["Others_Percentage"], color="#2CC8E4")
ax3.plot(ml_fm_df.index, ml_fm_df["Others_Percentage"], 'o--', color="#4b4b4c", alpha=0.3)
ax3.set_xticklabels(ml_fm_df.index, rotation=90)

ml_fm_df = ml_fm_df.sort_values(by="Africa_Percentage", ascending=False)
ax4.step(ml_fm_df.index, ml_fm_df["Africa_Percentage"], color="#FDBF08")
ax4.plot(ml_fm_df.index, ml_fm_df["Africa_Percentage"], 'o--', color="#4b4b4c", alpha=0.3)
ax4.set_xticklabels(ml_fm_df.index, rotation=90)

ml_fm_df = ml_fm_df.sort_values(by="Australia_Percentage", ascending=False)
ax5.step(ml_fm_df.index, ml_fm_df["Australia_Percentage"], color="#62751C")
ax5.plot(ml_fm_df.index, ml_fm_df["Australia_Percentage"], 'o--', color="#4b4b4c", alpha=0.3)
ax5.set_xticklabels(ml_fm_df.index, rotation=90)

ax0.set_xlabel("")
ax1.set_xlabel("")
ax1.set_ylabel("")
ax2.set_xlabel("")
ax2.set_ylabel("")
ax3.set_xlabel("")
ax3.set_ylabel("")
ax4.set_ylabel("")
ax5.set_ylabel("")

ax0.yaxis.set_major_formatter(mtick.PercentFormatter())
ax1.yaxis.set_major_formatter(mtick.PercentFormatter())
ax2.yaxis.set_major_formatter(mtick.PercentFormatter())
ax3.yaxis.set_major_formatter(mtick.PercentFormatter())
ax4.yaxis.set_major_formatter(mtick.PercentFormatter())
ax5.yaxis.set_major_formatter(mtick.PercentFormatter())
 
ax0.text(-2.5, 43, 
         'Kagglers Machine Learning Framework', 
         fontsize=20, fontweight='bold', fontfamily='serif')

ax0.text(-2.5, 38, 
         'Scikit-Learn is the most popular machine learning framework among Kagglers', 
         fontsize=13, fontweight='light', fontfamily='serif')


ax0.text(0, 32, 'Asia',fontsize=15, fontweight='bold', fontfamily='serif')
ax1.text(0, 32, 'America', fontsize=15, fontweight='bold', fontfamily='serif')
ax2.text(0, 32, 'Europe', fontsize=15, fontweight='bold', fontfamily='serif')
ax3.text(0, 32, 'Others', fontsize=15, fontweight='bold', fontfamily='serif')
ax4.text(0, 32, 'Africa', fontsize=15, fontweight='bold', fontfamily='serif')
ax5.text(0, 32, 'Australia', fontsize=15, fontweight='bold', fontfamily='serif')

for s in ["top","right","left"]:
    ax0.spines[s].set_visible(False)
    ax1.spines[s].set_visible(False)
    ax2.spines[s].set_visible(False)
    ax3.spines[s].set_visible(False)
    ax4.spines[s].set_visible(False)
    ax5.spines[s].set_visible(False)

<a id="14"></a>
# <p style="background-color:#f4f1e9;font-family:roboto;color:#0a0a0b;font-size:150%;text-align:center;border-radius:60px 40px;">Kagglers NLP Methods</p>

* Word embeddings/vectors are the most popular methods among Kagglers around the world.
* Encoder-decoder models are the 2nd most popular NLP methods for Asian and African Kagglers. This is very different with Kagglers in America, Europe and Australia that more prefer Transformer language models.
* Many Kaggler that live in Others continents don't use any NLP methods as it is ranked at position 2 and there is no distinct differences with Encoder-decoder models and Transformer language models.

In [None]:
nlp_lst = ["Q19_Part_1", "Q19_Part_2", "Q19_Part_3", "Q19_Part_4", "Q19_Part_5", "Q19_OTHER"] 
nlp_df = survey_df.groupby(["Continents"])[nlp_lst].count()
nlp_df.columns = ["Word embeddings\n/vectors", "Encoder-decoder\nmodels", "Contextualized\nembeddings", "Transformer\nlanguage models",
                  "None", "Other"]

nlp_df = nlp_df.loc[["Asia", "America", "Europe", "Others", "Africa", "Australia"], :]
nlp_df = nlp_df.T
nlp_df["World"] = nlp_df.sum(axis=1)

In [None]:
# Setting up figure and axes
fig = plt.figure(figsize=(20,10)) # create figure
gs = fig.add_gridspec(2, 3)
gs.update(wspace=0.3, hspace=0.8)
ax0 = fig.add_subplot(gs[0, 0])
ax1 = fig.add_subplot(gs[0, 1])
ax2 = fig.add_subplot(gs[0, 2])
ax3 = fig.add_subplot(gs[1, 0])
ax4 = fig.add_subplot(gs[1, 1])
ax5 = fig.add_subplot(gs[1, 2])

ax0.grid(color='black', linestyle=':', axis='x', zorder=0,  dashes=(1,5))
ax1.grid(color='black', linestyle=':', axis='x', zorder=0,  dashes=(1,5))
ax2.grid(color='black', linestyle=':', axis='x', zorder=0,  dashes=(1,5))
ax3.grid(color='black', linestyle=':', axis='x', zorder=0,  dashes=(1,5))
ax4.grid(color='black', linestyle=':', axis='x', zorder=0,  dashes=(1,5))
ax5.grid(color='black', linestyle=':', axis='x', zorder=0,  dashes=(1,5))

# Change background color
background_color = "#ffffff"
fig.patch.set_facecolor(background_color) # figure background color
ax0.set_facecolor(background_color) # axes background color
ax1.set_facecolor(background_color) # axes background color
ax2.set_facecolor(background_color) # axes background color
ax3.set_facecolor(background_color) # axes background color
ax4.set_facecolor(background_color) # axes background color
ax5.set_facecolor(background_color) # axes background color

nlp_df = nlp_df.sort_values(by="Asia", ascending=False)
ax0.fill_between(x=nlp_df.index, y1=nlp_df['Asia'], color="#d3d3d3", zorder=3, alpha=0.5)
ax0.scatter(x=nlp_df.index, y=nlp_df["Asia"], s=75, color="#FDBF08", zorder=4)
ax0.set_xticklabels(nlp_df.index, rotation=90)

nlp_df = nlp_df.sort_values(by="America", ascending=False)
ax1.fill_between(x=nlp_df.index, y1=nlp_df['America'], color="#d3d3d3", zorder=3, alpha=0.5)
ax1.scatter(x=nlp_df.index, y=nlp_df["America"], s=75, color="#FDBF08", zorder=4)
ax1.set_xticklabels(nlp_df.index, rotation=90)

nlp_df = nlp_df.sort_values(by="Europe", ascending=False)
ax2.fill_between(x=nlp_df.index, y1=nlp_df['Europe'], color="#d3d3d3", zorder=3, alpha=0.5)
ax2.scatter(x=nlp_df.index, y=nlp_df["Europe"], s=75, color="#FDBF08", zorder=4)
ax2.set_xticklabels(nlp_df.index, rotation=90)

nlp_df = nlp_df.sort_values(by="Others", ascending=False)
ax3.fill_between(x=nlp_df.index, y1=nlp_df['Others'], color="#d3d3d3", zorder=3, alpha=0.5)
ax3.scatter(x=nlp_df.index, y=nlp_df["Others"], s=75, color="#FDBF08", zorder=4)
ax3.set_xticklabels(nlp_df.index, rotation=90)

nlp_df = nlp_df.sort_values(by="Africa", ascending=False)
ax4.fill_between(x=nlp_df.index, y1=nlp_df['Africa'], color="#d3d3d3", zorder=3, alpha=0.5)
ax4.scatter(x=nlp_df.index, y=nlp_df["Africa"], s=75, color="#FDBF08", zorder=4)
ax4.set_xticklabels(nlp_df.index, rotation=90)

nlp_df = nlp_df.sort_values(by="Australia", ascending=False)
ax5.fill_between(x=nlp_df.index, y1=nlp_df['Australia'], color="#d3d3d3", zorder=3, alpha=0.5)
ax5.scatter(x=nlp_df.index, y=nlp_df["Australia"], s=75, color="#FDBF08", zorder=4)
ax5.set_xticklabels(nlp_df.index, rotation=90)

ax0.text(0, 15, 'Asia',fontsize=15, fontweight='bold', fontfamily='serif')
ax1.text(0, 15, 'America', fontsize=15, fontweight='bold', fontfamily='serif')
ax2.text(0, 15, 'Europe', fontsize=15, fontweight='bold', fontfamily='serif')
ax3.text(0, 15, 'Others', fontsize=15, fontweight='bold', fontfamily='serif')
ax4.text(0, 15, 'Africa', fontsize=15, fontweight='bold', fontfamily='serif')
ax5.text(0, 15, 'Australia', fontsize=15, fontweight='bold', fontfamily='serif')

ax0.text(-1, 1800, 
         'Kagglers Natural Language Processing Methods', 
         fontsize=20, fontweight='bold', fontfamily='serif')

ax0.text(-1, 1650, 
         'Word embeddings/vectors are the most popular NLP methods', 
         fontsize=13, fontweight='light', fontfamily='serif')

for s in ["top","right","left"]:
    ax0.spines[s].set_visible(False)
    ax1.spines[s].set_visible(False)
    ax2.spines[s].set_visible(False)
    ax3.spines[s].set_visible(False)
    ax4.spines[s].set_visible(False)
    ax5.spines[s].set_visible(False)

<a id="15"></a>
# <p style="background-color:#f4f1e9;font-family:roboto;color:#0a0a0b;font-size:150%;text-align:center;border-radius:60px 40px;">Kagglers Computing Platform</p>

* Across all continents, personal computer and laptop are dominating among Kagglers especially in Africa.
* Cloud computing platform is still not as popular as PC and laptop though there are several free hosted notebooks.
* As expected, deep learning workstation is still not popular due to high price especially for personal use.

In [None]:
continents_count_df = pd.DataFrame(survey_df["Q11"].value_counts())
continents_count_df = continents_count_df.reset_index(drop=False)
continents_count_df.columns = ["Platform","Count"]

asia_country_count_df = pd.DataFrame(survey_asia_df["Q11"].value_counts())
asia_country_count_df = asia_country_count_df.reset_index(drop=False)
asia_country_count_df.columns = ["Platform","Count"]

america_country_count_df = pd.DataFrame(survey_america_df["Q11"].value_counts())
america_country_count_df = america_country_count_df.reset_index(drop=False)
america_country_count_df.columns = ["Platform","Count"]

europe_country_count_df = pd.DataFrame(survey_europe_df["Q11"].value_counts())
europe_country_count_df = europe_country_count_df.reset_index(drop=False)
europe_country_count_df.columns = ["Platform","Count"]

others_country_count_df = pd.DataFrame(survey_others_df["Q11"].value_counts())
others_country_count_df = others_country_count_df.reset_index(drop=False)
others_country_count_df.columns = ["Platform","Count"]

africa_country_count_df = pd.DataFrame(survey_africa_df["Q11"].value_counts())
africa_country_count_df = africa_country_count_df.reset_index(drop=False)
africa_country_count_df.columns = ["Platform","Count"]

australia_country_count_df = pd.DataFrame(survey_australia_df["Q11"].value_counts())
australia_country_count_df = australia_country_count_df.reset_index(drop=False)
australia_country_count_df.columns = ["Platform","Count"]

In [None]:
# Setting up figure and axes
fig = plt.figure(figsize=(17,7)) # create figure
gs = fig.add_gridspec(2, 3)
gs.update(wspace=0.1, hspace=0)
ax0 = fig.add_subplot(gs[0, 0])
ax1 = fig.add_subplot(gs[0, 1]) # create axes
ax2 = fig.add_subplot(gs[0, 2]) # create axes
ax3 = fig.add_subplot(gs[1, 0]) # create axes
ax4 = fig.add_subplot(gs[1, 1]) # create axes
ax5 = fig.add_subplot(gs[1, 2]) # create axes

# Change background color
background_color = "#ffffff"
fig.patch.set_facecolor(background_color) # figure background color
ax0.set_facecolor(background_color) # axes background color
ax1.set_facecolor(background_color) # axes background color
ax2.set_facecolor(background_color) # axes background color
ax3.set_facecolor(background_color) # axes background color
ax4.set_facecolor(background_color) # axes background color
ax5.set_facecolor(background_color) # axes background color

color_map = ["#4898EF", "#112A86", "#FDBF08", "#62751C", "#4A4655"]

# Asia
ax0.pie(x=asia_country_count_df['Count'], colors=color_map, wedgeprops=dict(width=0.2))

ax0.text(0, 1.2, 
         'Asia', 
         fontsize=13, 
         fontweight='bold', 
         fontfamily='serif',
         horizontalalignment='center'
        )

ax0.text(-1, 3.5, 
         'Kagglers Computing Platform', 
         fontsize=20, 
         fontweight='bold', 
         fontfamily='serif',
        )

ax0.text(-1, 3.1, 
         'PC and Laptop are the main computing platform for Kagglers', 
         fontsize=13, 
         fontweight='light', 
         fontfamily='serif',
        )

ax0.legend(continents_count_df["Platform"], loc='lower center', ncol=2, bbox_to_anchor=(1.8, 1.2))

# America
ax1.pie(x=america_country_count_df['Count'], colors=color_map, wedgeprops=dict(width=0.2))

ax1.text(0, 1.2, 
         'America', 
         fontsize=13, 
         fontweight='bold', 
         fontfamily='serif',
         horizontalalignment='center'
        )

# Europe
ax2.pie(x=europe_country_count_df['Count'], colors=color_map, wedgeprops=dict(width=0.2))

ax2.text(0, 1.2, 
         'Europe', 
         fontsize=13, 
         fontweight='bold', 
         fontfamily='serif',
         horizontalalignment='center'
        )

# Others
ax3.pie(x=others_country_count_df['Count'], colors=color_map, wedgeprops=dict(width=0.2))

ax3.text(0, 1.2, 
         'Others', 
         fontsize=13, 
         fontweight='bold', 
         fontfamily='serif',
         horizontalalignment='center'
        )

# Africa
ax4.pie(x=africa_country_count_df['Count'], colors=color_map, wedgeprops=dict(width=0.2))

ax4.text(0, 1.2, 
         'Africa', 
         fontsize=13, 
         fontweight='bold', 
         fontfamily='serif',
         horizontalalignment='center'
        )

# Australia
ax5.pie(x=australia_country_count_df['Count'], colors=color_map, wedgeprops=dict(width=0.2))

ax5.text(0, 1.2, 
         'Australia', 
         fontsize=13, 
         fontweight='bold', 
         fontfamily='serif',
         horizontalalignment='center')

<a id="16"></a>
# <p style="background-color:#f4f1e9;font-family:roboto;color:#0a0a0b;font-size:150%;text-align:center;border-radius:60px 40px;">Kagglers Cloud Computing Platform</p>

* Amazon Web Services is the king of cloud computing platform among Kagglers.
* Kagglers around the world agree that top 3 cloud computing platform are Amazon Web Services (AWS), Google Cloud Platform (GCP) and Microsoft Azure though the position are different in every continents.
* In Asia, the differences between AWS and GCP are not far enough followed by Kagglers that don't use cloud computing platform.
* American Kagglers prefer AWS than GCP as the gap is quite far and Azure in the third place following the GCP.
* European Kagglers preferences is with AWS and Azure.
* In Others continents, Kagglers user of AWS and GCP are almost the same. The gap between AWS, GCP, Azure and None are not far enough.
* African Kagglers choose GCP more than AWS while the Azure in 4th position following Kagglers without use of any cloud computing platform.
* Like America & Europe in Australia Azure get the 3rd position.

In [None]:
cloud_lst = ["Q27_A_Part_1", "Q27_A_Part_2", "Q27_A_Part_3", "Q27_A_Part_4", "Q27_A_Part_5", "Q27_A_Part_6", "Q27_A_Part_7", 
           "Q27_A_Part_8", "Q27_A_Part_9", "Q27_A_Part_10", "Q27_A_Part_11", "Q27_A_OTHER"] 
cloud_df = survey_df.groupby(["Continents"])[cloud_lst].count()
cloud_df.columns = ["Amazon Web Services", "Microsoft Azure", "Google Cloud Platform", "IBM Cloud/Red Hat", "Oracle Cloud", 
                  "SAP Cloud", "Salesforce Cloud", "VMware Cloud", "Alibaba Cloud", "Tencent Cloud", "None", "Other"]
cloud_df = cloud_df.loc[["Asia", "America", "Europe", "Others", "Africa", "Australia"], :]
cloud_df = cloud_df.T

In [None]:
# Setting up figure and axes
fig = plt.figure(figsize=(25,10)) # create figure
gs = fig.add_gridspec(2, 3)
gs.update(wspace=0.1, hspace=1)
ax0 = fig.add_subplot(gs[0,0])
ax1 = fig.add_subplot(gs[0, 1])
ax2 = fig.add_subplot(gs[0, 2])
ax3 = fig.add_subplot(gs[1, 0])
ax4 = fig.add_subplot(gs[1, 1])
ax5 = fig.add_subplot(gs[1, 2])

# Change background color
background_color = "#ffffff"
fig.patch.set_facecolor(background_color) # figure background color
ax0.set_facecolor(background_color) # axes background color
ax1.set_facecolor(background_color) # axes background color
ax2.set_facecolor(background_color) # axes background color
ax3.set_facecolor(background_color) # axes background color
ax4.set_facecolor(background_color) # axes background color
ax5.set_facecolor(background_color) # axes background color

color_map = ["#FDBF08" for _ in range(12)]
color_map[0] = "#112A86"

cloud_df = cloud_df.sort_values(by="Asia", ascending=False)
ax0.vlines(x=cloud_df.index, ymin=0, ymax=cloud_df["Asia"], color=color_map)
ax0.scatter(x=cloud_df.index, y=cloud_df["Asia"], s=75, color=color_map)
ax0.set_xticklabels(cloud_df.index, rotation=90)
cloud_df = cloud_df.sort_values(by="America", ascending=False)
ax1.vlines(x=cloud_df.index, ymin=0, ymax=cloud_df["America"], color=color_map)
ax1.scatter(x=cloud_df.index, y=cloud_df["America"], s=75, color=color_map)
ax1.set_xticklabels(cloud_df.index, rotation=90)
cloud_df = cloud_df.sort_values(by="Europe", ascending=False)
ax2.vlines(x=cloud_df.index, ymin=0, ymax=cloud_df["Europe"], color=color_map)
ax2.scatter(x=cloud_df.index, y=cloud_df["Europe"], s=75, color=color_map)
ax2.set_xticklabels(cloud_df.index, rotation=90)
cloud_df = cloud_df.sort_values(by="Others", ascending=False)
ax3.vlines(x=cloud_df.index, ymin=0, ymax=cloud_df["Others"], color=color_map)
ax3.scatter(x=cloud_df.index, y=cloud_df["Others"], s=75, color=color_map)
ax3.set_xticklabels(cloud_df.index, rotation=90)
cloud_df = cloud_df.sort_values(by="Africa", ascending=False)
ax4.vlines(x=cloud_df.index, ymin=0, ymax=cloud_df["Africa"], color=color_map)
ax4.scatter(x=cloud_df.index, y=cloud_df["Africa"], s=75, color=color_map)
ax4.set_xticklabels(cloud_df.index, rotation=90)
cloud_df = cloud_df.sort_values(by="Australia", ascending=False)
ax5.vlines(x=cloud_df.index, ymin=0, ymax=cloud_df["Australia"], color=color_map)
ax5.scatter(x=cloud_df.index, y=cloud_df["Australia"], s=75, color=color_map)
ax5.set_xticklabels(cloud_df.index, rotation=90)

ax0.text(6, 800, 'Asia',fontsize=15, fontweight='bold', fontfamily='serif')
ax1.text(6, 500, 'America', fontsize=15, fontweight='bold', fontfamily='serif')
ax2.text(6, 300, 'Europe', fontsize=15, fontweight='bold', fontfamily='serif')
ax3.text(6, 80, 'Others', fontsize=15, fontweight='bold', fontfamily='serif')
ax4.text(6, 100, 'Africa', fontsize=15, fontweight='bold', fontfamily='serif')
ax5.text(6, 30, 'Australia', fontsize=15, fontweight='bold', fontfamily='serif')

ax0.text(-2, 2200, 
         'Kagglers Cloud Computing Platform', 
         fontsize=20, fontweight='bold', fontfamily='serif')

ax0.text(-2, 2000, 
         'AWS, GCP and Azure are three most popular cloud computing platform', 
         fontsize=13, fontweight='light', fontfamily='serif')

for s in ["top","right","left"]:
    ax0.spines[s].set_visible(False)
    ax1.spines[s].set_visible(False)
    ax2.spines[s].set_visible(False)
    ax3.spines[s].set_visible(False)
    ax4.spines[s].set_visible(False)
    ax5.spines[s].set_visible(False)