**A follow up analysis on African Kagglers:**

**Motivation for this analysis:**

Hello and Welcome to the notebook. The year 2020 has been nothing short of a roller coaster ride for many of us. Many industries and economies underwent a lot of changes to cope with the pandemic that is still raging our world. Amid all the chaos, we also got to witness a scenario where the pandemic has forced a lot of industries to adapt to digitization in a short span of time. One such example is from the African Continent. https://etradeforall.org/covid-19-is-fuelling-acceleration-in-digital-transformation-in-africa/

Africa as a continent is home to some of the fastest growing economies in the world. Out of top 10 fastest growing economies, more than 50% are in Africa. https://www.nasdaq.com/articles/the-five-fastest-growing-economies-in-the-world-2020-10-16

According to a world bank report https://blogs.worldbank.org/opendata/preparing-africas-next-generation-leadership-digital-data-and-innovation, there is a growing demand for initiatives that prepares a young generation of African people in the field of big data and innovation. 

The African Development Bank’s Coding for Employment e-learning platform has seen an increase in enrollments by 50% since the pandemic. https://blogs.worldbank.org/digital-development/new-skills-youth-succeed-post-covid-world

These are some positive news from a continent which has always been marred by poverty, food security, health and internal conflicts. But things are changing at an accelarated pace than it was anticipated. These findings have quashed my preconceived notion about the African continent and has inspired me to investigate the African data science community on Kaggle. 



**About the Analysis:**

This is an exploratory analysis that focusses on 
1. Kagglers from Africa in the year 2020 and how they have progressed from previous years (2018 or 2019) 
2. The Startup life of African Kagglers
3. African Kagglers and their Data Science/ ML status in 2020
4. African Kagglers and their plans for the next 2 years

In [None]:
import numpy as np 
import pandas as pd 
import plotly.express as px
import plotly.graph_objects as go
import warnings
warnings.filterwarnings("ignore")
import os
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))


In [None]:
survey_2018 = pd.read_csv('/kaggle/input/kaggle-survey-2018/multipleChoiceResponses.csv')
survey_2018['year'] = 2018
survey_2020 = pd.read_csv('/kaggle/input/kaggle-survey-2020/kaggle_survey_2020_responses.csv')
survey_2020['year'] = 2020
survey_2019 = pd.read_csv('/kaggle/input/kaggle-survey-2019/multiple_choice_responses.csv')
survey_2019['year'] = 2019

In [None]:
# create a list of participating African countries in 2018 and 2020
african_countries_2018 = ['Nigeria', 'Egypt', 'Kenya', 'Tunisia', 'Morocco', 'South Africa']
african_countries_2020 = ['Nigeria', 'Egypt', 'Kenya', 'South Africa', 'Ghana', 'Morocco', 'Tunisia']
african_countries_2019 = ['Nigeria', 'Egypt', 'Kenya', 'South Africa', 'Algeria', 'Morocco', 'Tunisia']

# Assign values by creating an additional column of african countries
survey_2018['african_country'] = np.where(survey_2018['Q3'].isin(african_countries_2018), 'Yes', 'No')
survey_2020['african_country'] = np.where(survey_2020['Q3'].isin(african_countries_2020), 'Yes', 'No')
survey_2019['african_country'] = np.where(survey_2019['Q3'].isin(african_countries_2019), 'Yes', 'No')


percent_2018 = (survey_2018.iloc[1:]['african_country'].value_counts(normalize = True)).reset_index()
percent_2018['year'] = 2018
percent_2020 = (survey_2020.iloc[1:]['african_country'].value_counts(normalize = True)).reset_index()
percent_2020['year'] = 2020
percent_2019 = (survey_2019.iloc[1:]['african_country'].value_counts(normalize = True)).reset_index()
percent_2019['year'] = 2019

df = pd.concat([percent_2018, percent_2019, percent_2020])

**1. Kagglers from Africa in the year 2020 and how they have progressed from previous years (2018 or 2019)**


**1.1. Status of their presence:**

There has been a consistent **increase** in **percent** of kagglers from the **African continent**. In the year **2018** only **2.85%** of the responders were from African Continent. However there is a significant growth in these numbers from the last 2 years due to which the percent of African Kagglers who chose to respond on this survey is around **6%**

In [None]:
import plotly.express as px
import plotly.graph_objects as go

df1 = df[df['index'] == 'Yes']
df1['year'] = df1['year'].astype(str)

fig = px.bar(df1, x="year", 
             y="african_country", 
             text = "african_country", 
             color = "african_country", 
             color_continuous_scale=px.colors.sequential.Peach)

fig.update_traces(texttemplate='%{text:.2%}', 
                  textposition='outside')

fig.update_layout(yaxis_tickformat = '%', 
                  title = {"text":"Percent of responders has doubled from the last 2 years", 
                           "xanchor":"center", 
                           'y':0.98,
                           'x':0.5})
fig.show()

**1.2. Status of African Countries present:**

Let's highlight those countries that represented their underrepresented continent on this survey:

* Nigeria 
* Egypt 
* Kenya 
* South Africa
* Morocco 
* Tuninisa
* Ghana

Out of these 7 countries, we saw **Ghana** getting added to the list in the year 2020. 

**Most** of the African Kagglers belong to **Nigeria** in 2018 and in 2020, while the number of kagglers from **South Africa dropped** in **2020**. Egypt and Kenya improved their positions from 2018-2020.

In [None]:
af_2018 = survey_2018.iloc[1:][survey_2018.iloc[1:]['Q3'].isin(african_countries_2018)]
af_2018 = af_2018[['Q3', 'year']].value_counts(normalize = True).reset_index()
af_2020 = survey_2020.iloc[1:][survey_2020.iloc[1:]['Q3'].isin(african_countries_2020)]
af_2020 = af_2020[['Q3', 'year']].value_counts(normalize = True).reset_index()

df = pd.concat([af_2018, af_2020])

In [None]:
colors = ['lightslategray',] * 7
colors[6] = 'crimson'

fig = go.Figure(data=[go.Bar(
    x=af_2020['Q3'],
    y=af_2020[0],
    marker_color=colors, text = af_2020[0],
)])
fig.update_traces(texttemplate='%{text:.2%}', 
                  textposition='outside')
fig.update_layout(yaxis_tickformat = '%', 
                  title = {"text":"Countries represented in 2020 survey- Ghana is the new entrant this year", 
                           "xanchor":"center", 
                           'y':0.98,
                           'x':0.5,})

In [None]:
colors = ['lightslategray',] * 6
#colors[3] = 'slategray'

fig = go.Figure(data=[go.Bar(
    x=af_2018['Q3'],
    y=af_2018[0],
    marker_color=colors, text = af_2018[0],
)])
fig.update_traces(texttemplate='%{text:.2%}', textposition='outside')
fig.update_layout(title_text='6 Countries were represented in 2018 survey', yaxis_tickformat = '%')

**1.3. The in-house competition:**

While my geography is not so strong but I managed to find out that Africa as a continent is vast and there are a lot of countries that belong to this continent. The top 7 countries represented on Kaggle are miles apart from each other geographically. It would be insightful to see which part of the African continent fares prominently on kaggle. 

In the year **2018**, the **North African** counntries consisting of Egypt, Morocco and Tunisia fared **slightly better** than West African countries that had just Nigerian participants 2 years back. But this year with **improved numbers** from **Nigeria** and addition of **Ghana** has made **North Africa a clear winner**. The numbers from south africa and east africa has improved as well.

In [None]:
cat_dict   = {'Nigeria':'west africa', 'Egypt':'north africa', 
              'Kenya':'east africa', 'South Africa':'south africa', 
              'Ghana':'west africa', 'Morocco':'north africa', 
              'Tunisia':'north africa'}

df['region'] = df['Q3'].map(cat_dict)
df= df.rename(columns ={0:'percent'})

fig = px.bar(df, x="region", 
             y='percent', 
             facet_col="year", 
             text = 'percent', 
             color = 'Q3')

fig.update_traces(texttemplate='%{text:.2%}', 
                  textposition='outside')

fig.update_layout(uniformtext_minsize=8, 
                  uniformtext_mode='hide', 
                  xaxis_tickangle=-45, 
                  yaxis_tickformat = '%')
fig.show()


In [None]:
# Take out all rows belonging to african countries
african_survey_2018 = survey_2018.iloc[1:][survey_2018.iloc[1:]['Q3'].isin(african_countries_2018)]
african_survey_2019 = survey_2019.iloc[1:][survey_2019.iloc[1:]['Q3'].isin(african_countries_2019)]
african_survey_2020 = survey_2020.iloc[1:][survey_2020.iloc[1:]['Q3'].isin(african_countries_2020)]

**1.4. Status of Age:**


When it comes to the Age of African Kagglers, we can see that the **percent** of participants belonging to age group **22-24** years and **25-29** years has **reduced in 2020**, where as the percent of **18-21 year olds** has **increased**. 

African continent is home to some of the youngest populations in the world. https://www.weforum.org/agenda/2019/08/youngest-populations-africa/.. So it is insightful to see increased percent of '18-21' year olds using kaggle as a platform for technical revolution in the year 2020.

In [None]:
# Age

age_2018 = (african_survey_2018.iloc[1:][['Q2', 'year']].value_counts(normalize = True)).reset_index()
age_2018 = age_2018.rename(columns= {'Q2': 'age'})

age_2020 = (african_survey_2020.iloc[1:][['Q1', 'year']].value_counts(normalize = True)).reset_index()
age_2020 = age_2020.rename(columns= {'Q1': 'age'})

df = pd.concat([age_2018, age_2020])
df = df.rename(columns = {0: 'percent'})
df['year'] = df['year'].astype(str)
fig = px.bar(df, x="age", y='percent', barmode="group", color="year", text = 'percent', 
             color_discrete_sequence=["green", "lightslategray"])

fig.update_traces(texttemplate='%{text:.2%}', 
                  textposition='outside')

fig.update_layout(uniformtext_minsize=8, 
                  uniformtext_mode='hide', 
                  xaxis_tickangle=-45, 
                  yaxis_tickformat = '%',
                  title = {"text":"Age: Increase in percent of 18-21 year olds in 2020", 
                           "xanchor":"center", 
                           'y':0.98,
                           'x':0.5})

fig.show()

**1.5. Status of Gender:**

Well Ladies are still underrepresented, just like the rest of the population. 

In [None]:
# Gender

gender_2018 = (african_survey_2018.iloc[1:][['Q1', 'year']].value_counts(normalize = True)).reset_index()
gender_2018 = gender_2018.rename(columns= {'Q1': 'gender'})

gender_2020 = (african_survey_2020.iloc[1:][['Q2', 'year']].value_counts(normalize = True)).reset_index()
gender_2020 = gender_2020.rename(columns= {'Q2': 'gender'})

df = pd.concat([gender_2018, gender_2020])
df = df.rename(columns = {0: 'percent'})
df['gender'] = df['gender'].replace(['Man', 'Woman'], ['Male', 'Female'])

fig = px.bar(df, x="gender", 
             y='percent', 
             barmode="group", 
             facet_col="year", 
             text = 'percent')

fig.update_traces(texttemplate='%{text:.2%}', 
                  textposition='outside', 
                  marker_color = "lightslategray")

fig.update_layout(uniformtext_minsize=8, 
                  uniformtext_mode='hide', 
                  xaxis_tickangle=-45, 
                  yaxis_tickformat = '%',
                  title = {"text":"No change in percent of female participation", 
                           "xanchor":"center", 
                           'y':0.98,
                           'x':0.5,})
fig.show()

**1.6. Status of Educational degree:**

It's significant to note that the **gap** between the percent of Bachelor's degree holders and Master's degree holders from Africa has **increased to 18%** in **2020** which was just around **3% in 2018**. **Around 7%** **donot have any degree in 2020 which was 5% during 2018**

In [None]:
# Education
education_2018 = (african_survey_2018.iloc[1:][['Q4', 'year']].value_counts(normalize = True)).reset_index()
education_2018 = education_2018.rename(columns= {'Q4': 'education'})

education_2020 = (african_survey_2020.iloc[1:][['Q4', 'year']].value_counts(normalize = True)).reset_index()
education_2020 = education_2020.rename(columns= {'Q4': 'education'})

df = pd.concat([education_2018, education_2020])
df = df.rename(columns = {0: 'percent'})
df['education'] = df['education'].replace(['Some college/university study without earning a bachelor’s degree', 'No formal education past high school'], ['no_degree', 'high_school'])
df["year"] = df["year"].astype(str)


fig = px.bar(df, x="education", 
             y='percent', 
             barmode="group", 
             color="year", text = 'percent', 
             color_discrete_sequence=["green", "lightslategray"])

fig.update_traces(texttemplate='%{text:.2%}', 
                  textposition='outside')

fig.update_layout(uniformtext_minsize=8, 
                  uniformtext_mode='hide', 
                  xaxis_tickangle=45, 
                  yaxis_tickformat = '%',
                  title = {"text":"Decrease in percent of Master degree holders", 
                           "xanchor":"center", 
                           'y':0.98,
                           'x':0.5,})
fig.show()

**1.7. Status of Employment:**

The **percent** of **Not employed** kagglers has **doubled** in these **two years**. In 2018 it was around 5% and in 2020 the Not Employed Kagglers are 12%. 

In [None]:
# title
title_2018 = (african_survey_2018.iloc[1:][['Q6', 'year']].value_counts(normalize = True)).reset_index()
title_2018 = title_2018.rename(columns= {'Q6': 'title'})

title_2020 = (african_survey_2020.iloc[1:][['Q5', 'year']].value_counts(normalize = True)).reset_index()
title_2020 = title_2020.rename(columns= {'Q5': 'title'})

df = pd.concat([title_2018, title_2020])
df = df.rename(columns = {0: 'percent'})
df['title'] = df['title'].replace(['Currently not employed'], ['Not employed'])
fig = px.bar(df, x="title", y="percent", barmode="group", facet_row="year", text = 'percent')
fig.update_traces(texttemplate='%{text:.2%}', textposition='outside', marker_color = 'lightslategray')
fig.update_layout(yaxis_tickformat = '%',
                  title = {"text":"Increase in percent of 'not employed' participants", 
                           "xanchor":"center", 
                           'y':0.98,
                           'x':0.5,})
fig.show()

**2. THE START UP LIFE OF AFRICAN KAGGLERS:**

*According to an article in Disrupt Africa https://disrupt-africa.com/2020/12/the-top-5-pan-african-startup-sector-developments-of-2020/, the year 2020 witnessed a increase in investments and fundings to startups that help alleviate the impact of Covid pandemic*

Investigating the **company size** of Kagglers from Africa reveals that most of the African Kagglers work in **small companies** with **less than 50 employees**. In the year **2019 around 49%** of African Kagglers represented this category, however the number of Kagglers belonging to **very small startups** has **increased to 58%**. 

In [None]:
# company
company_2019 = (african_survey_2019.iloc[1:][['Q6', 'year']].value_counts(normalize = True)).reset_index()
company_2019 = company_2019.rename(columns= {'Q6': 'company_size'})

company_2020 = (african_survey_2020.iloc[1:][['Q20', 'year']].value_counts(normalize = True)).reset_index()
company_2020 = company_2020.rename(columns= {'Q20': 'company_size'})

df = pd.concat([company_2019, company_2020])
df = df.rename(columns = {0: 'percent'})
df['company_size'] = df['company_size'].replace(['> 10,000 employees'], ['10,000 or more employees'])
df["year"] = df["year"].astype(str)


fig = px.bar(df, x="company_size", 
             y="percent", 
             barmode="group", 
             color="year", 
             text = 'percent')
fig.update_traces(texttemplate='%{text:.2%}', textposition='outside')

fig.update_layout(yaxis_tickformat = '%', 
                  title = {"text":"Increase in percent of Kagglers working in very small startups", 
                           "xanchor":"center", 
                           'y':0.98,
                           'x':0.5,})
fig.show()

However, Most of the companies have just **1-2 team members** who look after the **Data Science work**. 

In [None]:
# company
company_2019 = (african_survey_2019.iloc[1:][['Q7', 'year']].value_counts(normalize = True)).reset_index()
company_2019 = company_2019.rename(columns= {'Q7': 'ds_team_size'})

company_2020 = (african_survey_2020.iloc[1:][['Q21', 'year']].value_counts(normalize = True)).reset_index()
company_2020 = company_2020.rename(columns= {'Q21': 'ds_team_size'})

df = pd.concat([company_2019, company_2020])
df = df.rename(columns = {0: 'percent'})
#df['company'] = df['company'].replace(['> 10,000 employees'], ['10,000 or more employees'])
df["year"] = df["year"].astype(str)


fig = px.bar(df, x="ds_team_size", 
             y="percent", 
             barmode="group", 
             color="year", 
             text = 'percent')

fig.update_traces(texttemplate='%{text:.2%}', textposition='outside')

fig.update_layout(yaxis_tickformat = '%', 
                  title = {"text":"DS team strength", 
                           "xanchor":"center", 
                           'y':0.98,
                           'x':0.5,})
fig.show()

When it comes to using **Machine Learning**, only about **6.5%** say that they have **well established ML models** into production for more than 2 years. Around **14%** say that they have **recently deployed models in production**. However, **most percent** of respondents are **yet to utilize any ML models** in their company. 

In [None]:
# company
company_2019 = (african_survey_2019.iloc[1:][['Q8', 'year']].value_counts(normalize = True)).reset_index()
company_2019 = company_2019.rename(columns= {'Q8': 'ML_strategy'})

company_2020 = (african_survey_2020.iloc[1:][['Q22', 'year']].value_counts(normalize = True)).reset_index()
company_2020 = company_2020.rename(columns= {'Q22': 'ML_strategy'})

df = pd.concat([company_2019, company_2020])
df = df.rename(columns = {0: 'percent'})
df['ML_strategy'] = df['ML_strategy'].replace(['We use ML methods for generating insights (but do not put working models into production)', 
                                     'We recently started using ML methods (i.e., models in production for less than 2 years)', 
                                     'We have well established ML methods (i.e., models in production for more than 2 years)', 
                                     'We are exploring ML methods (and may one day put a model into production)', 
                                     'No (we do not use ML methods)'], 
                                    ['generating insights', 'recently started', 'well established', 'exploring', 'no'])



df["year"] = df["year"].astype(str)


fig = px.bar(df, 
             x="ML_strategy", 
             y="percent", 
             barmode="group", 
             color="year", 
             text = 'percent')

fig.update_traces(texttemplate='%{text:.2%}', 
                  textposition='outside')

fig.update_layout(yaxis_tickformat = '%', 
                  title = {"text":"Does current employer use ML", 
                           "xanchor":"center", 
                           'y':0.98,
                           'x':0.5,})
fig.show()

**3. AFRICAN KAGGLERS AND THEIR DATA SCIENCE JOURNEY IN 2020:**


**3.1. Status of Experience in Coding:**

Around **8%** of African Kagglers **do not have a coding experience** and **28%** have coding experience between **1-2 years**. 

In [None]:
# experience coding

experience_2018 = (african_survey_2018.iloc[1:][['Q8', 'year']].value_counts(normalize = True)).reset_index()
experience_2018 = experience_2018.rename(columns= {'Q8': 'experience', 0:'percent'})

experience_2020 = (african_survey_2020.iloc[1:][['Q6', 'year']].value_counts(normalize = True)).reset_index()
experience_2020 = experience_2020.rename(columns= {'Q6': 'experience', 0:'percent'})

colors = ['lightslategray',] * 7
colors[3] = 'crimson'

fig = go.Figure(data=[go.Bar(
    x=experience_2020['experience'],
    y=experience_2020['percent'],
    marker_color=colors, text = experience_2020['percent'],
)])
fig.update_traces(texttemplate='%{text:.2%}', textposition='outside')

fig.update_layout(yaxis_tickformat = '%', 
                  title = {"text":"Experience in coding 2020", 
                           "xanchor":"center", 
                           'y':0.98,
                           'x':0.5,})

fig.show()

**3.2. Status of Experience in Machine Learning:**

**13%** if African Kagglers say that they **do not use any Machine Learning** methods. While around 50% have less than an year of Machine Learning experience. 

In [None]:
# experience in Machine Learning

experience_2020 = (african_survey_2020.iloc[1:][['Q15', 'year']].value_counts(normalize = True)).reset_index()
experience_2020 = experience_2020.rename(columns= {'Q15': 'experience in ML'})

colors = ['lightslategray',] * 9
colors[2] = 'crimson'

fig = go.Figure(data=[go.Bar(
    x=experience_2020['experience in ML'],
    y=experience_2020[0],
    marker_color=colors, text = experience_2020[0],
)])

fig.update_traces(texttemplate='%{text:.2%}', 
                  textposition='outside')

fig.update_layout(yaxis_tickformat = '%', 
                  title = {"text":"Experience in Machine Learning in 2020: Around 50% have experience under 1 year", 
                           "xanchor":"center", 
                           'y':0.98,
                           'x':0.5,})

fig.show()


**3.3. Popular Data Science platform:**

**Coursera** remains the popular Data Science platform for African Kagglers. 

In [None]:
Q37= ['Q37_Part_1', 'Q37_Part_2', 'Q37_Part_3', 'Q37_Part_4', 'Q37_Part_5', 'Q37_Part_6', 'Q37_Part_7', 'Q37_Part_8', 'Q37_Part_9', 'Q37_Part_10', 'Q37_Part_11', 'Q37_OTHER']

df1 = african_survey_2020[Q37]
df2 = df1.stack().reset_index()
df3= df2[0].value_counts(normalize = True).reset_index()
df3
df3['index'] = df3['index'].replace(['University Courses (resulting in a university degree)', 'Cloud-certification programs (direct from AWS, Azure, GCP, or similar)'], ['University Courses', 'Cloud-certification programs'])


colors = ['lightslategray',] * 12
colors[0] = 'crimson'

fig = go.Figure(data=[go.Bar(
    x=df3['index'],
    y=df3[0],
    marker_color=colors, text = df3[0],
)])

fig.update_traces(texttemplate='%{text:.2%}', 
                  textposition='outside')

fig.update_layout(yaxis_tickformat = '%', 
                  title = {"text":"Coursera is the most popular data science platform", 
                           "xanchor":"center", 
                           'y':0.98,
                           'x':0.5,})

fig.show()

**3.4. Status of ML algorithm used:**

The traditional/basic ML models are the most popular ones.

In [None]:
Q17= ['Q17_Part_1', 'Q17_Part_2', 'Q17_Part_3', 'Q17_Part_4', 'Q17_Part_5', 'Q17_Part_6', 'Q17_Part_7', 'Q17_Part_8', 'Q17_Part_9', 'Q17_Part_10', 'Q17_Part_11', 'Q17_OTHER']

df1 = african_survey_2020[Q17]
df2 = df1.stack().reset_index()
df3= df2[0].value_counts(normalize = True).reset_index()


colors = ['lightslategray',] * 12
colors[0:1] = ['crimson', 'crimson']

fig = go.Figure(data=[go.Bar(
    x=df3['index'],
    y=df3[0],
    marker_color=colors, text = df3[0],
)])

fig.update_traces(texttemplate='%{text:.2%}', textposition='outside')

fig.update_layout(yaxis_tickformat = '%', 
                  title = {"text":"ML algorithms used mostly in 2020", 
                           "xanchor":"center", 
                           'y':0.98,
                           'x':0.5,})

fig.show()

**4. AFRICAN KAGGLERS AND THEIR PLAN FOR NEXT 2 YEARS:**


In this segment we will see what are the prominent ML tools and products used by the African Kagglers in the year 2020 and What are their plans for next 2 year regarding these tools and products. 

**4.1. Plan for using Cloud Computing Platforms and Products in next 2 years:**

Around **10-17%** of them **do not** use any **cloud computing platforms** or **products** presently but **wish to learn more** about the cloud computing platforms in the **next 2 years**.



In [None]:
Q26= ['Q26_A_Part_1', 'Q26_A_Part_2', 'Q26_A_Part_3', 'Q26_A_Part_4', 'Q26_A_Part_5', 'Q26_A_Part_6', 'Q26_A_Part_7', 'Q26_A_Part_8', 'Q26_A_Part_9', 'Q26_A_Part_10', 'Q26_A_Part_11', 'Q26_A_OTHER']
#df = pd.melt(african_survey_2020, id_vars = african_survey_2020['Q3'], value_vars = [african_survey_2020['Q26_B_Part_1'], african_survey_2020['Q26_B_OTHER']])
df1 = african_survey_2020[Q26]
df2 = df1.stack().reset_index()
df3= df2[0].value_counts(normalize = True).reset_index()
colors = ['lightslategray',] * 12
colors[2] = 'crimson'

fig = go.Figure(data=[go.Bar(
    x=df3['index'],
    y=df3[0],
    marker_color=colors, text = df3[0],
)])
fig.update_traces(texttemplate='%{text:.2%}', textposition='outside')

fig.update_layout(yaxis_tickformat = '%', 
                  title = {"text":"Cloud computing platforms used in 2020", 
                           "xanchor":"center", 
                           'y':0.98,
                           'x':0.5,})

fig.show()

In [None]:
Q26= ['Q26_B_Part_1', 'Q26_B_Part_2', 'Q26_B_Part_3', 'Q26_B_Part_4', 'Q26_B_Part_5', 'Q26_B_Part_6', 'Q26_B_Part_7', 'Q26_B_Part_8', 'Q26_B_Part_9', 'Q26_B_Part_10', 'Q26_B_Part_11', 'Q26_B_OTHER']
#df = pd.melt(african_survey_2020, id_vars = african_survey_2020['Q3'], value_vars = [african_survey_2020['Q26_B_Part_1'], african_survey_2020['Q26_B_OTHER']])
df1 = african_survey_2020[Q26]
df2 = df1.stack().reset_index()
df3= df2[0].value_counts(normalize = True).reset_index()
colors = ['lightslategray',] * 12

fig = go.Figure(data=[go.Bar(
    x=df3['index'],
    y=df3[0],
    marker_color=colors, text = df3[0],
)])
fig.update_traces(texttemplate='%{text:.2%}', textposition='outside')

fig.update_layout(yaxis_tickformat = '%', 
                  title = {"text":"Respondents are looking forward to work on GCP, AWS and Azure in next 2 years", 
                           "xanchor":"center", 
                           'y':0.98,
                           'x':0.5,})

fig.show()

In [None]:
Q27 =['Q27_A_Part_1', 'Q27_A_Part_2', 'Q27_A_Part_3', 'Q27_A_Part_4', 'Q27_A_Part_5', 'Q27_A_Part_6', 'Q27_A_Part_7', 'Q27_A_Part_8', 'Q27_A_Part_9', 'Q27_A_Part_10', 'Q27_A_Part_11', 'Q27_A_OTHER']
df1 = african_survey_2020[Q27]
df2 = df1.stack().reset_index()
df3= df2[0].value_counts(normalize = True).reset_index()
colors = ['lightslategray',] * 12
colors[4] = 'crimson'

fig = go.Figure(data=[go.Bar(
    x=df3['index'],
    y=df3[0],
    marker_color=colors, text = df3[0],
)])
fig.update_traces(texttemplate='%{text:.2%}', textposition='outside')

fig.update_layout(yaxis_tickformat = '%', 
                  title = {"text":"Cloud computing products in 2020: Around 10% do not use any cloud products", 
                           "xanchor":"center", 
                           'y':0.98,
                           'x':0.5,})

fig.show()

In [None]:
Q27= ['Q27_B_Part_1', 'Q27_B_Part_2', 'Q27_B_Part_3', 'Q27_B_Part_4', 'Q27_B_Part_5', 'Q27_B_Part_6', 'Q27_B_Part_7', 'Q27_B_Part_8', 'Q27_B_Part_9', 'Q27_B_Part_10', 'Q27_B_Part_11', 'Q27_B_OTHER']

df1 = african_survey_2020[Q27]
df2 = df1.stack().reset_index()
df3= df2[0].value_counts(normalize = True).reset_index()
colors = ['lightslategray',] * 12

fig = go.Figure(data=[go.Bar(
    x=df3['index'],
    y=df3[0],
    marker_color=colors, text = df3[0],
)])
fig.update_traces(texttemplate='%{text:.2%}', textposition='outside')

fig.update_layout(yaxis_tickformat = '%', 
                  title = {"text":"Cloud computing products plan for next 2 years", 
                           "xanchor":"center", 
                           'y':0.98,
                           'x':0.5,})

fig.show()

**4.2. Plans for using ML Products in next 2 years:**

Presently around **26%** of African Kagglers **do not use any ML products** but plan to use ML products **mostly Google Cloud Products** in the next **2 years**. 

In [None]:
Q28= ['Q28_A_Part_1', 'Q28_A_Part_2', 'Q28_A_Part_3', 'Q28_A_Part_4', 'Q28_A_Part_5', 'Q28_A_Part_6', 'Q28_A_Part_7', 'Q28_A_Part_8', 'Q28_A_Part_9', 'Q28_A_Part_10', 'Q28_A_OTHER']

df1 = african_survey_2020[Q28]
df2 = df1.stack().reset_index()
df3= df2[0].value_counts(normalize = True).reset_index()
colors = ['lightslategray',] * 12
colors[0] = 'crimson'

fig = go.Figure(data=[go.Bar(
    x=df3['index'],
    y=df3[0],
    marker_color=colors, text = df3[0],
)])
fig.update_traces(texttemplate='%{text:.2%}', textposition='outside')

fig.update_layout(yaxis_tickformat = '%', 
                  title = {"text":"ML products used in 2020: 25% of respondents are yet to use any ML product", 
                           "xanchor":"center", 
                           'y':0.98,
                           'x':0.5,})

fig.show()

In [None]:
Q28= ['Q28_B_Part_1', 'Q28_B_Part_2', 'Q28_B_Part_3', 'Q28_B_Part_4', 'Q28_B_Part_5', 'Q28_B_Part_6', 'Q28_B_Part_7', 'Q28_B_Part_8', 'Q28_B_Part_9', 'Q28_B_Part_10', 'Q28_B_OTHER']

df1 = african_survey_2020[Q28]
df2 = df1.stack().reset_index()
df3= df2[0].value_counts(normalize = True).reset_index()
colors = ['lightslategray',] * 12

colors[0:1] = ['crimson', 'crimson']
colors[3:4] = ['crimson', 'crimson']


fig = go.Figure(data=[go.Bar(
    x=df3['index'],
    y=df3[0],
    marker_color=colors, text = df3[0],
)])
fig.update_traces(texttemplate='%{text:.2%}', textposition='outside')

fig.update_layout(yaxis_tickformat = '%', 
                  title = {"text":"ML product: Most respondents see themselves using Google Cloud products", 
                           "xanchor":"center", 
                           'y':0.98,
                           'x':0.5,})

fig.show()

**4.3. Plan for using Business Intelligence tools in next 2 years:**

Around **24%** of African Kagglers use **Microsoft Power BI tools** for **visualization** purposes, but there are still **24%** of African Kagglers who are **yet to use any BI tools**. However, in the **next two years** around **15%** are planning to use **Google Data Studio** which is being used by just **8%** of African Kagglers in the year **2020**. 

In [None]:
Q31A = ['Q31_A_Part_1', 'Q31_A_Part_2', 'Q31_A_Part_3', 'Q31_A_Part_4', 'Q31_A_Part_5', 'Q31_A_Part_6', 'Q31_A_Part_7', 'Q31_A_Part_8', 'Q31_A_Part_9', 'Q31_A_Part_10', 'Q31_A_Part_11', 'Q31_A_Part_12', 'Q31_A_Part_13', 'Q31_A_Part_14', 'Q31_A_OTHER']
df1 = african_survey_2020[Q31A]
df2 = df1.stack().reset_index()
df3= df2[0].value_counts(normalize = True).reset_index()
colors = ['lightslategray',] * 15
colors[1] = 'crimson'

fig = go.Figure(data=[go.Bar(
    x=df3['index'],
    y=df3[0],
    marker_color=colors, text = df3[0],
)])
fig.update_traces(texttemplate='%{text:.2%}', textposition='outside')

fig.update_layout(yaxis_tickformat = '%', 
                  title = {"text":"BI tools used in 2020: 24% are yet to use any BI tools", 
                           "xanchor":"center", 
                           'y':0.98,
                           'x':0.5,})

fig.show()

In [None]:
Q30= ['Q31_B_Part_1', 'Q31_B_Part_2', 'Q31_B_Part_3', 'Q31_B_Part_4', 'Q31_B_Part_5', 'Q31_B_Part_6', 'Q31_B_Part_7', 'Q31_B_Part_8', 'Q31_B_Part_9', 'Q31_B_Part_10', 'Q31_B_Part_11', 'Q31_B_Part_12', 'Q31_B_Part_13', 'Q31_B_Part_14', 'Q31_B_OTHER']

df1 = african_survey_2020[Q30]
df2 = df1.stack().reset_index()
df3= df2[0].value_counts(normalize = True).reset_index()
colors = ['lightslategray',] * 15
colors[2] = 'crimson'

fig = go.Figure(data=[go.Bar(
    x=df3['index'],
    y=df3[0],
    marker_color=colors, text = df3[0],
)])
fig.update_traces(texttemplate='%{text:.2%}', textposition='outside')

fig.update_layout(yaxis_tickformat = '%', 
                  title = {"text":"The interest in Google Data Studio has jumped from 8% to 15%", 
                           "xanchor":"center", 
                           'y':0.98,
                           'x':0.5,})

fig.show()

**CONCLUSION:**

It is really encouraging to see that the percent of Kagglers from the African continent is increasing year by year. The continent which is largely underrepresented in tech is undergoing a silent tech revolution. 
Even during the pandemic the countries in the African continent were able to accelarate the process of digitization, where startups and small companies played a significant role. https://institute.global/policy/covid-19-sparked-african-tech-revolution-heres-how-it-happened

From this survey we found that most of the African Kagglers (more than 70%) work in small companies with 0-250 employees. If I have to make an assumption, then I would like to assume that Kagglers from Africa also have a significant role to play in the digital revolution of the continent and we as a community should applause their efforts and encourage them to feel comfortable and participate in this community by bringing the spotlight on to them. 

I am sure there are many such analysis conducted on the African community in the past and I feel this is a story that is worth repeating until we change the status quo. That is when the true power of diversity and inclusiveness will prevail. 
