![Africa in AI](https://techeconomy.ng/wp-content/uploads/2019/02/microsoft-dives-into-AI-in-africa.jpg)

# __AFRICA IN AI 2021__

We are almost at the end of the year 2021, and Kaggle has gifted us with is beautiful dataset of the survey of Data Science and Machine Learning for the year 2021.

As we are easing away from the Covid-19, and entering the new normal of rapid adoption of digital technologies, let us see how it has affected the world. But now we are looking through the lens of Africa.

With almost everything going digital, and the resurgence of the metaverse and Web 3.0, the old guards giving way to the new decentralized web, we will witness a new leapfrogging into the fourth Industrial Revolution, at least with the digital technologies in Africa. And artificial intelligence is going to play a huge role in this. That is why there is the rapid adoption of these technologies here in Africa.

So, working with the subset of the dataset - the Africa dataset - we will see the countries leading in these adoptions, and comparing the dataset with that of the rest of the world, we will see how Africa fits in to the this renaissance.

In [None]:
# This Python 3 environment comes with many helpful analytics libraries installed
# It is defined by the kaggle/python Docker image: https://github.com/kaggle/docker-python
# For example, here's several helpful packages to load

import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)

# Input data files are available in the read-only "../input/" directory
# For example, running this (by clicking run or pressing Shift+Enter) will list all files under the input directory

import os
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))

# You can write up to 20GB to the current directory (/kaggle/working/) that gets preserved as output when you create a version using "Save & Run All" 
# You can also write temporary files to /kaggle/temp/, but they won't be saved outside of the current session

In [None]:
import seaborn as sns
import plotly.graph_objects as go
import plotly.express as px
from sklearn import preprocessing

min_max_scaler = preprocessing.MinMaxScaler()

In [None]:
responses = pd.read_csv('/kaggle/input/kaggle-survey-2021/kaggle_survey_2021_responses.csv', low_memory=False)

In [None]:
question = responses.iloc[0].T.to_frame()
question.index.name = 'name'
question.columns = ['description']

In [None]:
# Remove question labels from the dataset
responses.drop(0, inplace=True)
responses.reset_index(inplace=True)

In [None]:
africa = ['Nigeria', 'Egypt', 'South Africa', 'Algeria', 'Tunisia', 'Morocco', 'Kenya', 'Uganda', 'Ghana', 'Ethiopia']

In [None]:
df_africa = responses[responses['Q3'].isin(africa)]
df_others = responses[~responses['Q3'].isin(africa)]

In [None]:
# Africa or others
responses['continent'] = list(map(lambda e: 'Africa' if e in africa else 'others', responses['Q3']))
responses['continent'] = responses['continent'].astype('category')

In [None]:
ide_usage = list(question[question['description'].str.contains('IDE')].index)
notebook_usage = list(question[question['description'].str.contains('notebook products')].index)
lang_usage = list(question[question['description'].str.contains('programming languages')].index)
visual_usage = list(question[question['description'].str.contains('visual')].index)
algo_usage = list(question[question['description'].str.contains('algo')].index)
ml_algo_usage = list(question[question['description'].str.contains('ML algo')].index)
cv_usage = list(question[question['description'].str.contains('computer vision')].index)
nlp_usage = list(question[question['description'].str.contains('natural language processing')].index)
ml_frameworks = list(question[question['description'].str.contains('machine learning frameworks')].index)
cloud_platform = list(question[question['description'].str.contains('cloud computing platform')].index)
cloud_platform_A = [x for x in cloud_platform if 'A' in x]
cloud_platform_B = [x for x in cloud_platform if 'B' in x]
cloud_product = list(question[question['description'].str.contains('cloud computing product')].index)
cloud_product_A = [x for x in cloud_product if 'A' in x]
cloud_product_B = [x for x in cloud_product if 'B' in x]
big_data_product = list(question[question['description'].str.contains('big data product')].index)
big_data_product_A = [x for x in big_data_product if 'A' in x]
big_data_product_B = [x for x in big_data_product if 'B' in x]
big_data_product_often = [x for x in big_data_product if 'Q33' in x]
ml_product = list(question[question['description'].str.contains('machine learning product')].index)
ml_product_A = [x for x in ml_product if 'A' in x]
ml_product_B = [x for x in ml_product if 'B' in x]
automl = list(question[question['description'].str.contains('auto')].index)
automl_categories_A = [x for x in automl if 'Q36_A' in x]
automl_categories_B = [x for x in automl if 'Q36_B' in x]
automl_specific_A = [x for x in automl if 'Q37_A' in x]
automl_specific_B = [x for x in automl if 'Q37_B' in x]

In [None]:
single_columns = {'age': ['Q1'], 'gender': ['Q2'], 'country': ['Q3'], 'formal_edu': ['Q4'], 'current_role': ['Q5'],
 'years_coding': ['Q6'], 'lang_recommend': ['Q8'], 'comp_platform': ['Q11'], 'tpu_usage': ['Q13'], 
 'ml_methods_usage': ['Q15'], 'industry_employer': ['Q20'], 'company_size': ['Q21'], 'ds_workloads': ['Q22'],
 'ml_incorp': ['Q23'], 'yr_compensation': ['Q25'], 'cl_comp_spend': ['Q26'], 'cl_platform_best': ['Q28'], 'pri_tool': ['Q41']}

In [None]:
def data_frame(alist, responses):
    i = 0
    data_dict = dict()
    while i < len(alist):
        for a,b in responses[alist[i]].value_counts().items():
            data_dict[a] = b
        i += 1
    data_df = pd.Series(data_dict).sort_values(ascending=False)
    return data_df

In [None]:
country_africa = data_frame(single_columns['country'], df_africa)
gender_africa = data_frame(single_columns['gender'], df_africa)

In [None]:
fig = px.bar(country_africa, y=country_africa.values, x=country_africa.index, text=country_africa.values)
fig.update_traces(texttemplate='%{text:.2s}', textposition='outside')
fig.show()

Here we see __Nigeria__ leading in the list of the African countries with over 34 percent of the total submissions, followed by Egypt and Kenya.

Thanks to the work of the __Data Science Nigeria__ led by __Dr. Olubayo Adekanmbi__ whose vision is to build an ['AI-first society where Artificial Intelligence is effectively deployed to solve local problems, particularly the sustainable millennial goals'](https://www.datasciencenigeria.org/adekanmbi-olubayo-2/) [1]. And also AI Saturdays Lagos organized by __Tejumade Afonja__ and __Femi Azeez__, with the goal of promoting Artificial Intelligence ['beyond the borders of Lagos and Nigeria'](https://www.aisaturdayslagos.com/) [2], and leading in the impact of AI in the Fourth Industrial Revolution.

There are other AI startups built by Nigerians making impact, like __Ubenwa__, an AI system built by __Charles Onu__ and __Innocent Udeogu__, which saves new-borns from birth asphyxia.

These are some of the impacts of AI in the African continent. Africa is truly emerging in leaps and bounds.

So, if we compare the education in Africa with the rest of the world, we will see how the adoption of Data Science and Machine Learning was achieved.

In [None]:
labels = list(gender_africa.index)
values = list(gender_africa.values)

fig = go.Figure(data=[go.Pie(labels=labels, values=values, textinfo='label+percent')])
fig.show()

The number of Data Science / Machine Learning practitioners in Africa is still overwhelmingly male dominated with __79 percent__ male and over __20 percent__ female.

That is why there are organizations like __Lagos Women in Machine Learning and Data Science (WiMLDS)__ to encourage more female participation in Data Science and Artificial Intelligence.

In [None]:
def normalized(data):
    values = data.values
    values = values.reshape((len(values), 1))
    scaler = min_max_scaler.fit(values)
    normalized =  scaler.transform(values)
    return normalized.ravel()


formal_edu_africa = data_frame(single_columns['formal_edu'], df_africa)
formal_edu_others = data_frame(single_columns['formal_edu'], df_others)

fig = go.Figure(data=[
    go.Bar(name='Africa Formal Edu', x=formal_edu_africa.index, y=normalized(formal_edu_africa)),
    go.Bar(name='Others Formal Edu', x=formal_edu_others.index, y=normalized(formal_edu_others))])
# Change the bar mode
fig.update_layout(barmode='group')

fig.show()

We notice that Africa has more __Bachelor's Degree__ holders than Master's Degree, unlike places like Europe and America, where there are more Master's Degree holders.

This because in many countries in Africa, most persons begin to provide for their families after Bachelor's Degree. And this means getting or creating jobs that pay. Later, Master's Degree will be obtained.

In [None]:
age_africa = data_frame(single_columns['age'], df_africa)
age_others = data_frame(single_columns['age'], df_others)

fig = go.Figure(data=[
    go.Bar(name='(Africa) Age ', x=age_africa.index, y=normalized(age_africa)),
    go.Bar(name='(Others) Age', x=age_others.index, y=normalized(age_others))])

# Change the bar mode
fig.update_layout(barmode='group')
fig.show()

The majority of those interesed in Data Science in Africa are the young people with the the age range of __25 - 29 years__. One observation. More of those within the age range of __18 - 21 years__ are interested in Data Science in the rest of the world than in Africa. If the majority of the world's young population are in Africa, why not then this age range being the leading one? It seems that this is so because of __the lack of technological infrastructure__ in the continent. That age range in Africa consists of those who are just leaving scondary schools and deciding to either enter the universities or learn a trade. But once they enter the institutions, (and by then they are a bit matured) it is easier for them to find their ground in the pursuit of the career in Data Science/Machine Learning.

Plaforms like __Data Science Nigeria__ through their University outreach, are helping students to be engaged early to a career in Data Science, and also bridging this gap of technological infrastructure.

In [None]:
years_coding_africa = data_frame(single_columns['years_coding'], df_africa)
years_coding_others = data_frame(single_columns['years_coding'], df_others)

fig = go.Figure(data=[
    go.Bar(name='Africa Years Coding', x=years_coding_africa.index, y=normalized(years_coding_africa)),
    go.Bar(name='Others Years Coding', x=years_coding_others.index, y=normalized(years_coding_others))])

# Change the bar mode
fig.update_layout(barmode='group')
fig.show()

The highest years of coding for Africans is about __1 - 3 years__ with the next highest which is about less than __a year__ of coding. This suggests that there are emerging number of Data Science and Machine Learning practitioners in the continent.

In [None]:
industry_employer_africa = data_frame(single_columns['industry_employer'], df_africa)
industry_employer_others = data_frame(single_columns['industry_employer'], df_others)

fig = go.Figure(data=[
    go.Bar(name='(Africa) Industry Employer ', x=industry_employer_africa.index, y=normalized(industry_employer_africa)),
    go.Bar(name='(Others) Industry Employer ', x=industry_employer_others.index, y=normalized(industry_employer_others))])

# Change the bar mode
fig.update_layout(barmode='group')
fig.show()

We noticed there are more Africans in __Education__ than in __Computers/Technology__. This corroborates the fact that there are more students who are interested in AI than there are practitioners. This means that there is the need for the creation of platforms or startups to ease the employment of the students after school. Also, the enabling of environment to help students to develop AI ideas, launch prototypes and begin their own startups.

In [None]:
curr_role_africa = data_frame(single_columns['current_role'], df_africa)
curr_role_others = data_frame(single_columns['current_role'], df_others)

fig = go.Figure(data=[
    go.Bar(name='(Africa) Current Role', x=curr_role_africa.index, y=normalized(curr_role_africa)),
    go.Bar(name='(Others) Current Role' , x=curr_role_others.index, y=normalized(curr_role_others))])

# Change the bar mode
fig.update_layout(barmode='group')
fig.show()

As we have observed above, there are more __students__ in Data Science/Machine Learning than there are __Data Scientists__ both for Africa and other countries.


# __SKILLS IN AI__

Let us see how AI skills and tools are used in Africa compared to other parts of the world.

In [None]:
notebook_usage_africa = data_frame(notebook_usage, df_africa)
notebook_usage_others = data_frame(notebook_usage, df_others)

fig = go.Figure(data=[
    go.Bar(name='(Africa) Notebook Usage', x=notebook_usage_africa.index, y=normalized(notebook_usage_africa)),
    go.Bar(name='(Others) Notebook Usage' , x=notebook_usage_others.index, y=normalized(notebook_usage_others))])

# Change the bar mode
fig.update_layout(barmode='group')
fig.show()

In [None]:
ide_usage_africa = data_frame(ide_usage, df_africa)
ide_usage_others = data_frame(ide_usage, df_others)

fig = go.Figure(data=[
    go.Bar(name='(Africa) IDE Usage', x=ide_usage_africa.index, y=normalized(ide_usage_africa)),
    go.Bar(name='(Others) IDE Usage' , x=ide_usage_others.index, y=normalized(ide_usage_others))])

# Change the bar mode
fig.update_layout(barmode='group')
fig.show()

In [None]:
lang_usage_africa = data_frame(lang_usage, df_africa)
lang_usage_others = data_frame(lang_usage, df_others)

fig = go.Figure(data=[
    go.Bar(name='(Africa) Language Usage', x=lang_usage_africa.index, y=normalized(lang_usage_africa)),
    go.Bar(name='(Others) Language Usage' , x=lang_usage_others.index, y=normalized(lang_usage_others))])

# Change the bar mode
fig.update_layout(barmode='group')
fig.show()

In [None]:
lang_recommend_africa = data_frame(single_columns['lang_recommend'], df_africa)
lang_recommend_others = data_frame(single_columns['lang_recommend'], df_others)

fig = go.Figure(data=[
    go.Bar(name='(Africa) Language Recommend', x=lang_recommend_africa.index, y=normalized(lang_recommend_africa)),
    go.Bar(name='(Others) Language Recommend' , x=lang_recommend_others.index, y=normalized(lang_recommend_others))])

# Change the bar mode
fig.update_layout(barmode='group')
fig.show()

We see that __Python__ is the dominant language in Data Science/ Artificial Intelligence. That is why it is most used and most recommended in both Africa and the rest of the world. And it is used on __Jupyter Notebooks__ and __Kaggle Notebooks__, the dominant IDE and notebook respectively in Africa. We see also __SQL__ as the next most used language. Because some of the data are structured in databases and must be querried and accessed through SQL. The third most used language is __Javascript__. This is because models developed in Python is useless if left in silos. It must then be integrated into an application for practical use. And Javascript is very good for application development.

In [None]:
algo_usage_africa = data_frame(algo_usage, df_africa)
algo_usage_others = data_frame(algo_usage, df_others)

fig = go.Figure(data=[
    go.Bar(name='(Africa) Algorithm Usage', x=algo_usage_africa.index, y=normalized(algo_usage_africa)),
    go.Bar(name='(Others) Algorithm Usage' , x=algo_usage_others.index, y=normalized(algo_usage_others))])

# Change the bar mode
fig.update_layout(barmode='group')
fig.show()

The most popular algorithm for both Africa and the rest of the world is __Linear or Logistic Regression__. This is because most of the day to day problems can be solved by the simple regression algorithms. The next are __Decision Trees__ and __Convolutional Neural Networks.__

Seeing __CNNs__ being the third most popular algorithm shows there is the emerging adoption of Deep Learning technologies in solving Data Science problems. A welcome development.

In [None]:
ml_frameworks_africa = data_frame(ml_frameworks, df_africa)
ml_frameworks_others = data_frame(ml_frameworks, df_others)

fig = go.Figure(data=[
    go.Bar(name='(Africa) Machine Learning Frameworks', x=ml_frameworks_africa.index, y=normalized(ml_frameworks_africa)),
    go.Bar(name='(Others) Machine Learning Frameworks' , x=ml_frameworks_others.index, y=normalized(ml_frameworks_others))])

# Change the bar mode
fig.update_layout(barmode='group')
fig.show()

__Scikit-learn__ is the most used Machine Learning framework. This makes sense because it is the framework most used for __Linear/Logistic Regression__.

The next most used is __TensorFlow__ which can be used for __Convolution Neural Networks__.

In [None]:
visual_usage_africa = data_frame(visual_usage, df_africa)
visual_usage_others = data_frame(visual_usage, df_others)

fig = go.Figure(data=[
    go.Bar(name='(Africa) Data Visualization Usage', x=visual_usage_africa.index, y=normalized(visual_usage_africa)),
    go.Bar(name='(Others) Data Visualization Usage' , x=visual_usage_others.index, y=normalized(visual_usage_others))])

# Change the bar mode
fig.update_layout(barmode='group')
fig.show()

The most popular Data Visualization library is __Matplotlib__, followed by __Seaborn__. This is because the most popular language is __Python__ with which these two libraries can be used. The third most popular is __Ggplot/ggplot2__ used in __R__ language.

In [None]:
yr_compensation_africa = data_frame(single_columns['yr_compensation'], df_africa)
yr_compensation_others = data_frame(single_columns['yr_compensation'], df_others)

fig = go.Figure(data=[
    go.Bar(name='(Africa) Year Compensation', x=yr_compensation_africa.index, y=normalized(yr_compensation_africa)),
    go.Bar(name='(Others) Year Compensation' , x=yr_compensation_others.index, y=normalized(yr_compensation_others))])

# Change the bar mode
fig.update_layout(barmode='group')
fig.show()

The greatest majority are earning less than __1000/year__ in dollars in Data Science both in Africa and Other countries. The next is within the range of __(1,000 - 1,999)/year__ in dollars for Africa. Which is less than the next for other countries __(10,000 - 14,999)__. If AI job roles are created or made accessible to emerging Data Scientists from Africa, they will earn more.

Hopefully, this can be achieved by either AI/Data Science/Machine Learning remote jobs or ones created in the country.

In [None]:
ml_incorp_africa = data_frame(single_columns['ml_incorp'], df_africa)
ml_incorp_others = data_frame(single_columns['ml_incorp'], df_others)

fig = go.Figure(data=[
    go.Bar(name='(Africa) Machine Learning Incorporation', x=ml_incorp_africa.index, y=normalized(ml_incorp_africa)),
    go.Bar(name='(Others) Machine Learning Incorporation', x=ml_incorp_others.index, y=normalized(ml_incorp_others))])

# Change the bar mode
fig.update_layout(barmode='group')
fig.show()

The greater part of the current companies in Africa do not use Machine Learning methods. But don't be fooled with this narrative. What is happening is that the old companies are giving way to the emergence of the new startups. For example, the recent fintech startups in Nigeria like __Paystack__, __Flutterware__ and __Interswitch__ are all __unicorns__ (valued more than 1 billion dollars) and are of more value than all Nigerian banks put together. This same phenomenon will happen in the next AI startup that will be tailored to the African needs, and it may be these young emerging African Data Scientists that will build that. So, watch this space.

# __CONCLUSION__

The future of Artificial Intelligence is indeed bright in Africa. With a young teeming population interested in Artificial Intelligence, we will see a renaissance of cutting edge AI startups in no distant time. So, whatever you are doing, keep your eye on Africa, especially Nigeria. Because according to the Nigerian Pidgin expression, _'Naija no dey carry last'._

Bye now, and thanks for reading up to this point.

# REFERENCES 

[1] Data Science Nigeria https://www.datasciencenigeria.org/adekanmbi-olubayo-2/

[2]. AI Saturdays Lagos https://www.aisaturdayslagos.com/