<img src="https://www.epsiloninnovation.com/img/climate-predictive-analytics.png" alt="Drawing" style="width: 2000px;height: 300px;"/>

image credit: [epsiloninnovation](epsiloninnovation.com)

<div class="alert alert-block alert-warning">
    <h1> <strong> Exploring Data Analytics in Africa: The People, The Industry and The Tools </strong> 
    </h1>
</div>        

    By: Ariel Jumba
    Email: arieljumba5@gmail.com
    Data by : Kaggle Survey 2021

<div class="alert alert-block alert-warning">
    <h2> <strong> Introduction </strong> 
    </h2>
</div>  

    
In 2021, Kaggle, a global data science community platform, carried out a survey to establish the extent of coverage of various variables across the data science field.

This survey was meant to enable us understand ,among other things, the following:

- The demographic spread of professionals whose main functions and scope involve data and analytics.

- To understand which industries across the various demographics apply data analytics in their operations. This includes the industry types, number of companies per industry, their data analytics employee base and the extent of coverage of Machine Learning. 

- To understand the use of data analytics tools within each demographic. This includes the programming, database, IDEs, hardware, data visualization, business intelligence and cloud computing tools.

- To understand the activities performed by data analysts in different organizations through an analysis if their day to dat activities

- To understand the use and adoption of Machine learning and cloud computing in each of the demographics

This report will help understand the above items with respect to the African continent only. 

Data science will paly a very important role in the continent's future in terms of transforming people's lives, creating new solutions for problems currently faced and improving processes and procedures.


<h3> <strong> How was Africa represented in the 2021 Survey? </strong> </h3>


In [None]:
######################################code by Ariel Jumba###################################################################################
import warnings
warnings.simplefilter(action='ignore', category=FutureWarning)
warnings.filterwarnings('ignore')

import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd
import numpy as np
#########################################################################################################################
filepath = "../input/kaggle-survey-2021/kaggle_survey_2021_responses.csv"
df = pd.read_csv(filepath,error_bad_lines= False)
df = pd.DataFrame(df)
#########################################################################################################################
afro_countries = ['Nigeria','Egypt','South Africa','Algeria','Tunisia','Morocco','Kenya','Uganda','Ghana','Ethiopia']
df['Continent'] = np.where(df['Q3'].isin(afro_countries), 'Africa','Other')
df_africa = df[df['Continent'] == 'Africa']

######################################code by Ariel Jumba###################################################################################

yes = ['We have well established ML methods (i.e., models in production for more than 2 years)',
       'We use ML methods for generating insights (but do not put working models into production)',
       'We recently started using ML methods (i.e., models in production for less than 2 years)']
df_africa['ML_Use'] = np.where(df_africa['Q23'].isin(yes),'Yes','No')

######################################code by Ariel Jumba###################################################################################

df_africa['Q2'] = np.where(df_africa['Q2'] =='Man','Male','Female')

######################################code by Ariel Jumba###################################################################################

country = ['India','Indonesia','Pakistan','Mexico','Russia','Turkey','Australia','Nigeria','Greece','Belgium','Japan','Egypt','Singapore','Brazil','Poland','China',
           'Iran, Islamic Republic of...','United States of America','Italy','Viet Nam','Israel','Peru','South Africa','Other','Spain','Bangladesh','United Kingdom of Great Britain and Northern Ireland','France','Switzerland',
           'Algeria','Tunisia','Argentina','Sweden','Colombia','I do not wish to disclose my location','Canada','Chile','Netherlands','Ukraine','Saudi Arabia',
           'Romania','Morocco','Austria','Taiwan','Kenya','Belarus','Ireland','Portugal','Hong Kong (S.A.R.)','Denmark','Germany','South Korea','Philippines',
           'Sri Lanka','United Arab Emirates','Uganda','Ghana','Malaysia','Thailand','Nepal','Kazakhstan','Ethiopia','Iraq','Ecuador','Norway','Czech Republic']

continent = ['Asia','Asia','Asia','North America','Europe','Europe','Oceania','Africa','Europe','Europe','Asia','Africa','Asia','South America','Europe',
             'Asia','Asia','North America','Europe','Asia','Asia','South America','Africa','Other','Europe','Asia','Europe','Europe','Europe','Africa','Africa',
             'South America','Europe','South America','Other','North America','South America','Europe','Europe','Asia','Europe','Africa',
             'Europe','Asia','Africa','Europe','Europe','Europe','Asia','Europe','Europe','Asia','Asia','Asia','Asia','Africa','Africa','Asia','Asia',
             'Asia','Europe','Africa','Asia','South America','Europe','Europe']

countries = pd.DataFrame(country)
continents = pd.DataFrame(continent)

countries_df = {'country':country,'continent': continent}
countries_df = pd.DataFrame(countries_df)

######################################code by Ariel Jumba###################################################################################

In [None]:


###continents ##############################################################################
x = pd.merge(countries_df,df, left_on='country', right_on='Q3',how='left')
x=x.groupby(['continent']).agg({'continent':['count']})
x.reset_index(inplace=True)
x.columns = ['x','count']
x=x.sort_values(by='count', ascending=False)
x['percentage'] = x['count']/x['count'].sum() 


#############################################################################################################################################
from matplotlib.pyplot import GridSpec
sns.set_style('dark')
#sns.set(rc={'axes.facecolor':'cornflowerblue', 'figure.facecolor':'PowderBlue'})
sns.set_context('notebook', font_scale=1.7)
fig, axes = plt.subplots(figsize = (30,20))
gs = GridSpec(nrows=1,ncols=1)

for ax in fig.get_axes():
    ax.tick_params(bottom=False, labelbottom=False, left=False, labelleft=False)

data=x['percentage']
labels=x['x'] 
ax1 = fig.add_subplot(gs[0,0:])
ax1 = plt.pie(data,labels=labels,autopct="%.0f%%",explode=[0.1,0,0,0,0,0,0],shadow = True,startangle=45,textprops={'fontsize': 34})
#inner circle
center_circle = plt.Circle((0,0),0.30,fc='white')
fig=plt.gcf()
fig.gca().add_artist(center_circle)
plt.tight_layout()
plt.show()


<h3> <strong> Demographic Spread of African Participants in the 2021 Survey </strong> </h3>



- From the above chart, African data analytics specialists represent on 8% of the population under study and ranks 4th in a descending hierarchy after North America. This is promising and an indicator that Africa is not being left behind in the big data revolution.

- We have continued to see more companies appreciating the role of data and analytics across the continent. The most successful companies are those that have invested heavily in data and analytics. 

- Big data and analytics hubs are being opened across the continent to help bring up a new generation of data specialists.

- However, we still face challenges in terms of inadequate infrastructure and investment to support data initiatives in the continent hence still lagging behind.

In [None]:
###Set demographic parameters ###############################################################################################################
df_africa_demographics = df_africa[['Q1','Q2','Q3','Q4']]
Q1 = df_africa_demographics.groupby('Q1').count()

###Age groups ##############################################################################
Q1 = df_africa_demographics.groupby(['Q1']).agg({'Q1':['count']})
Q1.reset_index( inplace=True)
Q1.columns = ['Q1','count']
Q1['percentage'] = Q1['count']/Q1['count'].sum() 

###Gender ##################################################################################
Q2 = df_africa_demographics.groupby(['Q2','Q3']).agg({'Q2':['count']})
Q2.reset_index( inplace=True)
Q2.columns = ['Q2','Q3','count']
Q2 = Q2.sort_values(by=['count'], ascending=False)
Q2['percentage'] = Q2['count']/Q2['count'].sum() 
Q2_male = Q2[Q2['Q2'] == 'Male']
Q2_male = Q2_male[['Q3','count']]
Q2_male.columns = ['country','male_count']
Q2_female = Q2[Q2['Q2'] == 'Female']
Q2_female = Q2_female[['Q3','count']]
Q2_female.columns = ['country','female_count']
Q2=pd.merge(Q2_male,Q2_female,left_on='country',right_on='country', how='right')
Q2['total'] = Q2['male_count']+Q2['female_count']
Q2 = Q2.sort_values(by=['total'], ascending=False)

####Countries ###############################################################################
Q3 = df_africa_demographics.groupby(['Q3']).agg({'Q3':['count']})
Q3.reset_index( inplace=True)
Q3.columns = ['Q3','count']
Q3 = Q3.sort_values(by=['count'], ascending=False)
Q3['percentage'] = Q3['count']/Q3['count'].sum() 

####Education level #########################################################################
Q4 = df_africa_demographics.groupby(['Q4']).agg({'Q4':['count']})
Q4.reset_index( inplace=True)
Q4.columns = ['Q4','count']
Q4 = Q4.sort_values(by=['count'], ascending=False)
Q4['percentage'] = Q4['count']/Q4['count'].sum() 
#################################################################################################################################

In [None]:
#country#######################################################################################################
import numpy as np
import pandas as pd
import plotly as py
from plotly.offline import download_plotlyjs,init_notebook_mode,plot,iplot
import plotly.graph_objs as go



data = dict(type ='choropleth',
            locations = list(Q3['Q3']),
            locationmode='country names',
            z = list(Q3['count']),
            colorscale='OrRd')
layout = dict(geo={'scope':'africa'},margin={"r":0,"t":0,"l":0,"b":0},xaxis=dict(fixedrange=True),yaxis=dict(fixedrange=True))


chloromap = go.Figure(data=[data],layout=layout)

chloromap.show()

#########################################################################################################
sns.set_style("dark")
sns.set_context("notebook", font_scale=2.0, rc={"lines.linewidth": 1.0})
fig, axes = plt.subplots(figsize = (30,10),sharey=True)
gs=GridSpec(nrows=1,ncols=2)
plt.suptitle('Respondents by Gender',fontsize=22,color='Indigo')
for ax in fig.get_axes():
    ax.tick_params(bottom=False,labelbottom=False,left=False,labelleft=False)

#Gender  #####################################################################################################
ax1 = fig.add_subplot(gs[0,0])
ax1 = sns.barplot(x = Q2['male_count'], y = Q2['country'],color='DarkOrange')
ax1.set_title('Male',color='Indigo')
ax1.set(xlabel = None)
ax1.set(ylabel = None)
ax1.tick_params(bottom=False,labelbottom=False,left=False,labelleft=False)
labels_male = Q2['male_count'].values
labels_male =['{:,.0f}'.format(x) for x in labels_male] 
ax1.bar_label(container=ax1.containers[0],labels=labels_male,color='Black')
ax1.patch.set_facecolor('PowderBlue')
ax1.invert_xaxis()

ax2 = fig.add_subplot(gs[0,1])
ax2 = sns.barplot(x = Q2['female_count'], y = Q2['country'],color='MediumSeaGreen')
ax2.set_title('Female',color='Indigo')
ax2.set(xlabel = None)
ax2.set(ylabel = None)
ax2.tick_params(bottom=False,labelbottom=False,left=False)
labels_female = Q2['female_count'].values
labels_female =['{:,.0f}'.format(x) for x in labels_female] 
ax2.bar_label(container=ax2.containers[0],labels=labels_female,color='Black')
ax2.patch.set_facecolor('PowderBlue')


plt.show()
####################################################################################################################
fig2, axes = plt.subplots(figsize = (30,10))
gs=GridSpec(nrows=1,ncols=2)
for ax in fig2.get_axes():
    ax.tick_params(bottom=False,labelbottom=False,left=False,labelleft=False)

#Age groups#####################################################################################################

ax3=fig2.add_subplot(gs[0,0])
ax3=sns.barplot(y = Q1['count'], x = Q1['Q1'],color='DarkOrange')
ax3.set_title('Respondents by Age Group',color='Indigo')
ax3.set(xlabel = None)
ax3.set(ylabel = None)
ax3.tick_params(bottom=False,labelbottom=True,left=False,labelleft=False)
labels_age= Q1['percentage'].values
labels_age =['{:,.0%}'.format(x) for x in labels_age] 
ax3.bar_label(container=ax3.containers[0],labels=labels_age,color='DarkBlue')
ax3.patch.set_facecolor('PowderBlue')

#Education Level##################################################################################################
ax4=fig2.add_subplot(gs[0,1])
ax4=sns.barplot(y = Q4['count'], x = Q4['Q4'],color='MediumSeaGreen')
ax4.set_title('Respondents by Education level',color='Indigo')
ax4.set(xlabel = None)
ax4.set(ylabel = None)
ax4.tick_params(bottom=False,labelbottom=True,left=False,labelleft=False)
labels_age= Q4['percentage'].values
labels_age =['{:,.0%}'.format(x) for x in labels_age] 
ax4.bar_label(container=ax4.containers[0],labels=labels_age,color='DarkBlue')
ax4.patch.set_facecolor('PowderBlue')
ax4.tick_params(axis='x',rotation=90)


plt.show()


 
- The data and analytics field is not new, however, its importance has been largely felt in the most recent years as businesses and their processes become more complex and competition grows. <br>
    
    We have witnessed new analytics methods such as Deep Learning become more popular in the recent days. <br>
    
- We have very few schools offering the courses due to the limited number of professionals. This is why the majority of professionals fall within the "youth" population of between 18-35 years with majority only studied up to master level. 

- Nigeria has the highest number of data analytics professionals in Africa followed by Egypt then Kenya.

- Female representation stands at 20% compared to their male counterparts who form 79% of the study group. However, there are many initiatives across the continent such as the Nairobi Women in Data Science and Machine Learning to help bring in more women into the field.


<div class="alert alert-block alert-warning">
    <h2> <strong> Industry level Adoption of Data Analytics </strong> 
    </h2>
</div>  


* Almost every industry in one way or the other has adopted data analytics in their operations.

* The business world is now driven by data. Data is key as it helps the business understand the direction the business is moving as well as help them make better decisions.

In [None]:
## Prof per industr #################################################################################################################################################
Q20 = df_africa[['ML_Use','Q20']]
Q20 = df_africa.groupby(['Q20','ML_Use']).agg({'Q20':['count']})
Q20.reset_index( inplace=True)
Q20.columns = ['Q20','ML_Use','count']
Q20 = Q20.sort_values(by=['count'], ascending=False)
Q20['percentage'] = Q20['count']/Q20['count'].sum() 
Q20 = Q20.sort_values(by=['percentage'], ascending=False)
Q20=Q20[Q20['Q20'] !='Other']

Q20_ML = Q20[Q20['ML_Use'] =='Yes']
Q20_no = Q20[Q20['ML_Use']=='No']
Q20 = pd.merge(Q20_ML,Q20_no, left_on='Q20',right_on='Q20', how='left')      
Q20=Q20[['Q20','count_x','count_y']]
Q20.columns = ['industry','Adopted ML','Not Using ML']
Q20['percentage_ML'] = Q20['Adopted ML']/Q20['Adopted ML'].sum() 
Q20['percentage_non_ML'] = Q20['Not Using ML']/Q20['Not Using ML'].sum() 
Q20['Total'] = Q20['Adopted ML'] + Q20['Not Using ML']
Q20 = Q20.sort_values(by=['Total'], ascending=False)
Q20 = Q20.head(20)

##########################################################################################################################################
sns.set_style("dark")
sns.set_context("notebook", font_scale=2.0, rc={"lines.linewidth": 1.0})
fig3, axes = plt.subplots(figsize = (30,12),sharey=True)
gs=GridSpec(nrows=1,ncols=2,wspace=0.7)
for ax in fig3.get_axes():
    ax.tick_params(bottom=False,labelbottom=False,left=False,labelleft=False)

#Analytics Professionals  #####################################################################################################

ax1 = fig3.add_subplot(gs[0,0])
ax1 = sns.barplot(x = Q20['Adopted ML'], y = Q20['industry'],color='DarkOrange')
ax1.set_title('Adopted Machine Learning',color='Black')
ax1.set(xlabel = None)
ax1.set(ylabel = None)
ax1.tick_params(bottom=False,labelbottom=False,left=False,labelleft=False)
labels_ml = Q20['percentage_ML'].values
labels_ml =['{:,.0%}'.format(x) for x in labels_ml] 
ax1.bar_label(container=ax1.containers[0],labels=labels_ml,color='Black')
ax1.patch.set_facecolor('PowderBlue')
ax1.invert_xaxis()

ax2 = fig3.add_subplot(gs[0,1])
ax2 = sns.barplot(x = Q20['Not Using ML'], y = Q20['industry'],color='MediumSeaGreen')
ax2.set_title('Do not apply Machine Learning',color='Black')
ax2.set(xlabel = None)
ax2.set(ylabel = None)
ax2.tick_params(bottom=False,labelbottom=False,left=False)
labels_noml = Q20['percentage_non_ML'].values
labels_noml =['{:,.0%}'.format(x) for x in labels_noml] 
ax2.bar_label(container=ax2.containers[0],labels=labels_noml,color='Black')
ax2.patch.set_facecolor('PowderBlue')
ax2.tick_params(axis='x',rotation=90)


plt. show()

<div class="alert alert-block alert-warning">
    <h2> <strong> What are the Most Popular Data Analytics Tools Used by African Data Analytics Professionals? </strong> 
    </h2>
</div>  

In [None]:
###cloud######################################################################################################################################################################################
df_cloud = df_africa[['Q27_A_Part_1','Q27_A_Part_2','Q27_A_Part_3','Q27_A_Part_4','Q27_A_Part_5','Q27_A_Part_6','Q27_A_Part_7','Q27_A_Part_8','Q27_A_Part_9','Q27_A_Part_10']]
df_cloud.columns = [['Amazon Web Services','Microsoft Azure','Google Cloud Platform','IBM Cloud','Oracle Cloud','SAP Cloud','Salesforce Cloud','VMware Cloud','Alibaba Cloud ','Tencent Cloud']]
df_cloud = df_cloud.T
df_cloud = df_cloud.notnull().astype('int')
df_cloud['users'] = df_cloud.sum(axis=1,numeric_only=True)
df_cloud = pd.DataFrame(df_cloud['users']).reset_index()
df_cloud.columns=['Platform','users']
df_cloud['percentage'] = df_cloud['users']/df_cloud['users'].sum()
df_cloud = df_cloud.sort_values(by=['users'], ascending=False)


###programming languages######################################################################################################################################################################################

df_prog = df_africa[['Q7_Part_1','Q7_Part_2','Q7_Part_3','Q7_Part_4','Q7_Part_5','Q7_Part_6','Q7_Part_7','Q7_Part_8','Q7_Part_9','Q7_Part_10','Q7_Part_11']]
df_prog.columns = [['Python','R','SQL','C','C++','Java','Javascript','Julia','Swift','Bash','MATLAB']]
df_prog = df_prog.T
df_prog = df_prog.notnull().astype('int')
df_prog['users'] = df_prog.sum(axis=1,numeric_only=True)
df_prog = pd.DataFrame(df_prog['users']).reset_index()
df_prog.columns=['Platform','users']
df_prog['percentage'] = df_prog['users']/df_prog['users'].sum()
df_prog= df_prog.sort_values(by=['users'], ascending=False)


###Integrated Development Environments######################################################################################################################################################################################
df_IDE = df_africa[['Q9_Part_2','Q9_Part_3','Q9_Part_4','Q9_Part_5','Q9_Part_6','Q9_Part_7','Q9_Part_8','Q9_Part_9','Q9_Part_10','Q9_Part_11']]
df_IDE.columns = [['RStudio','Visual Studio','Visual Studio Code','PyCharm ','Spyder','Notepad++','Sublime Text','Vim / Emacs','MATLAB','Jupyter Notebook']]
df_IDE = df_IDE.T
df_IDE = df_IDE.notnull().astype('int')
df_IDE['users'] = df_IDE.sum(axis=1,numeric_only=True)
df_IDE = pd.DataFrame(df_IDE['users']).reset_index()
df_IDE.columns=['Platform','users']
df_IDE['percentage'] = df_IDE['users']/df_IDE['users'].sum()
df_IDE= df_IDE.sort_values(by=['users'], ascending=False)



###DataBase Tools######################################################################################################################################################################################
df_DB = df_africa[['Q32_A_Part_1','Q32_A_Part_2','Q32_A_Part_3','Q32_A_Part_4','Q32_A_Part_5','Q32_A_Part_6','Q32_A_Part_7','Q32_A_Part_8','Q32_A_Part_9','Q32_A_Part_10','Q32_A_Part_11','Q32_A_Part_12',
                    'Q32_A_Part_13','Q32_A_Part_14','Q32_A_Part_15','Q32_A_Part_16','Q32_A_Part_17','Q32_A_Part_18','Q32_A_Part_19']]
df_DB.columns = [['MySQL ','PostgreSQL','SQLite','Oracle Database','MongoDB','Snowflake','IBM Db2','Microsoft SQL Server','Microsoft Azure SQL Database','Microsoft Azure Cosmos DB','Amazon Redshift','Amazon Aurora',
                   'Amazon RDS','Amazon DynamoDB','Google Cloud BigQuery','Google Cloud SQL','Google Cloud Firestore','Google Cloud BigTable','Google Cloud Spanner']]
df_DB = df_DB.T
df_DB = df_DB.notnull().astype('int')
df_DB['users'] = df_DB.sum(axis=1,numeric_only=True)
df_DB = pd.DataFrame(df_DB['users']).reset_index()
df_DB.columns=['Platform','users']
df_DB['percentage'] = df_DB['users']/df_DB['users'].sum()
df_DB= df_DB.sort_values(by=['users'], ascending=False)


###BI Tools######################################################################################################################################################################################
df_BI = df_africa[['Q34_A_Part_1','Q34_A_Part_2','Q34_A_Part_3','Q34_A_Part_4','Q34_A_Part_5','Q34_A_Part_6','Q34_A_Part_7','Q34_A_Part_8','Q34_A_Part_9','Q34_A_Part_10','Q34_A_Part_11','Q34_A_Part_12',
                   'Q34_A_Part_13','Q34_A_Part_14','Q34_A_Part_15']]
df_BI.columns = [['Amazon QuickSight','Microsoft Power BI','Google Data Studio','Looker','Tableau','Salesforce','Tableau CRM','Qlik','Domo',
                  'TIBCO Spotfire','Alteryx','Sisense','SAP Analytics Cloud','Microsoft Azure Synapse','Thoughtspot']]
df_BI = df_BI.T
df_BI = df_BI.notnull().astype('int')
df_BI['users'] = df_BI.sum(axis=1,numeric_only=True)
df_BI = pd.DataFrame(df_BI['users']).reset_index()
df_BI.columns=['Platform','users']
df_BI['percentage'] = df_BI['users']/df_BI['users'].sum()
df_BI= df_BI.sort_values(by=['users'], ascending=False)


###Visualization Tools######################################################################################################################################################################################
df_viz = df_africa[['Q14_Part_1','Q14_Part_2','Q14_Part_3','Q14_Part_4','Q14_Part_5','Q14_Part_6','Q14_Part_7','Q14_Part_8','Q14_Part_9','Q14_Part_10']]
df_viz.columns = [['Matplotlib','Seaborn','Plotly','Ggplot','Shiny','D3 js','Altair','Bokeh','Geoplotlib','Leaflet / Folium']]
df_viz = df_viz.T
df_viz = df_viz.notnull().astype('int')
df_viz['users'] = df_viz.sum(axis=1,numeric_only=True)
df_viz = pd.DataFrame(df_viz['users']).reset_index()
df_viz.columns=['Platform','users']
df_viz['percentage'] = df_viz['users']/df_viz['users'].sum()
df_viz= df_viz.sort_values(by=['users'], ascending=False)


###ML Frameworks######################################################################################################################################################################################
df_mlF = df_africa[['Q16_Part_1','Q16_Part_2','Q16_Part_3','Q16_Part_4','Q16_Part_5','Q16_Part_6','Q16_Part_7','Q16_Part_8','Q16_Part_9','Q16_Part_10','Q16_Part_11','Q16_Part_12','Q16_Part_13',
                    'Q16_Part_14','Q16_Part_15','Q16_Part_16']]
df_mlF.columns = [['Scikit','TensorFlow','Keras','PyTorch','Fast_ai','MXNet','Xgboost','LightGBM','CatBoost','Prophet','H2O 3 ','Caret ','Tidymodels','JAX','PyTorch Lightning','Huggingface']]
df_mlF = df_mlF.T
df_mlF = df_mlF.notnull().astype('int')
df_mlF['users'] = df_mlF.sum(axis=1,numeric_only=True)
df_mlF = pd.DataFrame(df_mlF['users']).reset_index()
df_mlF.columns=['Platform','users']
df_mlF['percentage'] = df_mlF['users']/df_mlF['users'].sum()
df_mlF= df_mlF.sort_values(by=['users'], ascending=False)



<h3> <strong> Which are the Most Popular Cloud Computing Platforms in Africa? </strong> </h3>


* Google, Microsoft and Amazon currently dominate the world of cloud computing in Africa with their respective products. However, Amazon Web Services provides a deeper range of products compared to the other two.

* Cloud platforms provide a range of service and functionalities on demand without the need for additional hardware or software. Some of the functionalities include: AI, computing, storage, networking, security, automation as well as security and compliance services.

In [None]:
sns.set_style("dark")
sns.set_context("notebook", font_scale=1.7, rc={"lines.linewidth": 1.0})
fig, axes = plt.subplots(figsize = (20,10))
gs=GridSpec(nrows=1,ncols=2)
for ax in fig.get_axes():
    ax.tick_params(bottom=False,labelbottom=False,left=False,labelleft=False)

#Cloud#####################################################################################################

ax1 = fig.add_subplot(gs[0,0:])
ax1 = sns.barplot(x = df_cloud['users'], y = df_cloud['Platform'],color='DarkOrange')
ax1.set(xlabel = None)
ax1.set(ylabel = None)
ax1.tick_params(bottom=False,labelbottom=False,left=False,labelleft=True)
labels_cloud = df_cloud['percentage'].values
labels_cloud =['{:,.0%}'.format(x) for x in labels_cloud] 
ax1.bar_label(container=ax1.containers[0],labels=labels_cloud,color='Black')
ax1.patch.set_facecolor('LightGray')


<h3> <strong> Which are the Most Preffered Programming Languages used by Professionals in Africa? </strong> </h3>



* Python is the most preferred language due to its applicability. Python is a general purpose language hence can be used in almost every field be it statistics, software development or engineering.

* It is used by many big organizations such as [Google](https://finance.yahoo.com/quote/GOOG), [Facebook](https://finance.yahoo.com/quote/FB/) etc.

* python is also considered as very easy to learn. This is according to a survey done by [Stack Overflow](https://stackoverflow.com) in 2020


In [None]:
sns.set_style("dark")
sns.set_context("notebook", font_scale=1.7, rc={"lines.linewidth": 1.0})
fig, axes = plt.subplots(figsize = (20,10))
gs=GridSpec(nrows=1,ncols=2)
for ax in fig.get_axes():
    ax.tick_params(bottom=False,labelbottom=False,left=False,labelleft=False)
###prog#####################################################################################################

ax2 = fig.add_subplot(gs[0,0:])
ax2 = sns.barplot(x = df_prog['users'], y = df_prog['Platform'],color='MediumSeaGreen')
ax2.set(xlabel = None)
ax2.set(ylabel = None)
ax2.tick_params(bottom=False,labelbottom=False,left=False,labelleft=True)
labels_prog = df_prog['percentage'].values
labels_prog =['{:,.0%}'.format(x) for x in labels_prog] 
ax2.bar_label(container=ax2.containers[0],labels=labels_prog,color='Black')
ax2.patch.set_facecolor('LightGray')


<h3> <strong> Which are the Most Popular DataBase Manipulation Platforms used by Professionals in Africa? </strong> </h3>


[MySQL](https://www.mysql.com/) is the most popular relational database system as it is a secure opensource system. This makes it more preferable by database management beginners.

In [None]:
sns.set_style("dark")
sns.set_context("notebook", font_scale=1.7, rc={"lines.linewidth": 1.0})
fig, axes = plt.subplots(figsize = (20,10))
gs=GridSpec(nrows=1,ncols=2)
for ax in fig.get_axes():
    ax.tick_params(bottom=False,labelbottom=False,left=False,labelleft=False)



###DB#####################################################################################################

ax3 = fig.add_subplot(gs[0,0:])
ax3 = sns.barplot(x = df_DB['users'], y = df_DB['Platform'],color='DarkOrange')
ax3.set(xlabel = None)
ax3.set(ylabel = None)
ax3.tick_params(bottom=False,labelbottom=False,left=False,labelleft=True)
labels_DB = df_DB['percentage'].values
labels_DB =['{:,.0%}'.format(x) for x in labels_DB] 
ax3.bar_label(container=ax3.containers[0],labels=labels_DB,color='Black')
ax3.patch.set_facecolor('LightGray')

<h3> <strong> Which are the Most Popular IDEs used by Data Analytics Professionals in Africa? </strong> </h3>



* [Jupyter notebooks](https://jupyter.org/install) are most preferred by junior analysts due to their user-friendly interface. They are equally useful when making presentations.
    
* They are, however, not useful when running fully fledged codes.

In [None]:
sns.set_style("dark")
sns.set_context("notebook", font_scale=1.7, rc={"lines.linewidth": 1.0})
fig, axes = plt.subplots(figsize = (20,10))
gs=GridSpec(nrows=1,ncols=2)
for ax in fig.get_axes():
    ax.tick_params(bottom=False,labelbottom=False,left=False,labelleft=False)

###IDE#####################################################################################################

ax4 = fig.add_subplot(gs[0,0:])
ax4 = sns.barplot(x = df_IDE['users'], y = df_IDE['Platform'],color='MediumSeaGreen')
ax4.set(xlabel = None)
ax4.set(ylabel = None)
ax4.tick_params(bottom=False,labelbottom=False,left=False,labelleft=True)
labels_IDE = df_IDE['percentage'].values
labels_IDE =['{:,.0%}'.format(x) for x in labels_IDE] 
ax4.bar_label(container=ax4.containers[0],labels=labels_IDE,color='Black')
ax4.patch.set_facecolor('LightGray')

<h3> <strong> Which are the Most Popular BI Tools used by Professionals in Africa? </strong> </h3>


In [None]:
sns.set_style("dark")
sns.set_context("notebook", font_scale=1.7, rc={"lines.linewidth": 1.0})
fig, axes = plt.subplots(figsize = (20,10))
gs=GridSpec(nrows=1,ncols=2)
for ax in fig.get_axes():
    ax.tick_params(bottom=False,labelbottom=False,left=False,labelleft=False)

###BI#####################################################################################################

ax5 = fig.add_subplot(gs[0,0:])
ax5 = sns.barplot(x = df_BI['users'], y = df_BI['Platform'],color='DarkOrange')
ax5.set(xlabel = None)
ax5.set(ylabel = None)
ax5.tick_params(bottom=False,labelbottom=False,left=False,labelleft=True)
labels_BI = df_BI['percentage'].values
labels_BI =['{:,.0%}'.format(x) for x in labels_BI] 
ax5.bar_label(container=ax5.containers[0],labels=labels_BI,color='Black')
ax5.patch.set_facecolor('LightGray')


<h3> <strong> Which are the Most Popular Data Visualization Tools used by Professionals in Africa? </strong> </h3>


In [None]:
sns.set_style("dark")
sns.set_context("notebook", font_scale=1.7, rc={"lines.linewidth": 1.0})
fig, axes = plt.subplots(figsize = (20,10))
gs=GridSpec(nrows=1,ncols=2)
for ax in fig.get_axes():
    ax.tick_params(bottom=False,labelbottom=False,left=False,labelleft=False)


###viz#####################################################################################################

ax6 = fig.add_subplot(gs[0,0:])
ax6 = sns.barplot(x = df_viz['users'], y = df_viz['Platform'],color='MediumSeaGreen')
ax6.set(xlabel = None)
ax6.set(ylabel = None)
ax6.tick_params(bottom=False,labelbottom=False,left=False,labelleft=True)
labels_viz = df_viz['percentage'].values
labels_viz =['{:,.0%}'.format(x) for x in labels_viz] 
ax6.bar_label(container=ax6.containers[0],labels=labels_viz,color='Black')
ax6.patch.set_facecolor('LightGray')

<h2> **TO BE CONTINUED***************************</h2>