## **This is a Notebook Exploring Data on Nigerian Data Scientists**
> Author: [Alao David I.](https://github.com/invest41/)

[<img src="https://user-images.githubusercontent.com/70070334/142493938-e8a3b455-3893-47ef-99ff-590dcfc031c8.jpeg"/> <br/> <quote>  *Source: African Economic Merit Awards* <quote/>](http://www.africaneconomicmeritawards.org/)

### **Introduction:** 
Nigeria is the most populous country in Africa, and 7th in the World, just behind Brazil [which is 9 times bigger in landmass](https://www.mylifeelsewhere.com/country-size-comparison/brazil/nigeria).   <br/><br/>
Around [2.64% of the World's Population is Nigerian](https://www.worldometers.info/world-population/nigeria-population/) and **2.7% of Kaggle Data Scientists are Nigerian**, with **the country's youthful population even more than the entire population of the United Kingdom of Great Britain and Northern Ireland.**
 <br/><br/>
*The frontline is ever expanding in the technological sphere, [Data is the New Oil](https://www.forbes.com/sites/forbestechcouncil/2019/11/15/data-is-the-new-oil-and-thats-a-good-thing/?sh=5c25aa787304), and [technological advancements have been proven to be a major driver of economic development](https://www.cio.com/article/3152568/the-growing-importance-of-the-technology-economy.html),* hence, this is **definitely a great time for Nigerians to get in and innovate.** 

### **Aim:**  
*The Aim of this Analytic endeavour is to highlight certain facts and figures about Data Scientists in Nigeria, as well as, making certain predictions on what the future holds...*

### **Setting up the environment**

In [None]:
import numpy as np, pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
from sklearn.linear_model import LinearRegression

from fbprophet import Prophet
import os, sys
import warnings
warnings.simplefilter('ignore')

In [None]:
path = '../input/kaggle-survey-2021/'
print('Datasets')
os.listdir(path)

### **Accessing the Dataset and Data Pre-processing**

In [None]:
raw_df = pd.read_csv(path + 'kaggle_survey_2021_responses.csv')
raw_df.head()

In [None]:
#Map columns to actual question asked
print('Map columns to actual question asked', end ='\n\n')
cols = pd.DataFrame(raw_df.columns)
raw_df.iloc[:1]

In [None]:
df = raw_df.drop(0, axis = 0)

In [None]:
mask = df['Q3'] == 'Nigeria'       #df['Q3'] == 'In which country do you currently reside?'
NGA = df[mask]
print('Nigeria-specific Dataset - NGA\n')
NGA.head(1)

In [None]:
print ('Columns with missing values - col_mis')
mis = NGA.isnull().sum()
col_mis = pd.DataFrame()

col_mis['Questions Asked'] = raw_df[mis[mis>0].index].iloc[0].values
col_mis['Number of Missing Values'] = mis[mis>0].values
col_mis.index = mis[mis>0].index
col_mis.T

In [None]:
print ('Decriptive Statistics\n')
NGA.describe()

### **Define Necessary Visualization Functions**

In [None]:
def label_plot(title = 'Distribution Plot', x = 'X-axis', y = 'Y-axis'):
    plt.title(title, weight = 'bold', fontsize = 17)
    plt.ylabel(y, weight = 'bold', fontsize = 15)
    plt.xlabel(x, weight = 'bold', fontsize = 15)
    

    
def plot_bar(dataset, column = None, horizontal = False, figsize = (20,10), color = 'darkcyan'):
    if column == None: 
        if horizontal: 
            dataset.value_counts().plot.barh(figsize = figsize, color = color)
            plt.gca().invert_yaxis()
        else:
            return dataset.value_counts().plot.bar(figsize = figsize, color = color)
    else:
        if horizontal: 
            dataset[column].value_counts().plot.barh(figsize = figsize, color = color)
            plt.gca().invert_yaxis()
        else: 
            return dataset[column].value_counts().plot.barh(figsize = figsize, color = color)

        

        
def plot_donut(dataset, column = None, title = 'Donut Plot', figsize = (20,10), shadow = False, color = False, explode = False, background = 'white'):
    
    plt.figure(figsize = figsize)
    
    if column == None: data = dataset.value_counts()    
    else: data = dataset[column].value_counts()
    labs = data.values
    
    
    plt.title(title, weight = 'bold', fontsize = 30)
    
    
    
    if not color: color = ['darkcyan', 'green', 'darkorange', 'black', 'darkred', 'darkmagenta', 'gold']
    if not explode: 
        try: explode = [0] + (list(np.linspace(0.05,0.1, len(labs) - 2))[::-1]) + [0]
        except: explode = [0] * len(labs)
                
            
    data.plot.pie(
        label = '',
        colors = color,
        shadow = shadow,
        explode = explode,
        autopct = lambda x: f'{round(x, 2)}%')
    
    circle = plt.Circle( (0,0), 0.8, color='white')
    plt.gcf().gca().add_artist(circle)
    

    

    

def annot_bar(plots):
    for bar in plots.patches:
        plots.annotate(format(bar.get_height(), '.0f'),
                 (bar.get_x() + bar.get_width() / 2,
                  bar.get_height()), ha='center', va='center',
                  size=15, xytext=(0, 8),
                  textcoords='offset points')

### **Exploratory Data Analysis**

In [None]:
numNGA, numRES = len(NGA), len(df)
perNGA = round((numNGA/numRES) *100, 2)
#print(f'{numNGA}, out of {numRES} respondents are Nigerians')
#print(f'{perNGA}% of total respondents are Nigerian')

In [None]:
#List of African Countries to filter from
africa = ['Nigeria',
 'Ethiopia',
 'Democratic Republic of the Congo',
 'Egypt',
 'South Africa',
 'Tanzania',
 'Kenya',
 'Uganda',
 'Algeria',
 'Sudan',
 'Morocco',
 'Angola',
 'Ghana',
 'Cameroon',
 'Madagascar',
 'Mozambique',
 'Ivory Coast',
 'Niger',
 'Mali',
 'Burkina Faso',
 'Malawi',
 'Chad',
 'Somalia',
 'Zimbabwe',
 'Zambia',
 'Senegal',
 'South Sudan',
 'Rwanda',
 'Guinea',
 'Benin',
 'Tunisia',
 'Burundi',
 'Sierra Leone',
 'Togo',
 'Libya',
 'Eritrea',
 'Republic of the Congo',
 'Liberia',
 'Central African Republic',
 'Mauritania',
 'Gambia',
 'Botswana',
 'Gabon',
 'Namibia',
 'Lesotho',
 'Guinea-Bissau',
 'Mauritius',
 'Equatorial Guinea',
 'Eswatini',
 'Djibouti',
 'Réunion',
 'Comoros',
 'Cape Verde',
 'Western Sahara',
 'Mayotte',
 'São Tomé and Príncipe',
 'Seychelles',
 'Saint Helena, Ascension and Tristan da Cunha']

In [None]:
uniq1 = df['Q3'].drop(df['Q3'][df['Q3'].str.contains('Other')].index, axis = 0)
uniq = uniq1.replace('United Kingdom of Great Britain and Northern Ireland', 'United Kingdom').value_counts()


african, others = {}, {}
for country, pop in zip(uniq.index, uniq.values):
    if country in africa:
        african.update({country:pop})
        if not country == 'Nigeria':
            others.update({country:pop})

nonNGA = sum(others.values())
#print('Number of Non-Nigerian African Data Scientists:', nonNGA)


In [None]:
fig = plt.figure(figsize=(5,2),facecolor='white')
ax = fig.add_subplot(1,1,1)

ax.text(1.0,1, "Key figures",color='black',fontsize=28, fontweight='bold', fontfamily='monospace',ha='center')

ax.text(0, 0.4, f"{numNGA}",color='darkcyan',fontsize=25, fontweight='bold', fontfamily='monospace',ha='center')
ax.text(0, 0.001, "out of 25973  \nrespondents are Nigerian",color='dimgrey',fontsize=17, fontweight='light', fontfamily='monospace',ha='center')

ax.text(1.0, 0.4, f"{perNGA}%",color='darkcyan',fontsize=25, fontweight='bold', fontfamily='monospace',ha='center')
ax.text(1.0, 0.001, "of global respondents \nare Nigerian",color='dimgrey',fontsize=17, fontweight='light', fontfamily='monospace',ha='center')

ax.text(2.0, 0.4, f"{sum(african.values())}",color='darkcyan',fontsize=25, fontweight='bold', fontfamily='monospace',ha='center')
ax.text(2.0, 0.001, "African \nData Scientists",color='dimgrey',fontsize=17, fontweight='light', fontfamily='monospace',ha='center')



ax.text(0.5, -0.4, f"{round((numNGA/sum(african.values()))*100, 1)}%",color='darkcyan',fontsize=25, fontweight='bold', fontfamily='monospace',ha='center')
ax.text(0.5, -0.75, "of African Data Scientists \nare Nigerian – A wide margin",color='dimgrey',fontsize=17, fontweight='light', fontfamily='monospace',ha='center')


ax.text(1.65, -0.4, f"{len(african)}",color='darkcyan',fontsize=25, fontweight='bold', fontfamily='monospace',ha='center')
ax.text(1.65, -0.75, "African \nCountries Represented",color='dimgrey',fontsize=17, fontweight='light', fontfamily='monospace',ha='center')




ax.set_yticklabels('')
ax.tick_params(axis='y',length=0)
ax.tick_params(axis='x',length=0)
ax.set_xticklabels('')

for direction in ['top','right','left','bottom']:
    ax.spines[direction].set_visible(False)

In [None]:
sns.set_theme()

In [None]:
color_lst = ['darkgray'] * 9
color_lst.insert(6, 'darkcyan') 

In [None]:
plt.title('Top 10 Countries with the highest number of Data Scientists', weight = 'bold', fontsize = 15)
plots = uniq[:10].plot.bar(figsize=(16,10), color = color_lst)
annot_bar(plots)

In [None]:
color_lst2 = ['darkgray'] * 9
color_lst2.insert(0, 'darkcyan') 

In [None]:
data = [[i[0], i[1]] for i in african.items()]

plots = pd.DataFrame(data, columns =['Country', 'Number of Respondents']).plot.bar(x = 'Country', y ='Number of Respondents', label = 'Highest Representation', figsize=(16,10), color = color_lst2)
plt.ylabel('Number of Respondents', weight = 'bold', fontsize = 13)
plt.xlabel('Country', weight = 'bold', fontsize = 13)
plt.title('African Countries with the Highest Representation in Data Science\n', weight = 'bold', fontsize = 15)
annot_bar(plots)

In [None]:
plt.figure(figsize = (20,10))
plt.title('Proportion of Global Respondent Data Scientists', weight = 'bold', fontsize = 15)
plt.ylabel('Number of Respondent', weight = 'bold', fontsize = 13)
plots = sns.barplot(x = ['Nigeria', 'Rest of Africa' ,'Rest of the World'], y = [numNGA, nonNGA, numRES - (numNGA + nonNGA)], palette = ['darkcyan', 'darkgrey', 'darkgrey'])
                   
annot_bar(plots)

<h4> &nbsp;  &nbsp;  &nbsp; <strong> Based on Recent Growth Pattern, Nigeria is projected to be Top 3 by 2032 <strong/> <h4/>
<img src='https://user-images.githubusercontent.com/70070334/143162578-9c1ac436-5a46-4e47-908c-d465518acafd.jpeg' align="middle"/>

In [None]:
class suppress_stdout_stderr(object):
    '''
    A context manager for doing a "deep suppression" of stdout and stderr in
    Python, i.e. will suppress all print, even if the print originates in a
    compiled C/Fortran sub-function.
       This will not suppress raised exceptions, since exceptions are printed
    to stderr just before a script exits, and after the context manager has
    exited (at least, I think that is why it lets exceptions through).

    '''
    def __init__(self):
        # Open a pair of null files
        self.null_fds = [os.open(os.devnull, os.O_RDWR) for x in range(2)]
        # Save the actual stdout (1) and stderr (2) file descriptors.
        self.save_fds = (os.dup(1), os.dup(2))

    def __enter__(self):
        # Assign the null pointers to stdout and stderr.
        os.dup2(self.null_fds[0], 1)
        os.dup2(self.null_fds[1], 2)

    def __exit__(self, *_):
        # Re-assign the real stdout/stderr back to (1) and (2)
        os.dup2(self.save_fds[0], 1)
        os.dup2(self.save_fds[1], 2)
        # Close the null files
        os.close(self.null_fds[0])
        os.close(self.null_fds[1])

In [None]:
Nigerians = {}

for date in [2017, 2018, 2019, 2020]:
    dpath = f'../input/kaggle-survey-{date}/'
    
    if date == 2019: MCQ = os.path.join(dpath, 'multiple_choice_responses.csv')
    elif date>2019: MCQ = os.path.join(dpath, 'kaggle_survey_2020_responses.csv')
    else: MCQ = os.path.join(dpath, 'multipleChoiceResponses.csv')
    

    if dpath.endswith('2019/'): MCQ = MCQ.lower()
        
    surv = pd.read_csv(MCQ, error_bad_lines=False, warn_bad_lines=True, encoding='latin-1')
    
    try: Nigerians.update({date : len(surv[surv.Q3 == 'Nigeria'])})
    except: Nigerians.update({date : len(surv[surv['Country'] == 'Nigeria'])})
        
        
        
    
Nigerians.update({2021 : numNGA})
begin, end = 2022, 2033


X_test = [[i] for i in range(begin, end, )]
#y_pred = LinearRegression().fit(np.asanyarray(list(Nigerians.keys())).reshape(-1,1), np.asanyarray(list(Nigerians.values()))).predict(X_test)

with suppress_stdout_stderr():
    y_pred = Prophet().fit(pd.DataFrame([Nigerians.keys(), Nigerians.values()],).T.rename( columns = {0 : 'ds', 1 : 'y'}), verbose = 1).predict(pd.DataFrame(X_test).rename(columns = {0 : 'ds'})).yhat                                                                                              



    
[Nigerians.update({k:v}) for k, v in zip(list(range(begin,end)), y_pred)]

years = pd.DataFrame(Nigerians, index = [0]).T
years.rename(columns = {0:'Number of Respondents'}, inplace=True)


plt.figure(figsize = (20,10))
plots = sns.barplot(x = years.index, y = years.values.reshape(1,-1)[0], palette = (['darkgray']*5 + ['darkcyan'] * (len(Nigerians)-6) + ['brown']))
plt.legend(['Darkgray - Recent Years', 'Darkcyan - Predicted Years before Top Three Entry', 'Brown - Year of Top Three Entry'])
label_plot('Number of Nigerian Respondents by Year', x = 'Years', y = 'Number of Respondents')
annot_bar(plots)

In [None]:
Japan = {}

for date in [2017, 2018, 2019, 2020, 2021]:
    dpath = f'../input/kaggle-survey-{date}/'
    
    if dpath.endswith('2019/'): MCQ = os.path.join(dpath, 'multiple_choice_responses.csv')
    elif dpath.endswith('2020/') or dpath.endswith('2021/'): MCQ = os.path.join(dpath, f'kaggle_survey_{date}_responses.csv')
    else: MCQ = os.path.join(dpath, 'multipleChoiceResponses.csv')
    

    if dpath.endswith('2019/'): MCQ = MCQ.lower()
        
    surv = pd.read_csv(MCQ, error_bad_lines=False, warn_bad_lines=True, encoding='latin-1')
    
    try: Japan.update({date : len(surv[surv.Q3 == 'Japan'])})
    except: Japan.update({date : len(surv[surv['Country'] == 'Japan'])})

        
        
        

        
        
        



X_test = [[i] for i in range(begin, end)]
#y_pred = LinearRegression().fit(np.asanyarray(list(Japan.keys())).reshape(-1,1), np.asanyarray(list(Japan.values()))).predict(X_test)

with suppress_stdout_stderr():
    y_pred = Prophet().fit(pd.DataFrame([Japan.keys(), Japan.values()],).T.rename( columns = {0 : 'ds', 1 : 'y'}), verbose = 1).predict(pd.DataFrame(X_test).rename(columns = {0 : 'ds'})).yhat                                                                                              



[Japan.update({k:v}) for k, v in zip(list(range(begin,end)), y_pred)]

jyears = pd.DataFrame(Japan, index = [0]).T
jyears.rename(columns = {0:'Number of Respondents'}, inplace=True)


plt.figure(figsize = (20,10))
plots = sns.barplot(x = jyears.index, y = jyears.values.reshape(1,-1)[0], palette = (['darkgray']*5 + ['darkcyan'] * (len(Japan)-6) + ['darkorange']))
plt.legend(['Darkgray - Recent Years', 'Darkcyan - Predicted Years before Top Three Departure', 'Brown - Year of Top Three Departure'])
label_plot('Number of Japanese Respondents by Year', x = 'Years', y = 'Number of Respondents')
annot_bar(plots)

In [None]:
plt.figure(figsize=(16,6))
plt.plot(years.index, years.values.reshape(1,-1)[0])
plt.plot(years.index, jyears.values.reshape(1,-1)[0])
label_plot('Expected Growth of Data Scientist Population between Nigeria and Japan', x = 'Years', y = 'Number of Respondents')
plt.legend(['NIGERIA','JAPAN (3rd highest Number of Respondents)'], loc = 7)

#### **Facts about Nigeria's Data Scientists**

In [None]:
plt.figure(figsize=(20, 10))
plt.title('Age Distribution', weight = 'bold', fontsize = 15)
plt.ylabel('Number of Respondents\n', weight = 'bold', fontsize = 13)
plt.xlabel('\nAge Bins (years)', weight = 'bold', fontsize = 13)
plots = NGA['Q1'].value_counts().plot.bar(color = 'darkcyan')
annot_bar(plots)

In [None]:
plot_donut(NGA, 'Q2', 'Gender Distribution')

In [None]:
plt.figure(figsize=(20, 10))
plt.title('Employment Status', weight = 'bold', fontsize = 15)
plt.ylabel('Number of Respondents', weight = 'bold', fontsize = 13)
plt.xlabel('Role', weight = 'bold', fontsize = 13)
plots = NGA['Q5'].value_counts().plot.bar(color = 'darkcyan')
annot_bar(plots)

In [None]:
plt.figure(figsize=(20, 10))
plt.title('Years of experience in coding', weight = 'bold', fontsize = 17)
plt.ylabel('Number of Respondents', weight = 'bold', fontsize = 15)
plt.xlabel('Duration (years)', weight = 'bold', fontsize = 15)
plots = NGA['Q6'].value_counts().plot.bar(color = 'darkcyan')
annot_bar(plots)

In [None]:

plt.title('Highest Educational Level', weight = 'bold', fontsize = 17)
plt.ylabel('Educational Level', weight = 'bold', fontsize = 15)
plt.xlabel('Number of Respondents', weight = 'bold', fontsize = 15)
NGA['Q4'].value_counts().plot.barh(figsize=(20, 10), color = 'darkcyan')
plt.gca().invert_yaxis()

In [None]:
Q18col = [f'Q18_Part_{i}' for i in range(1,7)] + ['Q18_OTHER'] 
Q18 = NGA[Q18col]
Q18agg = Q18[~ Q18.isna()].stack()



plot_donut(Q18agg, title = "Most Utilised Computer Vision Method\n")

In [None]:
Q19col = [f'Q19_Part_{i}' for i in range(1,6)] + ['Q19_OTHER'] 
Q19 = NGA[Q19col]
Q19agg = Q19[~ Q19.isna()].stack()



plot_donut(Q19agg, title = "Most Utilised Natural Language Processing (NLP) Methods\n", figsize = (8,8))

In [None]:
plt.figure(figsize=(20, 10))
plt.title('Distribution of yearly Compensation', weight = 'bold', fontsize = 17)
plt.ylabel('Number of Respondents', weight = 'bold', fontsize = 15)
plt.xlabel('Amount (USD - $)', weight = 'bold', fontsize = 15)
plots = NGA['Q25'].value_counts().plot.bar(color = 'darkcyan')
annot_bar(plots)

In [None]:
plt.figure(figsize=(20, 10))
plt.title('First Language to learn for Data Science', weight = 'bold', fontsize = 15)
plt.ylabel('Number of Respondents', weight = 'bold', fontsize = 13)
plt.xlabel('Programming Language', weight = 'bold', fontsize = 13)
plots = NGA['Q8'].value_counts().plot.bar(color = 'darkcyan')

annot_bar(plots)

In [None]:
#plt.figure(figsize=(20, 10))
#plt.title('Computing device used for programming', weight = 'bold', fontsize = 17)
#plt.xlabel('Number of Respondents', weight = 'bold', fontsize = 15)
#plt.ylabel('Device', weight = 'bold', fontsize = 15)
#NGA['Q11'].value_counts().plot.barh(color = 'darkcyan')
#plt.gca().invert_yaxis()

plot_donut(NGA, 'Q11', 'Computing Device Used for Programming', explode = [0, 0, 0.05, 0.04, 0.5, 0.75])

In [None]:
plt.figure(figsize=(20, 10))
plt.title('Amount spent on Machine Learning or Cloud Computing services', weight = 'bold', fontsize = 15)
plt.ylabel('Number of Respondents', weight = 'bold', fontsize = 13)
plt.xlabel('Amount (USD - $)', weight = 'bold', fontsize = 13)
plots = NGA['Q25'].value_counts().plot.bar(color = 'darkcyan')

annot_bar(plots)

In [None]:
Q9col = [f'Q9_Part_{i}' for i in range(1,13)] + ['Q9_OTHER']
Q9 = NGA[Q9col]
Q9agg = Q9[~ Q9.isna()].stack()
#Q9agg = Q9agg.drop(0, 0)



label_plot("Most Utilised Integrated Development Environment (IDE)\n\n\n", "IDEs", "Number of Respondents")
plots = plot_bar(Q9agg)
plt.text(-0.5, 460, 'Note: Some respondents chose categories more than once', fontsize = 20, color = 'green')
annot_bar(plots)

In [None]:
Q10col = [f'Q10_Part_{i}' for i in range(1,17)] + ['Q10_OTHER']
Q10 = NGA[Q10col]
Q10agg = Q10[~ Q10.isna()].stack()



label_plot("Most Utilised Hosted Notebook Products\n\n\n", "Notebook Product", "Number of Respondents")
plots = plot_bar(Q10agg)
#plt.text(-0.5, 275, 'Note: Some respondents chose more than one category', fontsize = 20, color = 'green')
annot_bar(plots)

In [None]:
#Data preprocessing 
Q12col = [f'Q12_Part_{i}' for i in range(1,6)] + ['Q12_OTHER'] 
Q12 = NGA[Q12col]
Q12agg = Q12[~ Q12.isna()].stack()



#Donut plot
plot_donut(Q12agg, title = "Most Utilised Specialized Hardware\n")

In [None]:
Q17col = [f'Q17_Part_{i}' for i in range(1,12)] + ['Q17_OTHER'] 
Q17 = NGA[Q17col]
Q17agg = Q17[~ Q17.isna()].stack()



label_plot("Most Utilised Machine Learning Algorithms\n", "Number of Respondents", "Algorithm")
plots = plot_bar(Q17agg, horizontal = True)

In [None]:
Q14col = [f'Q14_Part_{i}' for i in range(1,12)] + ['Q14_OTHER'] 
Q14 = NGA[Q14col]
Q14agg = Q14[~ Q14.isna()].stack()



label_plot("Most Utilised Data Visualization Libraries or Tools\n", "Library / Tool", "Number of Respondents")
plots = plot_bar(Q14agg)
annot_bar(plots)

In [None]:
Q16col = [f'Q16_Part_{i}' for i in range(1,18)] + ['Q16_OTHER'] 
Q16 = NGA[Q16col]
Q16agg = Q16[~ Q16.isna()].stack()



label_plot("Most Utilised Machine Learning Frameworks\n", "Framework", "Number of Respondents")
plots = plot_bar(Q16agg)
annot_bar(plots)

In [None]:
plt.figure(figsize=(20, 10))
plt.title('Best Cloud platform with Developer Experience', weight = 'bold', fontsize = 15)
plt.ylabel('Number of Respondents', weight = 'bold', fontsize = 13)
plt.xlabel('Cloud Platform', weight = 'bold', fontsize = 13)
plots = NGA['Q28'].value_counts()[1:-1].plot.bar(color = 'darkcyan')

annot_bar(plots)

In [None]:
plt.figure(figsize=(20, 10))
plt.title('Distribution of most often used Big Data Products', weight = 'bold', fontsize = 15)
plt.ylabel('Number of Respondents', weight = 'bold', fontsize = 13)
plt.xlabel('Product', weight = 'bold', fontsize = 13)
plots = NGA['Q33'].value_counts().plot.bar(color = 'darkcyan')

annot_bar(plots)

In [None]:
plt.figure(figsize=(20, 10))
plt.title('Most used Business Intelligence tool', weight = 'bold', fontsize = 15)
plt.ylabel('Number of Respondents', weight = 'bold', fontsize = 13)
plt.xlabel('Business Intelligence tool', weight = 'bold', fontsize = 13)
plots = NGA['Q35'].value_counts().plot.bar(color = 'darkcyan')

annot_bar(plots)

In [None]:
#plt.figure(figsize=(20, 10))
#plt.title('Primary tool for analysis', weight = 'bold', fontsize = 17)
#plt.ylabel('Number of Respondents', weight = 'bold', fontsize = 13)
#plt.xlabel('Business Intelligence tool', weight = 'bold', fontsize = 13)
#plots = NGA['Q41'].value_counts().plot.barh(color = 'darkcyan')
#plt.gca().invert_yaxis()
#annot_bar(plots)

plot_donut(NGA, 'Q41', 'Primary Tool for Analysis', explode = [0, 0, 0.05, 0.04, 0.03, 0.01])

In [None]:
plt.figure(figsize=(20, 10))
plt.title('Number of times a TPU has been used', weight = 'bold', fontsize = 15)
plt.ylabel('Number of Respondents', weight = 'bold', fontsize = 13)
plt.xlabel('Number of Times', weight = 'bold', fontsize = 13)
plots = NGA['Q13'].value_counts().plot.bar(color = 'darkcyan')

annot_bar(plots)

In [None]:
Q42col = [f'Q42_Part_{i}' for i in range(1,12)] + ['Q42_OTHER'] 
Q42 = NGA[Q42col]
Q42agg = Q42[~ Q42.isna()].stack()



label_plot("Favourite Media that post Data Science Contents", 'Number of Respondents', 'Platforms')
plots = plot_bar(Q42agg, horizontal = True)

In [None]:
Q39col = [f'Q39_Part_{i}' for i in range(1,10)] + ['Q39_OTHER'] 
Q39 = NGA[Q39col]
Q39agg = Q39[~ Q39.isna()].stack()



label_plot("Most Utilised Platforms for Publicly Sharing of\nData Analysis or Machine Learning Applications", 'Number of Respondents', 'Platforms')
plots = plot_bar(Q39agg, horizontal = True)

In [None]:
Q40col = [f'Q40_Part_{i}' for i in range(1,12)] + ['Q40_OTHER'] 
Q40 = NGA[Q40col]
Q40agg = Q40[~ Q40.isna()].stack()



label_plot("Platforms Where Data Science Courses Were Begun or Completed",
           'Number of Respondents', 'Platforms')
plots = plot_bar(Q40agg, horizontal = True)

In [None]:
plt.figure(figsize=(20, 10))
plt.title('Number of years spent using Machine Learning methods', weight = 'bold', fontsize = 15)
plt.ylabel('Number of Respondents', weight = 'bold', fontsize = 13)
plt.xlabel('Duration (years)', weight = 'bold', fontsize = 13)
plots = NGA['Q15'].value_counts().plot.bar(color = 'darkcyan')
annot_bar(plots)

In [None]:
plt.figure(figsize=(20, 10))
plt.title('Industry of current employment', weight = 'bold', fontsize = 15)
plt.ylabel('Number of Respondents', weight = 'bold', fontsize = 13)
plt.xlabel('Industry', weight = 'bold', fontsize = 13)
plots = NGA['Q20'].value_counts().plot.bar(color = 'darkcyan')
annot_bar(plots)

In [None]:
plt.figure(figsize=(20, 10))
plt.title('Size of company of employment', weight = 'bold', fontsize = 15)
plt.ylabel('Number of Respondents', weight = 'bold', fontsize = 13)
plt.xlabel('Number of employees', weight = 'bold', fontsize = 13)
plots = NGA['Q21'].value_counts().plot.bar(color = 'darkcyan')
annot_bar(plots)

In [None]:
plt.figure(figsize=(20, 10))
plt.title('Number of Individuals responsible for Data Workload in businesses', weight = 'bold', fontsize = 15)
plt.ylabel('Number of Respondents', weight = 'bold', fontsize = 13)
plt.xlabel('Number of Individuals', weight = 'bold', fontsize = 13)
plots = NGA['Q22'].value_counts().plot.bar(color = 'darkcyan')
annot_bar(plots)

## **Conclusion**

#### **Important things to check out**
[<img src="https://user-images.githubusercontent.com/70070334/142493947-b935b4d6-f044-4686-985d-f692ecd2de52.jpg"/> <br/> <quote>  *Source: African Economic Merit Awards* <quote/>](http://www.africaneconomicmeritawards.org/)
- [Tech. in Nigeria](https://edition.cnn.com/2021/01/08/africa/nigeria-techpreneurs-african-startups-spc-intl/index.html)

- [AI Readiness in Nigeria](https://ai4da.com/ai-readiness-in-nigeria/)

- [Keyword Trends on Data Science in Nigeria](https://trends.google.com/trends/explore?date=today%205-y&geo=NG&q=%2Fm%2F0jt3_q3)

- [Open Source Contribution in Nigeria]( https://technext.ng/2020/12/04/number-of-nigerian-developers-on-git-hub-grew-by-65-in-2020/)

Note:
> An extra parameter I would've really loved to work on, would've been details on the traditional educational discipline of respondents, which would've been useful in assessing the proportion of those who pivoted from non-tech backgrounds.  
  **I'm sure it would be a significant percentage, there's a tech. revolution coming in Nigeria.**

### Let's Connect
> Author: [Alao David I.](https://github.com/invest41/)

| | | | | | |
|:--|:--|:--|:--|:--|:--|
|[Project Portfolio](https://invest41.github.io/AlaoDavid.github.io/) | [GitHub](https://github.com/invest41)| [Twitter](https://mobile.twitter.com/Wilder_Maxim)| [Kaggle](https://www.kaggle.com/welcomehere)|[Linkedin](https://www.linkedin.com/in/david-alao-72362113b/)| [Tableau](https://public.tableau.com/app/profile/alao.david)|