## The world’s top trending skills in AI and related fields of Data Science benchmarked across 60 countries.
![](https://cdn-images-1.medium.com/max/800/1*vu0cAstnYfhBntpmwfnISw.jpeg)

"Image by Gerd Altmann from Pixabay"

[Coursera](http://https://www.coursera.org/) is an online platform for higher education that serves over 45 million learners around the world by providing access to high quality content from leading universities and companies. The platform currently includes includes **3,700+ **courses, **400+ specializations**, and **16 degrees**, This undoubtedly creates one
of the largest skills databases as millions of learners take graded assessments ranging from multiple choice exams to programming assignments to peer reviewed projects that measure their skill proficiency.[Source](https://hai.stanford.edu/sites/g/files/sbiybj10986/f/ai_index_2019_report.pdf)

The [Coursera Global Skills Index (GSI)](https://www.coursera.org/gsi) draws upon this rich data to benchmark 60 countries and 10 industries across Business, Technology, and Data Science skills to reveal skills development trends around the world.

Coursera measures the skill proficiency of countries in AI overall and in the related skills of :
* **Math** : is the study of numbers and their relationships as well as applying these principles to models of real phenomena. (Sample skills: calculus, linear algebra)

* **Machine learning** :  creates algorithms and statistical models that computer systems can use to perform a specific task without explicit instructions. (Sample skills: neural networks, natural language processing) 

* **Statistics** : deals with all aspects of data collection, organization, analysis, interpretation, and presentation. (Sample skills: linear regression, AB testing)

* **Statistical programming** : is the set of programming languages and tools used to create statistical models and algorithms. (Sample skills: R, Python)

* **Software engineering** : involves the design, development, maintenance, testing, and evaluation of computer software. (Sample skills: software development, algorithms)

These related skills cover the breadth of knowledge needed to build and deploy AI powered technologies within organizations and society. 


### Benchmarking(https://www.coursera.org/gsi/methodology/)
Skills in this taxonomy are mapped to the courses that teach them using a machine learning model. For every competency in the GSI, this tagging makes it possible to extract assessments in courses teaching relevant skills. These serve as the pool of assessments used to measure individual learners’ skill proficiencies.

With the set of assessments for each competency defined,grades were considered for all learners taking each assessment in the relevant pool. Machine Learning models were then trained to estimate individual learners’ skill proficiencies, adjusting for item difficulty.

The average proficiency for a company for industry has been calculated by averaging across proficiencies of learners in an entity, weighting by the inverse of the standard error on the proficiency estimates and trimming these weights to avoid undue influence by any one learner.

This weighted average for each domain or competency is the GSI estimate of an entity’s skill proficiency. Performance bands are computed by segmenting skill proficiencies into quartiles:

* **Cutting-Edge** for 76th percentile or above
* **Competitive** for 51st to 75th percentile
* **Emerging** for 26th to 50th percentile
* **Lagging** for 25th percentile or below

The GSI reflects the average skill proficiency of learners within each entity on the Coursera platform. Note that the GSI estimate does not necessarily reflect the average skill proficiency of all entity members because Coursera learners may differ from the average resident of a country or employee in an industry.

In [None]:
from mpl_toolkits.mplot3d import Axes3D
from sklearn.preprocessing import StandardScaler
import matplotlib.pyplot as plt # plotting
import numpy as np # linear algebra
import os # accessing directory structure
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)
from plotly.offline import init_notebook_mode, iplot 
import plotly.graph_objs as go
import plotly.offline as py
import plotly.express as px
import pycountry
py.init_notebook_mode(connected=True)

# Graphics in retina format 
%config InlineBackend.figure_format = 'retina' 

# Increase the default plot size and set the color scheme
plt.rcParams['figure.figsize'] = 8, 5

# Disable warnings in Anaconda
import warnings
warnings.filterwarnings('ignore')

In [None]:
# This Python 3 environment comes with many helpful analytics libraries installed
# It is defined by the kaggle/python docker image: https://github.com/kaggle/docker-python
# For example, here's several helpful packages to load in 

import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)

# Input data files are available in the "../input/" directory.
# For example, running this (by clicking run or pressing Shift+Enter) will list all files under the input directory

import os
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))

# Any results you write to the current directory are saved as output.

## Loading the dataset

In [None]:
df = pd.read_csv('/kaggle/input/coursera-ai-global-skills-index-2019-data/Coursera AI GSI Percentile and Category.csv')
df.head()

## Getting to know our dataset

In [None]:
df.info()

# Overview of the dataset
## 1. Different regions included in the survey

Let's look at the different regions included in the survey to get an idea about the various regions covered in the index

In [None]:

counts = df.region.value_counts(sort=True)
labels = counts.index
values = counts.values

pie = go.Pie(labels=labels, values=values,pull=[0.05, 0], marker=dict(line=dict(color='#000000', width=1)))
layout = go.Layout(title='Region wise Distribution in 2019')

fig = go.Figure(data=[pie], layout=layout)
py.iplot(fig)

> The data consists of 7 regions where the maximum countries belong to the Europe and central Asia regions.

## 2. Division by Income Level
There is a column in the dataset which pertains to the average income of the countries.

In [None]:
counts = df.incomegroup.value_counts(sort=True)
labels = counts.index
values = counts.values

pie = go.Pie(labels=labels, values=values,pull=[0.05, 0],
             marker=dict(colors=['dodgerblue', 'plum', '#F0A30A'],line=dict(color='#000000', width=1)))
layout = go.Layout(title='Income wise distribution in 2019')

fig = go.Figure(data=[pie], layout=layout)
py.iplot(fig)

The results show that most of the countries that have been included in the survey belong to the the High Income Zone while around 17% belong to the lower middle income zone. This is true since only the countries with decent income level will be able to spend money on AI and ML technologies.

## 3. A look at different competency parameters 

In [None]:

df.competency_id.value_counts()

There are 6 competency parameters against which all the countries have been marked. It will be interesting to see how the countries fare against one another.

# Global & Regional Results

In [None]:
# creating different dataframes based on the competency Ids
df_AI = df[df['competency_id'] == 'artificial-intelligence']
df_Stats_prog = df[df['competency_id'] == 'statistical-programming']
df_Stats = df[df['competency_id'] == 'statistics']
df_SE = df[df['competency_id'] == 'software-engineering']
df_Math = df[df['competency_id'] == 'fields-of-mathematics']
df_ML = df[df['competency_id'] == 'machine-learning']

Let's write a function to create a world heat map to see where the countries based on the competency IDs

In [None]:
def map(data):
    """
    function to plot a world map of the competency ids, distributed regionwise
    """
    fig = go.Figure(data=go.Choropleth(
        locations = data['iso3'],
        z = data['percentile_rank'],
        text = data['percentile_category'],
        colorscale = "Rainbow",
        autocolorscale=False,
        reversescale=True,
        marker_line_color='darkgray',
        marker_line_width=0.5,
        #colorbar_tickprefix = '$',
        colorbar_title = 'Skill Index (1 is highest)'))

    fig.update_layout(
            title_text= data['competency_id'].iloc[0].title() +" "+'Skill Index in 2019',
            geo=dict(
                  showframe=False,
                  showcoastlines=False,
                  projection_type='equirectangular'))

    fig.show()

## Artificial Intelligence Skill Index in 2019
A look at the world on the AI index.

In [None]:
map(df_AI)

> * Countries leading the AI category are:
>
> **Finland**, **Japan**, **Hong Kong**, **Germany** and **Singapore**. Even USA and China fall in cutting edge category.
> * Countries lagging in the AI category:
>
> **Nigeria**,**Dominican Republic**, **Mexico**,**Kenya** and **Peru** lie at the lowermost level when it comes to AI.

### Let's have a look at the top 5 countries in AI category in all the four categories.This can be easily done by the Pandas' groupby function.

In [None]:
df1 = df_AI.groupby(['percentile_category','country'])['percentile_rank'].apply(lambda x : x.min()).to_frame()
df1.sort_values('percentile_rank',ascending=False).groupby(level=0).head(5)

## Machine Learning Skill Index in 2019
A look at the world on the ML index.

In [None]:
map(df_ML)

> * Countries leading in the  ML category are:
>
> **Russia**, **Switzerland**, **Belarus**, **Belgium** and **Finland**. Even USA falls in this category but China has fallen in the **Emerging** category.
> * Countries lagging in the ML category:
>
> **Nigeria**,**Dominican Republic**, **Venezuela**,**Equador** and **Kenya** lie at the lowermost level when it comes to ML.

### Top 5 countries in ML category in all the four categories.

In [None]:
df2 = df_ML.groupby(['percentile_category','country'])['percentile_rank'].apply(lambda x : x.min()).to_frame()
df2.sort_values('percentile_rank',ascending=False).groupby(level=0).head(5)

## Statistical Programming Skill Index in 2019
A look at the world on the Stattistical Programming index.

In [None]:
map(df_Stats_prog)

> * Countries leading in the Statistical Programming Category are:
>
> **Russia**, **Belarus**, **Ukraine**, **Finland** and **Germany**. 
> * Countries lagging in Statstical Programming category:
>
> **China**,**Taiwan**, **Korea**,**Egypt** and **Bangladesh** lie at the lowermost level when it comes to ML.

### Top 5 countries in Stastical Programming category in all the four categories.

In [None]:
df3 = df_Stats_prog.groupby(['percentile_category','country'])['percentile_rank'].apply(lambda x : x.min()).to_frame()
df3.sort_values('percentile_rank',ascending=False).groupby(level=0).head(5)

## Statistics Skill Index in 2019
A look at the world on the Statistics index.

In [None]:
map(df_Stats) 

> * Countries leading in the Statistics Category are:
>
> **Russia**, **Switzerland** **Belarus**, **Finland** and **Romania**. 
> * Countries lagging in the Statistics Category are:
>
> **Nigeria**,**Saudi Arabia**, **Kenya**,**Venezuela** and **Pakistan** lie at the lowermost level when it comes to ML.

### Top 5 countries in Statistics category in all the four categories.

In [None]:
df4 = df_Stats.groupby(['percentile_category','country'])['percentile_rank'].apply(lambda x : x.min()).to_frame()
df4.sort_values('percentile_rank',ascending=False).groupby(level=0).head(5)

## Maths Skill Index in 2019
A look at the world on the math index.

In [None]:
map(df_Math)

> * Countries leading in the Maths Category are:
>
> **Japan**, **Hongkong** **Switzerland**, **China** and **Venezuela**. 
> * Countries lagging in the Maths Category are:
>
> **Nigeria**,**Kenya**,**Pakistan**, **Turkey** and **Columbia** lie at the lowermost level when it comes to ML.

### Top 5 countries in Maths category in all the four categories.

In [None]:
df5 = df_Math.groupby(['percentile_category','country'])['percentile_rank'].apply(lambda x : x.min()).to_frame()
df5.sort_values('percentile_rank',ascending=False).groupby(level=0).head(5)

## Software Engineering Skill Index in 2019
A look at the world on the software engineering index.

In [None]:
map(df_SE)

> * Countries leading in the software Engineering Category are:
>
> **Russia**, **Brazil** **Belarus**, **Canada** and **Hungary**. 
> * Countries lagging in the software Engineering Category are:
>
> **Nigeria**,**Kenya**,**Pakistan**, **Turkey** and **Columbia** lie at the lowermost level when it comes to ML.

### Top 5 countries in Software Engineering category in all the four categories.

In [None]:
df6 = df_SE.groupby(['percentile_category','country'])['percentile_rank'].apply(lambda x : x.min()).to_frame()
df6.sort_values('percentile_rank',ascending=False).groupby(level=0).head(5)

# Countrywise Results
For each major geographic region, we can also see the average country’s share of enrollments in AI and the five related competencies. Let's create a different dataset for each region.

In [None]:
Eur_Central_Asia = df[(df['region'] == 'Europe & Central Asia')]
East_Asia_Pacific = df[(df['region'] == 'East Asia & Pacific')]
Latin_America_Caribbean = df[(df['region'] == 'Latin America & Caribbean')]
Middle_East_North_Africa = df[(df['region'] == 'Middle East & North Africa')]
Sub_Saharan_Africa = df[(df['region'] == 'Sub-Saharan Africa')]
South_Asia = df[(df['region'] == 'South Asia')]
North_America = df[(df['region'] == 'North America')]

In [None]:
def facetplot(data):
    """
    Plots a facetplot between competency Ids and the percentile rank for every country in the given geographical region
    
    """
     
    data['competency_id'].replace({'statistics':'Stats',
                                    'statistical-programming':'StatsProg',
                                     "artificial-intelligence":'AI',
                                     "fields-of-mathematics":'Maths',
                                     "software-engineering":'SE',
                                     "machine-learning":"ML"},inplace=True)




    fig = px.bar(data, 
                 y='country', 
                 x='percentile_rank', 
                 orientation='h',
                 facet_col="competency_id",
                 color='percentile_category',
                 width=1100,height=600,
                 title=data['region'].iloc[0])   
                 
    fig.update_xaxes(title_text='Rank',title_font=dict(size=10))
    fig.update_yaxes(title_text=None)
    fig.update_layout(legend={'x':0.1,'y':-0.5})

    fig.show()

## Europe & Central Asia
As can be inferred from above results, this region has been doing considerably good(especially Russia) when it comes to skill index. Let's look at the performance of every country in the European & Central Asia Region.

In [None]:
facetplot(Eur_Central_Asia)

In [None]:
%%HTML
<iframe title="Europe &amp;amp; Central Asia&amp;nbsp;" aria-label="Table" src="//datawrapper.dwcdn.net/khfZm/1/" scrolling="no" frameborder="0" style="border: none;" width="600" height="376"></iframe>

## East Asia & Pacific
Let's look at the performance of every country in the East Asia & Pacific Region.

In [None]:
facetplot(East_Asia_Pacific)

In [None]:
%%HTML
<iframe title="East Asia &amp; Pacific" aria-label="Table" src="//datawrapper.dwcdn.net/lDVBs/1/" scrolling="no" frameborder="0" style="border: none;" width="600" height="377"></iframe>

## Latin America & Caribbean
Let's look at the performance of every country in the Latin America & Caribbean Region.

In [None]:
facetplot(Latin_America_Caribbean)

In [None]:
%%HTML
<iframe title="Latin America &amp;amp; Caribbean " aria-label="Table" src="//datawrapper.dwcdn.net/C62K6/1/" scrolling="no" frameborder="0" style="border: none;" width="600" height="377"></iframe>

## Middle East & North Africa
Let's look at the performance of every country in the Middle East & North Africa Region.

In [None]:
facetplot(Middle_East_North_Africa)

In [None]:
%%HTML
<iframe title="Middle East &amp; North Africa" aria-label="Table" src="//datawrapper.dwcdn.net/x0hT6/2/" scrolling="no" frameborder="0" style="border: none;" width="600" height="376"></iframe>

## Sub-Saharan Africa
Let's look at the performance of every country in the Sub_Saharan_Africa

In [None]:
facetplot(Sub_Saharan_Africa )

In [None]:
%%HTML
<iframe title="Sub-Saharan Africa" aria-label="Table" src="//datawrapper.dwcdn.net/tE9Wv/1/" scrolling="no" frameborder="0" style="border: none;" width="600" height="376"></iframe>

## South Asia
Let's look at the performance of every country in the south Asian Region.

In [None]:
facetplot(South_Asia)

In [None]:
%%HTML
<iframe title="South Asia" aria-label="Table" src="//datawrapper.dwcdn.net/ZHkeJ/1/" scrolling="no" frameborder="0" style="border: none;" width="600" height="377"></iframe>

## North America
Let's look at the performance of every country in the North American Region.

In [None]:
facetplot(North_America)

In [None]:
%%HTML
<iframe title="North America" aria-label="Table" src="//datawrapper.dwcdn.net/AyFdV/12/" scrolling="no" frameborder="0" style="border: none;" width="600" height="309"></iframe>