# Quick analysis of international flows of 'AI skills' globally

We use LinkedIn data available [here](https://datacatalog.worldbank.org/dataset/world-bank-group-linkedin-dashboard-dataset#tab2) to analyse inwards and outwards flows of 'artificial intelligence' related skills.

## Preamble

In [None]:
%matplotlib inline
import matplotlib.pyplot as plt
import pandas as pd

plt.rc('font', family = 'serif',serif='Times New Roman') 


## Download the data

In [None]:
li = pd.read_excel('https://development-data-hub-s3-public.s3.amazonaws.com/ddhfiles/144635/public_use-talent-migration.xlsx',sheet_name='Skill Migration')

## Process and explore

In [None]:
li.shape

In [None]:
li.columns

In [None]:
len(set(li.country_name))

In [None]:
li.skill_group_name.value_counts().head()

These skills appear for al countries

According to the methodology, data are only available for skills groups with more than 50 'transitions' in a country. 
This means that countries with more activity will have more skills groups. They seem to have applied a threshold that focuses on the top 250 skill a

In [None]:
country_freqs =li.groupby(['country_name'])['skill_group_name'].size()

plt.hist(country_freqs)

We will focus analysis on the top 25 countries by activity. They are:




In [None]:
big_countries = country_freqs.sort_values(ascending=False)[:30].index

big_countries

It's interesting to note the presence of Arabiann countries such as UAE and Saudi Arabia - perhaps this is picking up worker flows?

In [None]:
li_big = li.loc[[x in big_countries for x in li['country_name']]].reset_index(drop=True)

li_big.shape

#### Let's find AI skills

A visual inspection of the data suggests these are the most 'Ai-related' skills. It would be interested to identify them in a more data-driven way

In [None]:
ai_skills = ['Data Science','Artificial Intelligence','Natural Language Processing','Data-driven Decision Making','Robotics']

In [None]:
li_big['ai_skills'] = [x in ai_skills for x in li_big.skill_group_name]

In [None]:
ai_trends = pd.concat([li_big.loc[li_big['ai_skills']==True].groupby('country_name')[var].mean() for var in ['net_per_10K_2015','net_per_10K_2018']],axis=1)

ai_trends

In [None]:
fig,ax = plt.subplots(figsize=(10,5))

ai_trends.sort_values('net_per_10K_2018',ascending=False).plot.bar(ax=ax)
ax.grid(linestyle=':',color='grey')

# Don't allow the axis to be on top of your data
ax.set_axisbelow(True)

ax.set_xticklabels(ax.get_xticklabels(),rotation=45,ha='right')
ax.set_xlabel('')
ax.set_ylabel('Net gain / loss of professionals \n with AI skills per 10,000')

ax.legend(['2015','2018'],title='Year')
ax.set_title('Global flows of AI skills 2015-2018 (LinkedIn & World Bank, 2019)')

plt.tight_layout()

plt.savefig('ai_figure.pdf')