# Abubakar Yagoub Ibrahim
# 1625897

## Questions

the focus is to obtain some specific data on skill migration among low income countries, and to predict the skill migration trends in each country for 2020 

- list of countries classified as low income by the world bank
- which skill group category had positive migration in 2019 
- which industry and country had the most positive migration in 2019
- list of countries with positive skill migration in 2019
- skill migration in countries with more the 1k per 10k in 2019 vs 2015
- predict skill migration per 10k for 2020
- which country will have the most skill migration in skills that had positive migration in 2019 for 2020

In [None]:
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
from statsmodels.tsa.api import Holt

# supress annoying warning
import warnings
from statsmodels.tools.sm_exceptions import ConvergenceWarning
warnings.simplefilter('ignore', ConvergenceWarning)

# set the size for all figures
plt.rcParams['figure.figsize'] = [14, 7]

In [None]:
# load in the dataset
talent_mg_skill = pd.read_excel("../input/linkedin-digital-data/public_use-talent-migration.xlsx", sheet_name="Skill Migration")

In [None]:
# get countries with low income
countries_low = talent_mg_skill[talent_mg_skill["wb_income"] == "Low income"]

In [None]:
print("list of countries classified as low income:")
for country in countries_low["country_name"].unique():
    print(country)

In [None]:
# get industries which had positive migration in 2019
pos_2019 = countries_low[countries_low["net_per_10K_2019"] > 0]

In [None]:
print("list of skill group with positive migration in 2019:")
for group in pos_2019["skill_group_category"].unique():
    print(group)
    

In [None]:
pos_2019[pos_2019["net_per_10K_2019"] == pos_2019["net_per_10K_2019"].max()]

the automotive industry in afghanstan had the most growth in 2019 compared to other skill groups, from this we can also infer that the Specialized Industry Skills category had the most growth of all groups in 2019

In [None]:
# group rows by country
country_mg_2019 = pos_2019.groupby("country_name").sum()

In [None]:
# lets take a look at each country in numbers
country_mg_2019

In [None]:
country_mg_2019["net_per_10K_2019"]

lets plot countries which have more than 1000 migration on every 10k

In [None]:
country_mg_2019[country_mg_2019["net_per_10K_2019"] > 1000].plot(y=["net_per_10K_2019"], style=".")

we can see that sengal had the most skill migration in 2019 compared to other countries
now lets compare these numbers to 2015 for example

In [None]:
country_mg_2019[country_mg_2019["net_per_10K_2019"] > 1000].plot(y=["net_per_10K_2019", "net_per_10K_2015"], style=".")

skill migration in countries with more than 1000 per 10k skill migration has drastically changes compared to 2015, for example in the Congo and Mali people are far less likely to migrate industries in 2019 compared to 2015 this could indicate a stability in the job market and that people are now settling to a specific field, in contrast more people are migrating to other industries in 2019 compared to 2015 in Senegal indicating a shift in the job market

In [None]:
''' 
Holt's (Method) Exponential Smoothing for predicting next value based on previous years values,
also known as forecasting
'''

def hes_predict(train):
    model = Holt(train)
    model_fit = model.fit()
    fcast = model_fit.forecast()
    return fcast[0]

In [None]:
countries_2020 = pd.DataFrame(columns = ["net_per_10k_2020"], index = country_mg_2019.index)

for country in country_mg_2019.index:
    
    # take previous numbers for country as model input
    train = country_mg_2019.drop("skill_group_id", axis=1)[country_mg_2019.index == f"{country}"].to_numpy()
    
    # get model prediciton and round to 2 decimal places
    result = round(hes_predict(train[0]), 2)
    
    # save model prediction to dataframe 
    countries_2020["net_per_10k_2020"][f"{country}"] = result
    
    # print prediction results
    print(f"{country} skill migration per 10k for 2020 = {result}\n")
    

In [None]:
# again plot countries where its more than 1k
countries_2020[countries_2020["net_per_10k_2020"] > 1000].plot(style=".")

from the model predicitions we can see that the job market will have a major shift in 2020 in Mozambique, with almost all 10k shifting between skills, previous year's leading to a very unstable job market high, last years highest migration was in Senegal with over 5k migrants per 10k, for 2020 the market seems to be stabilizing and people are settling into jobs leading to less than 3k migrants.

in conclusion. the market has changed significantly for the past years in low income countries, some of the countries had positive skill migration meaning an unstable market where workers do not settle for a specific field, while other countries had negative migration compared to previous years leading to a more stable market which can be interpreted as a good measure of market stability and quality of work for those workers