# Remittance Patterns and Economic Development

<a id='Introduction'></a>
# Description of dataset

This notebook analyzes datasets regarding worldwide economic remittances. The data used for this analysis is from [the Global Knowledge Partnership on Migration and Development (KNOMAD)](https://www.knomad.org/data/remittances) website in their broader effort to fill the knowledge gaps for monitoring and analyzing migration and remittances. It provides remittance data movements (inflows and outflows) between various countries. 

Summary Content:
* Number of countries involved: 214
* Time period: 1990 till 2022
* Unit of measurement: All values are in terms of millions of US dollars.
* This dataset contains three files:
    * `bilateral-remittance.csv` - Estimated remittances between world countries in the year 2021.
    * `remittance-inflows.csv` - Historical remittance money inflow into world countries since 1990. 
    * `remittance-outflows.csv` - Historical remittance money outflow from world countries since 1990. 

All monetary values are in terms of millions of US dollars.

In [6]:
# Importing libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from statsmodels.tsa.seasonal import seasonal_decompose
from statsmodels.tsa.arima.model import ARIMA
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import r2_score

#### Prepare datasets

In [2]:
# File path
inward_path = 'https://www.knomad.org/sites/default/files/2022-12/inward_remittance_flows_as_of_dec._2_2022_0.xlsx'
outward_path = 'https://www.knomad.org/sites/default/files/2023-01/outward_remittance_flows_brief_37_as_of_nov28_2022_2_1.xlsx'
bilateral_path = 'https://www.knomad.org/sites/default/files/2022-12/bilateral_remittance_matrix_2021_0.xlsx'

In [7]:
# Load the inward remittance flows dataset
df = pd.read_excel(inward_path)
print(df.shape)
# Display the first few rows of the dataset
df.head()

(226, 35)


Unnamed: 0,Migrant remittance inflows (US$ million),1990,1991,1992,1993,1994,1995,1996,1997,1998,...,2014,2015,2016,2017,2018,2019,2020,2021,2022e,% of GDP in 2022
0,Afghanistan,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,253.367822,348.624717,627.710802,822.73163,803.546454,828.571904,788.917115,300.0,350.0,2.058824
1,Albania,0.0,0.0,151.8,332.0,307.1,427.3,550.9,300.3,504.14,...,1421.007454,1290.863508,1306.009167,1311.822432,1458.210056,1472.812242,1465.987212,1718.320554,1800.0,9.859772
2,Algeria,352.44176,232.990263,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,2452.442617,1997.393458,1989.023597,1791.887073,1984.998399,1785.838683,1699.608935,1759.095247,1829.459057,0.97751
3,American Samoa,,,,,,,,,,...,,,,,,,,,,
4,Andorra,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,,,,,,21.1,47.416324,,,


In [8]:
# Display the dataset information
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 226 entries, 0 to 225
Data columns (total 35 columns):
 #   Column                                    Non-Null Count  Dtype  
---  ------                                    --------------  -----  
 0   Migrant remittance inflows (US$ million)  224 non-null    object 
 1   1990                                      202 non-null    float64
 2   1991                                      202 non-null    float64
 3   1992                                      202 non-null    float64
 4   1993                                      202 non-null    float64
 5   1994                                      202 non-null    float64
 6   1995                                      202 non-null    float64
 7   1996                                      202 non-null    float64
 8   1997                                      202 non-null    float64
 9   1998                                      202 non-null    float64
 10  1999                                  

### Preprocessing the data

In [9]:
# Rename columns
df = df.rename(columns={'Migrant remittance inflows (US$ million)': 'Country', 
                        '% of GDP in 2022': '%GDP_2022'})

# Selecting country names only
df = df.iloc[:214, :]

# Setting `Country` as index column
df.set_index('Country', inplace = True)
df

Unnamed: 0_level_0,1990,1991,1992,1993,1994,1995,1996,1997,1998,1999,...,2014,2015,2016,2017,2018,2019,2020,2021,2022e,%GDP_2022
Country,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
Afghanistan,0.000000,0.000000,0.000000,0.000000,0.000000,0.0,0.0,0.0,0.000000,0.000000,...,253.367822,348.624717,627.710802,822.731630,803.546454,828.571904,788.917115,300,350.000000,2.058824
Albania,0.000000,0.000000,151.800000,332.000000,307.100000,427.3,550.9,300.3,504.140000,407.200000,...,1421.007454,1290.863508,1306.009167,1311.822432,1458.210056,1472.812242,1465.987212,1718.320554,1800.000000,9.859772
Algeria,352.441760,232.990263,0.000000,0.000000,0.000000,0.0,0.0,0.0,0.000000,0.000000,...,2452.442617,1997.393458,1989.023597,1791.887073,1984.998399,1785.838683,1699.608935,1759.095247,1829.459057,0.977510
American Samoa,,,,,,,,,,,...,,,,,,,,,,
Andorra,0.000000,0.000000,0.000000,0.000000,0.000000,0.0,0.0,0.0,0.000000,0.000000,...,,,,,,21.100000,47.416324,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
Virgin Islands (U.S.),,,,,,,,,,,...,,,,,,,,,,
West Bank and Gaza,0.000000,0.000000,0.000000,0.000000,0.000000,582.1,542.3,623.3,1058.245498,1096.285162,...,1804.542445,1817.412109,2086.576176,2378.923437,2833.912788,3152.859814,2559.660846,3393.3649,3495.165847,18.573525
"Yemen, Rep.",0.000000,0.000000,0.000000,0.000000,0.000000,0.0,0.0,0.0,0.000000,0.000000,...,3350.500000,3350.500000,3770.584000,0.000000,0.000000,0.000000,,,,
Zambia,0.000000,0.000000,0.000000,0.000000,0.000000,0.0,0.0,0.0,0.000000,0.000000,...,58.300302,47.046538,38.464441,93.644095,106.965626,98.259121,134.864832,241.688413,260.150000,0.962627


In [10]:
# Check for missing values
df.isna().sum()

1990         14
1991         14
1992         14
1993         14
1994         14
1995         14
1996         14
1997         14
1998         14
1999         14
2000         14
2001         14
2002         14
2003         14
2004         14
2005         21
2006         20
2007         20
2008         20
2009         20
2010         20
2011         20
2012         20
2013         20
2014         20
2015         20
2016         20
2017         20
2018         20
2019         19
2020         19
2021         36
2022e        37
%GDP_2022    42
dtype: int64

In [11]:
# Remmove all empty rows
df.dropna(how='all', inplace=True)
df

Unnamed: 0_level_0,1990,1991,1992,1993,1994,1995,1996,1997,1998,1999,...,2014,2015,2016,2017,2018,2019,2020,2021,2022e,%GDP_2022
Country,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
Afghanistan,0.000000,0.000000,0.000000,0.000000,0.000000,0.0,0.000,0.0,0.000000,0.000000,...,253.367822,348.624717,627.710802,822.731630,803.546454,828.571904,788.917115,300,350.000000,2.058824
Albania,0.000000,0.000000,151.800000,332.000000,307.100000,427.3,550.900,300.3,504.140000,407.200000,...,1421.007454,1290.863508,1306.009167,1311.822432,1458.210056,1472.812242,1465.987212,1718.320554,1800.000000,9.859772
Algeria,352.441760,232.990263,0.000000,0.000000,0.000000,0.0,0.000,0.0,0.000000,0.000000,...,2452.442617,1997.393458,1989.023597,1791.887073,1984.998399,1785.838683,1699.608935,1759.095247,1829.459057,0.977510
Andorra,0.000000,0.000000,0.000000,0.000000,0.000000,0.0,0.000,0.0,0.000000,0.000000,...,,,,,,21.100000,47.416324,,,
Angola,0.000000,0.000000,0.000000,0.000000,0.000000,0.0,5.142,0.0,0.000000,0.000000,...,30.971119,11.114712,3.988048,1.418196,1.579247,3.445473,8.053051,12.631149,16.420494,0.013158
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
Vietnam,0.000000,0.000000,0.000000,0.000000,0.000000,0.0,0.000,0.0,0.000000,0.000000,...,12000.000000,13000.000000,14000.000000,15000.000000,16000.000000,17000.000000,17200,18060,19000.000000,4.591501
West Bank and Gaza,0.000000,0.000000,0.000000,0.000000,0.000000,582.1,542.300,623.3,1058.245498,1096.285162,...,1804.542445,1817.412109,2086.576176,2378.923437,2833.912788,3152.859814,2559.660846,3393.3649,3495.165847,18.573525
"Yemen, Rep.",0.000000,0.000000,0.000000,0.000000,0.000000,0.0,0.000,0.0,0.000000,0.000000,...,3350.500000,3350.500000,3770.584000,0.000000,0.000000,0.000000,,,,
Zambia,0.000000,0.000000,0.000000,0.000000,0.000000,0.0,0.000,0.0,0.000000,0.000000,...,58.300302,47.046538,38.464441,93.644095,106.965626,98.259121,134.864832,241.688413,260.150000,0.962627


In [12]:
# Impute missing values with zeros (0) in df
df.fillna(0, inplace=True)
df.head()

Unnamed: 0_level_0,1990,1991,1992,1993,1994,1995,1996,1997,1998,1999,...,2014,2015,2016,2017,2018,2019,2020,2021,2022e,%GDP_2022
Country,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
Afghanistan,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,253.367822,348.624717,627.710802,822.73163,803.546454,828.571904,788.917115,300.0,350.0,2.058824
Albania,0.0,0.0,151.8,332.0,307.1,427.3,550.9,300.3,504.14,407.2,...,1421.007454,1290.863508,1306.009167,1311.822432,1458.210056,1472.812242,1465.987212,1718.320554,1800.0,9.859772
Algeria,352.44176,232.990263,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,2452.442617,1997.393458,1989.023597,1791.887073,1984.998399,1785.838683,1699.608935,1759.095247,1829.459057,0.97751
Andorra,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,21.1,47.416324,0.0,0.0,0.0
Angola,0.0,0.0,0.0,0.0,0.0,0.0,5.142,0.0,0.0,0.0,...,30.971119,11.114712,3.988048,1.418196,1.579247,3.445473,8.053051,12.631149,16.420494,0.013158


In [13]:
# Check data type of each column
df.dtypes

1990         float64
1991         float64
1992         float64
1993         float64
1994         float64
1995         float64
1996         float64
1997         float64
1998         float64
1999         float64
2000         float64
2001         float64
2002         float64
2003         float64
2004         float64
2005         float64
2006          object
2007          object
2008          object
2009          object
2010         float64
2011         float64
2012         float64
2013         float64
2014         float64
2015         float64
2016         float64
2017         float64
2018         float64
2019         float64
2020          object
2021          object
2022e        float64
%GDP_2022    float64
dtype: object

In [14]:
# Convert string object to float64
df = df.apply(pd.to_numeric, errors='coerce')

In [15]:
df.dtypes

1990         float64
1991         float64
1992         float64
1993         float64
1994         float64
1995         float64
1996         float64
1997         float64
1998         float64
1999         float64
2000         float64
2001         float64
2002         float64
2003         float64
2004         float64
2005         float64
2006         float64
2007         float64
2008         float64
2009         float64
2010         float64
2011         float64
2012         float64
2013         float64
2014         float64
2015         float64
2016         float64
2017         float64
2018         float64
2019         float64
2020         float64
2021         float64
2022e        float64
%GDP_2022    float64
dtype: object

In [16]:
df.shape

(200, 34)

In [17]:
# Save cleaned data to CSV file
df.to_csv('remittance_inflows_clean.csv', index=True)