## Net Migration Project Pakistan vs India
Prepared By: Ejaz-ur-Rehman\
Date: 23-04-2025\
Email ID: ijazfinance@gmail.com

In [1]:
# Import liabraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import plotly.express as px

In [3]:
# load the dataset
df = pd.read_csv('./data_set/data_net_migration.csv')
df.head()

Unnamed: 0,Years,Pakistan,India
0,1960,0,146955
1,1961,-66991,295177
2,1962,-65780,-70514
3,1963,-64552,21640
4,1964,-63298,-141916


In [4]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 65 entries, 0 to 64
Data columns (total 3 columns):
 #   Column    Non-Null Count  Dtype
---  ------    --------------  -----
 0   Years     65 non-null     int64
 1   Pakistan  65 non-null     int64
 2   India     65 non-null     int64
dtypes: int64(3)
memory usage: 1.7 KB


## Data Span
- We get Net Migration data from 1690 to 2024 for both Pakistan and India. 
  - The data includes annual figures, allowing for a detailed analysis of migration trends over the years.
  - The data is sourced from the World Bank, ensuring its reliability and accuracy.
  
Net Migration:
> 
>  Net Migration is the difference between the number of people entering a country (immigrants) and the number of people leaving the country (emigrants) over a specific period of time.
> 
> Formula:
$$
\text{Net Migration} = \text{Immigrants} - \text{Emigrants}
$$
>
> Explanation:
  - If the result is 'positive', more people are entering the country than leaving (net in-migration).
  - If the result is 'negative', more people are leaving the country than entering (net out-migration).

Net Migration Rate:
$$
\text{Net Migration Rate} = \left( \frac{\text{Immigrants} - \text{Emigrants}}{\text{Total Population}} \right) \times 1000
$$



In [5]:
pip install wbdata pandas


Defaulting to user installation because normal site-packages is not writeable
Collecting wbdata
  Downloading wbdata-1.0.0-py3-none-any.whl.metadata (2.6 kB)
Collecting appdirs<2.0,>=1.4 (from wbdata)
  Downloading appdirs-1.4.4-py2.py3-none-any.whl.metadata (9.0 kB)
Collecting backoff<3.0.0,>=2.2.1 (from wbdata)
  Downloading backoff-2.2.1-py3-none-any.whl.metadata (14 kB)
Collecting cachetools<6.0.0,>=5.3.2 (from wbdata)
  Using cached cachetools-5.5.2-py3-none-any.whl.metadata (5.4 kB)
Collecting dateparser<2.0.0,>=1.2.0 (from wbdata)
  Downloading dateparser-1.2.1-py3-none-any.whl.metadata (29 kB)
Collecting shelved-cache<0.4.0,>=0.3.1 (from wbdata)
  Downloading shelved_cache-0.3.1-py3-none-any.whl.metadata (4.7 kB)
Collecting tabulate<0.9.0,>=0.8.5 (from wbdata)
  Downloading tabulate-0.8.10-py3-none-any.whl.metadata (25 kB)
Collecting regex!=2019.02.19,!=2021.8.27,>=2015.06.24 (from dateparser<2.0.0,>=1.2.0->wbdata)
  Downloading regex-2024.11.6-cp313-cp313-win_amd64.whl.metadat

- We don't have the data for Populaiton of Pakistan and India. We will get the populaito data from world bank API.
- Further, we will add with Net Migration data set too for the analysis of demographic trends.

In [17]:
import wbdata
import pandas as pd

# Country codes for Pakistan and India
countries = ["PAK", "IND"]

# Indicators: Population and Net Migration
indicators = {
    "SP.POP.TOTL": "total_population",
    "SM.POP.NETM": "net_migration",
}

# Download data
df = wbdata.get_dataframe(indicators, country=countries)

# Reset index to work with columns
df = df.reset_index()

# Convert 'date' to datetime format
df['date'] = pd.to_datetime(df['date'], errors='coerce')

# Extract only the year (no need for full date)
df['year'] = df['date'].dt.year

# Filter data from 1960 to 2024
df = df[(df['year'] >= 1960) & (df['year'] <= 2024)]

# Pivot to make data more readable
df_pivot = df.pivot_table(index="year", columns=["country"], values=["total_population", "net_migration"])
df_pivot = df_pivot.sort_index()

# Save the combined data
df_pivot.to_csv("./data_set/indo_pak_total_pop_net_migration.csv")
df_pivot.head()



Unnamed: 0_level_0,net_migration,net_migration,total_population,total_population
country,India,Pakistan,India,Pakistan
year,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2
1960,146955.0,0.0,435990338.0,45709310.0
1961,295177.0,-66991.0,446564729.0,46921277.0
1962,-70514.0,-65780.0,457283090.0,48156128.0
1963,21640.0,-64552.0,468138575.0,49447776.0
1964,-141916.0,-63298.0,479229598.0,50799999.0


In [19]:
df = df_pivot.copy()

In [20]:
df.info()

<class 'wbdata.client.DataFrame'>
Index: 65 entries, 1960 to 2024
Data columns (total 4 columns):
 #   Column                        Non-Null Count  Dtype  
---  ------                        --------------  -----  
 0   (net_migration, India)        65 non-null     float64
 1   (net_migration, Pakistan)     65 non-null     float64
 2   (total_population, India)     64 non-null     float64
 3   (total_population, Pakistan)  64 non-null     float64
dtypes: float64(4)
memory usage: 2.3 KB
