### migration_rate
In this project, I aim to apply and enhance my skills in data preprocessing and analysis. The focus is on analyzing migration rates across various countries from 1990 to 2020. The project is divided into two sections: the first section covers data preprocessing techniques, while the second focuses on answering analytical questions related to the dataset.


### Formula for Calculating Migration Rate

##### $$\frac{(\text{migration in} - \text{migration out}) * 1000 }{population}$$

<p dir=rtl style="direction:trl; text-align: justify;line-height:200%;font-family:vazir;font-size:medium">

</p>

In [294]:
import numpy as np
import pandas as pd 

In [295]:

df = pd.read_csv("migration_rate.csv")
df

Unnamed: 0,Country,1990,1995,2000,2005,2010,2015,2020
0,United States,2.7,3.5,6.5,3.7,3.6,3.2,2.9
1,India,0.0,,-0.1,-0.3,-0.4,-0.4,-0.4
2,Russian Federation,1.2,3.4,3.2,2.5,3.2,2.5,1.3
3,Germany,4.3,6.6,1.8,,0.1,4.8,6.6
4,Algeria,-0.8,-0.9,-1.1,-1.3,,-0.8,-0.2
...,...,...,...,...,...,...,...,...
186,Mali,-12.1,-3.8,-2.8,-1.1,-1.4,-3.7,-2.1
187,Costa Rica,1.2,4.2,4.8,2.0,1.4,0.8,0.8
188,Norway,1.8,2.2,2.7,3.0,6.9,8.8,5.3
189,Tajikistan,-1.3,,-7.9,-4.5,-4.1,-3.4,-2.2



<p dir=rtl style="direction: rtl;text-align: right;line-height:200%;font-family:vazir;font-size:medium">
<font face="vazirmatn" size=3>
</font>
</p>



### Preprocessing
Before analysis, the missing values in the dataset should be managed, and some information must be removed.

In the preprocessing step, Remove countries with missing migration rates for all 7 years and fill in missing values for countries with missing data for specific years with the country's average rate.

In [None]:

df.dropna(thresh=2,inplace=True)
df

Unnamed: 0,Country,1990,1995,2000,2005,2010,2015,2020
0,United States,2.7,3.5,6.5,3.7,3.6,3.2,2.9
1,India,0.0,,-0.1,-0.3,-0.4,-0.4,-0.4
2,Russian Federation,1.2,3.4,3.2,2.5,3.2,2.5,1.3
3,Germany,4.3,6.6,1.8,,0.1,4.8,6.6
4,Algeria,-0.8,-0.9,-1.1,-1.3,,-0.8,-0.2
...,...,...,...,...,...,...,...,...
186,Mali,-12.1,-3.8,-2.8,-1.1,-1.4,-3.7,-2.1
187,Costa Rica,1.2,4.2,4.8,2.0,1.4,0.8,0.8
188,Norway,1.8,2.2,2.7,3.0,6.9,8.8,5.3
189,Tajikistan,-1.3,,-7.9,-4.5,-4.1,-3.4,-2.2


In [None]:

df = df.apply(lambda row: row.fillna(row[1:].mean()), axis=1)
df

Unnamed: 0,Country,1990,1995,2000,2005,2010,2015,2020
0,United States,2.7,3.500000,6.5,3.700000,3.60,3.2,2.9
1,India,0.0,-0.266667,-0.1,-0.300000,-0.40,-0.4,-0.4
2,Russian Federation,1.2,3.400000,3.2,2.500000,3.20,2.5,1.3
3,Germany,4.3,6.600000,1.8,4.033333,0.10,4.8,6.6
4,Algeria,-0.8,-0.900000,-1.1,-1.300000,-0.85,-0.8,-0.2
...,...,...,...,...,...,...,...,...
186,Mali,-12.1,-3.800000,-2.8,-1.100000,-1.40,-3.7,-2.1
187,Costa Rica,1.2,4.200000,4.8,2.000000,1.40,0.8,0.8
188,Norway,1.8,2.200000,2.7,3.000000,6.90,8.8,5.3
189,Tajikistan,-1.3,-3.900000,-7.9,-4.500000,-4.10,-3.4,-2.2


### find the top 3 countries
In 2020, find the names of the top three countries with the highest migration acceptance rates. Store your answer in the list top_countries

In [None]:

df_top_countries=df.copy()
df_top_countries=df_top_countries.sort_values(by=['2020'],ascending=False)
df_top_countries

df_top_countries['Country'][0:3]
top_countries =df_top_countries['Country'][0:3]
top_countries=top_countries.reset_index(drop=True)
top_countries.name = None
top_countries.index.name = None
top_countries=list(top_countries[:])
print(top_countries)

['Bahrain', 'Maldives', 'Oman']


### Calculate the Average Migration Rate for Iran
Calculate the average migration rate for Iran over the 30-year period and store the result in the variable iran_mean

In [None]:

iran_mean =df[df['Country']=='Iran (Islamic Republic of)'].apply(lambda row: row[1:].mean(), axis=1)

print(iran_mean)

39   -0.228571
dtype: float64


###  Find the Country with the Highest Growth Over 30 Years

In [None]:

df_highest_growth=df.copy()
df_highest_growth['growth']=df_highest_growth.apply(lambda row: row[7]-row[1], axis=1)
df_highest_growth=df_highest_growth.sort_values(['growth'],ascending=False)
df_highest_growth=df_highest_growth.reset_index()
highest_growth=df_highest_growth['Country'][0]
print(highest_growth)
df_highest_growth

Liberia


Unnamed: 0,index,Country,1990,1995,2000,2005,2010,2015,2020,growth
0,132,Liberia,-34.1,-28.2,37.8,-3.30,9.9,1.200000,-1.0,33.1
1,47,Grenada,-31.0,-8.6,-7.8,-6.80,-6.7,-1.900000,-1.8,29.2
2,73,Bahrain,5.6,24.5,12.8,40.00,51.1,6.400000,31.1,25.5
3,107,Maldives,-2.5,-2.6,-0.8,11.60,10.5,28.400000,22.8,25.3
4,183,Afghanistan,-25.1,40.3,-8.9,6.40,-7.6,3.300000,-1.7,23.4
...,...,...,...,...,...,...,...,...,...,...
180,142,Lithuania,2.0,-5.5,-5.3,-5.80,-9.3,-5.916667,-11.6,-13.6
181,172,Malawi,21.0,-17.9,-1.1,-1.00,-0.9,-1.000000,-0.9,-21.9
182,176,Venezuela (Bolivarian Republic of),0.0,-0.0,-0.0,-4.45,-1.5,-2.900000,-22.3,-22.3
183,62,United Arab Emirates,32.1,34.7,35.9,61.60,109.4,6.100000,4.2,-27.9
