## Welcome

This notebook covers some EDA and analysis on migration of Indians to foreign countries.

In [None]:
# This Python 3 environment comes with many helpful analytics libraries installed
# It is defined by the kaggle/python Docker image: https://github.com/kaggle/docker-python
# For example, here's several helpful packages to load
# Input data files are available in the read-only "../input/" directory
# For example, running this (by clicking run or pressing Shift+Enter) will list all files under the input directory

import os
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))

# You can write up to 5GB to the current directory (/kaggle/working/) that gets preserved as output when you create a version using "Save & Run All" 
# You can also write temporary files to /kaggle/temp/, but they won't be saved outside of the current session

# Packages Used

In [None]:
#For reading data
import pandas as pd

#For visualizations
import plotly.express as px
import plotly.graph_objects as go

#To ignore warnings
import warnings
warnings.filterwarnings("ignore")

# DataFrame Analysis

In [None]:
df = pd.read_csv('../input/indian-migration-history/IndianMigrationHistory.csv')
df.head()

A better way to show the shape of the dataframe :)

In [None]:
rows, cols = df.shape
print("Rows:", rows, '\nColumns:', cols)

# Analysis & EDA 

This dataset deals with Indians migrating to other countries so we can drop the following columns:

* Country Origin Name

* Country Origin Code

* Migration by Gender Code

* Country Dest Code

In [None]:
df.drop(labels=['Country Origin Name', 'Country Origin Code', 'Migration by Gender Code', 'Country Dest Code'], axis=1, inplace=True)
df.head()

Let us see the various countries that are available

In [None]:
df['Country Dest Name'].value_counts()

Interesting. Let's get a visual of these countries over the years available to us

In [None]:
df['Migration by Gender Name'].value_counts()

There's an extra value called Total which takes both Male and Female. We need to look at only two genders here

In [None]:
tot = df[df['Migration by Gender Name'] == 'Total']
not_tot = df[df['Migration by Gender Name'] != 'Total']

In [None]:
fig = px.sunburst(not_tot, path=['Migration by Gender Name', 'Country Dest Name'], values='1960 [1960]', title='Migration to countries based on Gender(1960)')
fig.show()

Click each gender to show the countries they have most migrated to 

## With the first year(1960) the Top 3 countries were:

**Male**

* United Kingdom

* Singapore

* France

**Female**

* United Kingdom

* France

* Singapore

which is the same for both of them the minor change being in the values



Let's have a look at the task at hand:

**Find the top 3 countries which attracted Indians the most from 1980 to 2000**

# Country wise migration rate over the years

We are creating 2 columns in the DataFrame:

* Total Migration[1960-2000]

* Final Migration[1980-2000]

In [None]:
req_cols = ['1960 [1960]', '1970 [1970]', '1980 [1980]', '1990 [1990]', '2000 [2000]']
t_cols = ['1980 [1980]', '1990 [1990]', '2000 [2000]']
not_tot['Total Migration[1960-2000]'] = not_tot[req_cols].sum(axis=1)
not_tot['Final Migration[1980-2000]'] = not_tot[t_cols].sum(axis=1)

## Total Migration in every country

This is done using the first column that we created 

In [None]:
mig_df = not_tot.sort_values("Total Migration[1960-2000]", ascending=False)
m = mig_df[mig_df['Migration by Gender Name'] == 'Male']
f = mig_df[mig_df['Migration by Gender Name'] == 'Female']
cols_used = m['Country Dest Name'].tolist()

In [None]:
fig = go.Figure(data=[
    go.Bar(name='Male', x=cols_used, y=m['Total Migration[1960-2000]'].tolist()),
    go.Bar(name='Female', x=cols_used, y=m['Total Migration[1960-2000]'].tolist())
])
fig.update_layout(title="Total Migration from 1960-2000")
fig.update_yaxes(type='log') #Makes the country viewing a little easy :)
fig.show()

## Total Migration Top 3 Overall

* United States

* United Kingdom

* Canada

# Final Migration for Task

This now deals with the second column that we created

In [None]:
fin_df = not_tot.sort_values('Final Migration[1980-2000]', ascending=False)

In [None]:
fig = px.bar(fin_df, x="Country Dest Name", y="Final Migration[1980-2000]", color="Migration by Gender Name", title="Total Migration from 1980-2000")
fig.update_yaxes(type='log')
fig.show()

## Top 3 Countries that Indians were attracted to:

* United States (Was there any surprise?)

* United Kingdom

* Canada

So the first 2 aren't really any surprise and Canada has been a place where Indians have been attracted to lately.

## That is all

Thank you for reading this notebook. This dataset can be found [here](https://www.kaggle.com/rajacsp/indian-migration-history) if you need to do analysis on it. Upvote if you've found the notebook to be useful and also check out my other notebooks [here](https://www.kaggle.com/charlessamuel/notebooks) 