<font size="+4" color=purple><u><center>Migration Analysis (5 decades 1960  - 2000)</center></u></font>

<a id="top"></a>

<div class="list-group" id="list-tab" role="tablist">
<h3 class="list-group-item list-group-item-action active" data-toggle="list"  role="tab" aria-controls="home">Table of content</h3>

* [Introduction](#intro)
* [Data](#data)
* [1. Cleaning ](#1)
* [2. Rearranging Dataframe ](#2)
* [3. Countries and Continents](#3)
* [4. Gender](#4)
* [5. Violin Graphs](#5)
* [6. Tree Maps](#6)
* [7. Sunburst Charts](#7)
* [8. Strip Charts](#8)
* [9. Density Contour Maps](#9)
* [10. Bar and Scatter maps](#10)


<a id="intro"></a>
<font size="+2" color="blue"><b>Introduction and Imports</b></font><br>

**Quote**
> **Migration is an expression of the human aspiration for dignity, safety and a better future. It is part of the social fabric, part of our very make-up as a human family** - 
*Ban Ki-moon*

In [None]:
!pip install pycountry_convert

In [None]:
# This Python 3 environment comes with many helpful analytics libraries installed
# It is defined by the kaggle/python Docker image: https://github.com/kaggle/docker-python
# For example, here's several helpful packages to load

import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)

# Input data files are available in the read-only "../input/" directory
# For example, running this (by clicking run or pressing Shift+Enter) will list all files under the input directory

import os
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))
        
import plotly.express as px
import matplotlib.pyplot as plt

import plotly.graph_objects as go
from plotly.subplots import make_subplots

import pycountry_convert as pc

# You can write up to 5GB to the current directory (/kaggle/working/) that gets preserved as output when you create a version using "Save & Run All" 
# You can also write temporary files to /kaggle/temp/, but they won't be saved outside of the current session

<a id="intro"></a>
<font size="+2" color="blue"><b>Data</b></font><br>

Data is migration numbers of men and women across 5 decades between 1960 - 2000. Only limited countries data has been provided. Some continents like South America and Africa has not been covered.

In [None]:
migration = pd.read_csv("/kaggle/input/indian-migration-history/IndianMigrationHistory.csv")
print(migration.info())
migration.head()

<a id="1"></a>
<font size="+2" color="blue"><b>Cleaning</b></font><br>

In [None]:
# Renaming column names
migration = migration.rename(columns={'Country Origin Name':'Origin','Migration by Gender Name':'Gender','2000 [2000]':'2000','1990 [1990]':'1990','1980 [1980]':'1980','1970 [1970]':'1970','1960 [1960]':'1960','Country Dest Code':'DestCode','Migration by Gender Code':'GenCode','Country Origin Code':'OriCode','Country Dest Name':'Country'})

# Dropping some columns
migration = migration.drop(columns=['GenCode','DestCode','OriCode'],axis=1)

# Adding some columns
migration['Year'] = '0'
migration['Population'] = 0

print(migration.columns)

In [None]:
# Remove the total column
mig = migration[migration['Gender'] != 'Total']
mig.shape

<a id="2"></a>
<font size="+2" color="blue"><b>Rearranging Data Frame - Population by Year</b></font><br>

In [None]:
# We have years as columns, we want to make them as values in the column year. 
# We are also adding migration numbers per year/gender to the 'pop' column

def populate_years(df):
    for ind in (df.index):
        if ((ind >= 0) and (ind < 26)):
            yr = '1960'
            df.at[ind, 'Year'] = yr
            df.at[ind, 'Population'] = df.at[ind, yr]            
        elif ((ind >= 26) and (ind < 52)):
            yr = '1970'
            df.at[ind, 'Year'] = yr
            df.at[ind, 'Population'] = df.at[ind, yr]             
        elif ((ind >= 52)and (ind < 78)):
            yr = '1980'
            df.at[ind, 'Year'] = yr
            df.at[ind, 'Population'] = df.at[ind, yr]           
        elif ((ind >= 78) and (ind < 104)):
            yr = '1990'
            df.at[ind, 'Year'] = yr
            df.at[ind, 'Population']= df.at[ind, yr]            
        else :
            yr = '2000'
            df.at[ind, 'Year'] = yr
            df.at[ind, 'Population'] = df.at[ind, yr]             
        
    return df

# We want to make 5 copies of data, because we have 5 decades to cover.

mig_year = pd.concat([mig, mig], ignore_index=True)
mig_year = pd.concat([mig, mig_year], ignore_index=True)
mig_year = pd.concat([mig, mig_year], ignore_index=True)
mig_year = pd.concat([mig, mig_year], ignore_index=True)
print(mig_year.shape)

populate_years(mig_year)

<a id="3"></a>
<font size="+2" color="blue"><b>Countries</b></font><br>

In [None]:
print("Countries selected for migration are: ", mig_year['Country'].unique())
print("Number of countries covered are: ",mig_year['Country'].nunique())


In [None]:
# Dict for Continent code and Continent mapping
continent_dict = dict({"AS":"Asia","AF":"Africa", "OC":"Oceania","NA":"North America","SA":"South America","EU":"Europe"})

# Given a country name, fetches the country code

def find_country_code(country):
    return pc.country_name_to_country_alpha2(country, cn_name_format="default")

# Given a country code, fetches the continent code

def find_continent_code(country):
    return pc.country_alpha2_to_continent_code(country)   

# Given a continent code, fetches the continent name

def find_continent(continent):
    return continent_dict[continent]  
    
mig_year['Country_code'] = mig_year['Country'].map(find_country_code)
mig_year['Continent_code'] = mig_year['Country_code'].map(find_continent_code)
mig_year['Continent'] = mig_year['Continent_code'].map(find_continent)

mig_year.tail(10)


<font size="+1" color="purple"><b>Pie Chart of Countries migrated to</b></font><br>

In [None]:
fig = px.pie(mig_year, values='Population', names='Country', title='Migration')
fig.show()

**Inference: ** 
* Majority of Exodus happened to UK (35.2%), with USA (34.6%) closely following at heels.
* Finland was least preferred with just nearly 2k migrants.**


<font size="+1" color="purple"><b>Area Chart of Continents migrated to</b></font><br>

In [None]:
fig = px.area(mig_year, x="Year", y="Population", color="Continent", line_group="Country")
fig.show()

**Inference: ** 
* Singapore in Asia has been preferred destination.
* UK has been preferred destnation in Europe.
* Australia has been preferred destination in Oceania.


<font size="+1" color="purple"><b>Box Chart of Countries migrated to (year-wise)</b></font><br>

In [None]:
fig = px.box(mig_year, x="Country", y="Population", color="Year", notched=True)
fig.show()

**Inference: ** 
* Notice that migration to countries like Denmark, Finland, Netherlands, Sweden, Switzerland has been low.
* Migrations to Australia has been more in years 1960, 2000 than in other decades.

<font size="+1" color="purple"><b>Bubble Scatter plot of Countries migrated to (year-wise)</b></font><br>

In [None]:
fig = px.scatter(mig_year, x="Population", y="Year", size="Population", color="Continent", hover_name="Country", log_x=True, size_max=100)
fig.show()

**Inference: ** 
* Notice that migration to conntinents like North America and Europe has been increasing over the decades.

<a id="4"></a>
<font size="+2" color="blue"><b>Gender</b></font><br>

<font size="+1" color="purple"><b>Bar chart of Countinents migrated to (gender-wise)</b></font><br>

In [None]:
fig = px.bar(mig_year, x="Gender", y="Population", color="Continent", barmode="group")
fig.show()

**Inference: ** 
* Male migrants are more than female migrants heading to North America, Europe, Asia.
* Migration to Oceania is more or less than same between Male-female migrants.

<font size="+1" color="purple"><b>Histogram of Population migrated (gender-wise)</b></font><br>

In [None]:
fig = px.histogram(mig_year, x="Year", y="Population", log_y=True,color="Gender", marginal="rug", hover_data=mig_year.columns)
fig.show()

**Inference: ** 
* Notice that migration (both male/female migrants) to has been increasing over the decades.

<font size="+1" color="purple"><b>Cumulative Histogram of Population migrated </b></font><br>

In [None]:
fig = go.Figure(data=[go.Histogram(x=mig_year['Population'], cumulative_enabled=True)])
fig.show()

*Inference: *
* Notice that migration (both male/female migrants) to has been increasing over the decades.

<a id="5"></a>
<font size="+2" color="blue"><b>Violin charts</b></font><br>

In [None]:
fig = px.violin(mig_year, y="Population", x="Continent", color="Gender", box=True, points="all", hover_data=mig_year.columns)
fig.show()

**Inference: ** 
* Migration to Oceania - Male = 44.3K, Female = 49.7K.
* Migration to Europe - Male = 259K, Female = 266K.
* Migration to Americas - Male = 561K, Female = 480K.
* Migration to Asia - Male = 57.2K, Female = 47.9K.

<a id="6"></a>
<font size="+2" color="blue"><b>TreeMap charts</b></font><br>

In [None]:
fig = px.treemap(mig_year, path=[px.Constant('world'), 'Continent', 'Country','Gender'], values='Population', color='Population', hover_data=['Continent'])
fig.show()

**Inference**
* Singapore, Germany have more male migrants than female migrants.
 

<a id="7"></a>
<font size="+2" color="blue"><b>SunBurst charts</b></font><br>

In [None]:
fig = px.sunburst(mig_year, path=['Continent', 'Country'], values='Population', color='Population', hover_data=['Continent'])
fig.show()

**Inference**
* Notice that UK has more migrants than USA, but Americas continent has more migrants than Europe.
* Notice that Australia has more migrants than New Zealand in Oceania. 

<a id="8"></a>
<font size="+2" color="blue"><b>Strip charts</b></font><br>

In [None]:
fig = px.strip(mig_year, x="Population", y="Country", orientation="h", color="Gender")
fig.show()

**Inference**
* Notice that France, Germany are next popular destinations after UK in Europe.
 

<a id="9"></a>
<font size="+2" color="blue"><b>Density Contour Maps</b></font><br>

In [None]:
<font size="+1" color="purple"><b>Density contour plot, is a 2-dimensional generalization of a histogram which resembles a contour plot but is computed by grouping a set of points specified by their x and y coordinates into bins, and applying an aggregation function such as count or sum (if z is provided) to compute the value to be used to compute contours.  </b></font><br>

In [None]:
fig = px.density_contour(mig_year, x="Year", y="Population", color="Country", marginal_x="rug", marginal_y="histogram")
fig.show()

**Inference**
* Notice that many countries in Europe has less migrants than UK in Europe.

<a id="10"></a>
<font size="+2" color="blue"><b>Bar and Scatter maps</b></font><br>

In [None]:
female = migration[migration['Gender'] == 'Female']
female['Total'] = female['1960'] + female['1970'] + female['1980'] + female['1990'] +female['2000']
y_2000 = female['2000']
y_total = female['Total']
x = female['Country']
    
# Creating two subplots
fig = make_subplots(rows=1, cols=2, specs=[[{}, {}]], shared_xaxes=True,
                    shared_yaxes=False, vertical_spacing=0.001)

fig.append_trace(go.Bar(
    x=y_2000,
    y=x,
    marker=dict(
        color='rgba(250, 190, 160, 0.6)', line=dict(color='rgba(150, 90, 16, 1.0)', width=1),
    ),
    name='Migration of females in year 2000',orientation='h',), 1, 1)

fig.append_trace(go.Scatter(
    x=y_total, y=x,
    mode='lines+markers', line_color='rgb(80, 140, 80)', name='Migration of females between years 1960 - 2000',
), 1, 2)

fig.update_layout(
    title='Migration of females in year 2000 and from years 1960 - 2000',
    yaxis=dict(showgrid=False, showline=False, showticklabels=True, domain=[0, 0.85],),
    yaxis2=dict(showgrid=False, showline=False, showticklabels=True, linecolor='rgba(200, 252, 20, 0.8)',linewidth=2,domain=[0, 0.85],    ),
    xaxis=dict(zeroline=False, showline=False, showticklabels=True, showgrid=False, domain=[0, 0.42], ),
    xaxis2=dict(zeroline=False, showline=False,     showticklabels=True,
        showgrid=False,        domain=[0.47, 1],   side='top',       dtick=25000,     ), legend=dict(x=0.029, y=1.038, font_size=10),
    margin=dict(l=100, r=20, t=70, b=70),
    paper_bgcolor='rgb(228, 200, 208)',    plot_bgcolor='rgb(200, 80, 10)',)

annotations = []

# Adding labels
for ydn, yd, xd in zip(y_total, y_2000, x):
    # labeling the scatter 
    annotations.append(dict(xref='x2', yref='y2',
                            y=xd, x=ydn, text='{:,}'.format(ydn), font=dict(family='Arial', size=17, color='rgb(13, 5, 4)'), showarrow=True))
    
    # labeling the bar 
    annotations.append(dict(xref='x1', yref='y1',
                            y=xd, x=yd, text=str(yd) , font=dict(family='Arial', size=17,color='rgb(13, 5, 4)'), showarrow=False))
# Source
annotations.append(dict(xref='paper', yref='paper',
                        x=0.2, y=-0.109, text='World Bank ' + 'Migration ' + 'across Continents ',
                        font=dict(family='Arial', size=12, color='rgb(150,150,150)'), showarrow=False))

fig.update_layout(annotations=annotations)
fig.show()

**Conclusion**
* Female migrants preferred US, UK, Canada, France, Singapore, Australia, Germany, New Zealand, Sweden, Netherlands

In [None]:
male = migration[migration['Gender'] == 'Male']
male['Total'] = male['1960'] + male['1970'] + male['1980'] + male['1990'] +male['2000']
y_2000 = male['2000']
y_total = male['Total']
x = male['Country']
    
# Creating two subplots
fig = make_subplots(rows=1, cols=2, specs=[[{}, {}]], shared_xaxes=True, shared_yaxes=False, vertical_spacing=0.001)

fig.append_trace(go.Bar(
    x=y_2000, y=x,
    marker=dict(
        color='rgba(250, 190, 160, 0.6)', line=dict(color='rgba(150, 90, 16, 1.0)', width=1),
    ),
    name='Migration of Males in year 2000',orientation='h',), 1, 1)

fig.append_trace(go.Scatter(
    x=y_total, y=x,
    mode='lines+markers', line_color='rgb(80, 140, 80)', name='Migration of Males between years 1960 - 2000',
), 1, 2)

fig.update_layout(
    title='Migration of Males in year 2000 and from years 1960 - 2000',
    yaxis=dict(showgrid=False, showline=False, showticklabels=True, domain=[0, 0.85],),
    yaxis2=dict(showgrid=False, showline=False, showticklabels=True, linecolor='rgba(200, 252, 20, 0.8)',linewidth=2,domain=[0, 0.85],    ),
    xaxis=dict(zeroline=False, showline=False, showticklabels=True, showgrid=False, domain=[0, 0.42], ),
    xaxis2=dict(zeroline=False, showline=False,     showticklabels=True,
        showgrid=False,        domain=[0.47, 1],   side='top',       dtick=25000,     ), legend=dict(x=0.029, y=1.038, font_size=10),
    margin=dict(l=100, r=20, t=70, b=70),
    paper_bgcolor='rgb(228, 200, 208)',    plot_bgcolor='rgb(200, 80, 10)',)

annotations = []

# Adding labels
for ydn, yd, xd in zip(y_total, y_2000, x):
    # labeling the scatter 
    annotations.append(dict(xref='x2', yref='y2',
                            y=xd, x=ydn, text='{:,}'.format(ydn), font=dict(family='Arial', size=17, color='rgb(13, 5, 4)'), showarrow=True))
    
    # labeling the bar 
    annotations.append(dict(xref='x1', yref='y1',
                            y=xd, x=yd, text=str(yd) , font=dict(family='Arial', size=17,color='rgb(13, 5, 4)'), showarrow=False))
# Source
annotations.append(dict(xref='paper', yref='paper',
                        x=0.2, y=-0.109, text='World Bank ' + 'Migration ' + 'across Continents ',
                        font=dict(family='Arial', size=12, color='rgb(150,150,150)'), showarrow=False))

fig.update_layout(annotations=annotations)
fig.show()

**Conclusion**
* Male migrants preferred US, UK, Canada, France, Singapore, Australia, Germany, New Zealand, Netherlands, Sweden

Friends, Please upvote if you like. Thank you !!!