<center> <img src="https://i1.wp.com/www.middleeastmonitor.com/wp-content/uploads/2018/01/2016_8-30-Mediterranean-sea-migrantsCrHJ_O_WEAAFoSO.jpg?resize=1200%2C800&quality=75&strip=all&ssl=1"  height="600px" width="900px"> </center>

# Background of the dataset <br> 

> Missing Migrants Project tracks deaths of migrants, including refugees and asylum-seekers, who have gone missing along mixed migration routes worldwide. The research behind this project began with the October 2013 tragedies, when at least 368 individuals died in two shipwrecks near the Italian island of Lampedusa. Since then, Missing Migrants Project has developed into an important hub and advocacy source of information that media, researchers, and the general public access for the latest information. 
With a count surpassing 60,000 over the last two decades, IOM calls on all the world’s governments to address what it describes as “an epidemic of crime and abuse."
Missing Migrants Project is made possible by funding by UK Aid from the Government of the United Kingdom; however, the views expressed do not necessarily reflect the Government of the United Kingdom’s official policies. [(source)](https://missingmigrants.iom.int/about)
 
 <br>

Like the content of the dataset mentions, this dataset represents minimum estimates, as many deaths during migrations are unrecorded. Thanks to **@Stefano Nocco** for sharing this dataset and raising awareness of the issues affecting refugees, asylum seekers and stateless persons.[His kernel](https://www.kaggle.com/snocco/dead-on-mediterranean-routes) has good EDA, so check it out as well. 


In [None]:
import numpy as np 
import pandas as pd 
import matplotlib.pyplot as plt 
%matplotlib inline 
import seaborn as sns 
import warnings
warnings.filterwarnings('ignore') 
from scipy import stats, linalg
from matplotlib import rcParams
import scipy.stats as st
import folium 
from folium import plugins
from folium.plugins import HeatMap
import plotly.plotly as py
from plotly.offline import init_notebook_mode, iplot
init_notebook_mode(connected=True)
import plotly.graph_objs as go
import plotly.figure_factory as ff
import plotly.tools as tls
import datetime
import re

pd.set_option('display.max_columns', None)
sns.set_style('whitegrid')
warnings.filterwarnings('ignore') 

In [None]:
raw_data = pd.read_csv('../input/MissingMigrants-Global-2019-03-29T18-36-07.csv')

In [None]:
raw_data.head(5)


# First, let's take a look at the map <br>

In [None]:
print(raw_data[raw_data['Location Coordinates'].isin([np.nan, np.inf, -np.inf])]) #checking None, Inf, and -Inf values
df=raw_data.dropna(subset=['Location Coordinates']) #drop it

In [None]:
# formatting (split by comma and map it as a column of df)

df['lat']=df['Location Coordinates'].astype('str').apply(lambda x: x.split(',')).map(lambda x: x[0]).astype('float64') 
df['long']=df['Location Coordinates'].astype('str').apply(lambda x: x.split(',')).map(lambda x: x[-1]).astype('float64')

In [None]:
incident_map = folium.Map(location = [df['lat'].mean(), df['long'].mean()], zoom_start = 2)
lat_long_data = df[['lat', 'long']].values.tolist()
cluster_map = folium.plugins.FastMarkerCluster(lat_long_data).add_to(incident_map)

With this map, we can check where all the reported incidents in our dataset happened. The majority of reported incidents in the dataset are from Mediterranean,US-Mexico Border,and North Africa regions. 

In [None]:
incident_map 

In [None]:
#we can confirm by this pie chart that the majority of reported incidents in the dataset are from Mediterranean,US-Mexico Border,and North Africa regions. 

labels = list(raw_data['Region of Incident'].value_counts().index.values)
values = list(raw_data['Region of Incident'].value_counts().values)

trace = go.Pie(labels=labels, values=values)

iplot([trace], filename='basic_pie_chart')

Mediterranean region has the highest total dead and missing number. We can especially see the noticiable color difference in the northwest side of Libya **(Tripoli).**
I searched on google about this city, and I could find an abundant amount of news articles about attacks and refugee missing incidents. <br><br>

**Here are some sources to take a look:** <br>

* https://www.aljazeera.com/news/2019/07/1000-killed-battle-libya-tripoli-190708191029535.html
* https://www.bbc.com/news/world-africa-48849595



In [None]:
# heatmap to see the map by total dead and missing 

base_map =  folium.Map(location = [df['lat'].mean(), df['long'].mean()], zoom_start = 3)
HeatMap(data=df[['lat', 'long', 'Total Dead and Missing']].groupby(['lat', 'long']).sum().reset_index().values.tolist(), radius=8, max_zoom=13).add_to(base_map)

In [None]:
base_map

# Timeline <br>

Next, it will be good to check the frequency of reported incidents by year. Most of the reported incidents in the dataset is from 2016-2018. 

In [None]:
raw_data['datetime']=pd.to_datetime(raw_data['Reported Month'].str.cat(raw_data['Reported Year'].astype('str'), sep=' '),format="%b %Y")

d1 = raw_data['datetime'].value_counts().sort_index()
data = [go.Scatter(x=d1.index, y=d1.values, name='total incident')]
layout = go.Layout(dict(title = "Counts of Incident by Reported Date",
                  xaxis = dict(title = 'month year'),
                  yaxis = dict(title = 'Incident Count'),
                  ),legend=dict(
                orientation="v"))
iplot(dict(data=data, layout=layout))

**There are a couple of things to notice**: 

* Most of reported incidents is from Mediterranean, North Africa, and US-Mexico Border regions
* Starting from April 2017, the number of reported incidents from the Sub-Saharan region dramastically increased 
* The highest number of incident by month/year is from North Africa (Just in one month, Feb 2016, there are 62 reported incidents in North Africa <br>

**Based on my research, In Feb 2016**:

* Abnormally dry weather across North Africa [(source)](https://www.reuters.com/article/ozatp-uk-africa-drought-morocco-idAFKCN0VD0Z1)
* A 200-kilometer (125-mile) trench was completed in February 2016, which runs along the Tunisian–Libyan border [(source)](https://carnegieendowment.org/sada/77053)

In [None]:
time=pd.crosstab(raw_data['Region of Incident'],raw_data['datetime']).columns
values=pd.crosstab(raw_data['Region of Incident'],raw_data['datetime']).values

In [None]:

data = [go.Scatter(x=time, y=values[1], name='Central America'), go.Scatter(x=time, y=values[5], name='Horn of Africa'),
       go.Scatter(x=time, y=values[6], name='Mediterranean'),go.Scatter(x=time, y=values[8], name='North Africa'),
     go.Scatter(x=time, y=values[13], name='Sub-Saharan Africa'),go.Scatter(x=time, y=values[14], name='US-Mexico Border')]
layout = go.Layout(dict(title = "Counts of Incident by Reported Date by Region (Top 6)",
                  xaxis = dict(title = 'month year'),
                  yaxis = dict(title = 'Incident Count'),
                  ),legend=dict(
                orientation="v"))
iplot(dict(data=data, layout=layout))

# Migration flow chart <br>

From the migration flow chart, we can see places that people from a certain region tended to migrate, but they had unfortunate incidents while going their destiniation. <br>


In [None]:
migration_flow=pd.crosstab(raw_data['Region of Incident'],raw_data['Migration Route']) 

In [None]:

data = dict(
    type='sankey',
    node = dict(
      pad = 15,
      thickness = 20,
      line = dict(
        color = "black",
        width = 0.5
      ),
      label = list(migration_flow.index) + list(migration_flow.columns),
      color = ["blue", "blue", "blue", "blue", "blue", "blue","blue","blue","blue"]
    ),
    link = dict(
      source = [0,0,0,0,1,1,2,2,2,2,3,4,4,4,5,6,6,7,8,9  ],
      target = [11,15,17,21,12,14,10,19,23,18,13,16,24,22,11,14,20,22,12 ],
      value = [2,2,1,1,248,1,51,9,64,15,499,230,255,14,1,6,1,1,1259  ]
  ))

layout =  dict(
    title = "Migration Route",
    font = dict(
      size = 12.5
    )
    
)

fig = dict(data=[data], layout=layout)
iplot(fig, validate=False)


# Reasons of incidents <br>

Here are 19 major causes of death from our dataset. 

In [None]:
reason_df=pd.DataFrame()
reason_df['reason']=list(raw_data['Cause of Death'].value_counts().index.values)[:19]
reason_df['num']=list(raw_data['Cause of Death'].value_counts().values)[:19]

plt.figure(figsize=(15,10))
sns.barplot(x="num", y="reason", label='small', data=reason_df.sort_values(by="num",ascending=False))
plt.title('Cause of Death', fontsize=20)
plt.tight_layout()
plt.tick_params(labelsize=20)
plt.show()


# Cause of death by region <br>

By the interactive plot of cause of death by region, we can analyze reasons of incidents in depth. 
<br>
 
* the majority of people from Mediterranean region were drowned
* the majority of people from North Africa region were dead due to sickness and lack of access to medicines 
* the majority of people from US-Mexico region were missing (unknown) 

In [None]:
reason_region=pd.crosstab(raw_data['Region of Incident'],raw_data['Cause of Death']).loc[raw_data['Region of Incident'].value_counts().index.values[:6]][reason_df['reason'].values]
reason_region=reason_region.iloc[:,:6] #top 6 reason

In [None]:

trace1 = go.Bar(
    x=list(reason_df['reason'].values[:6]),
    y=reason_region.iloc[0].values,
    name=reason_region.iloc[0].name
)

trace2 = go.Bar(
    x=list(reason_df['reason'].values[:6]),
    y=reason_region.iloc[1].values,
    name=reason_region.iloc[1].name
)

trace3 = go.Bar(
    x=list(reason_df['reason'].values[:6]),
    y=reason_region.iloc[2].values,
    name=reason_region.iloc[2].name
)

trace4 = go.Bar(
    x=list(reason_df['reason'].values[:6]),
    y=reason_region.iloc[3].values,
    name=reason_region.iloc[3].name

)

trace5 = go.Bar(
    x=list(reason_df['reason'].values[:6]),
    y=reason_region.iloc[4].values,
    name=reason_region.iloc[4].name

)

trace5 = go.Bar(
    x=list(reason_df['reason'].values[:6]),
    y=reason_region.iloc[5].values,
    name=reason_region.iloc[5].name

)


data = [trace1, trace2,trace3, trace4, trace5]

layout = go.Layout(
    barmode='stack'
)


fig = go.Figure(data=data, layout=layout)
iplot(fig, filename='stack-bar')

# End

That's all for now. I will add up more analysis and possibly create a NLP project with Refugee news artcles. 

## <center> Thank you! </center>

