#  Human Trafficking
-------------------------------------------------------------------------------------------------
## Abstract

Human trafficking by definition is the trading of human beings for sexual exploitation, sexual slavery, or forced labor giving rise to a global epidemic. Victims are powerless to escape and are held in captivity. The trafficking of persons is believed to be the third-largest organized crime worldwide, encompassing many demographics. Internationally, "Modern-Day Slavery," or the smuggling of individuals, is an underground business thought to generate billions of dollars annually. In 2000, the Trafficking Victims Protection Act provided a framework for combating human trafficking on a domestic and global front. Awareness of this atrocity was brought to the forefront of discussion on a universal stage of law enforcement personnel, healthcare providers, human rights organizations, and politicians.


## Introduction

Human trafficking is defined as "the recruitment, harboring, transportation, provision, or obtaining of a person for labor or services, through the use of force, fraud, or coercion for the purpose of subjection to involuntary servitude, peonage, debt bondage, or slavery".

The most challenging aspect of addressing human trafficking is identifying the victims, which is problematic for several reasons. Victims may not perceive themselves as trafficked individuals. They may fear retaliation by the trafficker; lack knowledge of resources to help themselves; and face many cultural, social, and language barriers.

The Human Trafficing Center (HTC) is a nonprofit research and advocacy organization committed to using academic rigor and transparency, sound methodology, and reliable data to understand forced labor and human trafficking. It aim is to provide research that improves inter-organizational cooperation and accountability, influences anti-trafficking policy, and raises public awareness about the problem. 


## Objective 
The main objective of this Project are

> To create an Interactive data visualization Dashboard to show the inflow and outflow of different type of human trafficking from source to the destination over the countries. 

> Graph/ Plot to show the overall statistic of different type of trafficking from the Year 2008 to 2016.

> Try to cluster the data using Kmean. 

Please Refer to the Notebook "Data_Cleaning.ipynb" for Dataset Description and Basic Cleaning. 
This notebook is only having the final coding.

So we have used extensively GeoViews, Bokeh.

### What is GeoViews?

A Python library that makes it easy to explore and visualize geographical, meteorological, oceanographic datasets.

> GIS extension for Holoviews.

> Declarative API: annotate your data and let it visualize itself.

> Uses Cartopy for geographic projections.

> Allows to create plots from multidimesional dataset (gridded dataset)

> Makes it easy to overlay layers in a visualization

> Leverages many Python libraries: Pandas, GeoPandas, Xarray, Datashader, Matplotlib/Bokeh

There are total 5 main parts in this Notebook. 

<font color = blue > Section 1 </font>: Loading Data and fetching the Longitude and Latitude for Origin and Destination.
<font color = blue > Section 2 </font>: Another tab showing only the Tier Infomation.
<font color = blue > Section 3 </font>: To plot main start to end graph on top og tier ranking. 
<font color = blue > Section 4 </font>: Statistical Plot for Type of trafficing vs Year.
<font color = blue > Section 5 </font>: A Combined statistical summary plot for the number of different type of trafficking for each year.


Lets first Load all the Python Library required for this project.

In [None]:
import numpy as np
import pandas as pd
import geoviews as gv   #explore and visualize geographical datasets
import holoviews as hv #GeoViews is built on the HoloViews library for building flexible visualizations of multidimensional data
#import geoviews.feature as gf #visualize diff types of features(ocean,land,coastline,borders etc)
import cartopy #designed for geospatial data processing in order to produce maps and other geospatial data analyses.
from bokeh.tile_providers import STAMEN_TONER_LABELS#provide varities of tiles
from bokeh.models.widgets import Select
#from bokeh.models import BasicTicker, ColorBar, ColumnDataSource, LinearColorMapper 
from bokeh.transform import transform
from bokeh.models import ColumnDataSource
from bokeh.palettes import Spectral6, Pastel1, Purples
from bokeh.layouts import layout, gridplot
import json
from bokeh.io import show, output_notebook, output_file
from bokeh.models import (
    GeoJSONDataSource,BasicTicker,
    HoverTool,ColorBar,
    CategoricalColorMapper,ColumnDataSource, LinearColorMapper,
    LogColorMapper,
    BoxSelectTool, BoxZoomTool, LassoSelectTool
)
from bokeh.plotting import figure
from bokeh.palettes import Viridis9 as palette

from bokeh.io import curdoc, show, output_notebook
from bokeh.layouts import Column
from bokeh.models import (Plot, Range1d, GraphRenderer, ColumnDataSource, GlyphRenderer,
                          Circle, MultiLine, StaticLayoutProvider, HoverTool, LinearAxis,
                          DataTable, TableColumn, TapTool, BoxSelectTool, BoxZoomTool,
                          ResetTool, NodesAndLinkedEdges, GeoJSONDataSource, Patches,
                          WheelZoomTool, Arrow, OpenHead, NormalHead, VeeHead)
from bokeh.palettes import Set3_12

hv.notebook_extension('bokeh')

# <font color = green>Section 1: Data Loading and Cleaning </font>
--------------------------------------------------------------------------------------------------

We have data from 2008 to 2016. So lets load all the data. 

In [None]:
import pandas as pd
data1_2008 = pd.read_excel("Directionality Coding 2008-2016.xlsx", sheet_name="2008")
data1_2009 = pd.read_excel("Directionality Coding 2008-2016.xlsx", sheet_name="2009 RECODE")
data1_2010 = pd.read_excel("Directionality Coding 2008-2016.xlsx", sheet_name="2010")
data1_2011 = pd.read_excel("Directionality Coding 2008-2016.xlsx", sheet_name="2011")
data1_2012 = pd.read_excel("Directionality Coding 2008-2016.xlsx", sheet_name="2012")
data1_2013 = pd.read_excel("Directionality Coding 2008-2016.xlsx", sheet_name="2013")
data1_2014 = pd.read_excel("Directionality Coding 2008-2016.xlsx", sheet_name="2014")
data1_2015 = pd.read_excel("Directionality Coding 2008-2016.xlsx", sheet_name="2015")
data1_2016 = pd.read_excel("Directionality Coding 2008-2016.xlsx", sheet_name="2016")

Give the year column a generic name.

In [None]:
data1_2008.rename(columns={2008: 'Years'}, inplace=True)
data1_2009.rename(columns={2009: 'Years'}, inplace=True)
data1_2010.rename(columns={2010: 'Years'}, inplace=True)
data1_2011.rename(columns={2011: 'Years'}, inplace=True)
data1_2012.rename(columns={2012: 'Years'}, inplace=True)
data1_2013.rename(columns={2013: 'Years'}, inplace=True)
data1_2014.rename(columns={2014: 'Years'}, inplace=True)
data1_2015.rename(columns={2015: 'Years'}, inplace=True)
data1_2016.rename(columns={2016: 'Years'}, inplace=True)

There are few records where ** Year value ** is blank like 2010, 2016 etc .Filling those values with respective year

In [None]:
print(data1_2008.isnull().sum())
print(data1_2009.isnull().sum())
print(data1_2010.isnull().sum())
print(data1_2011.isnull().sum())
print(data1_2012.isnull().sum())
print(data1_2013.isnull().sum())
print(data1_2014.isnull().sum())
print(data1_2015.isnull().sum())
print(data1_2016.isnull().sum())

In [None]:
data1_2010.Years.fillna(2010,inplace=True)
data1_2012.Years.fillna(2012,inplace=True)
data1_2013.Years.fillna(2013,inplace=True)
data1_2014.Years.fillna(2014,inplace=True)
data1_2015.Years.fillna(2015,inplace=True)

Concatenating all the year's data into a single frame and renaming the column name.

In [None]:
frame=[data1_2008,data1_2009,data1_2010,data1_2011,data1_2012,data1_2013,data1_2014,data1_2015,data1_2016]
data=pd.concat(frame)

data = data.rename(columns={'Origin Country': 'Origin_Country', 'Destination Country': 'Destination_Country',
                           '(Transit Country)':'Transit_Country','type of trafficking':'Type_Of_Trafficking',
                           'Victim Profile':'Victim_Profile'})

To Plot the directed graph lets first find out the logitude and latitude of each country and save it in new columns 

Getting Logitude and Latitude of Each Countries

Note: To use Geocoder.

> We need to signup in http://www.geonames.org/

> Use the username for connecting to geopy

> Need to enable the user acount

In [None]:
from geopy import geocoders
from geopy.geocoders import Nominatim
gn = geocoders.GeoNames(username='rose0037')
geolocator = Nominatim()

Getting the Longitude and Latitude for each Source and Destination.

### <font color = red>Data Issue: </font>

We encounter some issue with the data, while getting the longituge and lattitude

The source and destination are having combination of country and region , so it was difficult to fetch he longitude and lattitude.

###  <font color = red> Please note this block of code will take a long time to run, so you can skip this block: </font>


In [None]:
for country in data['Origin_Country'].unique():
    print(country)
    if (country == "Sub-Saharan Africa" or country == "African countries"):  ## pick the center one 
        location = geolocator.geocode("Africa")   
    elif (country == 'Great Lakes Region'):
        location = geolocator.geocode("Great Lakes, US") 
    else:
        if (country == "Former Soviet states" or country == "Former Soviet Union" or country == "former Soviet states" or country == 'former Soviet Union'):
            location = geolocator.geocode("Soviet union")
        else:
            if (country != "Neighboring countries"):
                location = geolocator.geocode(country)
    if (location != None):
        data.loc[(data['Origin_Country']==country), 'Source_Latitude'] = location.latitude
        data.loc[(data['Origin_Country']==country), 'Source_Longitude'] = location.longitude

In [None]:
for country in data['Destination_Country'].unique():
    print(country)
    if (country == "Sub-Saharan Africa"):
        location = geolocator.geocode("Africa")
    else:
        if (country == "Former Soviet states" or country == "Former Soviet Union" or country == "former Soviet states" or country == 'former Soviet Union'):
            location = geolocator.geocode("Soviet union")
        else:
            if (country != "Neighboring countries"):
                location = geolocator.geocode(country)
    if (location != None):
        data.loc[(data['Destination_Country']==country), 'Destination_Latitude'] = location.latitude
        data.loc[(data['Destination_Country']==country), 'Destination_Longitude'] = location.longitude

In [None]:
del data['Unnamed: 11']
data.head()

Loading the data back to the csv file.

In [None]:
data.to_csv("Human_Traffic_data_new.csv",sep=',')

<font color = purple> The fetching of longitude and latitutude is time taking.so we update the dataframe with longituge and latitude and save the csv.</font>

# <font color = green>Section 2 : Ploting Tier Information </font>

We have tier Information for Country from 2008 to 2016 from ** TIP Tier Rankings.xlsx ** . Below is a brief explanation of each tier catagory.  

** Tier 1 **: Countries whose governments fully meet the Trafficking Victims Protection Act’s (TVPA) minimum standards.

** Tier 2 **: Countries whose governments do not fully meet the TVPA’s minimum standards, but are making significant efforts to bring themselves into compliance with those standards.

** Tier2 watch list **: Countries whose governments do not fully meet the TVPA’s minimum standards, but are making significant efforts to bring themselves into compliance with those standards.

** Tier 3 **: Countries whose governments do not fully meet the minimum standards and are not making significant efforts to do so.


We added a new property "tier" to the feature attribute in world.json.

The function ** tier_info ** takes the input as Year and returns a new jeojson having additional attribute called ** Tier **. 

In [None]:
#step1 : Load the data for all countries 
tier_df = pd.read_excel("TIP Tier Rankings.xlsx")

# Loading data for list of longitude and latitude for the countries from world.geojson. 
with open(r'world.geojson', 'r') as f:
    geoSource_data = json.load(f)

# Step2:Adding tier field to the property of the feature attribute in world.json.The function "tier_info"
# takes the input as year and returns a new jeojson having addition attribute "Tier". 

def tier_info(year):
    country_t = tier_df[tier_df.Year == year]
    
    with open(r'world.geojson', 'r') as f:
        geoSource_data = json.load(f)

    for i in geoSource_data["features"]:
        country = str(i["properties"]["name"] )
        if country in country_t['Country'].unique():
            tier_info =  country_t[country_t["Country"] == country ]['Tier Ranking']
            i["properties"]["Tier"] = str(tier_info.values[0])
        else: 
            i["properties"]["Tier"] = 'No Tier Infomation'
            
    return geoSource_data

# Step3: Call the above function for all the years and dump the respective file into a new geojson file.

geoSource_data_2008 = tier_info(2008)
with open("geoSource_final_2008.geojson", 'w') as json_file:
        json.dump(geoSource_data_2008, json_file)
        
geoSource_data_2009 = tier_info(2009)
with open("geoSource_final_2009.geojson", 'w') as json_file:
        json.dump(geoSource_data_2009, json_file)
        
geoSource_data_2010 = tier_info(2010)
with open("geoSource_final_2010.geojson", 'w') as json_file:
        json.dump(geoSource_data_2010, json_file)
        
geoSource_data_2011 = tier_info(2011)
with open("geoSource_final_2011.geojson", 'w') as json_file:
        json.dump(geoSource_data_2011, json_file)
        
geoSource_data_2012 = tier_info(2012)
with open("geoSource_final_2012.geojson", 'w') as json_file:
        json.dump(geoSource_data_2012, json_file)
        
geoSource_data_2013 = tier_info(2013)
with open("geoSource_final_2013.geojson", 'w') as json_file:
        json.dump(geoSource_data_2013, json_file)

geoSource_data_2014 = tier_info(2014)
with open("geoSource_final_2014.geojson", 'w') as json_file:
        json.dump(geoSource_data_2014, json_file)  

geoSource_data_2015 = tier_info(2015)
with open("geoSource_final_2015.geojson", 'w') as json_file:
        json.dump(geoSource_data_2015, json_file)
        
geoSource_data_2016 = tier_info(2016)
with open("geoSource_final_2016.geojson", 'w') as json_file:
        json.dump(geoSource_data_2016, json_file)

# Step 4: reading back the new updated json data.

with open(r'geoSource_final_2008.geojson', 'r') as f:
    geoSource_new_2008 = GeoJSONDataSource(geojson=f.read())

with open(r'geoSource_final_2009.geojson', 'r') as f:
    geoSource_new_2009 = GeoJSONDataSource(geojson=f.read())
    
with open(r'geoSource_final_2010.geojson', 'r') as f:
    geoSource_new_2010 = GeoJSONDataSource(geojson=f.read())

with open(r'geoSource_final_2011.geojson', 'r') as f:
    geoSource_new_2011 = GeoJSONDataSource(geojson=f.read())
    
with open(r'geoSource_final_2012.geojson', 'r') as f:
    geoSource_new_2012 = GeoJSONDataSource(geojson=f.read())

with open(r'geoSource_final_2013.geojson', 'r') as f:
    geoSource_new_2013 = GeoJSONDataSource(geojson=f.read())
    
with open(r'geoSource_final_2014.geojson', 'r') as f:
    geoSource_new_2014 = GeoJSONDataSource(geojson=f.read())

with open(r'geoSource_final_2015.geojson', 'r') as f:
    geoSource_new_2015 = GeoJSONDataSource(geojson=f.read())
    
with open(r'geoSource_final_2016.geojson', 'r') as f:
    geoSource_new_2016 = GeoJSONDataSource(geojson=f.read())
    
TOOLS = "pan,wheel_zoom,box_zoom,reset,save"


After preparing the data, Its time to plot the Final figure.

The below Function is used to Plot the tier information graph. Here we need to only pass the updated data for a particular year.

We have to make a different plot for each year and add a Select Widget that will fetch the respective plot for the selected Year from the select dropdown. 

The hover will show The Country Name and Tier ranking .

In [None]:
def tier_graph(geoSource_new, year):
    
    p = figure(title="Tier Infomation for the Year :"+str(year), 
          tools=TOOLS, 
          x_axis_location=None, 
          y_axis_location=None, 
          width=800, 
          height=700)
    p.grid.grid_line_color = None

    palette.reverse()
    mapper2 = CategoricalColorMapper(palette=["#4ca64c","yellow","orange","#ff3232","white"], 
                                     factors=["1", "2","2W","3",'No Tier Infomation'])
    label=["1", "2","2W","3",'No Tier Infomation']

    for factor, color in zip(mapper2.factors, mapper2.palette):
        p.circle(x=[], y=[], size = 5, fill_color=color, legend=factor)

    hover1 = HoverTool( tooltips=[("Country","@name"), ("Tier", "@Tier")] )
    p.add_tools(hover1)
    p.legend.border_line_width = 3
    p.legend.border_line_color = "navy"
    p.legend.border_line_alpha = 0.2
    p.legend.location = "center_left"
    p.patches('xs', 'ys', fill_alpha=1, 
              fill_color={'field': 'Tier','transform': mapper2},
              line_color='Red', line_width=0.5, source=geoSource_new)
   
    p.legend.location = "center_left"
    return p

##### Plottt graph 
with open('world.geojson', 'r') as f:
            geo_source = f.read()
        
geoSource_data = GeoJSONDataSource(geojson=geo_source)

plot2008 = tier_graph(geoSource_new_2008, 2008)
plot2009 = tier_graph(geoSource_new_2009, 2009)
plot2010 = tier_graph(geoSource_new_2010, 2010)
plot2011 = tier_graph(geoSource_new_2011, 2011)
plot2012 = tier_graph(geoSource_new_2012, 2012)
plot2013 = tier_graph(geoSource_new_2013, 2013)
plot2014 = tier_graph(geoSource_new_2014, 2014)
plot2015 = tier_graph(geoSource_new_2015, 2015)
plot2016 = tier_graph(geoSource_new_2016, 2016)

def plot_year(year):
    curdoc().clear()
    curdoc().add_root(myLayout)
    curdoc().add_root(year)

def update_plot(attr,old,new):
    if (select.value == "2008"):
        plot_year(plot2008)
    if (select.value == "2009"):
        plot_year(plot2009)
    if (select.value == "2010"):
        plot_year(plot2010)
    if (select.value == "2011"):
        plot_year(plot2011)
    if (select.value == "2012"):
        plot_year(plot2012)
    if (select.value == "2013"):
        plot_year(plot2013)
    if (select.value == "2014"):
        plot_year(plot2014)
    if (select.value == "2015"):
        plot_year(plot2015)
    if (select.value == "2016"):
        plot_year(plot2016)

#  create select widget
select = Select(title="Please Select Year:", value="fig4", options=[("2008","2008"), 
                                                               ("2009","2009"), 
                                                               ("2010","2010"), 
                                                               ("2011","2011"),
                                                               ("2012","2012"),
                                                               ("2013","2013"),
                                                               ("2014","2014"),
                                                               ("2015","2015"),
                                                               ("2016","2016"),
                                                              ])

# create a slider widget

select.on_change("value",update_plot)
curdoc().clear()
myLayout = layout([[select]])
curdoc().add_root(myLayout)   # add the select width
curdoc().add_root(plot2008)
show(plot2016)

# <font color = green >Section 3: Trafficking from Source to Destination On Top of Tier Ranking Patch.</font>
------------------------------------------------------------------------------------------------

Lets first load the data for all the years i:e from 2008 to 2016 and extract the required columns used for visualization and Statistic.

In [None]:
data=pd.read_csv('Human_Traffic_data.csv', encoding= 'ISO-8859-1')

# Extracting the Required columns like "Destination_Country", "Origin_Country" their Logitude Latitude etc.
data=data[["Transit_Country","Years","Destination_Country", "Destination_Latitude",
           "Destination_Longitude", "Flagged","Means","Notes","Origin_Country",
           "Sector","Victim_Profile","trafficker","Type_Of_Trafficking",
          "Source_Latitude","Source_Longitude"]]

# Replacing Trafficker null values to 0 and removing commas(9,10 becomes 910)
# We need the show the trafficker infomation in the hover

data['trafficker'] = data['trafficker'].str.replace(',', '')
data['trafficker'] = data['trafficker'].str.replace(' ', '')
data.trafficker.fillna(0,inplace=True)
data.Years.unique()

### Segregating Type of Trafficking

In the Hover, we need to show the ** Type of Trafficking ** between source and destination. 
The type of trafficking is having values between 1 and 11. Each numeric attribute represents a different type of Trafficking. Some field had more than one type and trafficking and is represented by the comma separated values like (1,3), so we need to split this first i:e segregating this field and replaced with appropriate values. 



.<font color = red> 
 Note : We were not able to find what the number 11 represents.</font> 

In [None]:
## Handling for type of trafficing
## Function to split the string like 1,2,3 will be 3 rows 

def change_column_order(df, col_name, index):
    cols = df.columns.tolist()
    cols.remove(col_name)
    cols.insert(index, col_name)
    return df[cols]

def split_df(dataframe, col_name, sep):
    orig_col_index = dataframe.columns.tolist().index(col_name)
    orig_index_name = dataframe.index.name
    orig_columns = dataframe.columns
    dataframe = dataframe.reset_index()  # we need a natural 0-based index for proper merge
    index_col_name = (set(dataframe.columns) - set(orig_columns)).pop()
    df_split = pd.DataFrame(
        pd.DataFrame(dataframe[col_name].str.split(sep).tolist())
        .stack().reset_index(level=1, drop=1), columns=[col_name])
    df = dataframe.drop(col_name, axis=1)
    df = pd.merge(df, df_split, left_index=True, right_index=True, how='inner')
    df = df.set_index(index_col_name)
    df.index.name = orig_index_name
    # merge adds the column to the last place, so we need to move it back
    return change_column_order(df, col_name, orig_col_index)

data['Type_Of_Trafficking']= data['Type_Of_Trafficking'].astype(str)
data = split_df(data, 'Type_Of_Trafficking', ',')

data['Type_Of_Trafficking'] = data['Type_Of_Trafficking'].replace(['1.0','2.0','3.0','4.0','5.0','6.0','7.0','8.0','9.0','10.0','1','2','3','4','5','6','7','8','9','10','nan'], 
                                                                  [ 'Sexual exploitation',
                                                                     'Forced labor',
                                                                     'Domestic servitude',
                                                                     'Child sex tourism',
                                                                     'Forced drug trafficking',
                                                                     'Forced begging',
                                                                     'Child soldiers',
                                                                     'Forced criminal activity',
                                                                     'Child sexual exploitation',
                                                                     'Forced marriage',
                                                                     'Sexual exploitation',
                                                                     'Forced labor',
                                                                     'Domestic servitude',
                                                                     'Child sex tourism',
                                                                     'Forced drug trafficking',
                                                                     'Forced begging',
                                                                     'Child soldiers',
                                                                     'Forced criminal activity',
                                                                     'Child sexual exploitation',
                                                                     'Forced marriage',''
                                                                     ])





Some more santity check for data.

In [None]:
# As a part of cleaning , lets change the Date format of Year to **Numeric.**
data.Years=data.Years.astype(int)

# Replacing all null values to zero and trafficker values to **Numeric** for some Validation purpose.
data.trafficker=pd.to_numeric(data.trafficker)

counts = data.groupby('Origin_Country')[['Destination_Country']].count().reset_index().rename(columns={'Destination_Country': 'Connections'})
# Joining the count data frame with Original dataframe which will allow us to show that column value "count" in the hover tool. 
## merge count with data.
data_df = pd.merge(data, counts, left_on='Origin_Country', right_on='Origin_Country', how='left')

# Dropping the source and destination longitude and latitude having Null values. 
data_df.dropna(subset = ['Destination_Latitude'], inplace = True)
data_df.dropna(subset = ['Destination_Longitude'], inplace = True)
data_df.dropna(subset = ['Source_Latitude'], inplace = True)
data_df.dropna(subset = ['Source_Longitude'], inplace = True)

The below function will return two data frame for each year.

1) The data frame having only the unique Country Name(source or destination) with their longitude and latitude.

2) The data frame having the connection between Source and Destination. 

In [None]:
def source_connection(inputyear):
    data_df_year = data_df[data_df.Years == inputyear].reset_index(drop=True)
    data_df_year.index.name = "index"
    connections_df_year = data_df_year[['Origin_Country','Destination_Country']]
    connections_df_year = connections_df_year.rename(columns={"Origin_Country": "start", 
                                                    "Destination_Country": "end"})
    ## two df 
    data_df_year_src = data_df_year[['Origin_Country','Source_Latitude','Source_Longitude']]
    data_df_year_dest= data_df_year[['Destination_Country','Destination_Latitude','Destination_Longitude']]

    ## rename
    data_df_year_src = data_df_year_src.rename(columns={"Origin_Country": "Country", "Source_Latitude": "Latitude", "Source_Longitude":"Longitude"})
    data_df_year_dest = data_df_year_dest.rename(columns={"Destination_Country": "Country", "Destination_Latitude": "Latitude", "Destination_Longitude":"Longitude"})

    ## merge
    data_df_countries_year = data_df_year_src.append(data_df_year_dest)

    data_df_countries_year.dropna(inplace=True)
    
    unique_country = set(data_df_countries_year.Country)
    dict_country_to_id = {c:i for (i,c) in enumerate(unique_country)}
    dict_country_to_id
    data_df_countries_year['country_id'] = data_df_countries_year.apply(lambda r: dict_country_to_id[r.Country] , axis = 1)
    
    connections_df_year.fillna(0,inplace = True)
    connections_df_year.drop_duplicates(subset=None, keep='first', inplace=True)

    y = data_df_countries_year.sort_values("country_id").groupby("Country", as_index=False).first()
    for cnty in connections_df_year['start'].unique():
        if (type(cnty)  == str):
            t = y.loc[y["Country"] == cnty ,'country_id']
            connections_df_year.loc[connections_df_year['start'] == cnty ,'start'] =int(t.values)

    for cnty in connections_df_year['end'].unique():
        if (type(cnty)  == str):
            t = y.loc[y["Country"] == cnty ,'country_id']
            connections_df_year.loc[connections_df_year['end'] == cnty ,'end'] = int(t.values)
            
    return connections_df_year, data_df_countries_year

# Lets Run the above function to get the nodes and connection infomation for all the years (from 2008 to 2016)

connections_df_2008, data_df_countries_2008 = source_connection(2008)
connections_df_2009, data_df_countries_2009 = source_connection(2009)
connections_df_2010, data_df_countries_2010 = source_connection(2010)
connections_df_2011, data_df_countries_2011 = source_connection(2011)
connections_df_2012, data_df_countries_2012 = source_connection(2012)
connections_df_2013, data_df_countries_2013 = source_connection(2013)
connections_df_2014, data_df_countries_2014 = source_connection(2014)
connections_df_2015, data_df_countries_2015 = source_connection(2015)
connections_df_2016, data_df_countries_2016 = source_connection(2016)

### Create Node and Edges
Create the glyphs to be used for nodes (circle) and edges (multiline), and then creates the node renderer and the edge renderer that will be used to draw the nodes and edges using the respective glyphs. To make selections easier to see, the selection glyph for nodes is made a different color and its line width is increased to make the point larger.

Also when a node is not selected , it will be become invisiable for the user 

In [None]:
## prior node ifo
node_glyph = Circle( size=7,  fill_color=Set3_12[3])
node_nonselection = Circle( size=2, fill_color=Set3_12[3] , fill_alpha=0.0, line_alpha=0.0)
node_selection = Circle(fill_color=Set3_12[10], fill_alpha=0.8, line_alpha=0.3,line_width=10, line_color='green')

## prior edge information 
edge_glyph = MultiLine(line_alpha=0.01)
edge_hover = MultiLine(line_alpha=0.6, line_color="Blue", line_dash="4 4")
edge_selection = MultiLine(line_alpha=1, line_width=2 , line_color= "Red", line_dash='dashed')
edge_nonselection = MultiLine(line_width=0.02 , line_color= "grey")

### Function to create the Final Interative graph 

Create the graph object form the node and edge renderers. A StaticLayoutProvider is used and is given the lat/long for the countries to use when plotting. This layout along with the two renderers are used to create a graph renderer. A main difference between the Graphs and other bokeh types is that a graph is composed of two renderers that work together while every other plot type is a single renderer.

To give a visual reference, the outline of the countries is added to the plot. The GeoJSON data source created earlier is added and patch glyphs are used for the shape because the shapes are irregular. The patches are created by bokeh using the GeoJSON data source and filled with color as per the tier infomation of the countries.

A hover tool is created to display trafficking type and destination country information when hovered over the associated node. A second hover tooltip is created that does not show any hover information. This is so the hover actions can be triggered without always “popping up” the information on the country.

For Hover map numbers in a range [low, high] into a sequence of colors (a palette) on a natural logarithm scale.

In [None]:
def plot_data(data_df,connections,year,geoSource_new,df):

    connections_df = connections
    
    
    data_df_countries = data_df.merge(df, how='inner',left_on='Country', right_on='Origin_Country').reset_index()
    data_df_countries.drop_duplicates(subset=None, keep='first', inplace=True)
    #data_df_countries = data_df.merge(df[df.Years == year], how='inner',left_on='Country', right_on='Origin_Country')

    node_source = ColumnDataSource(data_df_countries)
    edge_source = ColumnDataSource(connections_df[["start", "end"]])
    
    mapper2 = CategoricalColorMapper(palette=["#4ca64c","yellow","orange","#ff3232","white"], 
                                     factors=["1", "2","2W","3",'No Tier Infomation'])

    node_renderer = GlyphRenderer(data_source=node_source, 
                                  glyph=node_glyph,
                                  selection_glyph=node_selection, 
                                  nonselection_glyph=node_nonselection)

    ## Create edge_renderer
    edge_renderer = GlyphRenderer(data_source=edge_source, glyph=edge_glyph,
                                  hover_glyph=edge_hover, selection_glyph=edge_selection, 
                                  nonselection_glyph=edge_nonselection
                                 )
    ## Create layout_provider
    graph_layout = dict(zip(data_df_countries.country_id.astype(str), 
                            zip(data_df_countries.Longitude, data_df_countries.Latitude)))
    layout_provider = StaticLayoutProvider(graph_layout=graph_layout)

    ## Create graph renderer
    graph = GraphRenderer(edge_renderer=edge_renderer, 
                          node_renderer=node_renderer, 
                          layout_provider=layout_provider, 
                          inspection_policy=NodesAndLinkedEdges(),
                          selection_policy=NodesAndLinkedEdges())

    plot = Plot(x_range=Range1d(-150, 150), y_range=Range1d(15, 75), plot_width=800, plot_height=600, background_fill_color=Set3_12[4],background_fill_alpha=0.2)

    plot.title.text = "Human Trafficing Visualization for "+ str(year)

    plot.add_glyph(geoSource_new, Patches(xs='xs', ys='ys', line_color='grey'
                        , line_width=.2,  fill_color={'field': 'Tier','transform': mapper2}, 
                                          fill_alpha=0.40))
    # Add the graph to the plot
    plot.renderers.append(graph)
    
    # Change the axis labels
    plot.add_layout(LinearAxis(axis_label="Latitude"), "below")
    plot.add_layout(LinearAxis(axis_label="Longitude"), "left")
    
    # Add tools to the graph
    hover = HoverTool(show_arrow=True,
                      tooltips="""
                                <div>
                                    <div>
                                        <span style="font-size: 13px;">Country Info</span>
                                        <span style="font-size: 12px; color: #696;">@Country, @Type_Of_Trafficking</span>
                                    </div>
                                </div>
                                """, 
                      renderers=[node_renderer], name = 'Test')
    hover_no_tooltips = HoverTool(tooltips=None, renderers=[graph])
    box_zoom = BoxZoomTool()

    plot.add_tools(hover, 
                   hover_no_tooltips, 
                   box_zoom, TapTool(), 
                   BoxSelectTool(), 
                   ResetTool(), 
                   WheelZoomTool()
                  )
    plot.toolbar.active_inspect = [hover, hover_no_tooltips]
    plot.toolbar.active_drag = box_zoom
    plot.outline_line_color = "navy"
    plot.outline_line_alpha = 0.3
    plot.outline_line_width = 1
    
    select_overlay = plot.select_one(BoxSelectTool).overlay
    select_overlay.fill_color = "firebrick"
    select_overlay.line_color = None

    zoom_overlay = plot.select_one(BoxZoomTool).overlay
    zoom_overlay.line_color = "olive"
    zoom_overlay.line_width = 3
    zoom_overlay.line_dash = "solid"
    zoom_overlay.fill_color = None
    
    plot.add_tile(STAMEN_TONER_LABELS)
    
    return plot

Created plot from 2008 to 2016 separately. Added layout of "Select" DropDown having values between 2008 to 2016.

In [None]:
plot2008 = plot_data(data_df_countries_2008,connections_df_2008,2008,geoSource_new_2008,data_df)
plot2009 = plot_data(data_df_countries_2009,connections_df_2009,2009,geoSource_new_2009,data_df)
plot2010 = plot_data(data_df_countries_2010,connections_df_2010,2010,geoSource_new_2010,data_df)
plot2011 = plot_data(data_df_countries_2011,connections_df_2011,2011,geoSource_new_2011,data_df)
plot2012 = plot_data(data_df_countries_2012,connections_df_2012,2012,geoSource_new_2012,data_df)
plot2013 = plot_data(data_df_countries_2013,connections_df_2013,2013,geoSource_new_2013,data_df)
plot2014 = plot_data(data_df_countries_2014,connections_df_2014,2014,geoSource_new_2014,data_df)
plot2015 = plot_data(data_df_countries_2015,connections_df_2015,2015,geoSource_new_2015,data_df)
plot2016 = plot_data(data_df_countries_2016,connections_df_2016,2016,geoSource_new_2016,data_df)

def plot_fun(year):
    curdoc().clear()
    curdoc().add_root(myLayout) 
    curdoc().add_root(year)
    
def update_plot(attr,old,new):
    if (select.value == "2008"):
        plot_fun(plot2008)
    if (select.value == "2009"):
        plot_fun(plot2009)
    if (select.value == "2010"):
        plot_fun(plot2010)
    if (select.value == "2011"):
        plot_fun(plot2011)
    if (select.value == "2012"):
        plot_fun(plot2012)
    if (select.value == "2013"):
        plot_fun(plot2013)
    if (select.value == "2014"):
        plot_fun(plot2014)
    if (select.value == "2015"):
        plot_fun(plot2015)
    if (select.value == "2016"):
        plot_fun(plot2016)

#  create select widget
select = Select(title="Please Select Year:", value="fig4", options=[("2008","2008"), 
                                                                   ("2009","2009"), 
                                                                   ("2010","2010"), 
                                                                   ("2011","2011"),
                                                                   ("2012","2012"),
                                                                   ("2013","2013"),
                                                                   ("2014","2014"),
                                                                   ("2015","2015"),
                                                                   ("2016","2016"),
                                                              ])
select.on_change("value",update_plot)
curdoc().clear()
myLayout = layout([[select]])
curdoc().add_root(plot2008)
curdoc().add_root(myLayout) 
show(plot2010)

# <font color = green >Section 3: Statistic For Human Trafficking </font> 
--------------------------------------------------------------------------------------------------

Created two statistical graphs for Type of trafficking vs different Years (2008 to 2016). Sliced out the different data frames to plot the graph. 

** First Plot **: Here we are ploting the number of Destination Country for each year for each type of Trafficking. 


In [None]:
## Visualization on the type of trafficing per destination country 
## step 1: get the unique trafficing type per destination coun try 
traffic = data_df[['Years', 'Destination_Country', 'Type_Of_Trafficking']]

## step 2: remove duplication and keep the first entry 
traffic.drop_duplicates(subset=None, keep='first', inplace=True)

## step 3: get the count per type of trafficing 
traffic_counts = traffic.groupby(['Years','Type_Of_Trafficking'])[['Destination_Country']].count().reset_index().rename(columns={'Destination_Country': 'Counts'})
traffic_counts = traffic_counts[traffic_counts['Type_Of_Trafficking'].str.len() > 6]

traffic_2008 = traffic_counts[traffic_counts['Years'] == 2008]
traffic_2009 = traffic_counts[traffic_counts['Years'] == 2009]
traffic_2010 = traffic_counts[traffic_counts['Years'] == 2010]
traffic_2011 = traffic_counts[traffic_counts['Years'] == 2011]
traffic_2012 = traffic_counts[traffic_counts['Years'] == 2012]
traffic_2013 = traffic_counts[traffic_counts['Years'] == 2013]
traffic_2014 = traffic_counts[traffic_counts['Years'] == 2014]
traffic_2015 = traffic_counts[traffic_counts['Years'] == 2015]
traffic_2016 = traffic_counts[traffic_counts['Years'] == 2016]

Creating ** color mapper ** that will Map categories to colors. Here we are considering different shades of <font color = purple >Purples </font>.

In [None]:
colors1 = Purples[8]
mapper = LinearColorMapper(palette=Spectral6, low=traffic_2008.Counts.min(), high=traffic_2008.Counts.max())

#### Function to Plot Statistic for each Year.

The color bar is added at the end of each line. For example, the first grid contains 3 plot and only one color bar to remove redundancy. 

In [None]:
from bokeh.models import Div
from bokeh.layouts import column
def plot_stat(df_year, year):
    
    traffic =  df_year['Type_Of_Trafficking']
    counts = df_year['Counts']

    source = ColumnDataSource(data=dict(traffic=traffic, counts=counts))

    p = figure(x_range=traffic, plot_width = 250, plot_height=350, title=str(year),
               toolbar_location=None, tools="")

    colorbar = ColorBar(color_mapper = mapper, location = (0,0), width= 6)
    p.vbar(x="traffic", top="counts", width=0.4, source=source,
           line_color=None, fill_color={'field': 'counts', 'transform' :mapper})

    p.legend.orientation = "horizontal"
    p.legend.location = "top_center"
   
    p.xaxis.major_label_orientation = 1.0
    p.background_fill_alpha = 0.5
    #p.title_text_font_style = "italic"
    p.background_fill_color = "beige"
    p.ygrid.grid_line_alpha = 0.5
    p.ygrid.grid_line_dash = [6, 4]
    
    if year in(2010,2013,2016):
        p.add_layout(colorbar, 'right')
    return p

output_file("stat.html")

p8 = plot_stat(traffic_2008, 2008)
p9 = plot_stat(traffic_2009, 2009)
p10 = plot_stat(traffic_2010, 2010)
p11 = plot_stat(traffic_2011, 2011)
p12 = plot_stat(traffic_2012, 2012)
p13 = plot_stat(traffic_2013, 2013)
p14 = plot_stat(traffic_2014, 2014)
p15 = plot_stat(traffic_2015, 2015)
p16 = plot_stat(traffic_2016, 2016)

f = gridplot([p8,p9,p10],[p11,p12, p13],[p14,p15,p16])
show(column(Div(text="TRAFFICKING TYPE STATS FOR ALL YEARS"),f))

** Second Plot ** : Overall statistical visualization for type of traffing count for each year.

In [None]:
from bokeh.models import ColumnDataSource, FactorRange, LabelSet
from bokeh.transform import factor_cmap
years = ['2008', '2009', '2010', '2011', '2012', '2013', '2014', '2015', '2016'] 
traffics =  traffic_2008['Type_Of_Trafficking']
data = {'traffic' : traffic,
        '2008'   : traffic_2008.Counts.values, 
        '2009'   : traffic_2009.Counts.values, 
        '2010'   : traffic_2010.Counts.values,
        '2011'   : traffic_2011.Counts.values, 
        '2012'   : traffic_2012.Counts.values, 
        '2013'   : traffic_2013.Counts.values,
        '2014'   : traffic_2014.Counts.values, 
        '2015'   : traffic_2015.Counts.values, 
        '2016'   : traffic_2016.Counts.values}

x = [ (traffic, year) for year in years for traffic in traffics ]
counts = sum(zip(data['2008'], data['2009'], data['2010'], data['2011'], 
                 data['2012'], data['2013'], data['2014'], data['2015'], data['2016']), ()) # like an hstack

source = ColumnDataSource(data=dict(x=x, counts=counts))

p = figure(x_range=FactorRange(*x), plot_height=500, plot_width = 1000, title="Statistics of type of Trafficing for each Years (2008-2016)",
           toolbar_location=None,tools="hover", tooltips="@counts")

labels = LabelSet(x='x', y='counts', text='counts', level='glyph',text_font_size="7pt",
        x_offset= -5, y_offset= 0, source=source, render_mode='canvas')

p.vbar(x='x', top='counts', width=0.9, source=source, line_color="white",
       fill_color=factor_cmap('x', palette=palette, factors=years, start=1, end=2))

p.y_range.start = 0
p.x_range.range_padding = 0.1
p.xaxis.major_label_orientation = 1
p.xgrid.grid_line_color = None
p.y_range.start = 0
p.x_range.range_padding = 0.05
p.xaxis.major_label_orientation = 1.2
p.outline_line_color = None
p.ygrid.grid_line_alpha = 0.5
p.ygrid.grid_line_dash = [6, 4]
p.yaxis.major_label_text_color = "orange"
p.xaxis.major_label_text_color = "orange"
p.xaxis.axis_label_text_color = "red"
p.xaxis.axis_label_text_color = "#aa6666"
p.xaxis.axis_label_standoff = 30
p.add_layout(labels)
show(p)

## <font color = red >Error Aalysis. </font>
While we have already completed all requirement given by the clients but still some parts are incomplete. 

1) Another level of filter condition in the dashboard. Currently, we just have a filter on the year. The user has the option to select a year from 2008 to 2016. There should be another later of filter condition where the user can have an option to select Source country and Destination Country. Also, a modification can be included like "Instead of Select Dropdown for Yera it is better to complete a slider".

2) The Hover Part is still incomplete. It should display all the Destination Countries and Type of Trafficking. Currently, it is showing the Type Of Trafficking but Destination Country Information is inconsistent.

3) Tried to do Clustering using K-means, but not able to complete it.


## More Ideas To improve the Interactivity:

We can have the option to upload the file in the dashboard so that the range of Year can be controlled by the Users or the admin.

