## What I was Thinking

```2011-2018_monthly_refugee.csv``` contains data about the number of refugees taking shelter in what countries in what year. I was thinking of displaying a world map and depending on the number of refugees seen in the country in each, color each country accordingly. Somewhere down the line, we should add a scroll that allows the user to change the year and see the data change chronologically.

## What I Think Needs to be Done

Here is a tutorial for the plotly library: https://plot.ly/python/choropleth-maps/<br>
Here is a tutorial for creating a slider using plotly: https://plot.ly/python/gapminder-example/

If we do a world map, it would be better to use plotly since it has a world map already set for you. However, we need the country code to connect the map and the data in the csv, so... I don't really know what to do... Maybe Geopandas will actually be easier...

But if we do decide on plotly, we need to
1. find the country code and assign all rows a country code corresponding to the country
    1. https://www.kaggle.com/shep312/plotlycountrycodes
2. do some data wrangling and change some country names
    1. some country names have extra notes in the end, which we should delete
3. some values are represented with <b>*</b>, which means the data is confidential; those should be set to 0

## Relevant Libraries

In [84]:
# import plotly.plotly as py
# from plotly.grid_objs import Grid, Column
# from plotly.tools import FigureFactory as FF 

import pandas as pd
import time
import re
import numpy as np

In [85]:
refugee_df = pd.read_csv('2011-2018_monthly_refugee.csv')
# refugee_df

In [86]:
# displays list of unique country names in the data frame
# refugee_df['Country / territory of asylum/residence'].unique()

Replace:
1. Czech Rep. --> Czech Republic
2. United Kingdom of Great Britain and Northern Ireland --> ??? (currently United Kingdom)
3. Rep. of Korea --> Korea, South
4. The former Yugoslav Rep. of Macedonia --> Macedonia
5. USA (INS/DHS) --> United States
6. USA (EOIR) --> United States
7. Serbia and Kosovo: S/RES/1244 (1999) --> Serbia

In [87]:
code_df = pd.read_csv('plotly_countries_and_codes.csv')
# code_df

In [88]:
# displays list of unique country names in the data frame
# code_df['COUNTRY'].unique()

In [89]:
# replacing values in refugee_df dataframe
new_refugee_df = refugee_df.replace({
    'Czech Rep.': 'Czech Republic',
    'United Kingdom of Great Britain and Northern Ireland': 'United Kingdom',
    'Rep. of Korea': 'Korea, South',
    'The former Yugoslav Rep. of Macedonia': 'Macedonia',
    'USA (INS/DHS)': 'United States',
    'USA (EOIR)': 'United States',
    'Serbia and Kosovo: S/RES/1244 (1999)': 'Serbia'
})

In [90]:
# joining two tables together
merged_df = new_refugee_df.set_index('Country / territory of asylum/residence').join(code_df.set_index('COUNTRY'))
merged_df = merged_df.reset_index()
merged_df

Unnamed: 0,index,Origin,Year,Month,Value,GDP (BILLIONS),CODE
0,Albania,Syrian Arab Rep.,2012,December,3,13.4,ALB
1,Albania,Syrian Arab Rep.,2013,March,8,13.4,ALB
2,Albania,Syrian Arab Rep.,2013,August,4,13.4,ALB
3,Albania,Syrian Arab Rep.,2013,September,4,13.4,ALB
4,Albania,Syrian Arab Rep.,2013,December,8,13.4,ALB
5,Albania,Syrian Arab Rep.,2014,May,1,13.4,ALB
6,Albania,Syrian Arab Rep.,2014,June,1,13.4,ALB
7,Albania,Syrian Arab Rep.,2014,July,10,13.4,ALB
8,Albania,Syrian Arab Rep.,2014,August,6,13.4,ALB
9,Albania,Syrian Arab Rep.,2014,September,4,13.4,ALB


In [91]:
# rename index column to 'Country'
merged_df.rename(columns={'index': 'Country'}, inplace=True)

# remove * and convert values to int values
merged_df['Value'] = merged_df['Value'].apply(lambda x : 0 if x == '*' else int(x))

# sum values and group each country by year
total_df = pd.DataFrame(merged_df.groupby(['Country','Year','CODE']).agg({'Value':np.sum})).reset_index()
type(total_df)
total_df

Unnamed: 0,Country,Year,CODE,Value
0,Albania,2012,ALB,3
1,Albania,2013,ALB,24
2,Albania,2014,ALB,92
3,Albania,2015,ALB,71
4,Albania,2016,ALB,132
5,Albania,2017,ALB,105
6,Albania,2018,ALB,1676
7,Australia,2011,AUS,93
8,Australia,2012,AUS,143
9,Australia,2013,AUS,139


## Plotly

Need to make a plotly account and generate an api key

In [92]:
import plotly
plotly.__version__

plotly.tools.set_credentials_file(username='ksuhr1', api_key='Bg4V9Un93jYk47pAoxK3')

Create a Bubble Chart using Plotly where the bubbles represent countries and their sizes represents the number of Syrian refugees. The data currently is in total_df.

In [93]:
import plotly.plotly as py
from plotly.grid_objs import Grid, Column
from plotly.tools import FigureFactory as FF 

import pandas as pd
import time

In [94]:
# df_sample = total_df[0:100]
table = FF.create_table(total_df)
py.iplot(table, filename='animations-gapminder-data-preview')


plotly.tools.FigureFactory.create_table is deprecated. Use plotly.figure_factory.create_table



## Make the Grid

1) Define a list of string years to represent values slider will have

2) Will also take all unique continents from column 'Country' and store them

In [95]:
# Grab years from 2011-2018
years_from_col = set(total_df['Year'])

# Sort years and store them in a list
years_ints = sorted(list(years_from_col))

# Convert years into strings
years = [str(year) for year in years_ints]

# make list of countries
countries = []
for country in total_df['Country']:
    if country not in countries:
        countries.append(country)
        
columns = []
# make grid
for year in years:
    for country in countries:
        # groups dataset by year
        dataset_by_year = total_df[total_df['Year'] == int(year)]
        
        # splits by year and continent
        dataset_by_year_and_cont = dataset_by_year[dataset_by_year['Country'] == country]
        
        for col_name in dataset_by_year_and_cont:
            
            # each column name is unique
            column_name = '{year}_{country}_{header}_gapminder_grid'.format(year=year, country=country, header=col_name)
            a_column = Column(list(dataset_by_year_and_cont[col_name]), column_name)
            columns.append(a_column)
            
# upload grid
grid = Grid(columns)
url = py.grid_ops.upload(grid,'gapminder_grid'+str(time.time()), auto_open=False)
url

'https://plot.ly/~ksuhr1/8/'

In [96]:
countries

['Albania',
 'Australia',
 'Austria',
 'Belgium',
 'Bosnia and Herzegovina',
 'Bulgaria',
 'Canada',
 'Croatia',
 'Cyprus',
 'Czech Republic',
 'Denmark',
 'Estonia',
 'Finland',
 'France',
 'Germany',
 'Greece',
 'Hungary',
 'Iceland',
 'Ireland',
 'Italy',
 'Japan',
 'Korea, South',
 'Latvia',
 'Liechtenstein',
 'Lithuania',
 'Luxembourg',
 'Macedonia',
 'Malta',
 'Montenegro',
 'Netherlands',
 'New Zealand',
 'Norway',
 'Poland',
 'Portugal',
 'Romania',
 'Serbia',
 'Slovakia',
 'Slovenia',
 'Spain',
 'Sweden',
 'Switzerland',
 'Turkey',
 'United Kingdom',
 'United States']

## Make the Figure

In [97]:
figure = {
    'data': [],
    'layout':{},
    'frames': [],
    'config': {'scrollzoom': True}
}

# fill in most of layout
figure['layout']['xaxis'] = {'range': [30, 85], 'title': 'Number of Refugees', 'gridcolor': '#FFFFFF'}
figure['layout']['yaxis'] = {'title': 'IDK', 'type': 'log', 'gridcolor': '#FFFFFF'}
figure['layout']['hovermode'] = 'closest'
figure['layout']['plot_bgcolor'] = 'rgb(223, 232, 243)'

## Add Slider

For slider to appear, we need to add a sliders dictionary to layout. The sliders dictionary is set in the following way

figure['layout']['sliders'] = {
    'active': 0,
    # determines where the slider is on chart page
    'yanchor': 'top',
    # determines if slider is on left or right of chart page
    'xanchor': 'left',
    # sets the display of curreent value that the slider
    # is hovering on
    'currentvalue': {
        'font': {'size': 20},
        # sets the text that appears before the value
        'prefix': 'text-before-value-on-display',
        'visible': True,
        'xanchor':'right'
    },
    'transition': {'duration': 300, 'easing': 'cubic-in-out'},
    'pad': {'b': 10, 't': 50},
    'len': 0.9,
    'x': 0.1,
    'y': 0,
    # list of dictionaries each which corresponds to a frame
    # in the figure. Should be ordered in the sequence
    # in which the frrames occur in the animation
    'steps':[...]
}

Each dictionary in steps has the following form

{
    'method':'animate',
    # the text that appears next to the prefix arg
    # mentioned in the slider section
    'label':'label-for-frame',
    'value': 'value-for-frame(defaults to label)',
    # first item in list args is a list containing
    # the slider-value of that frame
    'args':[{'frame':{'duration': 300, 'redraw:' False}, 'mode':'immediate'}
           ],
}
    

In [98]:
sliders_dict = {
    'active': 0,
    'yanchor': 'top',
    'xanchor': 'left',
    'currentvalue': {
        'font': {'size': 20},
        'prefix': 'Year:',
        'visible': True,
        'xanchor': 'right'
    },
    'transition': {'duration': 300, 'easing': 'cubic-in-out'},
    'pad': {'b': 10, 't': 50},
    'len': 0.9,
    'x': 0.1,
    'y': 0,
    'steps': []
}

## Add Play and Pause Buttons

In [99]:
figure['layout']['updatemenus'] = [
    {
        'buttons': [
            {
                'args': [None, {'frame': {'duration': 500, 'redraw': False},
                         'fromcurrent': True, 'transition': {'duration': 300, 'easing': 'quadratic-in-out'}}],
                'label': 'Play',
                'method': 'animate'
            },
            {
                'args': [[None], {'frame': {'duration': 0, 'redraw': False}, 'mode': 'immediate',
                'transition': {'duration': 0}}],
                'label': 'Pause',
                'method': 'animate'
            }
        ],
        'direction': 'left',
        'pad': {'r': 10, 't': 87},
        'showactive': False,
        'type': 'buttons',
        'x': 0.1,
        'xanchor': 'right',
        'y': 0,
        'yanchor': 'top'
    }
]

# custom_colors = {
#  'Albania':,
#  'Australia':,
#  'Austria':,
#  'Belgium':,
#  'Bosnia and Herzegovina':,
#  'Bulgaria':,
#  'Canada':,
#  'Croatia':,
#  'Cyprus':,
#  'Czech Republic':,
#  'Denmark':,
#  'Estonia':,
#  'Finland':,
#  'France':,
#  'Germany':,
#  'Greece':,
#  'Hungary':,
#  'Iceland':,
#  'Ireland':,
#  'Italy':,
#  'Japan':,
#  'Korea, South':,
#  'Latvia':,
#  'Liechtenstein':,
#  'Lithuania':,
#  'Luxembourg':,
#  'Macedonia':,
#  'Malta':,
#  'Montenegro':,
#  'Netherlands':,
#  'New Zealand':,
#  'Norway':,
#  'Poland':,
#  'Portugal':,
#  'Romania':,
#  'Serbia':,
#  'Slovakia':,
#  'Slovenia':,
#  'Spain':,
#  'Sweden':,
#  'Switzerland':,
#  'Turkey':,
#  'United Kingdom':,
#  'United States':
    
# }

## Fill in Figure with Data and Frames

We can put data from our grid into the figure by using .get_column_reference() method on the grid and supply the name ofo the column we want by looping through all the years and continents.

Note: If you are using referenced data for a particular parameter, you MUST change the parameter name from name to namesrc to indicate that you are using referenced data from a grid. For instance, x becomes xsrc, text becomes textsrc, etc.

In [103]:
col_name_template = '{year}_{country}_{header}_gapminder_grid'
year = 2011
for country in countries:
    data_dict = {
        'xsrc': grid.get_column_reference(col_name_template.format(
            year=year, country=country, header='Value'
        )),
        'ysrc': grid.get_column_reference(col_name_template.format(
            year=year, country=country, header='Year'
        )),
        'mode': 'markers',
        'textsrc': grid.get_column_reference(col_name_template.format(
            year=year, country=country, header='Year'
        )),
        'marker': {
            'sizemode': 'area',
            'sizeref': 200000,
            'sizesrc': grid.get_column_reference(col_name_template.format(
                 year=year, country=country, header='Country'
            )),
            #'color': custom_colors[continent]
        },
        'name': country
    }
    figure['data'].append(data_dict)

## Create Frames

Running again through yearrs and countries but for each combination, we instantiate a frame dictionary of the form:
    frame = {'data':[], 'name': value-name

We add a dictionary of data to this list and at the end of each loop, we ensurer to add the steps dictionary to the steps list. At the end, we attach the sliders dictionary to the figure via:
    figure['layout']['sliders'] = [sliders_dict]

In [104]:
for year in years:
    frame = {'data': [], 'name': str(year)}
    for country in countries:
        data_dict = {
            'xsrc': grid.get_column_reference(col_name_template.format(
                year=year, country=country, header='Value'
            )),
            'ysrc': grid.get_column_reference(col_name_template.format(
                year=year, country=country, header='Country'
            )),
            'mode': 'markers',
            'textsrc': grid.get_column_reference(col_name_template.format(
                year=year, country=country, header='Country'
            )),
            'marker': {
                'sizemode': 'area',
                'sizeref': 200000,
                'sizesrc': grid.get_column_reference(col_name_template.format(
                     year=year, country=country, header='Country'
                )),
                #'color': custom_colors[continent]
            },
            'name': country
        }
        frame['data'].append(data_dict)

    figure['frames'].append(frame)
    slider_step = {'args': [
        [year],
        {'frame': {'duration': 300, 'redraw': False},
         'mode': 'immediate',
       'transition': {'duration': 300}}
     ],
     'label': year,
     'method': 'animate'}
    sliders_dict['steps'].append(slider_step)

figure['layout']['sliders'] = [sliders_dict]

## Plot Animation

In [105]:
py.icreate_animations(figure, 'gapminder_example'+str(time.time()))