# Casualties and Migration in the Syrian Civil War

## Introduction

***

In 2011, five weeks into the civil demonstrations against the Syrian government, secret police forces detained and tortured fifteen students who had spray painted an anti-government statement on the walls of their school. They would be released weeks later in an effort to quell the rising civil unrest in the province. In the wake of the hundreds of other demonstrators who were killed or disappeared, this action was too little and too late to stop the tide of the civil war. Demonstrations turned to protest turned to armed conflict and the rest is history.

The war would go on to spawn both the largest refugee crisis and one of the deadliest conflicts in modern history. As of 2019, there are over 6 million Syrian refugees and another 6 million internally displaced people in a country with a pre-war population of around 24 million (UNHCR, 2018). The regime's efforts to prevent accurate information from leaving the country has made it nearly impossible to estimate the number of casualties that have occured in that time. Current estimates range from 300,000 to 600,000 killed depending on the source.

The link between the flow of violence within the country and the flow of asylum seekers out of the country should be apparent to anyone who is aware of the war. Yet a growing sentiment among residents in host countries is that a large portion of asylum seekers from Syria are actually economic migrants, who are using the conflict as a means of gaining entry into the European Union and access to generous social programs.

We believe that violence is the most important predictor of migration of Syrian refugees; however, while this argument may be generally accepted, there is great difficulty in proving this relationship for certain. We hope to answer this question using reported casualty data to see whether there is a correlation between violence in a given province and a subsequent increase in the amount of asylum seekers across all host countries.

## Project
***

Our project can be organized into three distinct portions:

1. Data Scraping
2. Data Wrangling
3. Data Visualization

Our goal is to create a dataset for casualty information a refugee data, clean and structure the dataset for easier queying, and visualize the data to provide more insights into the questions we pose above. 

## Data Scraping
***

There are multiple sources that could be used for casualty information (list here). We will leave the three datasets for now, and focus on the VDC and CSR datasets because they provide their data is table elements that make it easy for us to scrape and organize our dataframes for analysis.

We will now go through the process of scraping and creating the inital forms of these datasets.

### Casualty Data
***

#### VDC

The [Violations Documentation Center](http://www.vdc-sy.info/) has been recording casualty data since June 2011. It is likely the most detailed and complete (in terms of metadata) data source of casualties that is publicly accessible.

They provide their data with a user interface that will query their database using parameter the user defines. This interface will provide this information:

- `Name                  - Full name in English`
- `Status                - Civilian, non-civilian, or military status of deceased`
- `Sex                   - Whether deceased is an Adult or Minor and Male or Female`
- `Province              - One of the 14 Provinces of Syria`
- `Area \ Place of Birth - Various locations that can be Provinces/Subdistricts/Towns`
- `Date of death         - self explanatory`
- `Cause of death        - self explanatory`
- `Actors                - groups involved in the casualty`

Each entry is associated with a unique identifier, which is an integer between 0 and 250,000. Clicking on the name of the entry will lead the user to another page that provides the unique identifier number and other data that is not displayed on the main page. We will avoid describing this detail for now, since most of this data is not used in the final product.

We will describe the full process we used to scrape all details from this website as well as the detailed information.

In [None]:
def scrape_recent():
    first_page = 'http://www.vdc-sy.info/index.php/en/martyrs/1/c29ydGJ5PWEua2lsbGVkX2RhdGV8c29ydGRpcj1ERVNDfGFwcHJvdmVkPXZpc2libGV8ZXh0cmFkaXNwbGF5PTB8'
    
    # This is the format of the links that give us the unique identfiers
    pattern    = re.compile('\/index\.php\/en\/details\/martyrs\/.')

    # We want to establish a randomized user agent and Tor node to avoid detection
    ua         = UserAgent()
    headers    = {'User-Agent': ua.random}
    tor        = TorRequest(password = 'commonhorse')
    
    try:
        response = tor.get(first_page, headers=headers)
        content  = bs(response.text, 'html.parser')
        
        # This list comprehension grabs all unique identifiers in string format for all links that match
        # our regex pattern from above
        links    = {link['href'][30:] for link in content.find_all('a', href = True) if pattern.match(link['href'])} 

    except Exception as e:
        print(e)

    return links

In [None]:
'''
Provided a list of unique identifiers in string fromat, scrapes details and saves each entry 
as an idividual dataframe that represents one person.
'''

def scrape_details(uid, tor, headers):
    cols = []
    vals = []

    url  = 'http://www.vdc-sy.info/index.php/en/details/martyrs/' + uid
    
    # Headers will provide the UserAgent to use when getting response
    # Makes the request using a TorRequest object passed in
    page = tor.get(url, headers = headers).text
    page = bs(page, 'html.parser')
    
    # Grabs the relevant table info and all rows in it
    table = page.find('table', attrs = {'class':'peopleListing'})
    rows  = table.find_all('tr')

    for row in rows:
        data = row.find_all('td')

        # All data without only 2 data values
        # are not data we are looking for
        if len(data) != 2:
            continue

        # data[0] corresponds to the row label/column
        cols.append(data[0].text)
        
        # Values need to appended differently for image rows 
        if data[1].find('img') is not None:
            vals.append(data[1].find('img')['src'])
        else:
            vals.append(data[1].text)

    # Adds the uid to the dataframe
    cols.append('uid')
    vals.append(uid)

    # Creates and saves dataframe
    person = pd.DataFrame([vals], columns = cols, dtype=str)

    save(person, os.path.join('person_dfs', uid))
    
    
    

Each detailed page has a different number of columns depending on the metadata associated with that entry, so we will now have to combine all the dataframes. Pandas requires that columns have unique names, so we have to rename all duplicate columns using this code.

In [None]:
def rename_dup_cols(dataframe):
    cols = pd.Series(dataframe.columns)
  
    for dup in dataframe.columns.get_duplicates(): 
        cols[dataframe.columns.get_loc(dup)] = [dup + '_' + str(d_idx) if d_idx != 0 else dup for d_idx in range(dataframe.columns.get_loc(dup).sum())]
   
    dataframe.columns = cols

    return dataframe




Now given a list of dataframes we can return a combined dataframe that retains all column data and saves that file as vdc_df and saves any failed dataframes as failed_vdc_df.

In [None]:
def combine_dataframes(dataframes):
    failed_dataframes = []
    combined          = pd.DataFrame()

    current = 0
    num     = len(dataframes)

    for df in dataframes:
        try:
            combined = pd.concat([combined, df], axis = 0)
            print(f'{counter} / {num} people processed in combine_dataframes().')
            counter += 1
        
        except Exception as e:
            failed_dataframes.append(df)
            print('Failed')
            counter += 1

    save(combined, 'vdc_df')
    save(failed_dataframes, 'failed_vdc_df')

    print('\n\nSuccess: ', len(dataframes) - len(failed_dataframes))
    print('Failed: ', len(failed_dataframes))
    
    
    

Now, adding this all together. We will now:

1. Build a list of unique identifiers by scraping the query page for the VDC database using scrape_recent()

2. Scrape the detailed information provided the list of unique ids from scrape_recent() using scrape_details, which gives us dataframes for each person.

3. Combine those dataframes into one large dataset using combine_dataframes()


In [None]:
uids_to_scrape = scrape_recent()
uids_scraped   = set()

while len(uids_to_scrape) > 0:
    uid = uids_to_scrape.pop()
    
    try:
        ua         = UserAgent()
        headers    = {'User-Agent': ua.random}
        tor        = TorRequest(password = 'cmps184')
        scrape_details(uid, tor, headers)

    except Exception as e:
        print(e)
        helen___uids_to_scrape.append(uid)

        ua         = UserAgent()
        headers    = {'User-Agent': ua.random}
        tor        = TorRequest(password = 'cmps184')
        tor.reset_identity()

        continue
        
    uids_scraped.add(uid)

    save(uids_to_scrape, 'uids_to_scrape')
    save(uids_scraped  , 'uids_scraped')

In [None]:
list_of_dataframes = []

for person_df in glob.glob(os.path.join('person_dfs', '*.pickle')):
    list_of_dataframes.append(load(person_df))
    
combine_dataframes(list_of_dataframes)

#### CSR

The [Syrian Center for Statistics and Research](https://csr-sy.org/) has been recording casualty data since March 2011. It has less information than the VDC dataset, but the location of death is more precise.

They provide their data with a user interface that will query their database using parameter the user defines. This interface will provide this information:

- `ID Number             - Arbitrary ID number`
- `First Name            - First name in Arabic`
- `Father Name           - Father's last name in Arabic`
- `Last Name             - Last name in Arabic`
- `Province              - One of the 14 Provinces of Syria`
- `Town                  - Town where they died`
- `Date of death         - self explanatory`

If you looked at the code to scrape the VDC website, you'll see that there is no package for Tor or user agent. For some reason, this particular website was cautious about who was looking at their data as it blocked our multiple attempts of trying to scrape without cycling through IP addresses and user agents.

The following section contain all the libraries that must be imported in order to run the web scraping. 

In [None]:
# for requesting content of a web page
from bs4 import BeautifulSoup
from requests import get

# for regex
# import re

# for dataframe creation
import pandas as pd

# for not overwhelming server when scraping pages
from time import sleep
from random import randint

# For a progress bar while scraping
from tqdm import tqdm

# For saving files
import pickle

# For shuffling list
import random

# For Tor Requests
from torrequest     import TorRequest

# To cycle through useragents
from fake_useragent import UserAgent

In order to not have to rescrape everything if one page fails, it is essential to have a pickle file that you can store your successfully scraped data and failed attempts in. Below is the code to create and load a pickle file.

In [None]:
# Save and Load functions for a pickle file
def save(obj, name):
    pickle.dump(obj, open(name + '.pickle', 'wb'))

def load(name):
    return pickle.load(open(name + '.pickle', 'rb'))

In [None]:
# the url to start scraping CSR
url = 'https://csr-sy.org/?l=1&sons=redirect&sequence=&name=&father_name=&surname=&age_from=0&age_to=120&gender=&born_state=&born_town=&career=&society_status=&sons_no=&medical_status=&incident_state=&incident_town=&incident_desc=3&incident_date_from=&incident_date_to=&incident_details=&trial=&trial_date_from=&trial_date_to=&id=182&ddate_from=&ddate_to=&rec=0'

response = get(url)
html_soup = BeautifulSoup(response.text, 'html.parser')

The website was odd in the sense that some pages at the end didn't contain any information about Syrian casualties. We found the last page that had any information and hard-coded the page number in. As you can see from the code below, 91900 was the last page to contain any useful information.

In [None]:
# create a list of numbers used in shuffling through all the URLs
numbers_url = [str(i) for i in range(0, 91901, 50)]

# shuffling the numbers so that the website doesn't become suspicious of us
random.shuffle(numbers_url)
numbers_url = set(numbers_url)

In [None]:
# all the pickle files: the good, the bad, and the ugly
data_list     = load('csr_data_list')
failed_urls   = load('failed_urls')
finished_urls = load('finished_urls')

# removes the URL number if that corresponding page has been scraped successfully
for url in finished_urls:
    try:
        numbers_url.remove(url)
    except:
        continue

# tqdm() creates a progress bar to see how far you are from finishing your task
# scrapes the website in a random order while cycling through IP addresses and user agents
for number_url in tqdm(numbers_url):
    try:
        # being cautious
        sleep(randint(30,60))
        ua         = UserAgent()
        user_agent = ua.random
        headers    = {'User-Agent': user_agent}
        tor = TorRequest(password = 'commonhorse')
        tor.reset_identity()
        
        url = 'https://csr-sy.org/?l=1&sons=redirect&sequence=&name=&father_name=&surname='        + \
                '&age_from=0&age_to=120&gender=&born_state=&born_town=&career=&society_status='    + \
                '&sons_no=&medical_status=&incident_state=&incident_town=&incident_desc=3'         + \
                '&incident_date_from=&incident_date_to=&incident_details=&trial=&trial_date_from=' + \
                '&trial_date_to=&id=182&ddate_from=&ddate_to=&rec=' + number_url


        response = tor.get(url, headers = headers)

        html_soup = BeautifulSoup(response.text, 'html.parser')
        rows = html_soup.findAll('tr', {'title':'victim'})

        # start scraping one page
        for row in rows:
            columns = row.findAll('td')

            # saves the column data
            for i in range(len(columns)):
                if i == 0:
                    continue
                else:
                    data_list[i - 1].append(columns[i].text)
                    
        finished_urls.append(number_url)
        
        # stores the successfully scraped data
        save(data_list    , 'csr_data_list')
        # stores the finished URL number
        save(finished_urls, 'finished_urls')
    
    # pages that failed are stored in the pickle file failed_urls
    except Exception as e:
        print(e)
        print('\nFailed on: ', number_url)
        failed_urls.append(number_url)
        save(failed_urls, 'failed_urls')
        continue

When we're done with all the pages, we want to convert our saved pickle file into a CSV file (to make loading in the data easier).

In [None]:
# creates a data frame
victim_info = pd.DataFrame({
    'victim_id'  : data_list[0],
    'first_name' : data_list[1],
    'father_name': data_list[2],
    'last_name'  : data_list[3],
    'province'   : data_list[4],
    'town'       : data_list[5],
    'date'       : data_list[6]    
})

# saves our file as a CSV file
victim_info.to_csv(index=false)

### Refugee Data

#### Monthly Inflows

#### Yearly Refugee Status

## Data Wrangling

### Casualty Data

#### VDC

You can run the cell below to see what the dataset looks like without any modification.

In [None]:
vdc_df = load('vdc_df')
vdc_df

While the added details we got from scraping everythign from the website are valuable for more detailed analysis, these particular columns will be what we will be focusing on with this project:

- `Name                  `
- `Status                `
- `Sex                   `
- `Province              `
- `Area \ Place of Birth `
- `Date of death         `
- `Cause of death        `
- `Actors                `

And we can create a dataframe we will use to do all of their data frame so that we are not modifying the original dataset.

In [None]:
scratch = vdc_df[['Province',
                  'Sex',
                  'Status',
                  'Date of death',
                  'Cause of Death']].copy()

If we look at the `Sex` column, we can see that there is actually data about the person's minority status and age range, so we will create new columns to capture that information.

In [None]:
# We'll first want to drop any rows that don't have this information
scratch = scratch.dropna(subset=['Sex'])

def check_age(row):
    if 'Adult' in row['Sex']:
        val = 'adult'
    else:
        val = 'minor'
    return val

scratch['age_cat'] = scratch.apply(check_age, axis=1)

def check_sex(row):
    if 'Male' in row['Sex']:
        val = 'male'
    else:
        val = 'female'
    return val

scratch['sex'] = scratch.apply(check_sex, axis=1)

If we look at the `Cause of Death` coolumn, we'll see that there is some reduntant categories, so we'll simplify these categories by remapping those values based on a dictionary mapping we show below.

In [None]:
cause_of_death_map = {'Chemical and toxic gases'         : 'Chemical Weapon',
                      'Detention - Execution'            : 'Detention',
                      'Detention - Torture'              : 'Detention',
                      'Detention - Torture - Execution'  : 'Detention',
                      'Explosion'                        : 'Explosion',
                      'Field Execution'                  : 'Execution',
                      'Kidnapping - Execution'           : 'Execution',
                      'Kidnapping - Torture'             : 'Execution',
                      'Kidnapping - Torture - Execution' : 'Execution',
                      'Other'                            : 'Unknown'  ,
                      'Shelling'                         : 'Shelling' ,
                      'Shooting'                         : 'Shooting' ,
                      'Siege'                            : 'Siege'    ,
                      'Un-allowed to seek Medical help'  : 'Lack of Medical Access',
                      'Unknown'                          : 'Unknown'  ,
                      'Warplane shelling'                : 'Shelling' 
}

def check_cause_of_death(row, mapping):
    return mapping[row['Cause of Death']]

scratch['cause_of_death'] = scratch.apply(check_cause_of_death,
                                        args = (cause_of_death_map, ),
                                        axis = 1)

For convenience we can change the status column

In [None]:
def check_status(row):
    if row['Status'] == 'Non-Civilian':
        val = 'non_civilian'
    elif row['Status'] == 'Civilian':
        val = 'civilian'
    else:
        val = 'regime'
    return val

scratch['status'] = scratch.apply(check_status, axis=1)

Now that the dataset is cleaner, we can drop columns that irrelevant to us, and rename the columns for convenience.

In [None]:
scratch = scratch[['Province',
                  'sex',
                  'status',
                  'age_cat',
                  'Date of death',
                  'cause_of_death']].copy()

scratch.columns = ['province', 'sex', 'status', 'age_cat','date_of_death', 'cause_of_death']



We will now drop any entries with unrecroded or icorrect dates of death and convert the time strings to python datetime objects.

With all of those modifcations we can finally save this dataset as complete.

In [None]:
scratch = scratch[scratch['province'].isin(picked)]
scratch = scratch[scratch['date_of_death'] != '0000-00-00']
scratch = scratch[scratch['date_of_death'] != '1970-01-01']
scratch['date_of_death'] = pd.to_datetime(scratch['date_of_death'])

save(scratch, 'clean_vdc_df')

#### CSR

### Refugee Data

#### Monthly Inflows

#### Yearly Refugee Status

## Data Visualization

### Casualty Data

#### VDC

#### Year-by-Year

In [None]:
import pandas as pd
import geopandas as gpd
import matplotlib.pyplot as plt
import pandas_bokeh

import os
import json

# call this so that running plot_bokeh won't create 
# a new window and results will be shown in notebook
pandas_bokeh.output_notebook()

Since the data is provincial for VDC, we decided to use the provincial shapefile to map the data.

In [None]:
# read the shape file and save it as a geo data frame
shp_file = os.path.join('syr_admin_shp_utf8_18219', 'syr_admin1.shp')
map_df   = gpd.read_file(shp_file)

In [None]:
# read the VDC csv file and save it as a pandas data frame
dataset = pd.read_csv('vdc_data.csv', encoding='latin-1', dtype=str)

The VDC data set's "Date of death" column contains the month, day, and year the person died, but we only wanted to see the yearly fluctuations. Therefore, we took a substring of the date.

In [None]:
# data has month and day, so we took a substring of the year of death
dataset['Year of Death'] = dataset['Date of death'].str[:4]

# counts the number of times a province is in the dataset for a certain year
province_count = dataset.groupby(['Province', 'Year of Death']).count()

In [None]:
# remove unnecessary columns to make data frame smaller
simplified_df = province_count.drop(province_count.columns[1:], axis=1)
simplified_df = simplified_df.reset_index()

In [None]:
# remove unnecessary columns to make data frame smaller
simplified_df = province_count.drop(province_count.columns[1:], axis=1)
simplified_df = simplified_df.reset_index()

The slider for Bokeh maps works with column titles, not row values, so we had to pivot the table. The once "Date of death" values became column titles and the count of Syrian casualties for that particular year and country became the value.

In [None]:
# make it so years are columns rather than values
year_as_column = simplified_df.pivot_table('Unnamed: 0', 'Province', 'Year of Death')
year_as_column.reset_index(inplace=True)

In [None]:
# dropping irrelevant years
year_as_column = year_as_column.drop(['0000', '1970'], axis=1)

Some of the province names in the VDC data set and the shapefile for Syria were not the same, so we had to go through the names manually and see which ones were different so that we could match the different names.

In [None]:
# changing province names by hand
name_change = {
    'Damascus Suburbs': 'Rural Damascus',
    'Daraa': 'Dar\'a',
    'Deir Ezzor': 'Deir-ez-Zor',
    'Hasakeh': 'Al-Hasakeh',
    'Idlib': 'Idleb',
    'Raqqa': 'Ar-Raqqa',
    'Sweida': 'As-Sweida'
}

# renames the provinces using name_change
year_as_column.replace(name_change, inplace=True)

In [None]:
# joining data from casualties (VDC) and geo data frame (shape file)
merged = year_as_column.set_index('Province').join(map_df.set_index('NAME_EN'))
merged.reset_index(inplace=True)

In [None]:
# dropping irrelevant information
# row where there were no data for geo data
merged = merged.drop([10, 15], axis=0)
# columns with information not pertaining to creating choropleth map
merged.drop(merged.columns[9:16], axis=1, inplace=True)

The problem with plot_bokeh() is that it only takes immutable objects. However, the Pandas dataframe is mutable. Therefore, in order to bypass this problem, we decided to convert the Pandas dataframe into a GeoDataFrame. The following code describes that process.

In [None]:
# Pandas dataframe to GeoDataFrame
from geopandas import GeoDataFrame
from shapely.geometry import Point

geometry = merged['geometry']
merged_gdf = merged.drop(['geometry'], axis=1)
crs = {'init': 'epsg:4326'}
gdf = GeoDataFrame(merged_gdf, crs=crs, geometry=geometry)

In [None]:
# specify slider columns:
slider_columns = ["201%d"%i for i in range(1, 8)]
slider_range = range(2011, 2018)

# make slider plot:
gdf.plot_bokeh(
    figsize=(900, 600),
    slider=slider_columns,
    slider_range=slider_range,
    slider_name="Year", 
    colormap='Inferno',
    hovertool_columns=["Province"],
    title="Deaths in Syria",
)

#### Year-by-Year with plotly and mapbox

In [2]:
# import pickle and read the file
import pickle
final = pickle.load(open('./death_by_province_by_year.pickle', 'rb'))

In [3]:
# reference: https://plot.ly/python/scattermapbox/
# reference: https://community.periscopedata.com/t/36nz2s/plotly-choropleth-with-slider-map-charts-over-time
import plotly.plotly as py
import plotly.graph_objs as go
from plotly.offline import download_plotlyjs, init_notebook_mode, plot, iplot

# this is a public mapbox token
mapbox_access_token = 'pk.eyJ1IjoibWF0dGhld2lydHoiLCJhIjoiY2p3ZTNpNXlnMHYxcjQ5bzdwMjc0anlpeSJ9.bSLA-SSqEomk0hC52rNliQ'

In [13]:
# first year
year = 2011

# for every year we need the latitude, longitude, and the # of casualties
# we use a for loop to store the data for each year of the Syrian War
data_slider = []
for year in final['year'].unique():
    sect =  final[(final['year']== year)]

    for col in sect.columns:
        sect[col] = sect[col].astype(str)
        
    data_each_yr = go.Scattermapbox(
        name=str(year),
        lat=['36.2021',
             '33.5138',
             '33.5167',
             '32.6264',
             '35.3297',
             '35.1409',
             '36.5079',
             '34.7324',
             '35.9310',
             '33.1219',
             '35.5407',  
             '35.9594',  
             '32.7129', 
             '34.8959'],
        lon=['37.1343',
             '36.2765',
             '36.9541',
             '36.1033',
             '40.1350',
             '36.7552',
             '40.7463',
             '36.7137',
             '36.6418',
             '35.8209',
             '35.7953', 
             '38.9981',  
             '36.5663',
             '35.8867'],
        text=["Aleppo",
              "Damascus",
              "Rural Damascus",
              "Daraa",
              "Deir Ezzor",
              "Hama",
              "Hasakeh",
              "Homs",
              "Idlib",
              "Lattakia",
              "Quneitra", 
              "Ar Raqqah", 
              "As Suwayda",
              "Tartus"],
        mode='markers',
        marker = go.scattermapbox.Marker(
            size = sect['casualties'].astype(int),
            color = 'rgb(255, 0, 0)',
            sizemode = 'area',
        ), visible=False
    )
    data_slider.append(data_each_yr)



A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy



In [14]:
# the steps are for each of the possible places our slider can rest
steps = []
for i in range(len(data_slider)):
    step = dict(method='restyle',
                args=['visible', [False] * len(data_slider)],
                label='Year {}'.format(i + 2011))
    step['args'][1][i] = True
    steps.append(step)
    
sliders = [dict(active=0, pad={"t": 1}, steps=steps)]

In [15]:
# sets up our frame with the mapbox
layout = go.Layout(
    showlegend=False,
    title = go.layout.Title(
            text = '2011-2018 Syrian Casualties'
        ),
    autosize=True,
    hovermode='closest',
    sliders=sliders,
    mapbox=go.layout.Mapbox(
        accesstoken=mapbox_access_token,
        bearing=0,
        center=go.layout.mapbox.Center(
            lat=34.7,
            lon=37.2
        ),
        pitch=0,
        zoom=5.4
    ),
)

In [16]:
# visualize
fig = go.Figure(data=data_slider, layout=layout)
py.iplot(fig, filename='Multiple-Mapbox.html')
# plot(fig, filename='Multiple-Mapbox.html')
# use ^ to have the graph open into a seperate tab

#### Day-by-Day

The <b>Day-by-Day</b> code is similar to the <b>Year-by-Year</b> code, so we hope the viewer doesn't mind if we have less comments in this one.

In [None]:
import pickle

In [None]:
# read the shape file and save it as a geo data frame
shp_file = os.path.join('syr_admin_shp_utf8_18219', 'syr_admin1.shp')
map_df   = gpd.read_file(shp_file)

In [None]:
# load the pickle objects as a pandas data frame
day_df = pickle.load(open('./death_by_province_by_day.pickle', 'rb'))

# read the VDC csv file specifically for column "geometry"
dataset = pd.read_csv('vdc_data.csv', encoding='latin-1', dtype=str)

In [None]:
# changing province names by hand
name_change = {
    'Damascus Suburbs': 'Rural Damascus',
    'Daraa': 'Dar\'a',
    'Deir Ezzor': 'Deir-ez-Zor',
    'Raqqa': 'Ar-Raqqa',
    'Sweida': 'As-Sweida',
    'Idlib': 'Idleb',
    'Hasakeh': 'Al-Hasakeh',
}

# renames the provinces using name_change
day_df.replace(name_change, inplace=True)

In [None]:
# make it so days are columns rather than values
pivoted_df = day_df.pivot_table('casualties','province','day').fillna(0)

In [None]:
# joining the data frames in order to obtain the geo data
use = pivoted_df.join(map_df.set_index('NAME_EN'))

In [None]:
# removing any unnecessary columns
ready = use.drop(columns=['NAM_EN_REF','NAME_AR','PCODE','ADM0_EN','ADM0_AR','ADM0_PCODE','UPDATE_DAT'])

The total number of days from the first date of death to the last date of death is 2,686 days. Therefore, we created 2,686 columns for each day. (Note: this may take a while to finish running.)

In [None]:
# remove the "year-month-day time" and replace it with the "day"
for i in range(0, 2687):
    ready = ready.rename(index=str, columns={ready.columns[i]: str(i)})

In [None]:
# Pandas dataframe to GeoDataFrame
from geopandas import GeoDataFrame
from shapely.geometry import Point

geometry = ready['geometry']
crs = {'init': 'epsg:4326'}
day_gdf = GeoDataFrame(ready, crs=crs, geometry=geometry)

In [None]:
# make 'province' a column
day_gdf = day_gdf.reset_index()

The problem with our visualization is that most of the deaths are in the single to double digits. Very few times do the number of deaths on a single day go into the triple digits. However, the color bar is uniformly split, and we weren't able to find documentation on how to split up the bokeh color bar up, so we brute forced that colors in the color bar.<br><br>
What we wanted to display was that between 0 and 100, the difference in hue would be greater than the difference in hue between 101-600. Therefore, there would be a bigger variety of colors on the map at a certain time.

In [None]:
# specify slider columns
slider_columns = []
for i in range (0, 2687):
    slider_columns.append(str(i))

slider_range = range(0, 2687)

# make slider plot
day_gdf.plot_bokeh(
    figsize=(900, 600),
    slider=slider_columns,
    slider_range=slider_range,
    slider_name="Day",
    # brute force color bar for map
    colormap=['#edf8f3', '#dcf2e8', '#cbebdd', '#b9e5d2', '#a8dfc7', '#97d8bc', '#85d2b1', '#74cba6', '#63c59b', '#52bf90',
              '#52bf90', '#49ab81', '#419873', '#398564', '#317256', '#295f48', '#204c39', '#18392b', '#18392b', '#18392b', 
              '#10261c', '#10261c', '#10261c', '#10261c', '#10261c', '#10261c', '#10261c', '#10261c', '#10261c', '#10261c', 
              '#0a1812', '#0a1812', '#0a1812', '#0a1812', '#0a1812', '#0a1812', '#0a1812', '#0a1812', '#0a1812', '#0a1812',
              '#08140f', '#08140f', '#08140f', '#08140f', '#08140f', '#08140f', '#08140f', '#08140f', '#08140f', '#08140f', 
              '#07110c', '#07110c', '#07110c', '#07110c', '#07110c', '#07110c', '#07110c', '#07110c', '#07110c', '#07110c', 
              '#08130e', '#08130e', '#08130e', '#08130e', '#08130e', '#08130e', '#08130e', '#08130e', '#08130e', '#08130e', 
              '#08130e', '#08130e', '#08130e', '#08130e', '#08130e', '#08130e', '#08130e', '#08130e', '#08130e', '#08130e', 
              '#08130e', '#08130e', '#08130e', '#08130e', '#08130e', '#08130e', '#08130e', '#08130e', '#08130e', '#08130e', 
              '#08130e', '#08130e', '#08130e', '#08130e', '#08130e', '#08130e', '#08130e', '#08130e', '#08130e', '#08130e', 
              '#08130e', '#08130e', '#08130e', '#08130e', '#08130e', '#08130e', '#08130e', '#08130e', '#08130e', '#08130e',
              '#08130e', '#08130e', '#08130e', '#08130e', '#08130e', '#08130e', '#08130e', '#08130e', '#08130e', '#08130e',
              '#08130e', '#08130e', '#08130e', '#08130e', '#08130e', '#08130e', '#08130e', '#08130e', '#08130e', '#08130e', 
              '#08130e', '#08130e', '#08130e', '#08130e', '#08130e', '#08130e', '#08130e', '#08130e', '#08130e', '#08130e',
              '#08130e', '#08130e', '#08130e', '#08130e', '#08130e', '#08130e', '#08130e', '#08130e', '#08130e', '#08130e',
              '#08130e', '#08130e', '#08130e', '#08130e', '#08130e', '#08130e', '#08130e', '#08130e', '#08130e', '#08130e',
              '#08130e', '#08130e', '#08130e', '#08130e', '#08130e', '#08130e', '#08130e', '#08130e', '#08130e', '#08130e', 
              '#08130e', '#08130e', '#08130e', '#08130e', '#08130e', '#08130e', '#08130e', '#08130e', '#08130e', '#08130e', 
              '#08130e', '#08130e', '#08130e', '#08130e', '#08130e', '#08130e', '#08130e', '#08130e', '#08130e', '#08130e', 
              '#08130e', '#08130e', '#08130e', '#08130e', '#08130e', '#08130e', '#08130e', '#08130e', '#08130e', '#08130e', 
              '#08130e', '#08130e', '#08130e', '#08130e', '#08130e', '#08130e', '#08130e', '#08130e', '#08130e', '#08130e',
              '#08130e', '#08130e', '#08130e', '#08130e', '#08130e', '#08130e', '#08130e', '#08130e', '#08130e', '#08130e',
              '#08130e', '#08130e', '#08130e', '#08130e', '#08130e', '#08130e', '#08130e', '#08130e', '#08130e', '#08130e', 
              '#08130e', '#08130e', '#08130e', '#08130e', '#08130e', '#08130e', '#08130e', '#08130e', '#08130e', '#08130e',
              '#08130e', '#08130e', '#08130e', '#08130e', '#08130e', '#08130e', '#08130e', '#08130e', '#08130e', '#08130e',
              '#08130e', '#08130e', '#08130e', '#08130e', '#08130e', '#08130e', '#08130e', '#08130e', '#08130e', '#08130e',
    ],
    hovertool_columns=["province"],
    title="Deaths in Syria",
)

#### CSR

### Refugee Data

#### Monthly Inflows

#### Yearly Refugee Status

## Future Work

## Conclusion

## Appendix

## References