# Results Processing

This notebook processes the results of the Ensemble Model from `Ensemble-Attraction-Routing.ipynb` to assess the total number of refugees predicted to travel to each of Ukraine's neighbors from the Ukraine conflict. Note that Belarus and Russia are excluded from this analysis.

The results are calculated by splitting up the total refugees who left Ukraine (as of early April, 2022) across the conflict locations provided by Brunel University. This splitting is performed by weighting the population of each conflict location against the total refugee population. In other words, larger (by population) conflict locations will account for more refugees in this model.

In [1]:
import json
import pandas as pd
from IPython.display import display

## Read in locations and results data

First we load in the locations data from Brunel and the various results that were calculated by the ensemble model.

Note that the ensemble model was actually run 3 times:

1. Driving routes only
2. Transit routes only
3. Hybrid (transit, then falling back on driving)

This notebook shows the results of all three approaches and demonstrates that #3, the hybrid approach, works best for Ukraine.

In [2]:
df = pd.read_csv('inputs/locations.csv')
conflicts = df[df['location_type']=='conflict_zone']
camps = df[df['location_type']=='camp']
df.head()

Unnamed: 0,#name,region,country,latitude,longitude,location_type,conflict_date,population
0,Donetsk,Donetsk,Ukraine,48.023,37.80224,conflict_zone,0.0,1024700.0
1,Kadiyivka,Luhansk,Ukraine,48.56818,38.64352,conflict_zone,1.0,84425.0
2,Mariupol,Donetsk,Ukraine,47.09514,37.54131,conflict_zone,3.0,481626.0
3,Schastia,Luhansk,Ukraine,48.7412,39.2354,conflict_zone,3.0,11743.0
4,Uman,Cherkasy,Ukraine,48.7484,30.2218,conflict_zone,5.0,87658.0


Now we read in our route results:

In [3]:
with open('outputs/ukraine_border_crossing_directions.json','r') as f:
    conflict_exit_routes = json.loads(f.read())
    
with open('outputs/ukraine_border_crossing_directions_transit.json','r') as f:
    conflict_exit_routes_transit = json.loads(f.read())   
        
with open('outputs/ukraine_border_crossing_directions_transit_hybrid.json','r') as f:
    conflict_exit_routes_hybrid_transit = json.loads(f.read())   

with open('outputs/ukraine_border_crossing_directions_driving_hybrid.json','r') as f:
    conflict_exit_routes_hybrid_driving = json.loads(f.read())       

Below we define a function to select which routes file to use. For each transit mode we are able to attach the associated relevant destination:

In [4]:
def get_exit_route(row, mode):
    if mode == 'driving':
        routes = conflict_exit_routes
    elif mode == 'transit':
        routes = conflict_exit_routes_transit
    elif mode == 'hybrid_transit':
        routes = conflict_exit_routes_hybrid_transit
    elif mode == 'hybrid_driving':
        routes = conflict_exit_routes_hybrid_driving          
    try:
        dest = routes[row['#name']][0]['name']
        dest = camps[camps['#name']==dest].country.values[0]
    except:
        dest = None
    row[f'{mode}_destination'] = dest
    return row

def transit_mixed(row):
    '''
    Pick transit when available, otherwise pick driving (from hybrid route options)
    '''
    if row.hybrid_transit_destination == None:
        dest = row.hybrid_driving_destination
    else:
        dest = row.hybrid_transit_destination
    row['hybrid_mixed_destination'] = dest
    return row


conflicts = conflicts.apply(lambda row: get_exit_route(row, 'driving'), axis=1)
conflicts = conflicts.apply(lambda row: get_exit_route(row, 'transit'), axis=1)
conflicts = conflicts.apply(lambda row: get_exit_route(row, 'hybrid_transit'), axis=1)
conflicts = conflicts.apply(lambda row: get_exit_route(row, 'hybrid_driving'), axis=1)
conflicts = conflicts.apply(lambda row: transit_mixed(row), axis=1)


display(pd.DataFrame(conflicts.groupby(['transit_destination']).country.count()))
display(pd.DataFrame(conflicts.groupby(['hybrid_driving_destination']).country.count()))
display(pd.DataFrame(conflicts.groupby(['hybrid_transit_destination']).country.count()))
display(pd.DataFrame(conflicts.groupby(['hybrid_mixed_destination']).country.count()))

Unnamed: 0_level_0,country
transit_destination,Unnamed: 1_level_1
Moldova,13
Poland,16


Unnamed: 0_level_0,country
hybrid_driving_destination,Unnamed: 1_level_1
Moldova,9
Poland,21


Unnamed: 0_level_0,country
hybrid_transit_destination,Unnamed: 1_level_1
Moldova,10
Poland,19


Unnamed: 0_level_0,country
hybrid_mixed_destination,Unnamed: 1_level_1
Moldova,19
Poland,40


We can see that our final results above from the `hybrid_mixed_destination` table are best with most refugees going to Poland. In this context `hybrid_mixed_destination` is just the routing based on our actual ensemble which preferenced transit but also included driving directions as an option when no transit routes are available.

We can see that now our `conflicts` table has newly added columns tracking the resulting destinations from each approach:

In [5]:
conflicts.head()

Unnamed: 0,#name,region,country,latitude,longitude,location_type,conflict_date,population,driving_destination,transit_destination,hybrid_transit_destination,hybrid_driving_destination,hybrid_mixed_destination
0,Donetsk,Donetsk,Ukraine,48.023,37.80224,conflict_zone,0.0,1024700.0,Moldova,,,Moldova,Moldova
1,Kadiyivka,Luhansk,Ukraine,48.56818,38.64352,conflict_zone,1.0,84425.0,Moldova,Poland,Poland,,Poland
2,Mariupol,Donetsk,Ukraine,47.09514,37.54131,conflict_zone,3.0,481626.0,Moldova,,,Moldova,Moldova
3,Schastia,Luhansk,Ukraine,48.7412,39.2354,conflict_zone,3.0,11743.0,Moldova,,,Poland,Poland
4,Uman,Cherkasy,Ukraine,48.7484,30.2218,conflict_zone,5.0,87658.0,Moldova,Moldova,Moldova,,Moldova


## Results Calculations

We are now ready to identify number of refugees using each route. We'll rely on the UNHCR refugee counts from April 11, 2022 for this purpose:

In [6]:
df = pd.read_csv('inputs/unhcr_refugee_counts_4.11.22.csv')

# Basic data cleanup
df.Population = df.Population.apply(lambda x: int(x.replace(',','')))

# Remove Russia and Belarus
df[~df['Location name'].isin(['Russian Federation','Belarus'])]

Unnamed: 0,Location name,Source,Data date,Population
0,Poland,Government,10 Apr 2022,2622117
1,Romania,Government,10 Apr 2022,692501
2,Hungary,Government,10 Apr 2022,424367
3,Republic of Moldova,Government,10 Apr 2022,411365
5,Slovakia,Government,10 Apr 2022,317781


Now we calculate various population counts of refugees for each of our routing options based on whether we had a destination associated with the conflict. The goal is to get the total leaving population to be spread across each conflict location based on the relative population of that conflict location.

In all cases, the total sum of refugees going to each destination country must sum to the total refugees from UNHCR on 4/11/2022: `4468131`.

In [7]:
ref_total = df[~df['Location name'].isin(['Russian Federation','Belarus'])].Population.sum()
conflict_total_driving = int(conflicts[conflicts['driving_destination'].notnull()].population.sum())
conflict_total_transit = int(conflicts[conflicts['transit_destination'].notnull()].population.sum())
conflict_total_hybrid_mixed = int(conflicts[conflicts['hybrid_mixed_destination'].notnull()].population.sum())

conflicts['pop_adjusted_driving'] = conflicts.population * (ref_total/conflict_total_driving)
conflicts['pop_adjusted_transit'] = conflicts.population * (ref_total/conflict_total_transit)
conflicts['pop_adjusted_hybrid_mixed'] = conflicts.population * (ref_total/conflict_total_hybrid_mixed)

display(pd.DataFrame(conflicts.groupby(['driving_destination'])['pop_adjusted_driving'].sum()).round())
display(pd.DataFrame(conflicts.groupby(['transit_destination'])['pop_adjusted_transit'].sum()).round())
display(pd.DataFrame(conflicts.groupby(['hybrid_mixed_destination'])['pop_adjusted_hybrid_mixed'].sum()).round())

Unnamed: 0_level_0,pop_adjusted_driving
driving_destination,Unnamed: 1_level_1
Moldova,4264139.0
Poland,178004.0
Romania,25988.0


Unnamed: 0_level_0,pop_adjusted_transit
transit_destination,Unnamed: 1_level_1
Moldova,1625772.0
Poland,2842359.0


Unnamed: 0_level_0,pop_adjusted_hybrid_mixed
hybrid_mixed_destination,Unnamed: 1_level_1
Moldova,1286577.0
Poland,3181554.0


## Producing Final Results

Now we are ready to look at the percentage refugee shares produced by our ensemble model. We can use the gross number of refugees predicted above to back into the percentage of refugee shares predicted to be received by each neighboring country:

In [8]:
transit_res = pd.DataFrame(conflicts.groupby(['transit_destination'])['pop_adjusted_transit'].sum()).round().reset_index()
transit_res = transit_res.rename(columns={'transit_destination': 'country'})

hybrid_res = pd.DataFrame(conflicts.groupby(['hybrid_mixed_destination'])['pop_adjusted_hybrid_mixed'].sum()).round().reset_index()
hybrid_res = hybrid_res.rename(columns={'hybrid_mixed_destination': 'country'})

transit_res['transit_predicted_shares'] = transit_res.pop_adjusted_transit/transit_res.pop_adjusted_transit.sum()
display(transit_res)

hybrid_res['hybrid_mixed_predicted_shares'] = hybrid_res.pop_adjusted_hybrid_mixed/hybrid_res.pop_adjusted_hybrid_mixed.sum()
display(hybrid_res)

Unnamed: 0,country,pop_adjusted_transit,transit_predicted_shares
0,Moldova,1625772.0,0.36386
1,Poland,2842359.0,0.63614


Unnamed: 0,country,pop_adjusted_hybrid_mixed,hybrid_mixed_predicted_shares
0,Moldova,1286577.0,0.287945
1,Poland,3181554.0,0.712055


Now we read in the results of our `Attraction Model` so that we can compare the share of refugees predicted by the ensemble approach with that of just the pure `Attraction Model`:

In [9]:
# Load in the Attraction Model results
ukr_model_results = pd.read_csv('outputs/ukraine_model_results.csv')
results = ukr_model_results[['country','predicted_shares']]
results = pd.merge(results, hybrid_res, left_on='country', right_on='country', how='left')

# Merge them with the results of our various ensemble models
results = pd.merge(results, transit_res, left_on='country', right_on='country', how='left')
results = results.fillna(0)

# Load in the actual refugee counts from UNHCR and merge them with our results
df_ = df[~df['Location name'].isin(['Russian Federation','Belarus'])]
df_ = df_.replace('Republic of Moldova','Moldova')\
    .rename(columns={'Location name': 'country', 'Population': 'refugees_actual'})[['country','refugees_actual']]

results = pd.merge(results,df_,how='left',left_on='country',right_on='country')

display(results)

Unnamed: 0,country,predicted_shares,pop_adjusted_hybrid_mixed,hybrid_mixed_predicted_shares,pop_adjusted_transit,transit_predicted_shares,refugees_actual
0,Hungary,0.20307,0.0,0.0,0.0,0.0,424367
1,Moldova,0.233816,1286577.0,0.287945,1625772.0,0.36386,411365
2,Poland,0.422941,3181554.0,0.712055,2842359.0,0.63614,2622117
3,Romania,0.352537,0.0,0.0,0.0,0.0,692501
4,Slovakia,0.311319,0.0,0.0,0.0,0.0,317781


### Interpretation

The `pop_adjusted_hybrid_mixed` column gives the total refugees going to Poland (`3.2M`) and Moldova (`1.3M`) based on our best model: on that combines transit (when possible) and driving routes (when needed), weighted by a decay function that accounts for the "attractiveness" of those countries to refugees.

## Optional: Testing other Weighting Functions

Here we conclude with presenting the refugee migration flow estimates for two models:

1. `pop_adjusted_hybrid_mixed` is the total refugee flow predicted by the hybrid model where the min(duration) decision function is parameterized by the inverse of country level attraction
2. `predicted_final_naive_weighted` is the total refugee flow predicted by an even weighting of the attraction scores and the min(transit duration) results. These are evenly weighted.

In [10]:
results['refugees_predicted_share_norm'] = results['predicted_shares']/results['predicted_shares'].sum()

def naive_weighted(row, total_refugees):
    predicted = (row.predicted_shares*.5 + row.transit_predicted_shares*.5)
    row['refugees_predicted_share_weighted'] = predicted
    return row

def attraction_only(row, total_refugees):
    predicted = (row.predicted_shares*total_refugees)
    row['refugees_predicted_attactions'] = int(predicted.round())
    return row

total_refugees = results.refugees_actual.sum()
results = results.apply(lambda row: naive_weighted(row, total_refugees), axis=1)
results = results.apply(lambda row: attraction_only(row, total_refugees), axis=1)

results['refugees_predicted_share_weighted'] = results['refugees_predicted_share_weighted']/results['refugees_predicted_share_weighted'].sum()

results['predicted_final_hybrid'] = results['refugees_predicted_share_norm'] * total_refugees
results['predicted_final_naive_weighted'] = results['refugees_predicted_share_weighted'] * total_refugees

display(results[['country','pop_adjusted_hybrid_mixed','predicted_final_naive_weighted']].round(0))

Unnamed: 0,country,pop_adjusted_hybrid_mixed,predicted_final_naive_weighted
0,Hungary,0.0,359532.0
1,Moldova,1286577.0,1058173.0
2,Poland,3181554.0,1875082.0
3,Romania,0.0,624160.0
4,Slovakia,0.0,551184.0
