## Introduction ## 

This is a small Notebook that prepares data that I will use in the summary introduction notebook as well as in the interactive Dash app.

It produces to dataframes: 

- First, it attaches Regions metadata to the Group Years data.
- Second, it creates a dataframes that counts number of changes by group and by year


In [1]:
import pandas as pd
import numpy as nd


group_years = pd.read_csv("./data/group_years.csv")
print(group_years.shape) ## 2180 x 20
#print(group_years.head)
print(list(group_years.columns)) ## includes readable name, dataset ID. 



(2180, 20)
['Unnamed: 0', 'ucdp_name', 'ucdp_dset_id', 'year', 'modeledarticles', 'countT1', 'countT2', 'propT1', 'propT2', 'propdiff', 'propdif.L1', 'propdif.L2', 'delta1', 'delta1.5', 'delta2', 'delta1_L2', 'gap25', 'gap50', 'counter', 'frexWords']


In [2]:
## Load helper df that maps groups to regions of activity
## (Pulled out of UCDP GED)
region_key = pd.read_csv("./data/region_key.csv")

## ID groups associated with more than one region
## Call that "multiple"

duplicates = region_key[region_key.duplicated(subset=["side_b_dset_id"])]

region_key.loc[region_key['side_b_dset_id'].isin(
    duplicates["side_b_dset_id"].values), "region"] = "Multiple"

region_key = region_key[['side_b_dset_id', "region"]].drop_duplicates()

## Verify should be 260 x 2, so just unique group names & region

print(region_key.shape) ## 260 x 2 v

## Quick summary table:
print(region_key["region"].value_counts())


(260, 2)
region
Africa         111
Asia            75
Middle East     25
Europe          22
Multiple        14
Americas        13
Name: count, dtype: int64


In [3]:
## Merge in Regions 
group_years = group_years.merge(region_key,
                                left_on = "ucdp_dset_id",
                                right_on = "side_b_dset_id")

print(group_years.shape) ## 2180 x 22

(2180, 22)


## Save 

To import into the analysis notebook, and to copy to the Dash app directory

In [5]:
group_years.to_csv("./data/group_years_regions.csv", index = False)