## Demographic Shares

Adds percentages to the demographics data

In [1]:
from pandas import DataFrame, read_csv
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np

We read the data from the csv file, replace the asterisk (protecting values between 1 and 4) with a 2 and add the column 'Total' which we need to calculate the share of each demographic group.

In [2]:
file = r'../data/unhcr_popstats_export_demographics_all_data.csv'
df = pd.read_csv(file, header=3)
df = df.replace(to_replace='*', value='2')
df.iloc[:, 3:19] = df.iloc[:, 3:19].apply(pd.to_numeric)
df['Total'] = df['F: Total'] + df['M: Total']
df['Total'] = pd.to_numeric(df['Total']).astype(np.int64)
df.dtypes

Year                                         int64
Country / territory of asylum/residence     object
Location Name                               object
Female 0-4                                 float64
Female 5-11                                float64
Female 5-17                                float64
Female 12-17                               float64
Female 18-59                               float64
Female 60+                                 float64
F: Unknown                                 float64
F: Total                                     int64
Male 0-4                                   float64
Male 5-11                                  float64
Male 5-17                                  float64
Male 12-17                                 float64
Male 18-59                                 float64
Male 60+                                   float64
M: Unknown                                 float64
M: Total                                     int64
Total                          

We calculate the share of each demographic group and add a column for it (`<original name>_share`).

In [3]:
def calculate_share(demographic):
    df[demographic + '_share'] = df[demographic] / df['Total']

In [4]:
for column in df.iloc[:, 3:19]:
    calculate_share(column)

In [5]:
print(df)

       Year Country / territory of asylum/residence  \
0      2001                             Afghanistan   
1      2001                             Afghanistan   
2      2001                             Afghanistan   
3      2001                                  Angola   
4      2001                                  Angola   
5      2001                                  Angola   
6      2001                                  Angola   
7      2001                                  Angola   
8      2001                                  Angola   
9      2001                                  Angola   
10     2001                                  Angola   
11     2001                                  Angola   
12     2001                                 Albania   
13     2001                                 Albania   
14     2001                                 Albania   
15     2001                                 Albania   
16     2001                                 Albania   
17     200

We save the dataframe as a csv file.

In [6]:
df.to_csv('../data/unhcr_demographics_share.csv', encoding='utf-8', index=False)