# Outreach Data

This notebook is just looking at the raw data, downloading it and getting it in the right shape.

#### Requirements

*Note-to-self:* Here's a list of things to keep in mind when going through the data/analysis.

- Gaza, Tel Aviv, Israel, and Palestine need segmentation. (To be done by Natasa)
- Some projects/classifications will need links added.

In [2]:
# Project variables
data_dir = '../data/'

outreach_f = data_dir + 'IWP_map.csv'

In [3]:
# Libraries
import pandas as pd
import json

%matplotlib inline

#### Extract Outreach data from CSV

In [6]:
df = pd.read_csv(outreach_f)

print ("{} rows".format(len(outreach_f)))
print (df.columns.tolist())
df.head(20)

19 rows
['Category', 'Country', 'Year', 'link']


Unnamed: 0,Category,Country,Year,link
0,International Conference,Greece,2006,https://iwp.uiowa.edu/programs/international-c...
1,Reading tour,Syria - Aleppo,2007,https://iwp.uiowa.edu/programs/reading-abroad/...
2,Reading tour,Syria - Damascus,2007,https://iwp.uiowa.edu/programs/reading-abroad/...
3,Reading tour,Jordan,2007,https://iwp.uiowa.edu/programs/reading-abroad/...
4,Reading tour,Jerusalem,2007,https://iwp.uiowa.edu/programs/reading-abroad/...
5,Reading tour,Palestinian Territories,2007,https://iwp.uiowa.edu/programs/reading-abroad/...
6,Reading tour,Turkey,2007,https://iwp.uiowa.edu/programs/reading-abroad/...
7,International Conference,Greece,2007,https://iwp.uiowa.edu/programs/international-c...
8,Reading tour,Cyprus,2008,http://iwp.uiowa.edu/programs/reading-abroad/2...
9,Reading tour,Oman,2008,http://iwp.uiowa.edu/programs/reading-abroad/2...


In [10]:
filt_features = [x['attributes'] for x in features]
df = pd.DataFrame(filt_features)

df.drop('FID', axis=1, inplace=True)
df.head()

Unnamed: 0,Category,Country,Year,link
0,International Conference,Greece,2006,https://iwp.uiowa.edu/programs/international-c...
1,Reading tour,Jordan,2007,https://iwp.uiowa.edu/programs/reading-abroad/...
2,Reading tour,Palestinian Territories,2007,https://iwp.uiowa.edu/programs/reading-abroad/...
3,Reading tour,Turkey,2007,https://iwp.uiowa.edu/programs/reading-abroad/...
4,International Conference,Greece,2007,https://iwp.uiowa.edu/programs/international-c...


#### Rename Countries to match participant naming.

In [11]:
df.Country.unique().tolist()

['Greece',
 'Jordan ',
 'Palestinian Territories',
 'Turkey',
 'Cyprus',
 'Oman',
 'Yemen',
 'Kenya',
 'China',
 'Tunisia',
 'Nepal',
 'Pakistan',
 'United Arab Emirates',
 'Afghanistan',
 'Uruguay',
 'Bolivia',
 'Mozambique',
 'Sudan',
 'South Sudan',
 'Uzbekistan',
 'Turkmenistan',
 'Burma',
 'Maldives',
 'Haiti',
 'Cuba',
 'Armenia',
 'Venezuela',
 'Colombia',
 'Ukraine',
 'Bahrain',
 'South Africa',
 'India',
 'Ecuador']

In [12]:
# name_sort aliases
name_sort = {
    'Russia': 'Russian Federation',
    #
    'USA': 'United States of America',
    'U.S.': 'United States of America',
    'US': 'United States of America',
    #
    'Burma': 'Myanmar',
    #
    'Kyrgyzstan': 'Kyrgyz Republic',
    #
    'Kuwait (but she is Lebanese)': 'Kuwait',
    #
    'Syria': 'Syrian Arab Republic',
    'Syria (Palestinian Syrian)': 'Syrian Arab Republic',
    #
    'Egypt': 'Egypt, Arab Rep.',
    #
    'Palestine': 'Palestine (West Bank and Gaza)',
    'West Bank': 'Palestine (West Bank and Gaza)',
    'Gaza': 'Palestine (West Bank and Gaza)',
    'Palestinian Territories': 'Palestine (West Bank and Gaza)',
    'Palestinian Territories (Gaza)': 'Palestine (West Bank and Gaza)',
    'Palestinian Territories (West Bank)': 'Palestine (West Bank and Gaza)',
    #
    'Venezuela': 'Venezuela, RB',
    #
    'Yemen': 'Yemen, Rep.'
    
}

# strip trailing spaces before dict lookup
df['Country'] = df['Country'].str.strip()

# dict lookup
df['Country'] = df['Country'].map(lambda x: name_sort[x] if x in name_sort else x)

In [13]:
"""country_aliases = {
    'USA': 'United States of America',
    'U.S.': 'United States of America',
    'US': 'United States of America',
    'Kuwait (but she is Lebanese)': 'Kuwait',
    'Syria ': 'Syria',
    'Palestinian Territories': 'Palestine',
    'Palestinian Territories ': 'Palestine',
    'Jordan ': 'Jordan',
    'Burma': 'Myanmar'
}

df['Country'] = df['Country'].map(lambda x: country_aliases[x] if x in country_aliases else x)"""

"country_aliases = {\n    'USA': 'United States of America',\n    'U.S.': 'United States of America',\n    'US': 'United States of America',\n    'Kuwait (but she is Lebanese)': 'Kuwait',\n    'Syria ': 'Syria',\n    'Palestinian Territories': 'Palestine',\n    'Palestinian Territories ': 'Palestine',\n    'Jordan ': 'Jordan',\n    'Burma': 'Myanmar'\n}\n\ndf['Country'] = df['Country'].map(lambda x: country_aliases[x] if x in country_aliases else x)"

In [14]:
df.Country.tolist()

['Greece',
 'Jordan',
 'Palestine (West Bank and Gaza)',
 'Turkey',
 'Greece',
 'Cyprus',
 'Oman',
 'Yemen, Rep.',
 'Greece',
 'Kenya',
 'China',
 'Tunisia',
 'China',
 'China',
 'Nepal',
 'Pakistan',
 'United Arab Emirates',
 'Afghanistan',
 'Uruguay',
 'Bolivia',
 'Kenya',
 'China',
 'Mozambique',
 'Turkey',
 'Sudan',
 'South Sudan',
 'Uzbekistan',
 'Turkmenistan',
 'Myanmar',
 'Maldives',
 'Haiti',
 'Cuba',
 'China',
 'United Arab Emirates',
 'Turkey',
 'Armenia',
 'Venezuela, RB',
 'Colombia',
 'Ukraine',
 'Jordan',
 'Bahrain',
 'Jordan',
 'South Africa',
 'India',
 'Ecuador']

In [15]:
df.head(2)

Unnamed: 0,Category,Country,Year,link
0,International Conference,Greece,2006,https://iwp.uiowa.edu/programs/international-c...
1,Reading tour,Jordan,2007,https://iwp.uiowa.edu/programs/reading-abroad/...


#### Rename Columns

Something human readable, prefferably.

In [16]:
new_cols = ['category', 'country', 'year', 'link']
df.columns = new_cols

df.head(2)

Unnamed: 0,category,country,year,link
0,International Conference,Greece,2006,https://iwp.uiowa.edu/programs/international-c...
1,Reading tour,Jordan,2007,https://iwp.uiowa.edu/programs/reading-abroad/...


---

In [17]:
outreach_canon_f = '../data/canonical/outreach-programs.csv'
df.to_csv(outreach_canon_f, index=False)