# Outreach Data

This notebook is just looking at the raw data, downloading it and getting it in the right shape.

#### Requirements

*Note-to-self:* Here's a list of things to keep in mind when going through the data/analysis.

- Gaza, Tel Aviv, Israel, and Palestine need segmentation. (To be done by Natasa)
- Some projects/classifications will need links added.

In [2]:
# Project variables
data_dir = '../data/'

outreach_f = data_dir + 'BTL-outreach.json'

In [3]:
# Libraries
import pandas as pd
import json

%matplotlib inline

#### Extract Outreach data from ArcGIS data.

In [4]:
# Read in the ArcGIS data
with open(outreach_f) as data_file:    
    data = json.load(data_file)

In [5]:
data.keys()

dict_keys(['operationalLayers', 'baseMap', 'spatialReference', 'authoringApp', 'authoringAppVersion', 'version'])

In [17]:
oper_layers = data['operationalLayers'][0]

In [21]:
print("Layer keys: {}".format(oper_layers.keys()))

oper_layers['featureCollection']['layers']

Layer keys: dict_keys(['layerType', 'id', 'title', 'featureCollection', 'visibility', 'opacity'])


[{'featureSet': {'features': [{'attributes': {'Category': 'International Conference',
      'Country': 'Greece',
      'FID': 1,
      'Year': 2006,
      'link': 'https://iwp.uiowa.edu/programs/international-conferences/the-new-symposium/2006'},
     'geometry': {'spatialReference': {'latestWkid': 3857, 'wkid': 102100},
      'x': 2449028.4976999983,
      'y': 4721671.4454}},
    {'attributes': {'Category': 'Reading tour',
      'Country': 'Jordan ',
      'FID': 2,
      'Year': 2007,
      'link': 'https://iwp.uiowa.edu/programs/reading-abroad/2007-middle-east'},
     'geometry': {'spatialReference': {'latestWkid': 3857, 'wkid': 102100},
      'x': 4007501.269099999,
      'y': 3632748.677199997}},
    {'attributes': {'Category': 'Reading tour',
      'Country': 'Palestinian Territories',
      'FID': 3,
      'Year': 2007,
      'link': 'https://iwp.uiowa.edu/programs/reading-abroad/2007-middle-east'},
     'geometry': {'spatialReference': {'latestWkid': 3857, 'wkid': 102100},
   

In [23]:
layers = oper_layers['featureCollection']['layers']
len(layers)

1

In [31]:
features = layers[0]['featureSet']['features']

print ("{} many features included.".format(len(features)))

45 many features included.


In [32]:
features[0]

{'attributes': {'Category': 'International Conference',
  'Country': 'Greece',
  'FID': 1,
  'Year': 2006,
  'link': 'https://iwp.uiowa.edu/programs/international-conferences/the-new-symposium/2006'},
 'geometry': {'spatialReference': {'latestWkid': 3857, 'wkid': 102100},
  'x': 2449028.4976999983,
  'y': 4721671.4454}}

In [34]:
filt_features = [x['attributes'] for x in features]
df = pd.DataFrame(filt_features)

df.drop('FID', axis=1, inplace=True)
df.head()

Unnamed: 0,Category,Country,Year,link
0,International Conference,Greece,2006,https://iwp.uiowa.edu/programs/international-c...
1,Reading tour,Jordan,2007,https://iwp.uiowa.edu/programs/reading-abroad/...
2,Reading tour,Palestinian Territories,2007,https://iwp.uiowa.edu/programs/reading-abroad/...
3,Reading tour,Turkey,2007,https://iwp.uiowa.edu/programs/reading-abroad/...
4,International Conference,Greece,2007,https://iwp.uiowa.edu/programs/international-c...


#### Rename Countries to match participant naming.

In [36]:
df.Country.unique().tolist()

['Greece',
 'Jordan ',
 'Palestinian Territories',
 'Turkey',
 'Cyprus',
 'Oman',
 'Yemen',
 'Kenya',
 'China',
 'Tunisia',
 'Nepal',
 'Pakistan',
 'United Arab Emirates',
 'Afghanistan',
 'Uruguay',
 'Bolivia',
 'Mozambique',
 'Sudan',
 'South Sudan',
 'Uzbekistan',
 'Turkmenistan',
 'Burma',
 'Maldives',
 'Haiti',
 'Cuba',
 'Armenia',
 'Venezuela',
 'Colombia',
 'Ukraine',
 'Bahrain',
 'South Africa',
 'India',
 'Ecuador']

In [39]:
country_aliases = {
    'USA': 'United States of America',
    'U.S.': 'United States of America',
    'US': 'United States of America',
    'Kuwait (but she is Lebanese)': 'Kuwait',
    'Syria ': 'Syria',
    'Palestinian Territories': 'Palestine',
    'Palestinian Territories ': 'Palestine',
    'Jordan ': 'Jordan',
    'Burma': 'Myanmar'
}

df['Country'] = df['Country'].map(lambda x: country_aliases[x] if x in country_aliases else x)

In [40]:
df.Country.tolist()

['Greece',
 'Jordan',
 'Palestine',
 'Turkey',
 'Greece',
 'Cyprus',
 'Oman',
 'Yemen',
 'Greece',
 'Kenya',
 'China',
 'Tunisia',
 'China',
 'China',
 'Nepal',
 'Pakistan',
 'United Arab Emirates',
 'Afghanistan',
 'Uruguay',
 'Bolivia',
 'Kenya',
 'China',
 'Mozambique',
 'Turkey',
 'Sudan',
 'South Sudan',
 'Uzbekistan',
 'Turkmenistan',
 'Myanmar',
 'Maldives',
 'Haiti',
 'Cuba',
 'China',
 'United Arab Emirates',
 'Turkey',
 'Armenia',
 'Venezuela',
 'Colombia',
 'Ukraine',
 'Jordan',
 'Bahrain',
 'Jordan',
 'South Africa',
 'India',
 'Ecuador']

In [43]:
df.head(2)

Unnamed: 0,category,country,year,link
0,International Conference,Greece,2006,https://iwp.uiowa.edu/programs/international-c...
1,Reading tour,Jordan,2007,https://iwp.uiowa.edu/programs/reading-abroad/...


#### Rename Columns

Something human readable, prefferably.

In [42]:
new_cols = ['category', 'country', 'year', 'link']
df.columns = new_cols

df.head(2)

Unnamed: 0,category,country,year,link
0,International Conference,Greece,2006,https://iwp.uiowa.edu/programs/international-c...
1,Reading tour,Jordan,2007,https://iwp.uiowa.edu/programs/reading-abroad/...


---

In [44]:
outreach_canon_f = '../data/canonical/outreach-programs.csv'
df.to_csv(outreach_canon_f, index=False)