# Importing from Humanitarian Data Exchange (HUM) 

This scripts is pulling data from arcgis API, where data is stored behind this dashboard: https://data.humdata.org/dataset/covid-19-global-travel-restrictions-and-airline-information

The data is divided in two datasets: - COVID-19 restrictions by country: This dataset shows current travel restrictions. Information is collected from various sources: IATA, media, national sources, WFP internal or any other. - COVID-19 airline restrictions information: This dataset shows restrictions taken by individual airlines or country. Information is collected again from various sources including WFP internal and public sources.

In [None]:
import requests
import json
import pandas as pd
import datetime

In [None]:
# papermill parameters
output_folder = '../output/'

In [None]:
def get_df_from_arcgis_api(url):
    res = requests.get(url)
    json_response = json.loads(res.text.encode('utf-8'))
    data = [feature["attributes"] for feature in json_response["features"]]
    return pd.DataFrame(data)
    

 ### COUNTRY RESTRICTIONS

In [None]:
url = "https://services3.arcgis.com/t6lYS2Pmd8iVx1fy/ArcGIS/rest/services/COVID_Travel_Restrictions_V2/FeatureServer/0/query?where=1%3D1&outFields=*&f=pjson"
countryDf = get_df_from_arcgis_api(url)

In [None]:
countryDf

### Data Quality
1. rename columns
2. filtering data based on "Sources" and "Info Data" data, because the dataset has a lot of empty country data
3. drop unnecessary columns
4. converting date string to datetime format
5. adding Last Update Date column

In [None]:
reNamedCountryDf = countryDf.rename(columns = {
                            'adm0_name': 'COUNTRY',
                            'iso3':'ISO3_COUNTRY_CODE',
                            'X': 'LON',
                            'Y': 'LAN',
                            'published':'PUBLISHED',
                            'sources': 'SOURCES',
                            'info': 'RESTRICTION_TEXT',
                            'optional1': 'INFO_DATE',
                            'optional2': 'QUARANTINE_TEXT'})

cleanCountryDf = reNamedCountryDf[reNamedCountryDf['SOURCES'].notnull() & reNamedCountryDf['INFO_DATE'].notnull()]
cleanCountryDf = cleanCountryDf.drop(['optional3', 'ObjectId'], axis=1)
cleanCountryDf['PUBLISHED'] = pd.to_datetime(cleanCountryDf['PUBLISHED'].astype(str),format='%d.%m.%Y')
cleanCountryDf['INFO_DATE'] = pd.to_datetime(cleanCountryDf['INFO_DATE'].astype(str),format='%Y%m%d')
cleanCountryDf['LAST_UPDATE_DATE'] = datetime.datetime.utcnow()

In [None]:
cleanCountryDf.dtypes

In [None]:
cleanCountryDf.to_csv(output_folder + "HUM_RESTRICTIONS_COUNTRY.csv", index=False)

### AIRLINE RESTRICTIONS

In [None]:
url = "https://services3.arcgis.com/t6lYS2Pmd8iVx1fy/ArcGIS/rest/services/COVID_Airline_Information_V2/FeatureServer/0/query?where=1%3D1&outFields=*&f=pjson"
airlineDf = get_df_from_arcgis_api(url)

### Data Quality
1. rename columns
2. filtering data based on "Sources" and "Info Data" data, because the dataset has a lot of empty country data
3. drop unnecessary columns
4. converting date string to datetime format
5. adding Last Update Date column

In [None]:
reNamedAirlineDf = airlineDf.rename(columns = {
                            'adm0_name': 'COUNTRY',
                            'iso3':'ISO3_COUNTRY_CODE',
                            'X': 'LON',
                            'Y': 'LAN',
                            'published':'PUBLISHED',
                            'source': 'SOURCE',
                            'airline': 'AIRLINE',
                            'info': 'RESTRICTION_TEXT'})

cleanAirlineDf = reNamedAirlineDf[reNamedAirlineDf['RESTRICTION_TEXT'].notnull()]
cleanAirlineDf = cleanAirlineDf.drop(['optional1', 'optional2', 'optional3', 'ObjectId'], axis=1)
cleanAirlineDf['PUBLISHED'] = pd.to_datetime(cleanAirlineDf['PUBLISHED'].astype(str),format='%d.%m.%Y')
cleanAirlineDf['LAST_UUPDATE_DATE'] = datetime.datetime.utcnow()

In [None]:
cleanAirlineDf.dtypes

In [None]:
cleanAirlineDf.to_csv(output_folder + "HUM_RESTRICTIONS_AIRLINE.csv", index=False)