<a href="https://colab.research.google.com/github/iragca/DS313/blob/main/notebooks/Phase1-Group3-hdx.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Phase 1: API Selection
**DMA / ADBMS Final Project**

Members:


*   Chris Andrei Irag*
*   Hernel Juanico
*   Keith Laspoña
*   Airyll Sanchez
*   Kobe Marco Olaguir
*   Ruszed Jy Ayad

University of Science and Technology of Southern Philippines

*irag.chrisandrei@mailbox.org

## Introduction

The API we chose is from the Humanitarian Data Exchange by the United Nations Office for the Coordination of Humanitarian Affairs. The API provides live indicator data from their databases, we can retrieve at maximum 10,000 rows of data per each request. The APIs presented in this notebook can give us data that is workable if we were to have any specific question we want to answer regarding any humanitarian related questions.

For example, we can do analysis on human living conditions across countries and discover disparities between developed and non developed countries in humanitarian responses including diseases, disasters, or any related subject.

The picture attached is a one-page introduction for HDX.

![Human](https://github.com/iragca/DS313/blob/main/hdx.png?raw=true)

[Click this FAQ link for more information.](https://data.humdata.org/faq)

## Code

Before starting, we need to import the following libraries:

*   `pandas`: Used for transforming and viewing tabular data.
*   `requests`: Enables us to perform API requests to servers.
*   `json`:  Helps parse and work with JSON-structured data from server responses.
*   `hdx-python-api`: A Python library that provides access to published datasets on the Humanitarian Data Exchange (HDX).

In [None]:
import pandas as pd
import requests
import json

try:
  from hdx.api.configuration import Configuration
  from hdx.data.dataset import Dataset
except ModuleNotFoundError:
  !pip install hdx-python-api
  from hdx.api.configuration import Configuration
  from hdx.data.dataset import Dataset

Reading the documentation is essential when using their APIs. It provides informative and easy-to-understand content.

Documentation:
*   https://data.humdata.org/faqs/devs
*   https://hdx-python-api.readthedocs.io/en/latest/
*   https://github.com/OCHA-DAP/hdx-python-api

HAPI Documentation:
*   https://hdx-hapi.readthedocs.io/en/latest/
*   https://hapi.humdata.org/docs



### API Identifier

This serves as a formal identifier for the user running this script when accessing their database. Please ensure this Jupyter Notebook is used ethically and without any malicious intent.

In [None]:
url = 'https://hapi.humdata.org/api/v1/encode_app_identifier?application=dmaadbms&email=irag.chrisandrei%40mailbox.org'
response = requests.get(url)
key = json.loads(response.text) #response contains json, we turn that into a python dictionary
api_identifier = key['encoded_app_identifier']
api_identifier

'ZG1hYWRibXM6aXJhZy5jaHJpc2FuZHJlaUBtYWlsYm94Lm9yZw=='

In [None]:
api_endpoint = f"https://hapi.humdata.org/api/v1/themes/3w?app_identifier={api_identifier}" # this variable is never used in this notebook

### Indicator APIs

These sets of APIs are used for accessing each table in the indicator database, which is presumably live data.


Run the following cell to query the database for a specific table using default parameters.
The data will be acquired from the "Affected People: Humanitarian Needs" table.

In [None]:
aphn_url = "https://hapi.humdata.org/api/v1/affected-people/returnees?output_format=csv&app_identifier=ZG1hYWRibXM6aXJhZy5jaHJpc2FuZHJlaUBtYWlsYm94Lm9yZw%3D%3D&limit=10000&offset=0"

aphn_response = requests.get(aphn_url)
aphn_response

<Response [200]>

Edit and run the following cell instead if there are specific queries that need to be made.

[Reference material for each parameter](https://hapi.humdata.org/docs#/Affected%20People/get_humanitarian_needs_api_v1_affected_people_humanitarian_needs_get)

In [None]:
try:
  params = {
      # "application": "dmaadbms",
      # "email": irag.chrisandrei%40mailbox.org
      # "category": None,
      # "sector_code": None,
      # "population_status": None,
      # ...
      "output_format": "csv",
      "app_identifier": api_identifier,
      "limit": 3,
      "offset": 0
  }

  # aphn_base_url = "https://hapi.humdata.org/api/v1/affected-people/humanitarian-needs"

  aphn_response = requests.get(aphn_base_url, params=params)
  aphn_response
except Exception as e:
  print("Comment out aphn_base_url if you want to run specific queries.")

Comment out aphn_base_url if you want to run specific queries.


The API response contains text. We can write the text into a *.csv* and save it to the current working directory.

In [None]:
with open('aphn_response.csv', 'w') as f:
  f.write(aphn_response.text)

Read the saved *.csv* file as a pandas dataframe.

In [None]:
data = pd.read_csv('aphn_response.csv')

In [None]:
data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 10000 entries, 0 to 9999
Data columns (total 15 columns):
 #   Column                  Non-Null Count  Dtype  
---  ------                  --------------  -----  
 0   resource_hdx_id         10000 non-null  object 
 1   origin_location_ref     10000 non-null  int64  
 2   asylum_location_ref     10000 non-null  int64  
 3   population_group        10000 non-null  object 
 4   gender                  10000 non-null  object 
 5   age_range               10000 non-null  object 
 6   min_age                 7699 non-null   float64
 7   max_age                 6165 non-null   float64
 8   population              10000 non-null  int64  
 9   reference_period_start  10000 non-null  object 
 10  reference_period_end    10000 non-null  object 
 11  origin_location_code    10000 non-null  object 
 12  origin_location_name    10000 non-null  object 
 13  asylum_location_code    10000 non-null  object 
 14  asylum_location_name    10000 non-null 

In [None]:
data

Unnamed: 0,resource_hdx_id,origin_location_ref,asylum_location_ref,population_group,gender,age_range,min_age,max_age,population,reference_period_start,reference_period_end,origin_location_code,origin_location_name,asylum_location_code,asylum_location_name
0,295cd9e4-8464-43ee-ad17-47196991a1f7,1,1,RET,f,0-4,0.0,4.0,0,2001-01-01 00:00:00,2001-12-31 00:00:00,AFG,Afghanistan,AFG,Afghanistan
1,295cd9e4-8464-43ee-ad17-47196991a1f7,1,1,RET,f,0-4,0.0,4.0,0,2002-01-01 00:00:00,2002-12-31 00:00:00,AFG,Afghanistan,AFG,Afghanistan
2,295cd9e4-8464-43ee-ad17-47196991a1f7,1,1,RET,f,0-4,0.0,4.0,40227,2003-01-01 00:00:00,2003-12-31 00:00:00,AFG,Afghanistan,AFG,Afghanistan
3,295cd9e4-8464-43ee-ad17-47196991a1f7,1,1,RET,f,0-4,0.0,4.0,80027,2004-01-01 00:00:00,2004-12-31 00:00:00,AFG,Afghanistan,AFG,Afghanistan
4,295cd9e4-8464-43ee-ad17-47196991a1f7,1,1,RET,f,0-4,0.0,4.0,71692,2005-01-01 00:00:00,2005-12-31 00:00:00,AFG,Afghanistan,AFG,Afghanistan
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
9995,295cd9e4-8464-43ee-ad17-47196991a1f7,196,196,RDP,f,12-17,12.0,17.0,0,2004-01-01 00:00:00,2004-12-31 00:00:00,SRB,Serbia,SRB,Serbia
9996,295cd9e4-8464-43ee-ad17-47196991a1f7,196,196,RDP,f,12-17,12.0,17.0,0,2005-01-01 00:00:00,2005-12-31 00:00:00,SRB,Serbia,SRB,Serbia
9997,295cd9e4-8464-43ee-ad17-47196991a1f7,196,196,RDP,f,12-17,12.0,17.0,98,2006-01-01 00:00:00,2006-12-31 00:00:00,SRB,Serbia,SRB,Serbia
9998,295cd9e4-8464-43ee-ad17-47196991a1f7,196,196,RDP,f,12-17,12.0,17.0,102,2007-01-01 00:00:00,2007-12-31 00:00:00,SRB,Serbia,SRB,Serbia


### Datasets acquisition

This API is for acquiring published datasets. This data is separate from the HAPI indicator database and are in the form of *.csvs*.

Documentation:
*   https://data.humdata.org/faqs/devs
*   https://hdx-python-api.readthedocs.io/en/latest/
*   https://github.com/OCHA-DAP/hdx-python-api

In [None]:
try:
  Configuration.create(hdx_site="prod", user_agent="A_Quick_Example", hdx_read_only=True) #default config
except Exception as e:
  print(e)

dataset = Dataset.read_from_hdx("world-bank-combined-indicators-for-philippines") #indicator data
resources = dataset.get_resources()

Configuration already created!


In [None]:
print(
    type(resources),
    len(resources)
    )

<class 'list'> 2


 What we get is a list with varying length for different indicators.

 Each element is a python dictionary.

In [None]:
for i in resources[0]:
  print(i, '---', resources[0][i])

alt_url --- https://data.humdata.org/dataset/bca4e35b-ac20-4d64-b6f7-34a63257ae8a/resource/e1c1e4f9-3026-46d5-b810-52fdac6811df/download/
cache_last_updated --- None
cache_url --- None
created --- 2019-11-22T13:56:45.248693
dataset_preview_enabled --- False
datastore_active --- False
description --- HXLated csv containing Economic, Social, Environmental, Health, Education, Development and Energy indicators
download_url --- https://data.humdata.org/dataset/bca4e35b-ac20-4d64-b6f7-34a63257ae8a/resource/e1c1e4f9-3026-46d5-b810-52fdac6811df/download/indicators_phl.csv
format --- CSV
fs_check_info --- [{"state": "processing", "message": "The processing of the file structure check has started", "timestamp": "2024-03-27T20:45:08.789553"}, {"state": "success", "message": "File structure check completed", "timestamp": "2024-03-27T20:45:13.465770", "sheet_changes": [{"name": "__DEFAULT__", "event_type": "spreadsheet-sheet-changed", "changed_fields": [{"field": "nrows", "new_value": 85429, "new_d

We fit the data into one list for easy access.

In [None]:
dfs = []

for i in resources:
    dfs.append(pd.read_csv(i['url']))

In [None]:
dfs[0]

Unnamed: 0,Country Name,Country ISO3,Year,Indicator Name,Indicator Code,Value
0,#country+name,#country+code,#date+year,#indicator+name,#indicator+code,#indicator+value+num
1,Philippines,PHL,2021,Fertilizer consumption (% of fertilizer produc...,AG.CON.FERT.PT.ZS,329.546723904379
2,Philippines,PHL,2020,Fertilizer consumption (% of fertilizer produc...,AG.CON.FERT.PT.ZS,634.156607196609
3,Philippines,PHL,2019,Fertilizer consumption (% of fertilizer produc...,AG.CON.FERT.PT.ZS,1403.63254077089
4,Philippines,PHL,2018,Fertilizer consumption (% of fertilizer produc...,AG.CON.FERT.PT.ZS,366.21123428811
...,...,...,...,...,...,...
86207,Philippines,PHL,1981,Travel services (% of commercial service exports),TX.VAL.TRVL.ZS.WT,22.4543080939948
86208,Philippines,PHL,1980,Travel services (% of commercial service exports),TX.VAL.TRVL.ZS.WT,26.3591433278418
86209,Philippines,PHL,1979,Travel services (% of commercial service exports),TX.VAL.TRVL.ZS.WT,26.9535673839185
86210,Philippines,PHL,1978,Travel services (% of commercial service exports),TX.VAL.TRVL.ZS.WT,25.990099009901
