This notebook is to explore the `hdx-python-api` library capabilities to download the United Nations Office for the Coordination of Humanitarian Affairs (OCHA) national and subnational administrative boundaries from [https://data.humdata.org/](https://data.humdata.org/)

Find the documentation for the library at [HDX Python API](https://hdx-python-api.readthedocs.io/en/latest/)

In [1]:
import pandas as pd

In [2]:
from pprint import pprint

According to the documentation, we first need to create a configuration object.

In [3]:
from hdx.api.configuration import Configuration
from hdx.data.dataset import Dataset
from hdx.utilities.easy_logging import setup_logging

Configuration.create(hdx_site="prod", user_agent="MyOrg_MyProject", hdx_read_only=True)



'https://data.humdata.org'

We're interested in searching for the subnational boundaries, a product from the Common Operational Databases (COD) initiative.

In [4]:
datasets = Dataset.search_in_hdx("subnational boundaries")

COD administrative boundaries datasets names start with `cod-ab`.

In [5]:
ochas = [
    dataset for dataset in datasets if (dataset['name'].startswith('cod-ab'))
]

ocha_names = sorted([ocha['name'] for ocha in ochas])

In [6]:
print('Total OCHA Administrative Boundaries:', len(ochas))

Total OCHA Administrative Boundaries: 165


We'll create a dataframe with the information of the products, including the API's `Dataset` objects. Datasets can include many `Resources`, so we need to find which ones correspond to the administrative boundaries.

See the documentation for [Dataset](https://hdx-python-api.readthedocs.io/en/latest/api-documentation/dataset/) and [Resource](https://hdx-python-api.readthedocs.io/en/latest/api-documentation/resource/)

In [7]:
n_ocha_resources = []
for ocha in ochas: # Loop through dataset objects
    ocha_name = ocha['name']
    resources = ocha.get_resources()
    for r in resources:
        resource_name = r['name']
        description = r['description']
        format_type = r['format']
        url = r['url']
        n_ocha_resources.append((ocha_name, resource_name, r, description, format_type, url))

resources_df = pd.DataFrame(
    n_ocha_resources,
    columns = [
        'ocha_name','resource_name', 'resource_obj', 'description', 'format', 'url'
    ]
).sort_values('ocha_name')

In [8]:
resources_df.head()

Unnamed: 0,ocha_name,resource_name,resource_obj,description,format,url
548,cod-ab-abw,abw_admbnda_adm0_2020.emf,"[alt_url, cache_last_updated, cache_url, creat...",Aruba administrative level 0 boundary EMF file,EMF,https://data.humdata.org/dataset/5abb5b0c-f3e1...
547,cod-ab-abw,abw_admbnda_adm0_2020.zip,"[alt_url, cache_last_updated, cache_url, creat...",Aruba administrative level 0 boundary shapefile,SHP,https://data.humdata.org/dataset/5abb5b0c-f3e1...
957,cod-ab-afg,COD_External/AFG_pcode (FeatureServer),"[cache_last_updated, cache_url, created, datas...",This service is intended as a labelling layer ...,Geoservice,https://codgis.itos.uga.edu/arcgis/rest/servic...
956,cod-ab-afg,COD_External/AFG_pcode (MapServer),"[cache_last_updated, cache_url, created, datas...",This service is intended as a labelling layer ...,Geoservice,https://codgis.itos.uga.edu/arcgis/rest/servic...
955,cod-ab-afg,COD_External/AFG_EN (MapServer),"[cache_last_updated, cache_url, created, datas...",AFG_EN - Common Operational Dataset - Administ...,Geoservice,https://codgis.itos.uga.edu/arcgis/rest/servic...


We want to filter the data and only get the shapefiles, so we can extract the boundary geometries for every administrative level.

In [9]:
resources_shps = resources_df[resources_df.format == 'SHP']

After filtering for shapefiles, there are three dataset names missing from our subset:

In [10]:
[name for name in ocha_names if name not in resources_shps.ocha_name.unique()]

['cod-ab-afg', 'cod-ab-tjk', 'cod-ab-tur-geoservices']

These correspond to Afganistan, Tajikistan, and Turkey (GEOSERVICES). Afganistan and Tajikistan only have Excel files listing their administrative boundaries, but no geometries associated to them.

Turkey administrative boundaries are present in the dataset, so we are not interested in the `cod-ab-tur-geoservices`.

In [11]:
'cod-ab-tur' in resources_shps.ocha_name.unique()

True

There is some data that, while still being shapefiles, they do not correspond to administrative boundaries. Some are capital cities locations, population counts, places of interest.

We use the next block of code to visually look for the descriptions, and create a list of things we don't want to download.

In [12]:
not_interesting_resources = [
    'ago_admbndp_admALL_gadm_ine_ocha_itos_20180904.zip',
    'ago_admbndl_admALL_gadm_ine_ocha_itos_20180904.zip',
    'Populated Places',
    'Capitals',
    'civ_admbndp_admALL_cntig_ocha_itos_20180706.zip',
    'egy_admbndp_admALL_capmas_itos_20170421.zip',
    'hnd_pplp_sinit_20160125.zip',
    'pplp_15m_esri.zip',
    'pak_admbndl_adm_LOC_wfp_itos_20220909.zip',
    'SAU_Places.zip',
    'Capitals',
    'syr_admin_shp_utf8_20230124.zip',
    'syr_admbnda_uncs_unocha_20201217.zip',
    'syr_admin_shp_utf8_20240711.zip',
    'turkey_centeralpoints_1_2.zip',
    'vut_admbndp_admALL_spc_itos_20180824.zip'
]

Finally, to download a resource we can do:

In [13]:
res = resources_shps.resource_obj.values[1]

print(type(res))

<class 'hdx.data.resource.Resource'>


In [14]:
res.download('./tests')

('https://data.humdata.org/dataset/a62b2816-7d2e-4e07-a794-0a1dcb507092/resource/2c36fa8f-8349-4686-8a3b-bff57e6234e2/download/ago_adm_gadm_ine_ocha_20180904_shp.zip',
 './tests/ago_adm_gadm_ine_ocha_20180904_SHP.zip1.shp')