4/7 - this is the current code to download WorldPop population density data for year 2020 (most recent year at resolution = 1km)

This will not require frequent re-updates, but any updates will require changing several of the below code cells (see notes in comments and in Markdown cells below)

In [2]:
import requests
import json
import pandas as pd
from urllib.request import urlretrieve

In [19]:
# view available aliases (also serves as test that API endpoint is working correctly)
url = "https://www.worldpop.org/rest/data/pop/"

response = requests.get(url)

text = json.loads(response.text)
text

{'data': [{'alias': 'pic', 'name': 'Individual countries'},
  {'alias': 'pop_continent', 'name': 'Whole Continent'},
  {'alias': 'wpgp',
   'name': 'Unconstrained individual countries 2000-2020 ( 100m resolution )'},
  {'alias': 'wpgp1km',
   'name': 'Unconstrained global mosaics 2000-2020 ( 1km resolution )'},
  {'alias': '', 'name': 'WP00643'},
  {'alias': 'wpgpunadj',
   'name': 'Unconstrained individual countries 2000-2020 UN adjusted ( 100m resolution )'},
  {'alias': 'wpic1km',
   'name': 'Unconstrained individual countries 2000-2020  ( 1km resolution )'},
  {'alias': 'wpicuadj1km',
   'name': 'Unconstrained individual countries 2000-2020 UN adjusted ( 1km resolution )'},
  {'alias': 'cic2020_100m',
   'name': 'Constrained Individual countries 2020 ( 100m resolution )'},
  {'alias': 'cic2020_UNadj_100m',
   'name': 'Constrained Individual countries 2020 UN adjusted  (100m resolution)'},
  {'alias': 'G2_UC_POP_2024_100m',
   'name': 'Unconstrained individual countries 2024 ( 100m 

The above shows the available resolutions and time periods for population density data. Right now we are using `wpic1km`, but if a preferred version becomes available, we can update the below cell which specifies the alias. The code below automatically updates that alias in the API call.

In [20]:
alias = 'wpic1km'

In [15]:
# read in country metadata
metadata = pd.read_excel('../Plan-EO_Country_meta-data.xlsx')
metadata = metadata[['Name', 'ISO2', 'ISO3']] # get the sections we need
metadata.Name = metadata.Name.str.replace(' ', '_') # to match Name to the format of the Individual_country_data folder
metadata.head()

Unnamed: 0,Name,ISO2,ISO3
0,Afghanistan,AF,AFG
1,Algeria,DZ,DZA
2,Angola,AO,AGO
3,Argentina,AR,ARG
4,Armenia,AM,ARM


This loop may need to be updated, specifically the lines handling extraction of the desired URL (beginning with `tiff_urls = ...`) if the desired alias (see above) needs to be updated. I will provide a walkthrough of how exactly to determine the necessary changes below, for documentation purposes.

In [14]:
# loop through every country
for i in range(len(metadata)):
    
    # extract iso3 and country name
    iso3 = metadata.loc[i,'ISO3']
    country = metadata.loc[i,'Name']

    # set query URL (we are using worldpop's REST API endpoint)
    url = f'https://www.worldpop.org/rest/data/pop/{alias}?iso3={iso3}'

    # load response and extract data
    response = requests.get(url)
    text = json.loads(response.text)
    data = text['data']

    # extract URL path to TIFF folder (note that this may change if the alias changes - potential for different file structure)
    # but stable for now
    tiff_urls = [item for item in data if item.get('popyear') == '2020'][0]['files']
    tiff_url = [item for item in tiff_urls if item.endswith('.tif')][0]
    
    # get end path (name at which the extracted TIFF will be stored)
    end_path = tiff_url.split('/')[-1]
    # option to print end_path to track progress
    # print(end_path)
    
    # set path to destination folder and retrieve TIFF file from data index, store in destination folder
    path = f'../Individual_country_data/{iso3}_{country}/04_Population_density_data/World_pop/{end_path}'
    urlretrieve(tiff_url, path)

Burundi
Cambodia
Cameroon
Cape_Verde
Central_African_Republic
Chad
China
Colombia
Comoros
Congo
Costa_Rica
Cote_d'Ivoire
Cuba
Democratic_Republic_of_Congo
Djibouti
Dominica
Dominican_Republic
Ecuador
Egypt
El_Salvador
Equatorial_Guinea
Eritrea
Eswatini
Ethiopia
Fiji
Gabon
Gambia
Georgia
Ghana
Grenada
Guatemala
Guinea
Guinea-Bissau
Guyana
Haiti
Honduras
India
Indonesia
Iran
Iraq
Jamaica
Jordan
Kazakhstan
Kenya
Kiribati
Kyrgyzstan
Laos
Lebanon
Lesotho
Liberia
Libya
Madagascar
Malawi
Malaysia
Maldives
Mali
Marshall_Islands
Mauritania
Mauritius
Mexico
Micronesia
Mongolia
Morocco
Mozambique
Myanmar
Namibia
Nepal
Nicaragua
Niger
Nigeria
North_Korea
Pakistan
Palestine
Panama
Papua_New_Guinea
Paraguay
Peru
Philippines
Rwanda
Saint_Lucia
Saint_Vincent_and_the_Grenadines
Samoa
Sao_Tome_and_Principe
Senegal
Sierra_Leone
Solomon_Islands
Somalia
South_Africa
South_Sudan
Sri_Lanka
Sudan
Suriname
Syria
Tajikistan
Tanzania
Thailand
Timor-Leste
Togo
Tonga
Tunisia
Turkey
Turkmenistan
Tuvalu
Uganda
Uzbek

## Troubleshooting Code: see below

Everything below this line does not need to be run unless there is an issue with the above. Unfortunately, whenever there is a more recent release of population density data, this script will need to be modified, so if it breaks after changing the alias, I am hoping this will be a useful guide to fixing it.

In [4]:
# Step 1: call a single example from the API

# using example country Nigeria
ts_alias = 'wpic1km'
ts_iso3 = 'NGA'

ts_url = f'https://www.worldpop.org/rest/data/pop/{ts_alias}?iso3={ts_iso3}'

ts_response = requests.get(ts_url)
ts_text = json.loads(ts_response.text)
print(ts_text.keys())

dict_keys(['data'])


In [6]:
# the above shows that everything in the response is contained in a dictionary called 'data'
# now, we can look at the first entry

ts_data = ts_text['data']
print(ts_data[0])

{'id': '32948', 'title': 'The spatial distribution of population in 2000 Nigeria', 'desc': 'Estimated total number of people per grid-cell. The dataset is available to download in Geotiff and ASCII XYZ format at a resolution of 30 arc (approximately 1km at the equator). The projection is Geographic Coordinate System, WGS84. The units are number of people per pixel. The mapping approach is Random Forest-based dasymetric redistribution.', 'doi': '10.5258/SOTON/WP00670', 'date': '2020-06-22', 'popyear': '2000', 'citation': 'WorldPop (www.worldpop.org - School of Geography and Environmental Science, University of Southampton; Department of Geography and Geosciences, University of Louisville; Departement de Geographie, Universite de Namur) and Center for International Earth Science Information Network (CIESIN), Columbia University (2018). Global High Resolution Population Denominators Project - Funded by The Bill and Melinda Gates Foundation (OPP1134076). https://dx.doi.org/10.5258/SOTON/WP

In [10]:
# In this case, each entry has a string value for the key 'popyear' which tells us which year the data is for
# each entry also has a list value for the key 'files' that directs us to the available download links

# find the newest available file, in GeoTIFF format
print([x['popyear'] for x in ts_data], '\n')
print(max([x['popyear'] for x in ts_data]))

['2000', '2001', '2002', '2003', '2004', '2005', '2006', '2007', '2008', '2009', '2010', '2011', '2012', '2013', '2014', '2015', '2016', '2017', '2018', '2019', '2020'] 

2020


In [12]:
# This is a good way to see all the available years, and the most recent year available

# Now, extract files for most recent available year:
ts_tiff_urls = [item for item in ts_data if item.get('popyear') == '2020'][0]['files']
print(ts_tiff_urls)

['https://data.worldpop.org/GIS/Population/Global_2000_2020_1km/2020/NGA/nga_ppp_2020_1km_Aggregated.tif', 'https://data.worldpop.org/GIS/Population/Global_2000_2020_1km/2020/NGA/nga_ppp_2020_1km_ASCII_XYZ.zip']


In [14]:
# here, we see that there is a ZIP file and a TIF file. We want the TIF file, so we further limit our search

ts_tiff_url = [item for item in ts_tiff_urls if item.endswith('.tif')][0]
print(ts_tiff_url)

https://data.worldpop.org/GIS/Population/Global_2000_2020_1km/2020/NGA/nga_ppp_2020_1km_Aggregated.tif


The above step-by-step walkthrough mirrors the logic used in the main loop. If the main loop breaks at any point during a data redownload, follow this walkthrough to figure out precisely where it is breaking. All the information shown above will likely remain even if the response format changes, so look through an individual data cell to determine the new names. So, say `'popyear'` changed to `'year'` and `'files'` changed to `'outputs'` and the newest year available was 2025 in an updated version - the new code would need to reflect that, like:

`tiff_urls = [item for item in data if item.get('year') == '2025'][0]['outputs']`

`tiff_url = [item for item in tiff_urls if item.endswith('.tif')][0]`