**Downloading**

## Pre-requisites
Register a Google account at [https://code.earthengine.google.com](https://code.earthengine.google.com). This process may take a couple of days. Without registration, the `ee.Initialize()` command below will throw an error message.

## Instructions

This notebook exports Landsat satellite image composites of DHS and LSMS clusters from Google Earth Engine.

The images are saved in gzipped TFRecord format.

## Imports and Constants

In [11]:
%load_ext autoreload
%autoreload 2

The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload


In [12]:
import math
from typing import Any, Dict, Optional, Tuple
import ee 
import pandas as pd
import ee_utils
from tqdm import tqdm

Before using the Earth Engine API, you must perform a one-time authentication that authorizes access to Earth Engine on behalf of your Google account you registered at [https://code.earthengine.google.com](https://code.earthengine.google.com). The authentication process saves a credentials file to `$HOME/.config/earthengine/credentials` for future use.

The following command `ee.Authenticate()` runs the authentication process. Once you successfully authenticate, you may comment out this command because you should not need to authenticate again in the future, unless you delete the credentials file. If you do not authenticate, the subsequent `ee.Initialize()` command below will fail.

For more information, see [https://developers.google.com/earth-engine/python_install-conda.html](https://developers.google.com/earth-engine/python_install-conda.html).

In [13]:
try:
    ee.Initialize()
except Exception as e:
    ee.Authenticate()
    ee.Initialize()

## Constants

In [14]:
# ========== ADAPT THESE PARAMETERS ==========
# To export to Google Drive, uncomment the next 2 lines
EXPORT = ''
BUCKET = None
# export location parameters
DHS_EXPORT_FOLDER = ''

In [15]:
# input data paths
DHS_CSV_PATH = '../data/dataset.csv'
# band names
MS_BANDS = ['BLUE', 'GREEN', 'RED', 'NIR', 'SWIR1', 'SWIR2', 'TEMP1']

# image export parameters
PROJECTION = 'EPSG:3857'  # see https://epsg.io/3857
SCALE = 30                # export resolution: 30m/px
EXPORT_TILE_RADIUS = 127  # image dimension = (2*EXPORT_TILE_RADIUS) + 1 = 255px
CHUNK_SIZE = 200     # set to a small number (<= 50) if Google Earth Engine reports memory errors; 

In [16]:
# ISSUE WITH YEARS 2007 AND 2008 WITH NIGHTLIGHTS EE-PROVIDER
csv = pd.read_csv(DHS_CSV_PATH)
csv.drop(csv[csv.year == 2008].index, inplace=True)
csv.drop(csv[csv.year == 2007].index, inplace=True)
csv.rename(columns={'LATNUM':'lat', 'LONGNUM':'lon'}, inplace=True)
csv.to_csv(DHS_CSV_PATH,index=False)

In [17]:
csv = pd.read_csv(DHS_CSV_PATH)
csv.head()

Unnamed: 0,country,year,cluster,lat,lon,households,wealthpooled
0,angola,2011,1,-12.350257,13.534922,36,2.312757
1,angola,2011,2,-12.360865,13.551494,32,2.010293
2,angola,2011,3,-12.613421,13.413085,36,0.877744
3,angola,2011,4,-12.581454,13.397711,35,1.066994
4,angola,2011,5,-12.578135,13.418748,37,1.750153


## Export Images

In [18]:
def export_images(
        df: pd.DataFrame,
        country: str,
        year: int,
        export_folder: str,
        chunk_size: Optional[int] = 1,
        ) -> Dict[Tuple[Any], ee.batch.Task]:
    '''
    Args
    - df: pd.DataFrame, contains columns ['lat', 'lon', 'country', 'year']
    - country: str, together with `year` determines the survey to export
    - year: int, together with `country` determines the survey to export
    - export_folder: str, name of folder for export
    - chunk_size: int, optionally set a limit to the # of images exported per TFRecord file
        - set to a small number (<= 50) if Google Earth Engine reports memory errors

    Returns: dict, maps task name tuple (export_folder, country, year, chunk) to ee.batch.Task
    '''
    subset_df = df[(df['country'] == country) & (df['year'] == year)].reset_index(drop=True)
    if chunk_size is None:
        chunk_size = len(subset_df)
    num_chunks = int(math.ceil(len(subset_df) / chunk_size))
    tasks = {}

    for i in range(num_chunks):
        chunk_slice = slice(i * chunk_size, (i+1) * chunk_size - 1)  # df.loc[] is inclusive
        fc = ee_utils.df_to_fc(subset_df.loc[chunk_slice, :])
        start_date, end_date = ee_utils.predictionyear_to_range(year)

        # create 3-year Landsat composite image
        roi = fc.geometry()
        imgcol = ee_utils.LandsatSR(roi, start_date=start_date, end_date=end_date).merged
        imgcol = imgcol.map(ee_utils.mask_qaclear).select(MS_BANDS)
        img = imgcol.median()

        # add nightlights, latitude, and longitude bands
        img = ee_utils.add_latlon(img)
        img = img.addBands(ee_utils.composite_nl(year))
        fname = f'{country}_{year}_{i:02d}'
        tasks[(export_folder, country, year, i)] = ee_utils.get_array_patches(
            img=img, scale=SCALE, ksize=EXPORT_TILE_RADIUS,
            points=fc, export='drive',
            prefix=export_folder, fname=fname,
            bucket=None)
    return tasks

In [19]:
dhs_df = pd.read_csv(DHS_CSV_PATH, float_precision='high', index_col=False)
dhs_surveys = list(dhs_df.groupby(['country', 'year']).groups.keys())
tasks = {}

In [21]:
for country, year in tqdm(dhs_surveys):
    if year >= 2014:
        print(country, year)
        new_tasks = export_images(
            df=dhs_df, country=country, year=year,
            export_folder=DHS_EXPORT_FOLDER, chunk_size=CHUNK_SIZE)
        tasks.update(new_tasks)

  0%|          | 0/85 [00:00<?, ?it/s]

angola 2015


  2%|▏         | 2/85 [00:05<03:34,  2.58s/it]

benin 2017


  5%|▍         | 4/85 [00:09<03:04,  2.27s/it]

burkina_faso 2014


  8%|▊         | 7/85 [00:11<01:54,  1.47s/it]

burkina_faso 2017


  9%|▉         | 8/85 [00:14<02:09,  1.69s/it]

cameroon 2018


 13%|█▎        | 11/85 [00:18<01:52,  1.52s/it]

ethiopia 2016


 20%|██        | 17/85 [00:23<01:18,  1.15s/it]

ethiopia 2019


 21%|██        | 18/85 [00:26<01:28,  1.33s/it]

ghana 2014


 24%|██▎       | 20/85 [00:30<01:36,  1.48s/it]

ghana 2016


 25%|██▍       | 21/85 [00:31<01:35,  1.49s/it]

ghana 2019


 26%|██▌       | 22/85 [00:32<01:32,  1.47s/it]

guinea 2018


 29%|██▉       | 25/85 [00:35<01:14,  1.25s/it]

kenya 2014


 32%|███▏      | 27/85 [00:46<02:26,  2.52s/it]

kenya 2015


 33%|███▎      | 28/85 [00:49<02:22,  2.49s/it]

lesotho 2014


 35%|███▌      | 30/85 [00:52<01:56,  2.12s/it]

madagascar 2016


 40%|████      | 34/85 [00:54<01:13,  1.43s/it]

malawi 2014


 45%|████▍     | 38/85 [00:56<00:45,  1.03it/s]

malawi 2015


 46%|████▌     | 39/85 [01:02<01:19,  1.72s/it]

malawi 2017


 47%|████▋     | 40/85 [01:04<01:14,  1.66s/it]

mali 2015


 52%|█████▏    | 44/85 [01:05<00:42,  1.04s/it]

mali 2018


 53%|█████▎    | 45/85 [01:08<00:51,  1.29s/it]

mozambique 2015


 56%|█████▋    | 48/85 [01:11<00:42,  1.14s/it]

mozambique 2018


 58%|█████▊    | 49/85 [01:13<00:48,  1.36s/it]

nigeria 2015


 62%|██████▏   | 53/85 [01:16<00:32,  1.02s/it]

nigeria 2018


 64%|██████▎   | 54/85 [01:26<01:11,  2.31s/it]

rwanda 2015


 66%|██████▌   | 56/85 [01:30<01:04,  2.22s/it]

rwanda 2019


 67%|██████▋   | 57/85 [01:34<01:10,  2.53s/it]

senegal 2014


 71%|███████   | 60/85 [01:35<00:41,  1.65s/it]

senegal 2015


 72%|███████▏  | 61/85 [01:38<00:42,  1.77s/it]

senegal 2016


 73%|███████▎  | 62/85 [01:40<00:43,  1.89s/it]

senegal 2017


 74%|███████▍  | 63/85 [01:43<00:46,  2.13s/it]

senegal 2018


 75%|███████▌  | 64/85 [01:45<00:45,  2.18s/it]

sierra_leone 2016


 78%|███████▊  | 66/85 [01:48<00:35,  1.84s/it]

sierra_leone 2019


 79%|███████▉  | 67/85 [01:52<00:42,  2.34s/it]

tanzania 2015


 84%|████████▎ | 71/85 [01:57<00:24,  1.75s/it]

tanzania 2017


 85%|████████▍ | 72/85 [02:01<00:27,  2.12s/it]

togo 2017


 87%|████████▋ | 74/85 [02:02<00:18,  1.65s/it]

uganda 2014


 92%|█████████▏| 78/85 [02:05<00:08,  1.15s/it]

uganda 2016


 93%|█████████▎| 79/85 [02:10<00:10,  1.76s/it]

uganda 2018


 94%|█████████▍| 80/85 [02:13<00:09,  1.92s/it]

zambia 2018


 96%|█████████▋| 82/85 [02:17<00:05,  1.94s/it]

zimbabwe 2015


100%|██████████| 85/85 [02:20<00:00,  1.65s/it]


Check on the status of each export task at [https://code.earthengine.google.com/](https://code.earthengine.google.com/), or run the following cell which checks every minute. Once all tasks have completed, download the DHS TFRecord files.

In [None]:
ee_utils.wait_on_tasks(tasks, poll_interval=180)