**Downloading**

## Pre-requisites
Register a Google account at [https://code.earthengine.google.com](https://code.earthengine.google.com). This process may take a couple of days. Without registration, the `ee.Initialize()` command below will throw an error message.

## Instructions

This notebook exports Landsat satellite image composites of DHS and LSMS clusters from Google Earth Engine.

The images are saved in gzipped TFRecord format.

## Imports and Constants

In [1]:
%load_ext autoreload
%autoreload 2

In [2]:
import math
from typing import Any, Dict, Optional, Tuple
import ee 
import pandas as pd
import ee_utils
from tqdm import tqdm

Before using the Earth Engine API, you must perform a one-time authentication that authorizes access to Earth Engine on behalf of your Google account you registered at [https://code.earthengine.google.com](https://code.earthengine.google.com). The authentication process saves a credentials file to `$HOME/.config/earthengine/credentials` for future use.

The following command `ee.Authenticate()` runs the authentication process. Once you successfully authenticate, you may comment out this command because you should not need to authenticate again in the future, unless you delete the credentials file. If you do not authenticate, the subsequent `ee.Initialize()` command below will fail.

For more information, see [https://developers.google.com/earth-engine/python_install-conda.html](https://developers.google.com/earth-engine/python_install-conda.html).

In [3]:
try:
    ee.Initialize()
except Exception as e:
    ee.Authenticate()
    ee.Initialize()

## Constants

In [4]:
# ========== ADAPT THESE PARAMETERS ==========
# To export to Google Drive, uncomment the next 2 lines
EXPORT = ''
BUCKET = None
# export location parameters
DHS_EXPORT_FOLDER = ''

In [5]:
# input data paths
DHS_CSV_PATH = '../data/dataset_tMM_00.csv'
# band names
MS_BANDS = ['BLUE', 'GREEN', 'RED', 'NIR', 'SWIR1', 'SWIR2', 'TEMP1']

# image export parameters
PROJECTION = 'EPSG:3857'  # see https://epsg.io/3857
SCALE = 30                # export resolution: 30m/px
EXPORT_TILE_RADIUS = 127  # image dimension = (2*EXPORT_TILE_RADIUS) + 1 = 255px
CHUNK_SIZE = 150    # set to a small number (<= 50) if Google Earth Engine reports memory errors; 

In [15]:
csv = pd.read_csv(DHS_CSV_PATH, sep=';')
# Rajout des cluster
csv['cluster'] = ''
csv['cluster'] = csv.index+ 1000

csv.rename(columns={'LATNUM':'lat', 'LONGNUM':'lon'}, inplace=True)
csv.to_csv(DHS_CSV_PATH,index=False)

In [16]:
csv

Unnamed: 0,country,year,lat,lon,cluster
0,madagascar,1997,-25.274814,44.571543,1000
1,madagascar,1997,-25.075023,44.112889,1001
2,madagascar,1997,-25.048568,44.164201,1002
3,madagascar,1997,-25.032407,44.146667,1003
4,madagascar,1997,-25.027735,44.100572,1004
...,...,...,...,...,...
181,tanzania,1998,-5.171397,39.775306,1181
182,tanzania,1998,-5.090718,39.767043,1182
183,tanzania,1998,-5.022767,39.703181,1183
184,tanzania,1998,-4.989954,39.693864,1184


## Export Images

In [17]:
def export_images(
        df: pd.DataFrame,
        country: str,
        year: int,
        export_folder: str,
        chunk_size: Optional[int] = 1,
 ) -> Dict[Tuple[Any], ee.batch.Task]:
    '''
    Args
    - df: pd.DataFrame, contains columns ['lat', 'lon', 'country', 'year']
    - country: str, together with `year` determines the survey to export
    - year: int, together with `country` determines the survey to export
    - export_folder: str, name of folder for export
    - chunk_size: int, optionally set a limit to the # of images exported per TFRecord file
        - set to a small number (<= 50) if Google Earth Engine reports memory errors

    Returns: dict, maps task name tuple (export_folder, country, year, chunk) to ee.batch.Task
    '''
    subset_df = df[(df['country'] == country) & (df['year'] == year)].reset_index(drop=True)
    if chunk_size is None:
        chunk_size = len(subset_df)
    num_chunks = int(math.ceil(len(subset_df) / chunk_size))
    tasks = {}

    for i in range(num_chunks):
        chunk_slice = slice(i * chunk_size, (i+1) * chunk_size - 1)  # df.loc[] is inclusive
        fc = ee_utils.df_to_fc(subset_df.loc[chunk_slice, :])
        if year == 2012: # outside landsat 8 range
            year += 1
        start_date, end_date = ee_utils.predictionyear_to_range(year)
        # create 3-year Landsat composite image
        roi = fc.geometry()
        imgcol = ee_utils.LandsatSR(roi, start_date=start_date, end_date=end_date).merged
        imgcol = imgcol.map(ee_utils.mask_qaclear).select(MS_BANDS)
        img = imgcol.median()
        # add nightlights, latitude, and longitude bands
        img = ee_utils.add_latlon(img)
        img = img.addBands(ee_utils.composite_nl(year))
        fname = f'{country}_{year}_{i:02d}'
        tasks[(export_folder, country, year, i)] = ee_utils.get_array_patches(
            img=img, scale=SCALE, ksize=EXPORT_TILE_RADIUS,
            points=fc, export='drive',
            prefix=export_folder, fname=fname,
            bucket=None)
    return tasks

In [18]:
dhs_df = pd.read_csv(DHS_CSV_PATH, float_precision='high', index_col=False)
dhs_surveys = list(dhs_df.groupby(['country', 'year']).groups.keys())
tasks = {}

In [None]:
import pickle
with open( '../data/additional_data/precipitations.pickle', 'rb') as f:
    precipitations = pickle.load(f)
precipitations

In [23]:
for country, year in tqdm(dhs_surveys):
    print(country, year)
    new_tasks = export_images(
        df=dhs_df, country=country, year=year,
        export_folder=DHS_EXPORT_FOLDER, chunk_size=CHUNK_SIZE)
    tasks.update(new_tasks)


  0%|          | 0/6 [00:00<?, ?it/s]

madagascar 1997


 17%|█▋        | 1/6 [00:01<00:06,  1.20s/it]

madagascar 1998


 33%|███▎      | 2/6 [00:02<00:05,  1.25s/it]

mozambique 1997


 50%|█████     | 3/6 [00:03<00:03,  1.12s/it]

mozambique 1998


 67%|██████▋   | 4/6 [00:06<00:04,  2.05s/it]

tanzania 1997


 83%|████████▎ | 5/6 [00:07<00:01,  1.67s/it]

tanzania 1998


100%|██████████| 6/6 [00:09<00:00,  1.50s/it]


Check on the status of each export task at [https://code.earthengine.google.com/](https://code.earthengine.google.com/), or run the following cell which checks every minute. Once all tasks have completed, download the DHS TFRecord files.

In [54]:
ee_utils.wait_on_tasks(tasks, poll_interval=180)

0it [00:00, ?it/s]
