# Run script - Google Earth Engine 

In this script we will gather catchment data from satellite products (e.g. tree cover, NDVI, elevation) using Google Earth Engine (GEE). GEE allows us to directly use satellite data, avoiding the struggle of downloading them. Before using it, you need to create an account: https://signup.earthengine.google.com/#!/


### 1. Install python GEE
GEE requires a new python environment. Take the following steps to create and activate your new python environment with GEE installed:
1. Go to your anaconda shell
2. Create a new python environment: *conda create -n ee_env*
3. Activate your new environment: *conda activate ee_env* 
4. Install GEE: *conda install -c conda-forge earthengine-api*
5. We also need to install jupyter lab: *conda install -c conda-forge jupyterlab*
6. And Geopandas: *conda install -c conda-forge geopandas*

A guideline of this installation can be found here: https://developers.google.com/earth-engine/guides/python_install-conda.

Now we have successfully created a new python environment with GEE installed, we need to make sure our notebook runs in this environment. Therefore, close your notebook, open the anaconda shell once more and write the following commands:
- *conda activate ee_env*
- *jupyter lab*

Now you can work in jupyter lab inside your *ee_env*

### 2. Getting started
First, import all the required packages.

In [1]:
# import packages
import ee
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import geopandas as gpd
import os
import glob
from pathlib import Path

Before using the Earth Engine API or earthengine command line tool, you must perform a one-time authentication that authorizes access to Earth Engine on behalf of your Google account. Below you run the authentication command. A URL will be provided that generates an authorization code upon agreement. Copy the authorization code and enter it in the box below.

In [2]:
# Trigger the authentication flow.
ee.Authenticate()

# Initialize the library.
ee.Initialize()

Enter verification code:  4/1AX4XfWgpqeTNXeMq2g8311SKuLNt1jsCfpxdYomD193M2FcUVTFSD4HYTOQ



Successfully saved authorization token.


After authentication we can import all the python functions defined in the scripts *f_earth_engine.py*.

In [3]:
from f_earth_engine import *

### 3. Define working directory
Here we define the working directory, where all the scripts and data are saved. Make sure that you generate within this working directory the following subdirectories with the data:\
/work_dir/data/forcing/*netcdf forcing files*\
/work_dir/data/shapes/*catchment shapefiles*\
/work_dir/data/gsim_discharge/*gsim discharge timeseries*

In [4]:
# define your working directory
work_dir=Path("/work/users/vanoorschot/fransje/scripts/GLOBAL_SR/global_sr_module")

### 4. Load your list of catchment IDs
Here we load the list of catchments IDs that was generated in the *run_script_main*.

In [5]:
catch_id_list = np.genfromtxt(f'{work_dir}/output/catch_id_list.txt',dtype='str')

### 5. Earth Engine treecover
We are interested in the treecover in a catchment. For this we use the MODIS treecover data (https://modis.gsfc.nasa.gov/data/dataprod/mod44.php). This product includes the percentage tree cover, non tree cover, and bare soil on a 250x250 m grid. Here we regrid the tree cover to a 1x1 km grid (to reduce computational costs), average the values over the time period of interest and extract the catchment statistics (mean, max, min and std).

First we create the output directory:

In [6]:
# make output directory
if not os.path.exists(f'{work_dir}/output/earth_engine_timeseries/treecover'):
    os.makedirs(f'{work_dir}/output/earth_engine_timeseries/treecover')

Now we run the *preprocess_treecover_data* and *catchment_treecover* functions from the *f_earth_engine.py* script. The output is a dataframe with the treecover statistics for each catchment.

In [8]:
# define your time period
start_date = '2000-01-01'
end_date = '2020-12-31'

# define your directories
shape_dir = Path(f'{work_dir}/data/shapes/')
out_dir = Path(f'{work_dir}/output/earth_engine_timeseries/treecover')

# preprocess your modis satellite data for your time period (interpolation and averaging)
(MOD44B_tree_res, MOD44B_nontree_res) = preprocess_treecover_data(start_date,end_date)

# loop over catch ids
for catch_id in catch_id_list:
    # extract catchment values and store in dataframe
    catchment_treecover(MOD44B_tree_res, MOD44B_nontree_res, catch_id, shape_dir, out_dir)