# Geographical and Ecological Earth Observation (GEEO) - Introduction

[![Open in Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/leonsnill/geeo/blob/master/docs/tutorial_0_introducing-geeo.ipynb)

This introductory tutorial provides an overview on the basic structure and usage of **geeo**. 

**geeo** is a processing pipeline and collection of algorithms for obtaining Analysis-Ready-Data (ARD) from multispectral image archives, including Landsat and Sentinel-2, using the Google Earth Engine Python API. 
The processing modules are organized along different hierarchical levels:

- LEVEL-2 (`geeo/level2`): Preprocessed, harmonized, spatio-temporally subsetted surface reflectance time series stacks (TSS) and mosaics (TSM)   
- LEVEL-3 (`geeo/level3`): Advanced spectral features including Spectral-Temporal-Metrics (STM) and Pixel-based Composits (PBC), as well as time series interpolation (TSI)
- LEVEL-4 (`geeo/level4`): Semantic features, i.e. information derived from spectral signals. Currently implemented in workflow are Land Surface Phenology (LSP) metrics. An auxiliary machine learning module allows to upload local compatible scikit-learn models to an ee.Classifier.   
- EXPORT (`geeo/export`): Export module handling metadata and projection settings, and constructing export tasks.

Typically, the user does not directly interact with the submodules, but simply provides processing instructions defined via either a text file (.yml) or python dictionary.

The classic (but not limited to) approach to use **geeo** is to:

1. Create a .yml parameter file from an existing blueprint using the `create_parameter_file()` function
2. Adjust the parameter settings in the .yml file to your requirements
3. Running the adjusted parameter file using `run_param()` to trigger the processing
4. Either export the data to Drive/Asset (if specified in parameter file) or use the resulting ee objects in your interactive python session

## Import (and installation)
Prior to importing (or installing and importing) **geeo**, import the Earth Engine module. Authenticate **ee** using your GEE-eligible Google account and initialize your Google Cloud project with the Earth Engine API enabled. For details on how to access Google Earth Engine, [click here](https://developers.google.com/earth-engine/guides/access).

In [2]:
my_project_name =''

# imports
import ee
ee.Authenticate()
ee.Initialize(project=my_project_name)
try:
    import geeo
except:
    !pip install git+https://github.com/leonsnill/geeo.git
    import geeo


Successfully saved authorization token.


## The parameter file
**The parameter file is a .yml file that contains all available settings for level-2 to level-4 processing and exporting.** 

### Creating a parameter file
It can be created from an existing blueprint using the `create_parameter_file` function:

In [3]:
# create new parameter .yml file from blueprint into current working directory
geeo.create_parameter_file('introduction', overwrite=True)

Parameter file created: introduction.yml


Inspect the newly created parameter file. **The parameter file follows the structure of the main modules (LEVEL-2, LEVEL-3, LEVEL-4, EXPORT) and is subdivided into basic categories.** The LEVEL-2 section, for example, starts with basic settings regarding the spatial and temporal extent (SPACE AND TIME), the desired sensors and associated quality mask settings (SENSOR AND DATA QUALITY SETTINGS), as well as which features/bands to include (BANDS | INDICES | FEATURES) or additionally calculate. 

For now, let us only take a look at the *SPACE AND TIME* subcategory of the LEVEL-2 instruction block, which contains the following variables and default values:

YEAR_MIN: 2023  
YEAR_MAX: 2023  
MONTH_MIN: 1  
MONTH_MAX: 12  
DOY_MIN: 1  
DOY_MAX: 366  
DATE_MIN: null  
DATE_MAX: null  
ROI: [12.9, 52.2, 13.9, 52.7]  
ROI_SIMPLIFY_GEOM_TO_BBOX: true  

As can be suspected, this block allows to specify the spatial extent and temporal window for which to process the data. The edited values must match the expected format for each variable. Mostly, the names and values of the variables are self-explainatory (e.g. the integer format for YEAR, MONTH, and DOY variables), but some variables also accept multiple data formats (e.g. the 'ROI' variables accepts a *list* of numeric coordinates, a *string* path to a file, a *GeoPackage* object, a *ee.Geometry*, or a *ee.FeatureCollection*). 

In general, the comments in the parameter file are supposed to guide the allowed formats and settings, whilst you can find a more detailed description on the variable specifications in the [documentation](documentation.md).

### Loading parameters (without directly running the instructions)
If we only wanted to load parameter settings from an existing .yml file into python, we can use the `load_parameters` function that converts the yml-file into a python dictionary containing all defined variables:

In [5]:
# load the newly created parameter file into a python dictionary
prm_dict = geeo.load_parameters('introduction.yml')
prm_dict

{'YEAR_MIN': 2023,
 'YEAR_MAX': 2023,
 'MONTH_MIN': 1,
 'MONTH_MAX': 12,
 'DOY_MIN': 1,
 'DOY_MAX': 366,
 'DATE_MIN': None,
 'DATE_MAX': None,
 'ROI': [12.9, 52.2, 13.9, 52.7],
 'ROI_SIMPLIFY_GEOM_TO_BBOX': True,
 'SENSORS': ['L9', 'L8', 'L7', 'L5', 'L4'],
 'MAX_CLOUD': 75,
 'EXCLUDE_SLCOFF': False,
 'GCP_MIN_LANDSAT': 1,
 'MASKS_LANDSAT': ['cloud', 'cshadow', 'snow', 'fill', 'dilated'],
 'MASKS_LANDSAT_CONF': 'Medium',
 'MASKS_S2': 'CPLUS',
 'MASKS_S2_CPLUS': 0.6,
 'MASKS_S2_PROB': 30,
 'MASKS_S2_NIR_THRESH_SHADOW': 0.2,
 'MASKS_HLS': ['cloud', 'cshadow', 'snow'],
 'ERODE_DILATE': False,
 'ERODE_RADIUS': 60,
 'DILATE_RADIUS': 120,
 'ERODE_DILATE_SCALE': 60,
 'BLUE_MAX_MASKING': None,
 'FEATURES': ['BLU', 'GRN', 'RED', 'NIR', 'SW1', 'SW2', 'NDVI'],
 'CUSTOM_FORMULAS': {'MY_FORMULA1': {'formula': '(G-SW1)/(G+SW1)',
   'variable_map': {'G': 'GRN', 'SW1': 'SW1'}},
  'MY_FORMULA2': {'formula': 'MY_FORMULA/2',
   'variable_map': {'MY_FORMULA': 'MY_FORMULA1'}}},
 'UMX': None,
 'UMX_SUM_TO_ON

As you can see printing the resulting dictionary shows all parameters (keys) and associated values found in the .yml file.

## The parameter dictionary

Under the hood, geeo loaded the .yml-file and converted it into a python dictionary. **The dictionary is the central data structure used to save input and output variables in geeo**. All core processing routines (from level-2 to export) rely on the dictionary structure and also return a dictionary if run individually (`run_level2()` -> `run_level3()` -> `run_export()`).

As such, **geeo also allows for giving processing instructions using a python dictionary directly as input**  (instead of a explicit .yml file). Using a dictionary as direct input can be seen as an 'interactive mode' that allows to easily include geeo in your existing / extended Earth Engine workflow. It is important that the specified keys of the dictionary have the same name as in the parameter file. Keys that are not defined by the user will simply receive the default values from the blueprint (naturally, to know the correct names this means getting familiar with the settings and inspecting the parameter file and/or documentation).

As mentioned, a direct interaction using a dictionary is not requried to use geeo, but it can prove very usefull due to the increased flexibility. For example, if we had a recurring processing chain where the only changing variable is the ROI, we can simply loop over our different geometries, iteratively add them to our dictionary and run the processing.

---

## Running a parameter file

If we have adjusted the settings to our needs, **all we need to do in order to execute the instructions is to call the `run_param()` function onto the yml-file or dictionary.**

In [34]:
prm_run = geeo.run_param('introduction.yml')
prm_run

{'YEAR_MIN': 2023,
 'YEAR_MAX': 2023,
 'MONTH_MIN': 1,
 'MONTH_MAX': 12,
 'DOY_MIN': 1,
 'DOY_MAX': 366,
 'DATE_MIN': None,
 'DATE_MAX': None,
 'ROI': [12.9, 52.2, 13.9, 52.7],
 'ROI_SIMPLIFY_GEOM_TO_BBOX': True,
 'SENSORS': ['L9', 'L8', 'L7', 'L5', 'L4'],
 'MAX_CLOUD': 75,
 'EXCLUDE_SLCOFF': False,
 'GCP_MIN_LANDSAT': 1,
 'MASKS_LANDSAT': ['cloud', 'cshadow', 'snow', 'fill', 'dilated'],
 'MASKS_LANDSAT_CONF': 'Medium',
 'MASKS_S2': 'CPLUS',
 'MASKS_S2_CPLUS': 0.6,
 'MASKS_S2_PROB': 30,
 'MASKS_S2_NIR_THRESH_SHADOW': 0.2,
 'MASKS_HLS': ['cloud', 'cshadow', 'snow'],
 'ERODE_DILATE': False,
 'ERODE_RADIUS': 60,
 'DILATE_RADIUS': 120,
 'ERODE_DILATE_SCALE': 60,
 'BLUE_MAX_MASKING': None,
 'FEATURES': ['BLU', 'GRN', 'RED', 'NIR', 'SW1', 'SW2', 'NDVI'],
 'CUSTOM_FORMULAS': {'MY_FORMULA1': {'formula': '(G-SW1)/(G+SW1)',
   'variable_map': {'G': 'GRN', 'SW1': 'SW1'}},
  'MY_FORMULA2': {'formula': 'MY_FORMULA/2',
   'variable_map': {'MY_FORMULA': 'MY_FORMULA1'}}},
 'UMX': None,
 'UMX_SUM_TO_ON

`run_param` is a wrapper function that executes the chain of `run_level2()` -> `run_level3()` -> `run_level4` -> `run_export()`, where each output of each module is fed into the subsequent one. Each module expects as input the dictionary, and also returns the dictionary + added variables (e.g. processed ee.ImageCollections). If we only want a subset of the processing, we could also run the specific module (+ preceding modules). In practise, all level-3 and level-4 processing is disabled by default.

To run the default settings, except a few adjustments, we can simply create a dictionary ...

In [35]:
# create dictionary whose key names match the variable names in the parameter file
prm_dict = {
    'YEAR_MIN': 2020,
    'YEAR_MAX': 2022
}

... and only adjust the global year range by setting YEAR_MIN and YEAR_MAX, and feed the dict into the `run_param` function:

In [36]:
run_prm = geeo.run_param(prm_dict)
run_prm

{'YEAR_MIN': 2020,
 'YEAR_MAX': 2022,
 'MONTH_MIN': 1,
 'MONTH_MAX': 12,
 'DOY_MIN': 1,
 'DOY_MAX': 366,
 'DATE_MIN': None,
 'DATE_MAX': None,
 'ROI': [12.9, 52.2, 13.9, 52.7],
 'ROI_SIMPLIFY_GEOM_TO_BBOX': True,
 'SENSORS': ['L9', 'L8', 'L7', 'L5', 'L4'],
 'MAX_CLOUD': 75,
 'EXCLUDE_SLCOFF': False,
 'GCP_MIN_LANDSAT': 1,
 'MASKS_LANDSAT': ['cloud', 'cshadow', 'snow', 'fill', 'dilated'],
 'MASKS_LANDSAT_CONF': 'Medium',
 'MASKS_S2': 'CPLUS',
 'MASKS_S2_CPLUS': 0.6,
 'MASKS_S2_PROB': 30,
 'MASKS_S2_NIR_THRESH_SHADOW': 0.2,
 'MASKS_HLS': ['cloud', 'cshadow', 'snow'],
 'ERODE_DILATE': False,
 'ERODE_RADIUS': 60,
 'DILATE_RADIUS': 120,
 'ERODE_DILATE_SCALE': 60,
 'BLUE_MAX_MASKING': None,
 'FEATURES': ['BLU', 'GRN', 'RED', 'NIR', 'SW1', 'SW2', 'NDVI'],
 'CUSTOM_FORMULAS': {'MY_FORMULA1': {'formula': '(G-SW1)/(G+SW1)',
   'variable_map': {'G': 'GRN', 'SW1': 'SW1'}},
  'MY_FORMULA2': {'formula': 'MY_FORMULA/2',
   'variable_map': {'MY_FORMULA': 'MY_FORMULA1'}}},
 'UMX': None,
 'UMX_SUM_TO_ON

As mentioned above, all non-specified parameters (keys in dict) will be set to the default values of the [parameter blueprint file](../geeo/config/parameter_blueprint.yml) used when calling `create_parameter_file()`. 

### Inspecting the output
Let us now inspect the processing output after having run the settings in more detail.

By default, only level-2 processing is enabled, as such, the only sections which have an impact on our current output are:

- SPACE AND TIME
- TIME SERIES STACK (TSS) / SENSOR AND DATA QUALITY SETTINGS
- BANDS | INDICES | FEATURES

Inspect these sections in your `introduction.yml` file in order to comprehend the current settings. You can also take a look at the [documentation](documentation.md) for a more detailed description of each parameter. 

In summary, we are requesting all potential Landsat-4, -5, -7, -8, and -9 from 2020-2022 for the bounding box [12.9, 52.2, 13.9, 52.7] (í.e., Berlin). We are restricting the valid scenes to a maximum cloud cover of 75% and mask (dilated) clouds, cloud shadows, snow/ice, and fill values with medium cloud detection confidence (conservative masking). The masks are not further eroded and dilated. The following bands/features are requested: blue (BLU), green (GRN), red (RED), near-infrared (NIR), shortwave-infrared 1 (SW1), shortwave-infrared 2 (SW1), as well as the Normalized Difference Vegetation Index (NDVI). No user-defined functions are applied to retrieve additional features. No unmixing is conducted. No custom ee.ImageCollection is provided and the TSS is not transferred into a Time Series Mosaic (TSM), an ee.ImageCollection where ee.Images of the same date are mosaicked in order to remove duplicate observations (mostly resulting from product tiling schemes from NASA and ESA). None of the level-3 and level-4 products are calculated, and no export is requested.

Our first variable of interest for now is the Time-Series-Stack `TSS` variable, the underlying ee.ImageCollection for subsequent level-3 and level-4 processing (unless a custom collection is provided).
We can retrieve the TSS ee.ImageCollection from the dictionary now as follows:

In [6]:
TSS = run_prm.get('TSS')
TSS

NameError: name 'run_prm' is not defined

Note: Using the python Earth Engine API we can get an interactive rendering of ee objects similar to the web-based JavaScript version by using the [eerepr](https://github.com/aazuspan/eerepr) python package. For the rendered version of this tutorial notebook, we commented this part out, but take a look for yourself, this is a highly convenient functionality for interactive sessions like this one.

In [None]:
#import eerepr
#eerepr.initialize()
#TSS # This will print the ee.ImageCollection as interactive object where you can inspect the images and their metadata

In [39]:
print(TSS.size().getInfo())

424


As you can see our TSS variable is an `ee.ImageCollection` containing 424 `ee.Image` objects which sufficed our filter criteria above. Each image contains the eight specified bands + the mask as separate band (internally required for some higher-level processing later on). 

In essence, geeo always returns an `ee.Image` or `ee.ImageCollection` objects for the main processing products:

- Time Series Stack (TSS) -> ee.ImageCollection
- CUSTOM IMAGE COLLECTION (CIC) -> ee.ImageCollection
- Time Series Mosaic (TSM) -> ee.ImageCollection
- NUMBER of VALID OBSERVATIONS (NVO) -> ee.Image
- TIME SERIES INTERPOLATION (TSI) -> ee.ImageCollection
- SPECTRAL TEMPORAL METRICS (STM) -> ee.Image (no fold) / ee.ImageCollection (folding)
- PIXEL-BASED COMPOSITING (PBC) -> ee.ImageCollection
- LAND SURFACE PHENOLOGY (LSP) -> ee.ImageCollection

## Visualizing Images

For illustration purposes let us visualize one of the images in a map view. `GEEO` has a built-in function for basic visualization purposes.

We can add `ee.Image`s to our map object using the `add()` function within the VisMap class. I want to visualize the 27th image in the ee.ImageCollection. First I will have to get this specific image form the collection:

In [6]:
import ee
img = ee.Image(TSS.toList(TSS.size()).get(27))
img

<ee.image.Image at 0x2b7303c9050>

In [7]:
from geeo import VisMap

# Create map
M = VisMap()
M.add(img.select(['NIR', 'SW1', 'RED']), roi=prm_processed.get('ROI'), name='TSS_image')
M.show()

Map(center=[0.0, 0.0], controls=(ZoomControl(options=['position', 'zoom_in_text', 'zoom_in_title', 'zoom_out_t…