# Geographical and Ecological Earth Observation (GEEO) - Introduction

[![Open in Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/leonsnill/geeo/blob/master/docs/tutorial_0_introducing-geeo.ipynb)

In [31]:
my_project_name =''

# imports
import os
import ee
ee.Authenticate()
ee.Initialize(project=my_project_name)
try:
    import geeo
except:
    !pip install git+https://github.com/leonsnill/geeo.git
    import geeo

**GEEO** is a processing pipeline and collection of algorithms for obtaining Analysis-Ready-Data (ARD) from multispectral image archives, including Landsat and Sentinel-2, using the Google Earth Engine Python API. 
The processing modules are organized along different hierarchical levels:

- LEVEL-2 (`geeo/level2`): 
- LEVEL-3 (`geeo/level3`):
- LEVEL-4 (`geeo/level4`):
- EXPORT (`geeo/export`): 

Typically, the user does not directly interact with the submodules, but simply provides processing instructions defined via either a text file (.yml) or python dictionary.  

## Creating a parameter file
The parameter file is .yml-text file containing all settings for processing and exporting data using geeo. It can be created from an existing blueprint using the `create_parameter_file` function:

In [32]:
# create new parameter .yml file from blueprint
geeo.create_parameter_file('introduction', overwrite=True)
#print("File created in current working directory: ", os.getcwd())

Parameter file created: introduction.yml


Inspect the newly created parameter file. It contains the set of variables which define if and how the processing happens. It follows the structure of the main modules (LEVEL-2, LEVEL-3, EXPORT) and is further subdivided into categories. The LEVEL-2 section starts of with basic settings regarding study area and overall time window (SPACE AND TIME), the desired sensor and masking settings (SENSOR AND DATA QUALITY SETTINGS), as well as which features/bands to include (BANDS | INDICES | FEATURES). 

For now, only take a look at the *SPACE AND TIME* settings block under LEVEL-2 which contains the following variables and default values:

YEAR_MIN: 2023  
YEAR_MAX: 2023  
MONTH_MIN: 1  
MONTH_MAX: 12  
DOY_MIN: 1  
DOY_MAX: 366  
DATE_MIN: null  
DATE_MAX: null  
ROI: [12.9, 52.2, 13.9, 52.7]  
ROI_SIMPLIFY_GEOM_TO_BBOX: true  

As can be suspected, this block allows to specify the spatial extent and temporal window for which to process the data. The edited values must match the expected format for each variable. Mostly, this is self-explainatory (e.g. the integer format for YEAR, MONTH, and DOY variables), but can also be of different format for the same variable (e.g. ROI accepts a list of numeric coordinates, a string poath to a file, a GeoPackage object or ee.Geometry and ee.FeatureCollection). Generally, information on the variable specification in the parameter file can be found in the [documentation](documentation.md).

## Loading parameters (without directly running them)
If we only wanted to load parameter settings from an existing .yml file, we can use `load_parameters` function which converts the yml-file into a python dictionary containing all defined variables:

In [33]:
prm_dict = geeo.load_parameters('introduction.yml')
#prm_dict

## Running a parameter file
If we - in theory - had edited the settings to our need, or simply wanted to run the default settings, all we would now have to do is call the `run_param()` function onto the yml-file and geeo would execute the instructions.

In [34]:
prm_run = geeo.run_param('introduction.yml')
prm_run

{'YEAR_MIN': 2023,
 'YEAR_MAX': 2023,
 'MONTH_MIN': 1,
 'MONTH_MAX': 12,
 'DOY_MIN': 1,
 'DOY_MAX': 366,
 'DATE_MIN': None,
 'DATE_MAX': None,
 'ROI': [12.9, 52.2, 13.9, 52.7],
 'ROI_SIMPLIFY_GEOM_TO_BBOX': True,
 'SENSORS': ['L9', 'L8', 'L7', 'L5', 'L4'],
 'MAX_CLOUD': 75,
 'EXCLUDE_SLCOFF': False,
 'GCP_MIN_LANDSAT': 1,
 'MASKS_LANDSAT': ['cloud', 'cshadow', 'snow', 'fill', 'dilated'],
 'MASKS_LANDSAT_CONF': 'Medium',
 'MASKS_S2': 'CPLUS',
 'MASKS_S2_CPLUS': 0.6,
 'MASKS_S2_PROB': 30,
 'MASKS_S2_NIR_THRESH_SHADOW': 0.2,
 'MASKS_HLS': ['cloud', 'cshadow', 'snow'],
 'ERODE_DILATE': False,
 'ERODE_RADIUS': 60,
 'DILATE_RADIUS': 120,
 'ERODE_DILATE_SCALE': 60,
 'BLUE_MAX_MASKING': None,
 'FEATURES': ['BLU', 'GRN', 'RED', 'NIR', 'SW1', 'SW2', 'NDVI'],
 'CUSTOM_FORMULAS': {'MY_FORMULA1': {'formula': '(G-SW1)/(G+SW1)',
   'variable_map': {'G': 'GRN', 'SW1': 'SW1'}},
  'MY_FORMULA2': {'formula': 'MY_FORMULA/2',
   'variable_map': {'MY_FORMULA': 'MY_FORMULA1'}}},
 'UMX': None,
 'UMX_SUM_TO_ON

This dictionary now contains all variables from the parameter file and associated values. 

## The parameter dictionary

Under the hood, geeo loads the .yml-file and converts it into a python dictionary. **The dictionary is the central data structure used to save input and output variables in geeo**. All core processing routines rely on this dictionary structure containing the required instructions and also return a dictionary if run individually (`run_level2()` -> `run_level3()` -> `run_export()`) (more on this below).

As such, we can also give instructions directly using a python dictionary as input.

### Using a dictionary for parameter settings
The dictionary is a more interactive alternative that allows to easily include geeo in your existing / extended Earth Engine workflow. To use this approach you simply create a python dictionary where the keys represent the variable names of the parameter file and the values - well - the values. It is important that the keys of the dictionary have the same name as in the parameter file. Keys which are not defined will simply receive the default value as found in the blueprint (naturally, to know the correct names this means getting familiar with the settings and inspecting the parameter file and/or documentation). 

In general, a direct interaction using a dictionary is not requried to use geeo, but can be very usefull. Right now the advantage of this might not entirely obvious, but many of the settings in geeo have default settings which you either do need to adjust (or need for your specific workflow), or more importantly, it allows you to quickly and interactively adjust settings in your existing workflow (for example only changing the ROI variable iteratively as you request the same processing routine for your different study areas). 

Let's only focus on the above variables we already gotton to know from the *SPACE AND TIME* sub-category. If we only wanted to overwrite the default year range from 2023-2023 to 2020-2022, all we would need to create is:

In [35]:
# create dictionary whose key names match the variable names in the parameter file
prm_dict = {
    'YEAR_MIN': 2020,
    'YEAR_MAX': 2022
}

Now if we called `run_param()` and used `prm_dict` as input, it would run the default settings except our modified year range.

In [36]:
run_prm = geeo.run_param(prm_dict)
run_prm

{'YEAR_MIN': 2020,
 'YEAR_MAX': 2022,
 'MONTH_MIN': 1,
 'MONTH_MAX': 12,
 'DOY_MIN': 1,
 'DOY_MAX': 366,
 'DATE_MIN': None,
 'DATE_MAX': None,
 'ROI': [12.9, 52.2, 13.9, 52.7],
 'ROI_SIMPLIFY_GEOM_TO_BBOX': True,
 'SENSORS': ['L9', 'L8', 'L7', 'L5', 'L4'],
 'MAX_CLOUD': 75,
 'EXCLUDE_SLCOFF': False,
 'GCP_MIN_LANDSAT': 1,
 'MASKS_LANDSAT': ['cloud', 'cshadow', 'snow', 'fill', 'dilated'],
 'MASKS_LANDSAT_CONF': 'Medium',
 'MASKS_S2': 'CPLUS',
 'MASKS_S2_CPLUS': 0.6,
 'MASKS_S2_PROB': 30,
 'MASKS_S2_NIR_THRESH_SHADOW': 0.2,
 'MASKS_HLS': ['cloud', 'cshadow', 'snow'],
 'ERODE_DILATE': False,
 'ERODE_RADIUS': 60,
 'DILATE_RADIUS': 120,
 'ERODE_DILATE_SCALE': 60,
 'BLUE_MAX_MASKING': None,
 'FEATURES': ['BLU', 'GRN', 'RED', 'NIR', 'SW1', 'SW2', 'NDVI'],
 'CUSTOM_FORMULAS': {'MY_FORMULA1': {'formula': '(G-SW1)/(G+SW1)',
   'variable_map': {'G': 'GRN', 'SW1': 'SW1'}},
  'MY_FORMULA2': {'formula': 'MY_FORMULA/2',
   'variable_map': {'MY_FORMULA': 'MY_FORMULA1'}}},
 'UMX': None,
 'UMX_SUM_TO_ON

### Inspecting the output
Let us now inspect the processing output in more detail.

The only sections which have an impact on our output are currently:

- SPACE AND TIME
- TIME SERIES STACK (TSS) / SENSOR AND DATA QUALITY SETTINGS
- BANDS | INDICES | FEATURES

Inspect these sections to comprehend the current settings. For a (detailed) description on the valid options for each variable, please inspect the comments in the yml-file and/or read the [documentation](documentation.md). 

In summary, we are requesting all potential Landsat-4, -5, -7, -8, and -9 from 2020-2022 for the bounding box [12.9, 52.2, 13.9, 52.7] that is Berlin. We are restricting the valid scenes to a maximum cloud cover of 75% and mask (dilated) clouds, cloud shadows, snow/ice, and fill values with medium cloud detection confidence (conservative masking). The following bands/features are requested: blue (BLU), green (GRN), red (RED), near-infrared (NIR), shortwave-infrared 1 (SW1), shortwave-infrared 2 (SW1), as well as the Normalized Difference Vegetation Index (NDVI).

By default, neither export nor level-3 processing is requested.

The first variable of interest for now is the Time-Series-Stack `TSS` variable, the fundamental ee.ImageCollection processed to our settings and potentially serving as input for export and level-3 procesing.
We can retrieve the ee.ImageCollection from the dictionary as follows:

In [37]:
TSS = run_prm.get('TSS')
TSS

<ee.imagecollection.ImageCollection at 0x7f15387ff440>

Using the python Earth Engine API we can get a interactive rendering of ee objects similar to the web-based JavaScript version by using the [eerepr](https://github.com/aazuspan/eerepr) python package. For the rendered version of this tutorial notebook, we comment this part out, but take a look for yourself.

In [38]:
#import eerepr
#TSS

In [39]:
print(TSS.size().getInfo())

424


As you can see our TSS variable is an `ee.ImageCollection` containing 424 `ee.Image` objects which sufficed our filter criteria above. Each image contains the eight specified bands + the mask as separate band (internally required for some higher-level processing later on). 

In essence, geeo always returns an `ee.Image` or `ee.ImageCollection` objects for the main processing products:

- Time Series Stack (TSS) -> ee.ImageCollection
- CUSTOM IMAGE COLLECTION (CIC) -> ee.ImageCollection
- Time Series Mosaic (TSM) -> ee.ImageCollection
- NUMBER of VALID OBSERVATIONS (NVO) -> ee.Image
- TIME SERIES INTERPOLATION (TSI) -> ee.ImageCollection
- SPECTRAL TEMPORAL METRICS (STM) -> ee.Image (no fold) / ee.ImageCollection (folding)
- PIXEL-BASED COMPOSITING (PBC) -> ee.ImageCollection
- LAND SURFACE PHENOLOGY (LSP) -> ee.ImageCollection

## Visualizing Images

For illustration purposes let us visualize one of the images in a map view. `GEEO` has a built-in function for basic visualization purposes.

We can add `ee.Image`s to our map object using the `add()` function within the VisMap class. I want to visualize the 27th image in the ee.ImageCollection. First I will have to get this specific image form the collection:

In [6]:
import ee
img = ee.Image(TSS.toList(TSS.size()).get(27))
img

<ee.image.Image at 0x2b7303c9050>

In [7]:
from geeo import VisMap

# Create map
M = VisMap()
M.add(img.select(['NIR', 'SW1', 'RED']), roi=prm_processed.get('ROI'), name='TSS_image')
M.show()

Map(center=[0.0, 0.0], controls=(ZoomControl(options=['position', 'zoom_in_text', 'zoom_in_title', 'zoom_out_t…

## Updating a parameter dictionary

We can also update a parameter dictionary or yml-file directly in the code/console.

Let's say we wanted to switch the study area and also calculate STMs, both not yet specified in the `introduction.yml` file. Instead of modifying the file we can modify the dictionary.

Let's extract the bounding box of our new hypothetical study area using the map window. We draw a rectangle with the tools on the left and select and copy the coordinates in the bottom right corner.

In [8]:
VisMap().show()

Map(center=[0.0, 0.0], controls=(ZoomControl(options=['position', 'zoom_in_text', 'zoom_in_title', 'zoom_out_t…

Generally, if we specify standard processing settings in a yml-file but want to interactively change certain variables we can use the `merge_parameters` function.

In [9]:
from geeo import merge_parameters

# new study area
new_roi = [11.212921, 47.543627, 11.491699, 47.692663]

# update the parameter file
prm = merge_parameters(load_parameters('introduction.yml'), {'ROI': new_roi, 'YEAR_MIN': 2021, 'YEAR_MAX': 2024})

prm_processed = run_param(prm)

prm_processed

{'YEAR_MIN': 2021,
 'YEAR_MAX': 2024,
 'MONTH_MIN': 1,
 'MONTH_MAX': 12,
 'DOY_MIN': 1,
 'DOY_MAX': 366,
 'DATE_MIN': None,
 'DATE_MAX': None,
 'ROI': [11.212921, 47.543627, 11.491699, 47.692663],
 'ROI_SIMPLIFY_GEOM_TO_BBOX': False,
 'SENSORS': ['L9', 'L8', 'L7', 'L5', 'L4'],
 'MAX_CLOUD': 75,
 'EXCLUDE_SLCOFF': False,
 'GCP_MIN_LANDSAT': 1,
 'MASKS_LANDSAT': ['cloud', 'cshadow', 'snow', 'fill', 'dilated'],
 'MASKS_LANDSAT_CONF': 'Medium',
 'MASKS_S2': 'CPLUS',
 'MASKS_S2_CPLUS': 0.6,
 'MASKS_S2_PROB': 30,
 'MASKS_S2_NIR_THRESH_SHADOW': 0.2,
 'ERODE_DILATE': False,
 'ERODE_RADIUS': 60,
 'DILATE_RADIUS': 120,
 'ERODE_DILATE_SCALE': 90,
 'BLUE_MAX_MASKING': None,
 'FEATURES': ['BLU', 'GRN', 'RED', 'NIR', 'SW1', 'SW2'],
 'DEM': False,
 'UMX': None,
 'UMX_SUM_TO_ONE': True,
 'UMX_NON_NEGATIVE': True,
 'UMX_REMOVE_INPUT_FEATURES': True,
 'TSM': False,
 'FOLD_YEAR': False,
 'FOLD_MONTH': False,
 'FOLD_CUSTOM': {'year': None, 'month': None, 'doy': None, 'date': None},
 'TSI': None,
 'TSI_BAS

In [10]:
TSS = prm_processed.get('TSS')

# Create map
M = VisMap()
M.add(ee.Image(TSS.toList(TSS.size()).get(10)).select(['NIR', 'SW1', 'RED']), roi=prm_processed.get('ROI'), name='TSS_image')
M.show()

Map(center=[0.0, 0.0], controls=(ZoomControl(options=['position', 'zoom_in_text', 'zoom_in_title', 'zoom_out_t…