# Downloading and Analyzing Band Values and Timeseries Statistics

This notebook is used to extract timeseries of various band values from Landsat imagery for each catchment using the EE API. The bands are reduced to yearly means, medians, standard deviations, and percentiles, and then written to Excel files for further analysis. In addition, the code also extracts the pixel areas represented by each class of the classified image and training image, and calculates the area of each class in each year, which is also written to Excel files.

In [5]:
import ee
import geemap
import geemap.ml as ml
from ipygee import chart as chart
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
from datetime import datetime as dt
import datetime
import os
import pytz

today = dt.today()
print("Today is: ", today)

Today is:  2023-04-19 19:57:09.063825


## GEE Authentication

Before using the Earth Engine Python API, we need to authenticate our account. The authentication step is required for the first time you use Earth Engine in a new session and roughly every week thereafter.

To authenticate, run the following cell and follow the prompts to log into your Earth Engine account. You will then be prompted to copy and paste the authentication code into the box provided. Once you have pasted the code, press enter to save the token.


In [6]:
# ee.Authenticate()
geemap.ee_initialize()

## Hydroclimatic Information

The code reads a CSV file containing a list of catchment IDs and stores them as a list. The number of catchments in the list is printed to the console, followed by the list of catchment IDs. The code is designed to process hydroclimatic variables for a given set of catchments. The path and version of the input file are defined as variables at the beginning of the code. The CSV file contains a column called "ID" that lists the catchment IDs.

In [7]:
p = '..'

version = 'Version_3_20230303'

l_BE = pd.read_excel(f"{p}\Inputs\Version_3_20230303\BE.xlsx").catchment.tolist()
l_FR = pd.read_excel(f"{p}\Inputs\Version_3_20230303\FR.xlsx").catchment.tolist()

ls = l_BE+l_FR

def drop_space(i) -> str:
    '''
    Since the FR and BE data is given with indices using catchment names,
    it is necessary to check and drop the space at the end of each name where applicable.
    
    list the letters, check the last charachter for a space and delete if necessary. 
    '''
    
    ls = list(i)
    
    if ls[-1] == ' ':
        i = i[:-1]
    else:
        i = i
    return i

ls = ['Membre Pont', 'Straimont', 'Treignes', 'Chooz', 'Daverdisse', 'Jemelle', 'Hastiere', 'Warnant', 'Ortho',
      'Wiheries', 'Salzinnes', 'Huccorgne', 'Amay', 'La Meuse Goncourt', 'Le Mouzon Circourt-sur-Mouzon [Villars]',
      'Le Vair Soulosse-sous-Saint-Élophe', 'La Meuse Saint-Mihiel', 'La Meuse Stenay', 'La Chiers Longlaville',
      'La Chiers Carignan', 'La Bar Cheveuges', 'La Vence la Francheville',
      'La Sormonne Belval', 'Le Loison Han-lés-Juvigny', 'La Crusnes Pierrepont', 'Le Ton Écouviez', 'Sainte-Marie']

names = [drop_space(i) for i in ls]

print(f'{len(names)} catchments passed for classification:\n \n{names}')

27 catchments passed for classification:
 
['Membre Pont', 'Straimont', 'Treignes', 'Chooz', 'Daverdisse', 'Jemelle', 'Hastiere', 'Warnant', 'Ortho', 'Wiheries', 'Salzinnes', 'Huccorgne', 'Amay', 'La Meuse Goncourt', 'Le Mouzon Circourt-sur-Mouzon [Villars]', 'Le Vair Soulosse-sous-Saint-Élophe', 'La Meuse Saint-Mihiel', 'La Meuse Stenay', 'La Chiers Longlaville', 'La Chiers Carignan', 'La Bar Cheveuges', 'La Vence la Francheville', 'La Sormonne Belval', 'Le Loison Han-lés-Juvigny', 'La Crusnes Pierrepont', 'Le Ton Écouviez', 'Sainte-Marie']


## Load the EE package

This notebook uses an adapted landTrendr package to construct time series of Landsat imagery for land cover detection. The package is optimized for deforestation event detection and can be used with the latest version of landTrendr available in the GEE asset, which has an Apache license and is free to use.

To load the package, we use the ltgee.buildSRcollection method. Note that if the JavaScript module is faulty, the cell below will not load.

In [8]:
oeel = geemap.requireJS()

Map = geemap.Map()

ltgee = geemap.requireJS(r'../JS_module/Adapted_LT_v7.3.js')

#ltgee.availability  #all functions within the javascript module

IMPORTANT! Please be advised:
- This version of the Adapted_LT.js modules
  uses some code adapted from the aut/or: @author Justin Braaten (Google) * @author Zhiqiang Yang (USDA Forest Service) * @author Robert Kennedy (Oregon State University)
The latest edits to this code occur: 08/03/2023 for the adaptation efforts by @Mike OHanrahan (TU DELFT MSc research)


## Initiate With a Shapefile

This notebook assumes the user has a shapefile saved as an asset on their GEE, the assets used in the CATAPUCII project will be made publicly available in the @mohanrahan repository

Assigning some useful variables for later classification

Catchment Assets are available at this address:

https://code.earthengine.google.com/?asset=projects/mohanrahan/assets/CATAPUCII_Catchments


## Assigning useful variables

- The asset_dir will point to the shapefile loaded as a GEE table asset. 
- crs is important for reprojection and scaling (which will affect area calculations)
- RGB_VIS is for landsat RGB visual parameters
- start day defines the beginning of the seasonal composite period
- maskThese applies a mask (renders Null/NA/Transparent) to those majority pixels in landsat imagery

In [9]:
# Directory where assets are stored
asset_dir = 'projects/mohanrahan/assets'

# Asset ID for catchment boundaries
catchment_asset = 'CATAPUCII_Catchments/Meuse_Catchments_4326_WFLOW'

# Name of the dataset
dataset = 'Meuse'

# Column string to identify catchments
col_string  = 'station_re'

# Coordinate reference system, GB is british national grid
crs = 'EPSG:4326'

# Figure number for plotting
fignum = 0

# RGB visualization settings for Landsat imagery
RGB_VIS = {'bands':['B3','B2','B1'], 'min':0, 'max':1.5e3}

#Classified image visualisation
lc_vis = {'bands':['landcover'], 'min':1, 'max':5, 'palette':['#E6004D', '#FFFFA8', '#80FF00', '#A6A6FF', '#00CCF2']}

# Start and end years for Landsat data collection
startYear = 1984
endYear = 2022

# Start and end days for Landsat data collection
startDay = '06-20'
endDay = '08-31'

# List of images to be masked from Landsat collection
maskThese = ['cloud', 'shadow', 'snow',]

# List of bands to include in Landsat collection
bandList = ["B1", "B2", "B3", "B4", "B5", "B7", 
           'NBR', 'NDMI', 'NDVI', 'NDSI', 'EVI','GNDVI', 
           'TCB', 'TCG', 'TCW', 'TCA', 'NDFI',] 

## The Table Data

### Code:
The code imports a feature collection of catchment boundaries from the Google Earth Engine (GEE) asset directory, calculates the area of each catchment in square kilometers and pixels, sets a unique identifier for each catchment, and filters and sorts the catchment collection based on the area. It then converts the filtered and sorted collection to a pandas dataframe, selects only the catchments that are specified in a list of catchment names, and saves the resulting table to an Excel file.

### Summary:
The code fetches catchment boundaries from a GEE asset directory and calculates their areas in both square kilometers and pixels. Then, it assigns a unique identifier to each catchment and filters and sorts them based on their area. It saves a subset of the resulting catchment table that contains only the catchments specified in a list of names to an Excel file.

In [10]:
# Define the feature collection from asset directory and catchment asset name
table = ee.FeatureCollection(f"{asset_dir}/{catchment_asset}")

# Define a function to calculate the area of each geometry in square kilometer
def set_area_km2(feature):
    '''
    Calculate the area of each geometry in square kilometer
    '''
    area = feature.geometry().area().divide(1000*1000)
    setting = feature.set('area_km2', area)
    return setting

# Define a function to calculate the area of each geometry in pixels
def set_area_pixel(feature):
    '''
    Calculate the area of each geometry in pixels
    '''
    aoi = feature.geometry()
    area = ee.Image.pixelArea().divide(1e6).clip(aoi).select('area').reduceRegion(**{
        'reducer':ee.Reducer.sum(),
        'geometry':aoi,
        'scale':30,
        'crs':crs,
        'maxPixels':1e13,
        'bestEffort':True,
        }).get('area')
    setting = feature.set('pixel_area', area)
    return setting

# Define a function to set the system ID as a column
def set_id(feature):
    '''
    Set the system ID as a column
    '''
    getting_name = ee.String(feature.get('system:index'))
    setting_id = feature.set({'system_index':getting_name,})
    return setting_id

# Calculate the area of each geometry and set ID column and pixel area column
table_area = table.map(set_area_km2).map(set_id).map(set_area_pixel)

# Filter out geometries with area_km2 equal to zero and sort by area from largest to smallest
Filtered_Sorted = table_area.filter(ee.Filter.gt('area_km2', 0)).sort('area_km2', False)

# Convert the sorted table to a Pandas dataframe and set the index to 'system_index'
down = geemap.ee_to_pandas(Filtered_Sorted).set_index(['system_index'])

# Select the rows of 'down' where the column specified by 'col_string' is in the list 'names'
df1 = down.loc[down[col_string].isin(names)]

# Get the system_index values as a list
sys_index = df1.index.to_list()

# Display the filtered table
display(df1)

# Print the number of features in the filtered table
print(len(df1))

# Convert the filtered and sorted table to a Pandas dataframe
gdf = geemap.ee_to_pandas(Filtered_Sorted)

# Create a directory for the output if it doesn't exist
if not os.path.exists(f'../Outputs/{dataset}/'):
    os.makedirs(f'../Outputs/{dataset}/')

# Export the filtered and sorted table to an Excel file
gdf.to_excel(f'../Outputs/{dataset}/{dataset}_catchment_table.xlsx', index=0)

Unnamed: 0_level_0,pixel_area,area_km2,station_Y_,wflowID,station_X_,station_re,station_na
system_index,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
00000000000000000004,2681.514071,2672.267778,50.466667,9.0,4.835833,Salzinnes,Salzinnes Ronet
00000000000000000001,1471.943902,1467.021734,50.091667,4.0,4.810833,Chooz,Chooz
00000000000000000000,1368.127798,1363.749108,49.491667,3.0,5.1775,La Meuse Stenay,La Meuse Stenay
0000000000000000000d,1349.324088,1345.191904,48.866667,101.0,5.5275,La Meuse Saint-Mihiel,La Meuse Saint-Mihiel
00000000000000000022,1247.982599,1243.674956,50.533333,1401.0,5.319167,Amay,Amay
0000000000000000000e,1067.196654,1063.732496,49.633333,201.0,5.144167,La Chiers Carignan,La Chiers Carignan
00000000000000000002,913.747728,910.70003,49.866667,5.0,4.9025,Membre Pont,Membre Pont
00000000000000000003,550.486854,548.630959,50.091667,6.0,4.6775,Treignes,Treignes
00000000000000000021,440.093432,438.791674,48.4,1016.0,5.735833,Le Vair Soulosse-sous-Saint-Élophe,Le Vair Soulosse-sous-Saint-Élophe
0000000000000000001b,417.977178,416.552196,50.158333,803.0,5.260833,Jemelle,Jemelle


27


In [11]:
catchments = Filtered_Sorted.filter(ee.Filter.inList(col_string, ee.List(names)))

Map = geemap.Map()

Map.setOptions('TERRAIN')
Map.addLayer(catchments, {'color': 'green'}, 'green: Included')
Map.centerObject(Filtered_Sorted, 7)
Map

Map(center=[50.15762767624681, 5.442982293509325], controls=(ZoomControl(options=['position', 'zoom_in_text', …

## Functions for calculating Surface Reflection Derived Annual Catchment Metrics

The functions are written to be employed in the loop that outputs data on a per catchment-per year basis.

Ultimately the functions accomplish three things:

1. Return the surface reflectance zonal statistics
2. Return the area per catchment exceeding critical surface reflection values relevant to vegetation 
 - {'NBR': 0.1, 'NDMI': 0.2, 'NDVI': 0.2, 'NDSI': 0.4, 'EVI': 0.2, 'GNDVI': 0.2, 'TCB': 0.15, 'TCG': 0.1, 'TCW': 0.15, 'TCA': 0.2, 'NDFI': -0.2}
3. Combine 1 and 2 when the loop succeeds for all years on catchment n the combined dataframe is returned '{ind}_area_stats' e.g. '00000000000000000088_area_stats.xlsx'

In [12]:
def area_above_critical_values(image: ee.Image, aoi:ee.Geometry, img_date, band_crit_vals: dict, scale: int, name:str, aoi_area:float) -> dict:
    """
    Calculates the area within an image that is above the critical values for each band in band_crit_vals.
    image: ee.Image - The image to calculate the area for
    band_crit_vals: dict - A dictionary with each band as a key and its corresponding critical value as the value
    Returns: ee.Number - The area within the image that is above the critical values
    """
    area_img = ee.Image.pixelArea()

    total_areas = ee.Dictionary()

    for band, crit_val in band_crit_vals.items():
        mask = image.select(band).gt(crit_val)

        masked_area_img = area_img.mask(mask)

        total_area = masked_area_img.reduceRegion(
            reducer=ee.Reducer.sum(),
            geometry=aoi,
            scale=scale,
            maxPixels=1e13,
            bestEffort=True).get("area")

        total_areas = total_areas.set(band, ee.Number(total_area).divide(1e6))

    total_areas_dict = total_areas.getInfo()
    

    df_dictionary = {'image_date':img_date, 'name':name,'scale_m':scale,'tot_area':aoi_area, **total_areas_dict}
    pd.DataFrame(df_dictionary, index=[0]).to_excel(f'../Outputs/{dataset}/SR_timeseries/{name}_band_areas_{img_date.year}.xlsx')
    print(f'catchment {name} critical areas calculated {img_date.year} saved')
    return total_areas_dict

    
    

def msToDate(milliseconds: int) -> datetime.datetime:
    '''
    Convert a timestamp in milliseconds to a datetime object.

    Input:
        milliseconds (int): a timestamp in milliseconds
    
    Output:
        datetime.datetime: a datetime object corresponding to the timestamp
    '''
    
    base_datetime = datetime.datetime(1970, 1, 1)
    delta = datetime.timedelta(0, 0, 0, milliseconds)
    target_datetime = base_datetime + delta
    return target_datetime




def summarize_image(image: ee.Image, region: ee.Geometry, img_date, name:str, scale:int) -> dict:
    """
    Computes statistics for an image within a given region.
    image: ee.Image - The image to compute statistics for
    region: ee.Geometry - The region to compute statistics over
    Returns: dict - A dictionary with statistics for the image
    """
    stats = image.reduceRegion(
        reducer=ee.Reducer.percentile([10, 90], ['p10', 'p90']).combine(
            reducer2=ee.Reducer.mean().combine(
                reducer2=ee.Reducer.stdDev(),
                sharedInputs=True
            ),
            sharedInputs=True
        ),
        geometry=region,
        scale=scale,
        maxPixels=1e13
    )
    
    keys = stats.getInfo().keys()
    
    dictionary = {key: stats.get(key).getInfo() for key in keys}
    df_dictionary = {'image_date':img_date, 'name':name,'scale_m':scale, **dictionary}
    pd.DataFrame(df_dictionary, index=[0]).to_excel(f'../Outputs/{dataset}/SR_timeseries/{name}_band_statistics_{img_date.year}.xlsx')
    print(f'catchment {name} surface reflectance zonal statistics {img_date.year} saved')
    return dictionary

    
def df_band_areas(imcol: ee.ImageCollection, aoi: ee.Geometry, scale: int, bands: list, crit_vals:list, name: str, aoi_area:float) -> pd.DataFrame:
    """
    Creates a Pandas DataFrame with the name of the geometry, the date of the image, and the area of each image in the
    collection above the critical values for each band in bands_crit_vals, as well as the summary statistics for each
    image returned by the summarize_image function.
    imcol: ee.ImageCollection - The image collection to calculate the area for
    scale: int - The scale of the image (in meters)
    bands_crit_vals: dict - A dictionary with each band as a key and its corresponding critical value as the value
    name: str - The name of the geometry
    Returns: pd.DataFrame - A Pandas DataFrame with the geometry name, image date, area above critical values, and summary statistics for each image in the collection
    """
    
    critical_adj = [i * 1000 for i in critical]

    if len(bands) == len(critical):
        bands_crit_vals = {s:f for s,f in zip(bands, critical_adj)}
    else:
        raise ValueError("The issue is that the bandList is not matching the length of the critical values list")

    
    areas = []
    ids = imcol.aggregate_array('system:index').getInfo()
    for i, idt in enumerate(ids):
        img = imcol.filterMetadata('system:index', 'equals', idt)
        img = ee.Image(img.first())
        img_date = msToDate(img.get("system:time_start").getInfo())
        area = area_above_critical_values(img, aoi, img_date, bands_crit_vals, scale, name, aoi_area)
        summary_stats = summarize_image(img, aoi, img_date, name, scale)
        area_dict = {f'{crit_val/1000}_{band}':list(area.values())[i] for i, (band, crit_val) in enumerate(bands_crit_vals.items())}
        areas.append({
            "geometry": name,
            "image_date": img_date,
            'tot_area_km':aoi_area,
            **area_dict,
            **summary_stats
        })
        
    df = pd.DataFrame(areas)
    
    df.to_excel(f'../Outputs/{dataset}/SR_timeseries/{name}_area_stats.xlsx')
    
    print(f'catchment {name} calculated and returned combined')


In [13]:
SR_t = 'SR_timeseries'

folder_list = [SR_t]

for folder in folder_list:
    
    var = f'../Outputs/{dataset}/{folder}'
    
    if not os.path.exists(var):
        print('created')
        os.makedirs(var)


## This loop is set to generate all classifiers, it will output CSV classifiers
- The loop iterates over a list of system indices for a set of image collections.
- For each index, the corresponding feature geometry is retrieved and stored in the ''aoi'' variable.
- 'nd_bandlist' is a list of band names for which area calculation above critical values will be performed
- 'critical' is a list of critical values for each band in nd_bandlist for area calculation above critical values. The critical value determines the threshold above which the area will be calculated.
- The loop retrieves a pixel area for the feature and stores it in the area variable.
- A Landsat surface reflectance collection is built using the buildSRcollection function from the ltgee module for a specified range of years and dates, mask criteria, and band list.
- The collection is then transformed using the transformSRcollection function from the ltgee module.
- The df_band_areas function is then called to create a Pandas DataFrame with the name of the geometry, the date of the image, and the area of each image in the collection above the critical values for each band in nd_bandlist.
- The time taken for each iteration and total time taken for the entire loop is printed to the console.

In [None]:
t0 = dt.today()

nd_bandlist = ['NBR','NDMI', 'NDVI', 'NDSI', 'EVI','GNDVI', 'TCB', 'TCG', 'TCW', 'TCA', 'NDFI',]
critical = [  0.1,  0.2,     0.2,   0.4,   0.2,    0.2,   0.15,   0.1,  0.15,   0.2,    -0.2]


print(f'begin loop: {t0}')


for i, ind in enumerate(sys_index):
    
    name = ind
    
    feat = Filtered_Sorted.filterMetadata('system:index', 'equals', ind).first()
    
    aoi = feat.geometry()
    
    area = feat.get('pixel_area').getInfo()
    
    t1 = dt.today()
    
    print(f'{t1}\nDataset: {dataset}, \ncatchment: {name}, \nSurface Reflectance Processing ...\n')
    
    annual_med = ltgee.buildSRcollection(startYear, endYear, startDay, endDay, aoi, maskThese, [''])#.map(clip_collection) #much slower when clipped
    
    annual_med_calc = ltgee.transformSRcollection(annual_med, bandList)
    
    df_band_areas(annual_med_calc, aoi, 30, nd_bandlist, critical,  ind, area)
    
    t4 = dt.today()
    print(f'\nCatchment: {name}, total time: {t4-t1}\n---------------')
    
    # if ind == sys_index[0]:
    #     break


tfinal = dt.today()

print(f'END LOOP: Full routine finished: {tfinal} \nTime taken: {tfinal-t0}')

begin loop: 2023-04-19 19:57:36.759478
2023-04-19 19:57:37.124700
Dataset: Meuse, 
catchment: 00000000000000000004, 
Surface Reflectance Processing ...

catchment 00000000000000000004 critical areas calculated 1984 saved
catchment 00000000000000000004 surface reflectance zonal statistics 1984 saved
catchment 00000000000000000004 critical areas calculated 1985 saved
catchment 00000000000000000004 surface reflectance zonal statistics 1985 saved
catchment 00000000000000000004 critical areas calculated 1986 saved
catchment 00000000000000000004 surface reflectance zonal statistics 1986 saved
catchment 00000000000000000004 critical areas calculated 1987 saved
