<h2>Retrieve satellite data using Google Earth Engine (GEE) corresponding to all resolvable ponds</h2>

<h4>Code accompanying "Satellite imagery as a management tool for monitoring water clarity across freshwater ponds in Cape Cod, Massachusetts" (Coffer et al., 2024, <em>Journal of Environmental Management</em>). 
Python code written by co-author Nikolay Nezlin.</h4>

* Select several ponds
* Select one satellite (GEE product)
* For each pond:
    * Calculate the pond center - the most offshore location  
    * Make ```df``` with times of observations with cloud cover ```<threshold```  
    * Extract time series of Rrs in the pond center  
    * Export ```df``` with Rrs as CSV file

**Step 1: Load all required packages. If a package has not yet been installed, run "conda install [package name]" from Anaconda Prompt.**

In [None]:
import os
import ee
import geopandas as gpd
import numpy as np
import matplotlib.pyplot as plt
import cartopy.crs as ccrs
import pandas as pd
import geemap
import datetime
import pytz
from shapely.geometry import Polygon, Point

**Step 2: Initialize GEE and change the project directory to the folder where the "CCC_Ponds_Export_2020.gdb" geodatabase is stored. To initialize GEE, change 'your-gee-project' to the name of your GEE project. Then, the following code will open an internet window through which you must provide all permissions and copy the code generated on the last screen. Copy that wherever your code editor requests it (for Visual Studio code, the box is at the top of the window); once this is done once, the project should be initialized for future code execution.<br /><br />Here you can also change the date range of interest. Research efforts initially focused on the monitoring season, which spans 1 April through 31 October of each year. This code is capable of running all timespans for each sensor, but this will be quite computationally intensive and can take many days to run. Just running a single monitoring season such as for the example below will take about a full day to run. The following range of dates is available for each satellite sensor of interest:<br /><li>Landsat 5 (1 March 1984 - 5 June 2013)<br /><li>Landsat 7 (15 April 1999 - 6 April 2022)<br /><li>Landsat 8 (11 February 2013 - present)<br /><li>Landsat 9 (27 September 2021 - present)<br /><li>Sentinel-2 (23 June 2015 - present)<br /><br />This is the only section of the code that should need to be updated by the user.**

In [None]:
# Change the following paths  
proj_dir = '...'
ccc_ponds_fn = 'Input_Data/Pond_Geodatabase/CCC_Ponds_Export_2020.gdb'
out_dir = 'Input_Data/Reflectance_Timeseries/'
if (not os.path.exists(os.path.join(proj_dir,out_dir))):
        os.mkdir(os.path.join(proj_dir,out_dir))

# Set the range of dates to be quiered (YYYY-MM-DD)
date_start = '2023-04-01'
date_end = '2023-10-31'

# Import and initialize Earth Engine package
try:
  # Trigger the authentication flow.
  ee.Authenticate()
  ee.Initialize(project='capecod-sdd')
  print('The Earth Engine package initialized successfully!')
except ee.EEException as e:
  print('The Earth Engine package failed to initialize!')
except:
    print("Unexpected error:", sys.exc_info()[0])
    raise

**Step 3: Run this cell to define all needed functions for the remainder of the script. Note, running this cell won't produce any output, but will set the script up to be able to run the remainder of the code.**

In [None]:
# Find the point at the maximum distance from shore
def findPmaxOffshore(pond_geom):
    # Boundaries
    pond_bounds = pond_geom.bounds
    # XY grid
    x1d = np.linspace(pond_bounds[0], pond_bounds[2])
    y1d = np.linspace(pond_bounds[1], pond_bounds[3])
    xx, yy = np.meshgrid(x1d, y1d)
    # Create geodataframe
    grid_gdf = gpd.GeoDataFrame(geometry=gpd.points_from_xy(x=xx.reshape(-1), y=yy.reshape(-1)), crs='EPSG:26919')
    # Points in pond
    grid_gdf = grid_gdf.iloc[[pond_geom.contains(g) for g in grid_gdf.geometry]]
    # Create geometry of land around the pond
    lat_point_list = np.asarray([pond_bounds[1], pond_bounds[1], pond_bounds[3], pond_bounds[3], pond_bounds[1]])
    lon_point_list = np.asarray([pond_bounds[0], pond_bounds[2], pond_bounds[2], pond_bounds[0], pond_bounds[0]])
    land_geom = Polygon(zip(lon_point_list, lat_point_list)).difference(pond_geom)
    # Distance from land
    grid_gdf['dist'] = [land_geom.distance(gg) for gg in grid_gdf.geometry]
    # Find maximum
    max_offshore_point = grid_gdf.loc[grid_gdf['dist']==grid_gdf['dist'].max()]['geometry']
    return max_offshore_point

# Set parameters for the given satellite 
def set_satellite(sat_code):
    if (sat_code == 'L5'):
        # Landsat-5 L1 (TOA)
        # 1984-03-16T16:18:01Z–2012-05-05T17:54:06
        ee_product = "LANDSAT/LT05/C02/T1_TOA"
        bands_list = ['B1', 'B2', 'B3', 'B4', 'B5','B7']  # Optical bands
        qa_bands_list = ['QA_PIXEL','QA_RADSAT'] # Do not use 'SR_CLOUD_QA'! LEDAPS quality conditions are wrong! 
        img_scale = 1
        img_offset = 0
        scale_m = 30  # Resolution for B1-B7 from "LANDSAT_LC05_C02_T1_TOA"
        bands4rgb_dict = {'Red':'B3', 'Green':'B2', 'Blue':'B1'}
        save_dir = os.path.join(proj_dir, out_dir)
    elif (sat_code == 'L7'):
        # Landsat-7 L1 (TOA)
        # 1999-05-28T01:02:17Z–present
        ee_product = "LANDSAT/LE07/C02/T1_TOA"
        bands_list = ['B1', 'B2', 'B3', 'B4', 'B5','B7']  # Optical bands
        qa_bands_list = ['QA_PIXEL','QA_RADSAT']
        img_scale = 1
        img_offset = 0
        scale_m = 30  # Resolution for B1-B5 from "LANDSAT_LC07_C02_T1_TOA"
        bands4rgb_dict = {'Red':'B4', 'Green':'B3', 'Blue':'B2'}
        save_dir = os.path.join(proj_dir, out_dir)
    elif (sat_code == 'L8'):
        # Landsat-8 L1 (TOA)
        # 2013-03-18T15:58:14Z–present
        ee_product = "LANDSAT/LC08/C02/T1_TOA"
        bands_list = ['B1', 'B2', 'B3', 'B4', 'B5','B6','B7']  # Optical bands
        qa_bands_list = ['QA_PIXEL','QA_RADSAT']
        img_scale = 1
        img_offset = 0
        scale_m = 30  # Resolution for B1-B7 from "LANDSAT_LC08_C02_T1_TOA"
        bands4rgb_dict = {'Red':'B4', 'Green':'B3', 'Blue':'B2'}
        save_dir = os.path.join(proj_dir, out_dir)
    elif (sat_code == 'L9'):
        # Landsat-9 L1 (TOA)
        # 2021-10-31T00:00:00Z–present
        ee_product = "LANDSAT/LC09/C02/T1_TOA"
        bands_list = ['B1', 'B2', 'B3', 'B4', 'B5','B6','B7']  # Optical bands
        qa_bands_list = ['QA_PIXEL','QA_RADSAT']
        img_scale = 1
        img_offset = 0
        scale_m = 30  # Resolution for B1-B7 from "LANDSAT_LC09_C02_T1_TOA"
        bands4rgb_dict = {'Red':'B4', 'Green':'B3', 'Blue':'B2'}
        save_dir = os.path.join(proj_dir, out_dir)
    elif (sat_code == 'S2'):    
        # Sentinel-2 L1 (TOA)
        # 2017-03-28T00:00:00Z - present
        ee_product = "COPERNICUS/S2_HARMONIZED"  # L1C-TOA
        bands_list = ['B1', 'B2', 'B3', 'B4', 'B5','B6','B7','B8','B8A','B9','B10','B11','B12']  # Optical bands
        qa_bands_list = ['QA60'] 
        img_scale = 0.0001
        img_offset = 0
        scale_m = 10  
        bands4rgb_dict = {'Red':'B4', 'Green':'B3', 'Blue':'B2'}
        save_dir = os.path.join(proj_dir, out_dir)
    else:
        print('Wrong satellite code %s' % sat_code)
        ee_product = ""  
        bands_list = []  # Optical bands
        qa_bands_list = [] 
        img_scale = 1
        img_offset = 0
        scale_m = 30  
        bands4rgb_dict = {}
        save_dir = ''
    return ee_product,bands_list,qa_bands_list,img_scale,img_offset,scale_m,bands4rgb_dict,save_dir

# Transform EE date to str ```YYYYMMdd'T'HHmmss```, which wiill be used in file name
def date_ee2str(date_ee, date_format="YYYYMMdd'T'HHmmss"):
    date_str = ee.Date(date_ee).format(date_format).getInfo()
    return date_str

# Transform EE date to python ```datetime```
def date_ee2datetime(date_ee):
    date_time = datetime.datetime.fromtimestamp(date_ee/1000, tz=pytz.UTC)
    return date_time

# Mask clouds for Landsat sensors 
def img_mask_L5789(image):
    QA_RADSAT = image.select('QA_RADSAT')
    radsatMask = QA_RADSAT.bitwiseAnd(0b111111101001).eq(0)
    QA_PIXEL = image.select('QA_PIXEL')
    fillMask = QA_PIXEL.bitwiseAnd(1<<0).eq(0)  # Image data
    dilatedCloudMask = QA_PIXEL.bitwiseAnd(1<<1).eq(0)  # Dilated Cloud
    cirrusMask = QA_PIXEL.bitwiseAnd(1<<2).eq(0)  # Cirrus
    cloudMask = QA_PIXEL.bitwiseAnd(1<<3).eq(0)  # Cloud
    cloudShadowMask = QA_PIXEL.bitwiseAnd(1<<4).eq(0)  # Cloud shadow
    maskTot = radsatMask.And(dilatedCloudMask).And(cirrusMask).And(cloudMask).And(cloudShadowMask).rename('mask')
    img_masked = image.updateMask(maskTot)
    #
    return img_masked

# Mask clouds for Sentinel-2 sensor using opaqueCloudsMask and cirrusCloudsMask
def img_mask_S2(image):
    if ('QA60' in qa_bands_list):
        QA60 = image.select('QA60')
        opaqueCloudsMask = QA60.bitwiseAnd(1<<10).eq(0)
        cirrusCloudsMask = QA60.bitwiseAnd(1<<11).eq(0)
    else:
        opaqueCloudsMask = ee.Image(1)
        cirrusCloudsMask = ee.Image(1)
    #
    maskTot = opaqueCloudsMask.And(cirrusCloudsMask).rename('mask')
    img_masked = image.updateMask(maskTot)
    #
    return img_masked

# Calculate the number of images available for each sample
def calcNimg(feature):
    dateRange4stn = ee.DateRange(ee.Date(feature.get('date')).advance(-delta_days,'day'), \
                    ee.Date(feature.get('date')).advance(delta_days,'day'))
    imgColl4stn = imgCollMasked.filterDate(dateRange4stn.start(), dateRange4stn.end()) \
                .filterBounds(feature.geometry()) \
                .map(detectar_duplicador).filter(ee.Filter.eq("duplicate","no duplicate"))
    feature = feature.set({'nImg': imgColl4stn.size()})
    return feature

# Remove images with duplicated dates
# From: https://gis.stackexchange.com/questions/336257/filter-out-duplicate-sentinel-2-images-form-earth-engine-image-collection-by-dat
def detectar_duplicador(image):
    esduplicado = ee.String("")
    numero = eeImgCollList.indexOf(image)
    image1 = ee.Image(eeImgCollList.get(numero.add(1)))
    # Compare the image(0) in the ImageCollection with the image(1) in the List
    fecha1 = image.date().format("Y-M-d")
    fecha2 = image1.date().format("Y-M-d")
    estado = ee.Algorithms.IsEqual(fecha1,fecha2)
    esduplicado = ee.String(ee.Algorithms.If(estado, "duplicate", "no duplicate"))
    return image.set({"duplicate": esduplicado})

# Calculate mean and Std.Dev of Rrs of <bands_list> in the circular region around the station location
def calc_RrsMeanStd(feature):
    # Filter the images collection within the date range
    dateRange4stn = ee.DateRange(ee.Date(feature.get('date')).advance(-delta_days,'day'), \
                    ee.Date(feature.get('date')).advance(delta_days,'day'))
    # Add to each image a band 'days_lag' and sort the collection 
    def add_days_lag(image):
        lagDays = ee.Number(ee.Date(image.get("system:time_start")) \
                     .difference(start=ee.Date(feature.get('date')), unit='day'))
        lagAbsDays = lagDays.abs()
        image = image.set({'lagAbsDays':lagAbsDays}) \
            .addBands(ee.Image.constant(lagDays).toFloat().rename('lagDays'))
        return image
    #
    imgColl4stn = imgCollMasked.filterDate(dateRange4stn.start(), dateRange4stn.end()) \
                .filterBounds(feature.geometry()) \
                .map(detectar_duplicador).filter(ee.Filter.eq("duplicate","no duplicate")) \
                .map(add_days_lag).sort('lagAbsDays',False) 
    #
    reducerMeanStDev = ee.Reducer.mean().combine(ee.Reducer.stdDev(),'',True)
    sampleGeom = ee.Geometry.Point([feature.getNumber('longitude'), feature.getNumber('latitude')]) \
                                    .buffer(bufferR, maxError=1) \
                    .intersection(feature.geometry().buffer(bufferS, maxError=1), maxError=1)
    feature_means = imgColl4stn.select(bands_list + ['lagDays']) \
                .mosaic().reduceRegion(**{ \
                      'reducer': reducerMeanStDev, \
                      'geometry': sampleGeom, \
                      'scale': scale_m, \
                      'maxPixels': 1e19 \
                })
    return feature.set(feature_means)

# Calculate average Rrs 
def calcRrsIn(feature):
    feature = feature.set({'RrsIn': feature.propertyNames().contains(bands_list[0]+'_mean')})
    return feature

# Add the product ID 
def add_product_id(feature):
    date_range = ee.Date(feature.get('date')).advance(feature.get('lagDays_mean'),'day').getRange('day')
    productIdStr = ee.Algorithms.If(condition=ee.String(sat_code).equals('S2'), \
                                    trueCase='PRODUCT_ID', falseCase='LANDSAT_PRODUCT_ID')
    PRODUCT_ID = eeImgColl.filterDate(date_range.start(),date_range.end()).first().get(productIdStr)
    return feature.set({'PRODUCT_ID': PRODUCT_ID})

**Step 4: Read in input data, including the CCC geodatabase**

In [None]:
# Read in geodatabase 
ccc_ponds_gdf = gpd.read_file(os.path.join(proj_dir,ccc_ponds_fn))
# Drop the objects without CCC_GIS_ID and Shape_Area
ccc_ponds_gdf.dropna(axis=0, how='any', subset=['CCC_GIS_ID','Shape_Area'], inplace=True)
ccc_ponds_gdf['CCC_GIS_ID'] = [s.strip() for s in ccc_ponds_gdf['CCC_GIS_ID']]
ccc_ponds_gdf = ccc_ponds_gdf.loc[ccc_ponds_gdf['CCC_GIS_ID']!='']
# Sort the objects
ccc_ponds_gdf.sort_values(by='Shape_Area', inplace=True, ascending=False, ignore_index=True)
# Drop the objects with area <1 ha
area_min = 10000 # 1 ha
ccc_ponds_gdf = ccc_ponds_gdf.loc[ccc_ponds_gdf['Shape_Area']>=area_min]
# Transform to EPSG:4326
ccc_ponds_gdf = ccc_ponds_gdf.to_crs('EPSG:4326')
# Transform multipolygons to polygons
ccc_ponds_gdf=ccc_ponds_gdf.explode(index_parts=True)

**Step 5: For each satellite, extract spectral information at the center of each pond and export as a CSV file. First, set parameters used for each satellite sensor.**

In [None]:
# The buffer around each sampling point (m)
bufferR = 10
# The buffer offshore (m, set to less than 0)
bufferS = -10
# The number of stations to process
nStn2process = 10 
# The maximum time lag between sample and image
delta_days = 1  
# Cloud cover threshold (%)
cloud_cover_threshold = 90

**<em>Sentinel-2 (S2) which was launched 28 March 2017 and is still operational.</em>** 

In [None]:
# Set the satellite code
sat_code = 'S2'

# Loop through each pond in ccc_ponds_gdf and extract satellite data 
for k_pond in range(0,len(ccc_ponds_gdf)):
    ccc_ponds_gdfSel = ccc_ponds_gdf.iloc[k_pond]

    # Calculate the pond center - the most offshore location
    pond_geom = ccc_ponds_gdfSel['geometry']
    max_offshore_point = findPmaxOffshore(pond_geom)
    ee_product,bands_list,qa_bands_list,img_scale,img_offset,scale_m,bands4rgb_dict,save_dir = set_satellite(sat_code)
    colNames = ['PRODUCT_ID','CCC_GIS_ID','day_str','lagDays_mean','lagDays_stdDev','latitude','longitude'] + list(np.array([[s+'_mean',s+'_stdDev'] for s in bands_list]).flatten())
    save_fn = '%03d_%s_R%03d.csv' % (k_pond, ccc_ponds_gdfSel['CCC_GIS_ID'], bufferR)
    roi_ee = ee.Geometry.Point([float(max_offshore_point.x.item()), float(max_offshore_point.y.item())]).buffer(bufferR)
    ee_img_coll = ee.ImageCollection(ee_product).filterDate(date_start, date_end).filterBounds(roi_ee).sort("system:time_start").select(bands_list + qa_bands_list)
    
    # The number of images in the collection
    ee_img_count = ee_img_coll.size().getInfo()
    ee_img_collA = ee_img_coll.filter(ee.Filter.lt('CLOUD_COVERAGE_ASSESSMENT', cloud_cover_threshold))
    img_dates_df = pd.DataFrame()
    img_dates_df['date_ee'] = ee_img_collA.aggregate_array("system:time_start").getInfo()
    img_dates_df['date_time'] = [date_ee2datetime(date_ee) for date_ee in img_dates_df['date_ee']]
    day_str = [t.strftime('%Y-%m-%d') for t in img_dates_df['date_time']]
    stn_dfSat = pd.DataFrame(data={'day_str':day_str})
    stn_dfSat['date'] = pd.to_datetime(stn_dfSat['day_str'])
    stn_dfSat['CCC_GIS_ID'] = ccc_ponds_gdfSel['CCC_GIS_ID']
    stn_dfSat['latitude'] = float(max_offshore_point.y.item())
    stn_dfSat['longitude'] = float(max_offshore_point.x.item())
    stn_dfSat['Pond Name'] = ccc_ponds_gdfSel['NAME']
    nStn = len(stn_dfSat)
    ccc_ponds_gdf1 = ccc_ponds_gdf[['CCC_GIS_ID','geometry']].copy()
    indx_start = 0; indx_end = indx_start+nStn2process
    rrs_df = pd.DataFrame()
    
    # Loop through each station 
    while (indx_start<nStn):
        stn_df1 = stn_dfSat[indx_start:indx_end].copy()
        stn_gdf = gpd.GeoDataFrame(stn_df1.merge(ccc_ponds_gdf1, how='left', left_on='CCC_GIS_ID', right_on='CCC_GIS_ID').dropna())
        # Create feature collection
        stnFCwPolygons = geemap.geopandas_to_ee(stn_gdf, date='day_str', date_format='YYYY-MM-dd')
        eeImgColl = ee.ImageCollection(ee_product).filterDate(ee.Date(str(np.min(stn_gdf['date']).strftime('%Y-%m-%d'))).advance(-delta_days,'day'),ee.Date(str(np.max(stn_gdf['date']).strftime('%Y-%m-%d'))).advance(delta_days,'day')).filterBounds(stnFCwPolygons).sort("system:time_start").select(bands_list + qa_bands_list)
        # Apply cloud mask 
        imgCollMasked = eeImgColl.map(lambda image: img_mask_S2(image)).select(bands_list).map(lambda image: image.multiply(img_scale).add(img_offset).copyProperties(image, ['system:time_start']))
        # Generate a List to compare dates
        eeImgCollList = eeImgColl.toList(eeImgColl.size())
        image = ee.Image(eeImgCollList.get(0))
        # Add in the end of the list a dummy image
        eeImgCollList = eeImgCollList.add(image)
        stnWithImgFc = stnFCwPolygons.map(calcNimg).filter(ee.Filter.gt('nImg',0)).map(calc_RrsMeanStd).map(calcRrsIn).filter(ee.Filter.eq('RrsIn',True)).map(add_product_id)
        if (stnWithImgFc.size().getInfo() > 0):
            stnWithImgDf = geemap.ee_to_pandas(stnWithImgFc, col_names=colNames)
            stnWithImgDf = stnWithImgDf.dropna().reset_index(drop=True)
            rrs_df = pd.concat([rrs_df,stnWithImgDf])
        indx_start = indx_end; indx_end = indx_start + nStn2process
    print('    Rrs in: %d stations' % len(rrs_df))
    
    # Export to csv
    saveSubDir = os.path.join(save_dir, 'S2/')
    if (not os.path.exists(saveSubDir)):
        os.mkdir(saveSubDir)
    rrs_df.to_csv(os.path.join(saveSubDir,save_fn), index=False)
    print('    Exported to %s' % os.path.join(saveSubDir,save_fn))
print('Done!')

**<em>Landsat 5 (L5) was operational from 16 March 1984 to 5 May 2012.</em>** 

In [None]:
# Set the satellite code
sat_code = 'L5'

# Loop through each pond in ccc_ponds_gdf and extract satellite data 
for k_pond in range(0,len(ccc_ponds_gdf)):
    ccc_ponds_gdfSel = ccc_ponds_gdf.iloc[k_pond]
    
    # Calculate the pond center - the most offshore location
    pond_geom = ccc_ponds_gdfSel['geometry']
    max_offshore_point = findPmaxOffshore(pond_geom)
    ee_product,bands_list,qa_bands_list,img_scale,img_offset,scale_m,bands4rgb_dict,save_dir = set_satellite(sat_code)
    colNames = ['PRODUCT_ID','CCC_GIS_ID','day_str','lagDays_mean','lagDays_stdDev','latitude','longitude'] + list(np.array([[s+'_mean',s+'_stdDev'] for s in bands_list]).flatten())
    save_fn = '%03d_%s_R%03d.csv' % (k_pond, ccc_ponds_gdfSel['CCC_GIS_ID'], bufferR)
    roi_ee = ee.Geometry.Point([float(max_offshore_point.x.item()), float(max_offshore_point.y.item())]).buffer(bufferR)
    ee_img_coll = ee.ImageCollection(ee_product).filterDate(date_start, date_end).filterBounds(roi_ee).sort("system:time_start").select(bands_list + qa_bands_list)
    
    # The number of images in the collection
    ee_img_count = ee_img_coll.size().getInfo()
    ee_img_collA = ee_img_coll.filter(ee.Filter.lt('CLOUD_COVER', cloud_cover_threshold))
    img_dates_df = pd.DataFrame()
    img_dates_df['date_ee'] = ee_img_collA.aggregate_array("system:time_start").getInfo()
    img_dates_df['date_time'] = [date_ee2datetime(date_ee) for date_ee in img_dates_df['date_ee']]
    day_str = [t.strftime('%Y-%m-%d') for t in img_dates_df['date_time']]
    stn_dfSat = pd.DataFrame(data={'day_str':day_str})
    stn_dfSat['date'] = pd.to_datetime(stn_dfSat['day_str'])
    stn_dfSat['CCC_GIS_ID'] = ccc_ponds_gdfSel['CCC_GIS_ID']
    stn_dfSat['latitude'] = float(max_offshore_point.y.item())
    stn_dfSat['longitude'] = float(max_offshore_point.x.item())
    stn_dfSat['Pond Name'] = ccc_ponds_gdfSel['NAME']
    nStn = len(stn_dfSat)
    ccc_ponds_gdf1 = ccc_ponds_gdf[['CCC_GIS_ID','geometry']].copy()
    indx_start = 0; indx_end = indx_start+nStn2process
    rrs_df = pd.DataFrame()
    
    # Loop through each station 
    while (indx_start<nStn):
        stn_df1 = stn_dfSat[indx_start:indx_end].copy()
        stn_gdf = gpd.GeoDataFrame(stn_df1.merge(ccc_ponds_gdf1, how='left', left_on='CCC_GIS_ID', right_on='CCC_GIS_ID').dropna())
        # Create feature collection
        stnFCwPolygons = geemap.geopandas_to_ee(stn_gdf, date='day_str', date_format='YYYY-MM-dd')
        eeImgColl = ee.ImageCollection(ee_product).filterDate(ee.Date(str(np.min(stn_gdf['date']).strftime('%Y-%m-%d'))).advance(-delta_days,'day'), ee.Date(str(np.max(stn_gdf['date']).strftime('%Y-%m-%d'))).advance(delta_days,'day')).filterBounds(stnFCwPolygons).sort("system:time_start").select(bands_list + qa_bands_list)
        # Apply cloud mask 
        imgCollMasked = eeImgColl.map(lambda image: img_mask_L5789(image)).select(bands_list).map(lambda image: image.multiply(img_scale).add(img_offset).copyProperties(image, ['system:time_start']))
        # Generate a List to compare dates
        eeImgCollList = eeImgColl.toList(eeImgColl.size())
        image = ee.Image(eeImgCollList.get(0))
        # Add in the end of the list a dummy image
        eeImgCollList = eeImgCollList.add(image)
        stnWithImgFc = stnFCwPolygons.map(calcNimg).filter(ee.Filter.gt('nImg',0)).map(calc_RrsMeanStd).map(calcRrsIn).filter(ee.Filter.eq('RrsIn',True)).map(add_product_id)
        if (stnWithImgFc.size().getInfo() > 0):
            stnWithImgDf = geemap.ee_to_pandas(stnWithImgFc, col_names=colNames)
            stnWithImgDf = stnWithImgDf.dropna().reset_index(drop=True)
            rrs_df = pd.concat([rrs_df,stnWithImgDf])
        indx_start = indx_end; indx_end = indx_start + nStn2process
    print('    Rrs in: %d stations' % len(rrs_df))
    
    # Export to csv
    saveSubDir = os.path.join(save_dir, 'L5/')
    if (not os.path.exists(saveSubDir)):
        os.mkdir(saveSubDir)
    rrs_df.to_csv(os.path.join(saveSubDir,save_fn), index=False)
    print('    Exported to %s' % os.path.join(saveSubDir,save_fn))
print('Done!')

**<em>Landsat 7 (L7) was operational from 28 May 1999 to 6 April 2022.</em>**

In [None]:
# Set the satellite code
sat_code = 'L7'

# Loop through each pond in ccc_ponds_gdf and extract satellite data 
for k_pond in range(0,len(ccc_ponds_gdf)):
    ccc_ponds_gdfSel = ccc_ponds_gdf.iloc[k_pond]
    
    # Calculate the pond center - the most offshore location
    pond_geom = ccc_ponds_gdfSel['geometry']
    max_offshore_point = findPmaxOffshore(pond_geom)
    ee_product,bands_list,qa_bands_list,img_scale,img_offset,scale_m,bands4rgb_dict,save_dir = set_satellite(sat_code)
    colNames = ['PRODUCT_ID','CCC_GIS_ID','day_str','lagDays_mean','lagDays_stdDev','latitude','longitude'] + list(np.array([[s+'_mean',s+'_stdDev'] for s in bands_list]).flatten())
    save_fn = '%03d_%s_R%03d.csv' % (k_pond, ccc_ponds_gdfSel['CCC_GIS_ID'], bufferR)
    roi_ee = ee.Geometry.Point([float(max_offshore_point.x.item()), float(max_offshore_point.y.item())]).buffer(bufferR)
    ee_img_coll = ee.ImageCollection(ee_product).filterDate(date_start, date_end).filterBounds(roi_ee).sort("system:time_start").select(bands_list + qa_bands_list)
    
    # The number of images in the collection
    ee_img_count = ee_img_coll.size().getInfo()
    ee_img_collA = ee_img_coll.filter(ee.Filter.lt('CLOUD_COVER', cloud_cover_threshold))
    img_dates_df = pd.DataFrame()
    img_dates_df['date_ee'] = ee_img_collA.aggregate_array("system:time_start").getInfo()
    img_dates_df['date_time'] = [date_ee2datetime(date_ee) for date_ee in img_dates_df['date_ee']]
    day_str = [t.strftime('%Y-%m-%d') for t in img_dates_df['date_time']]
    stn_dfSat = pd.DataFrame(data={'day_str':day_str})
    stn_dfSat['date'] = pd.to_datetime(stn_dfSat['day_str'])
    stn_dfSat['CCC_GIS_ID'] = ccc_ponds_gdfSel['CCC_GIS_ID']
    stn_dfSat['latitude'] = float(max_offshore_point.y.item())
    stn_dfSat['longitude'] = float(max_offshore_point.x.item())
    stn_dfSat['Pond Name'] = ccc_ponds_gdfSel['NAME']
    nStn = len(stn_dfSat)
    ccc_ponds_gdf1 = ccc_ponds_gdf[['CCC_GIS_ID','geometry']].copy()
    indx_start = 0; indx_end = indx_start+nStn2process
    rrs_df = pd.DataFrame()
    
    # Loop through each station 
    while (indx_start<nStn):
        stn_df1 = stn_dfSat[indx_start:indx_end].copy()
        stn_gdf = gpd.GeoDataFrame(stn_df1.merge(ccc_ponds_gdf1, how='left', left_on='CCC_GIS_ID', right_on='CCC_GIS_ID').dropna())
        # Create feature collection
        stnFCwPolygons = geemap.geopandas_to_ee(stn_gdf, date='day_str', date_format='YYYY-MM-dd')
        eeImgColl = ee.ImageCollection(ee_product).filterDate(ee.Date(str(np.min(stn_gdf['date']).strftime('%Y-%m-%d'))).advance(-delta_days,'day'), ee.Date(str(np.max(stn_gdf['date']).strftime('%Y-%m-%d'))).advance(delta_days,'day')).filterBounds(stnFCwPolygons).sort("system:time_start").select(bands_list + qa_bands_list)
        # Apply cloud mask 
        imgCollMasked = eeImgColl.map(lambda image: img_mask_L5789(image)).select(bands_list).map(lambda image: image.multiply(img_scale).add(img_offset).copyProperties(image, ['system:time_start']))
        # Generate a List to compare dates
        eeImgCollList = eeImgColl.toList(eeImgColl.size())
        image = ee.Image(eeImgCollList.get(0))
        # Add in the end of the list a dummy image
        eeImgCollList = eeImgCollList.add(image)
        stnWithImgFc = stnFCwPolygons.map(calcNimg).filter(ee.Filter.gt('nImg',0)).map(calc_RrsMeanStd).map(calcRrsIn).filter(ee.Filter.eq('RrsIn',True)).map(add_product_id)
        if (stnWithImgFc.size().getInfo() > 0):
            stnWithImgDf = geemap.ee_to_pandas(stnWithImgFc, col_names=colNames)
            stnWithImgDf = stnWithImgDf.dropna().reset_index(drop=True)
            rrs_df = pd.concat([rrs_df,stnWithImgDf])
        indx_start = indx_end; indx_end = indx_start + nStn2process
    print('    Rrs in: %d stations' % len(rrs_df))
    
    # Export to csv
    saveSubDir = os.path.join(save_dir, 'L7/')
    if (not os.path.exists(saveSubDir)):
        os.mkdir(saveSubDir)
    rrs_df.to_csv(os.path.join(saveSubDir,save_fn), index=False)
    print('    Exported to %s' % os.path.join(saveSubDir,save_fn))
print('Done!')

**<em>Landsat 8 (L8) which was launched 18 March 2013 and is still operational.</em>**

In [None]:
# Set the satellite code
sat_code = 'L8'

# Loop through each pond in ccc_ponds_gdf and extract satellite data 
for k_pond in range(0,len(ccc_ponds_gdf)):
    ccc_ponds_gdfSel = ccc_ponds_gdf.iloc[k_pond]
    
    # Calculate the pond center - the most offshore location
    pond_geom = ccc_ponds_gdfSel['geometry']
    max_offshore_point = findPmaxOffshore(pond_geom)
    ee_product,bands_list,qa_bands_list,img_scale,img_offset,scale_m,bands4rgb_dict,save_dir = set_satellite(sat_code)
    colNames = ['PRODUCT_ID','CCC_GIS_ID','day_str','lagDays_mean','lagDays_stdDev','latitude','longitude'] + list(np.array([[s+'_mean',s+'_stdDev'] for s in bands_list]).flatten())
    save_fn = '%03d_%s_R%03d.csv' % (k_pond, ccc_ponds_gdfSel['CCC_GIS_ID'], bufferR)
    roi_ee = ee.Geometry.Point([float(max_offshore_point.x.item()), float(max_offshore_point.y.item())]).buffer(bufferR)
    ee_img_coll = ee.ImageCollection(ee_product).filterDate(date_start, date_end).filterBounds(roi_ee).sort("system:time_start").select(bands_list + qa_bands_list)
    
    # The number of images in the collection
    ee_img_count = ee_img_coll.size().getInfo()
    ee_img_collA = ee_img_coll.filter(ee.Filter.lt('CLOUD_COVER', cloud_cover_threshold))
    img_dates_df = pd.DataFrame()
    img_dates_df['date_ee'] = ee_img_collA.aggregate_array("system:time_start").getInfo()
    img_dates_df['date_time'] = [date_ee2datetime(date_ee) for date_ee in img_dates_df['date_ee']]
    day_str = [t.strftime('%Y-%m-%d') for t in img_dates_df['date_time']]
    stn_dfSat = pd.DataFrame(data={'day_str':day_str})
    stn_dfSat['date'] = pd.to_datetime(stn_dfSat['day_str'])
    stn_dfSat['CCC_GIS_ID'] = ccc_ponds_gdfSel['CCC_GIS_ID']
    stn_dfSat['latitude'] = float(max_offshore_point.y.item())
    stn_dfSat['longitude'] = float(max_offshore_point.x.item())
    stn_dfSat['Pond Name'] = ccc_ponds_gdfSel['NAME']
    nStn = len(stn_dfSat)
    ccc_ponds_gdf1 = ccc_ponds_gdf[['CCC_GIS_ID','geometry']].copy()
    indx_start = 0; indx_end = indx_start+nStn2process
    rrs_df = pd.DataFrame()
    
    # Loop through each station 
    while (indx_start<nStn):
        stn_df1 = stn_dfSat[indx_start:indx_end].copy()
        stn_gdf = gpd.GeoDataFrame(stn_df1.merge(ccc_ponds_gdf1, how='left', left_on='CCC_GIS_ID', right_on='CCC_GIS_ID').dropna())
        # Create feature collection
        stnFCwPolygons = geemap.geopandas_to_ee(stn_gdf, date='day_str', date_format='YYYY-MM-dd')
        eeImgColl = ee.ImageCollection(ee_product).filterDate(ee.Date(str(np.min(stn_gdf['date']).strftime('%Y-%m-%d'))).advance(-delta_days,'day'), ee.Date(str(np.max(stn_gdf['date']).strftime('%Y-%m-%d'))).advance(delta_days,'day')).filterBounds(stnFCwPolygons).sort("system:time_start").select(bands_list + qa_bands_list)
        # Apply cloud mask 
        imgCollMasked = eeImgColl.map(lambda image: img_mask_L5789(image)).select(bands_list).map(lambda image: image.multiply(img_scale).add(img_offset).copyProperties(image, ['system:time_start']))
        # Generate a List to compare dates
        eeImgCollList = eeImgColl.toList(eeImgColl.size())
        image = ee.Image(eeImgCollList.get(0))
        # Add in the end of the list a dummy image
        eeImgCollList = eeImgCollList.add(image)
        stnWithImgFc = stnFCwPolygons.map(calcNimg).filter(ee.Filter.gt('nImg',0)).map(calc_RrsMeanStd).map(calcRrsIn).filter(ee.Filter.eq('RrsIn',True)).map(add_product_id)
        if (stnWithImgFc.size().getInfo() > 0):
            stnWithImgDf = geemap.ee_to_pandas(stnWithImgFc, col_names=colNames)
            stnWithImgDf = stnWithImgDf.dropna().reset_index(drop=True)
            rrs_df = pd.concat([rrs_df,stnWithImgDf])
        indx_start = indx_end; indx_end = indx_start + nStn2process
    print('    Rrs in: %d stations' % len(rrs_df))
    
    # Export to csv
    saveSubDir = os.path.join(save_dir, 'L8/')
    if (not os.path.exists(saveSubDir)):
        os.mkdir(saveSubDir)
    rrs_df.to_csv(os.path.join(saveSubDir,save_fn), index=False)
    print('    Exported to %s' % os.path.join(saveSubDir,save_fn))
print('Done!')

**<em>Landsat 9 (L9) which was launched 31 October 2021 and is still operational.</em>** 

In [None]:
# Set the satellite code
sat_code = 'L9'

# Loop through each pond in ccc_ponds_gdf and extract satellite data 
for k_pond in range(0,len(ccc_ponds_gdf)):
    ccc_ponds_gdfSel = ccc_ponds_gdf.iloc[k_pond]
    
    # Calculate the pond center - the most offshore location
    pond_geom = ccc_ponds_gdfSel['geometry']
    max_offshore_point = findPmaxOffshore(pond_geom)
    ee_product,bands_list,qa_bands_list,img_scale,img_offset,scale_m,bands4rgb_dict,save_dir = set_satellite(sat_code)
    colNames = ['PRODUCT_ID','CCC_GIS_ID','day_str','lagDays_mean','lagDays_stdDev','latitude','longitude'] + list(np.array([[s+'_mean',s+'_stdDev'] for s in bands_list]).flatten())
    save_fn = '%03d_%s_R%03d.csv' % (k_pond, ccc_ponds_gdfSel['CCC_GIS_ID'], bufferR)
    roi_ee = ee.Geometry.Point([float(max_offshore_point.x.item()), float(max_offshore_point.y.item())]).buffer(bufferR)
    ee_img_coll = ee.ImageCollection(ee_product).filterDate(date_start, date_end).filterBounds(roi_ee).sort("system:time_start").select(bands_list + qa_bands_list)
    
    # The number of images in the collection
    ee_img_count = ee_img_coll.size().getInfo()
    ee_img_collA = ee_img_coll.filter(ee.Filter.lt('CLOUD_COVER', cloud_cover_threshold))
    img_dates_df = pd.DataFrame()
    img_dates_df['date_ee'] = ee_img_collA.aggregate_array("system:time_start").getInfo()
    img_dates_df['date_time'] = [date_ee2datetime(date_ee) for date_ee in img_dates_df['date_ee']]
    day_str = [t.strftime('%Y-%m-%d') for t in img_dates_df['date_time']]
    stn_dfSat = pd.DataFrame(data={'day_str':day_str})
    stn_dfSat['date'] = pd.to_datetime(stn_dfSat['day_str'])
    stn_dfSat['CCC_GIS_ID'] = ccc_ponds_gdfSel['CCC_GIS_ID']
    stn_dfSat['latitude'] = float(max_offshore_point.y.item())
    stn_dfSat['longitude'] = float(max_offshore_point.x.item())
    stn_dfSat['Pond Name'] = ccc_ponds_gdfSel['NAME']
    nStn = len(stn_dfSat)
    ccc_ponds_gdf1 = ccc_ponds_gdf[['CCC_GIS_ID','geometry']].copy()
    indx_start = 0; indx_end = indx_start+nStn2process
    rrs_df = pd.DataFrame()
    
    # Loop through each station 
    while (indx_start<nStn):
        stn_df1 = stn_dfSat[indx_start:indx_end].copy()
        stn_gdf = gpd.GeoDataFrame(stn_df1.merge(ccc_ponds_gdf1, how='left', left_on='CCC_GIS_ID', right_on='CCC_GIS_ID').dropna())
        # Create feature collection
        stnFCwPolygons = geemap.geopandas_to_ee(stn_gdf, date='day_str', date_format='YYYY-MM-dd')
        eeImgColl = ee.ImageCollection(ee_product).filterDate(ee.Date(str(np.min(stn_gdf['date']).strftime('%Y-%m-%d'))).advance(-delta_days,'day'), ee.Date(str(np.max(stn_gdf['date']).strftime('%Y-%m-%d'))).advance(delta_days,'day')).filterBounds(stnFCwPolygons).sort("system:time_start").select(bands_list + qa_bands_list)
        # Apply cloud mask 
        imgCollMasked = eeImgColl.map(lambda image: img_mask_L5789(image)).select(bands_list).map(lambda image: image.multiply(img_scale).add(img_offset).copyProperties(image, ['system:time_start']))
        # Generate a List to compare dates
        eeImgCollList = eeImgColl.toList(eeImgColl.size())
        image = ee.Image(eeImgCollList.get(0))
        # Add in the end of the list a dummy image
        eeImgCollList = eeImgCollList.add(image)
        stnWithImgFc = stnFCwPolygons.map(calcNimg).filter(ee.Filter.gt('nImg',0)).map(calc_RrsMeanStd).map(calcRrsIn).filter(ee.Filter.eq('RrsIn',True)).map(add_product_id)
        if (stnWithImgFc.size().getInfo() > 0):
            stnWithImgDf = geemap.ee_to_pandas(stnWithImgFc, col_names=colNames)
            stnWithImgDf = stnWithImgDf.dropna().reset_index(drop=True)
            rrs_df = pd.concat([rrs_df,stnWithImgDf])
        indx_start = indx_end; indx_end = indx_start + nStn2process
    print('    Rrs in: %d stations' % len(rrs_df))
    
    # Export to csv
    saveSubDir = os.path.join(save_dir, 'L9/')
    if (not os.path.exists(saveSubDir)):
        os.mkdir(saveSubDir)
    rrs_df.to_csv(os.path.join(saveSubDir,save_fn), index=False)
    print('    Exported to %s' % os.path.join(saveSubDir,save_fn))
print('Done!')