<a href="https://colab.research.google.com/github/rg-smith/remote-sensing-hydro-2025/blob/main/assignments/homework3/homework3.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Homework 3: Watershed Water Balance
In this homework, you will evaluate multiple fluxes and their effect on soil water storage in the San Juan Mountains, which are the primary headwaters for the Rio Grande River. You will download precipitation, temperature, ET, and soil moisture over this region as a time series, then model soil water storage in Excel and compare with observed soil water storage.

Next, you will repeat the exercise over the watershed you have chosen for your term project.

First we need to install a couple packages. If this shows an error after running, try the next code block. If it runs without an error, then you should be ok.

In [None]:
pip install geopandas geemap pycrs

Now we will clone the git repository, which gives us easy access to the shapefile of the San Juan Mountain HUC8 watershed.

In [None]:
!git clone https://github.com/rg-smith/remote-sensing-hydro-2025.git

In [3]:
import ee
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import geopandas as gpd
import requests
from tqdm import tqdm
import zipfile
import os
import pandas as pd
from glob import glob
import geemap
import folium
import branca.colormap as cm

In [4]:
# you only need to run this once per session
ee.Authenticate()
ee.Initialize(project='geocode-1322')

In [19]:
def add_ee_layer(self, ee_object, name):
    try:
        # display ee.Image()
        if isinstance(ee_object, ee.image.Image):
            range = ee.Image(ee_object).reduceRegion(ee.Reducer.percentile([1, 99]),scale=10000)
            vals = range.getInfo()
            min=list(vals.items())[0][1]
            max=list(vals.items())[1][1]
            vis = {'min': min, 'max': max, 'palette': ['0000FF', 'FFFFFF','FF0000']}

            map_id_dict = ee.Image(ee_object).getMapId(vis)
            folium.raster_layers.TileLayer(
            tiles = map_id_dict['tile_fetcher'].url_format,
            attr = 'Google Earth Engine',
            name = name,
            overlay = True,
            control = True
            ).add_to(self)
            colormap = cm.LinearColormap(vmin=min,vmax=max,colors=['blue', 'white','red']).to_step(n=10)
            colormap.caption=name
            self.add_child(colormap)
        # display ee.ImageCollection()
        elif isinstance(ee_object, ee.imagecollection.ImageCollection):
            ee_object_new = ee_object.mosaic()
            map_id_dict = ee.Image(ee_object_new).getMapId(vis_params)
            folium.raster_layers.TileLayer(
            tiles = map_id_dict['tile_fetcher'].url_format,
            attr = 'Google Earth Engine',
            name = name,
            overlay = True,
            control = True
            ).add_to(self)
        # display ee.Geometry()
        elif isinstance(ee_object, ee.geometry.Geometry):
            folium.GeoJson(
            data = ee_object.getInfo(),
            name = name,
            overlay = True,
            control = True
        ).add_to(self)
        # display ee.FeatureCollection()
        elif isinstance(ee_object, ee.featurecollection.FeatureCollection):
            ee_object_new = ee.Image().paint(ee_object, 0, 2)
            map_id_dict = ee.Image(ee_object_new).getMapId(vis_params)
            folium.raster_layers.TileLayer(
            tiles = map_id_dict['tile_fetcher'].url_format,
            attr = 'Google Earth Engine',
            name = name,
            overlay = True,
            control = True
        ).add_to(self)

    except Exception as e:
        print("Could not display {}".format(name))
        print(e)

# Add EE drawing method to folium (not a function)
folium.Map.add_ee_layer = add_ee_layer

def create_reduce_region_function(geometry,
                                  reducer=ee.Reducer.mean(),
                                  scale=1000,
                                  crs='EPSG:4326',
                                  bestEffort=True,
                                  maxPixels=1e13,
                                  tileScale=4):
  """Creates a region reduction function.

  Creates a region reduction function intended to be used as the input function
  to ee.ImageCollection.map() for reducing pixels intersecting a provided region
  to a statistic for each image in a collection. See ee.Image.reduceRegion()
  documentation for more details.

  Args:
    geometry:
      An ee.Geometry that defines the region over which to reduce data.
    reducer:
      Optional; An ee.Reducer that defines the reduction method.
    scale:
      Optional; A number that defines the nominal scale in meters of the
      projection to work in.
    crs:
      Optional; An ee.Projection or EPSG string ('EPSG:5070') that defines
      the projection to work in.
    bestEffort:
      Optional; A Boolean indicator for whether to use a larger scale if the
      geometry contains too many pixels at the given scale for the operation
      to succeed.
    maxPixels:
      Optional; A number specifying the maximum number of pixels to reduce.
    tileScale:
      Optional; A number representing the scaling factor used to reduce
      aggregation tile size; using a larger tileScale (e.g. 2 or 4) may enable
      computations that run out of memory with the default.

  Returns:
    A function that accepts an ee.Image and reduces it by region, according to
    the provided arguments.
  """

  def reduce_region_function(img):
    """Applies the ee.Image.reduceRegion() method.

    Args:
      img:
        An ee.Image to reduce to a statistic by region.

    Returns:
      An ee.Feature that contains properties representing the image region
      reduction results per band and the image timestamp formatted as
      milliseconds from Unix epoch (included to enable time series plotting).
    """

    stat = img.reduceRegion(
        reducer=reducer,
        geometry=geometry,
        scale=scale,
        crs=crs,
        bestEffort=bestEffort,
        maxPixels=maxPixels,
        tileScale=tileScale)

    return ee.Feature(geometry, stat).set({'millis': img.date().millis()})
  return reduce_region_function

def gee_zonal_mean_img_coll(imageCollection,geometry,scale=1000):
    reduce_iC = create_reduce_region_function(geometry = geometry, scale=scale)
    stat_fc = ee.FeatureCollection(imageCollection.map(reduce_iC)).filter(ee.Filter.notNull(imageCollection.first().bandNames()))
    fc_dict = fc_to_dict(stat_fc).getInfo()

    df = pd.DataFrame(fc_dict)
    df['date'] = pd.to_datetime(df['millis'],unit='ms')
    return(df)

def gee_zonal_mean(date1,date2,geometry,collection_name,band_name,scale=1000):
     imcol = ee.ImageCollection(collection_name).select(band_name).filterDate(date1,date2)
     df = gee_zonal_mean_img_coll(imcol,geometry,scale=scale)
     return(df)


# Define a function to transfer feature properties to a dictionary.
def fc_to_dict(fc):
  prop_names = fc.first().propertyNames()
  prop_lists = fc.reduceColumns(
      reducer=ee.Reducer.toList().repeat(prop_names.size()),
      selectors=prop_names).get('list')

  return ee.Dictionary.fromLists(prop_names, prop_lists)

def ee_imgcoll_to_df_point(imagecollection, lat,lon):
    """Transforms client-side ee.Image.getRegion array to pandas.DataFrame."""
    poi = ee.Geometry.Point(lon, lat)
    arr = imagecollection.getRegion(poi,1000).getInfo()

    list_of_bands = imagecollection.first().bandNames().getInfo()

    df = pd.DataFrame(arr)

    # Rearrange the header.
    headers = df.iloc[0]
    df = pd.DataFrame(df.values[1:], columns=headers)

    # Remove rows without data inside.
    df = df[['longitude', 'latitude', 'time', *list_of_bands]].dropna()

    # Convert the data to numeric values.
    for band in list_of_bands:
        df[band] = pd.to_numeric(df[band], errors='coerce')

    # Convert the time field into a datetime.
    df['datetime'] = pd.to_datetime(df['time'], unit='ms')

    # Keep the columns of interest.
    df = df[['time','datetime',  *list_of_bands]]

    return df

# to get the link to download an earth engine image
def getLink(image,fname,aoi):
  link = image.getDownloadURL({
    'scale': 1000,
    'crs': 'EPSG:4326',
    'fileFormat': 'GeoTIFF',
    'region': aoi,
    'name': fname})
  # print(link)
  return(link)

# create an earth engine geometry polygon
def addGeometry(min_lon,max_lon,min_lat,max_lat):
  geom = ee.Geometry.Polygon(
      [[[min_lon, max_lat],
        [min_lon, min_lat],
        [max_lon, min_lat],
        [max_lon, max_lat]]])
  return(geom)

def get_imgcollection(date1,date2,geometry,collection_name,band_name,function='mean'):
  collection = ee.ImageCollection(collection_name)
  if function=='mean':
      img = collection.filterDate(date1,date2).select(band_name).mean().clip(geometry)
  if function=='sum':
      img = collection.filterDate(date1,date2).select(band_name).sum().clip(geometry)
  range = img.reduceRegion(ee.Reducer.percentile([1, 99]),scale=10000)
  vals = range.getInfo()
  min=list(vals.items())[0][1]
  max=list(vals.items())[1][1]
  visParams = {'min': min, 'max': max, 'palette': ['0000FF', 'FFFFFF','FF0000']}
  return(img,visParams)

def download_img(img,geom,fname):
    linkname = getLink(img,fname,geom)
    response = requests.get(linkname, stream=True)
    zipped = fname+'.zip'
    with open(zipped, "wb") as handle:
        for data in tqdm(response.iter_content()):
            handle.write(data)

    with zipfile.ZipFile(zipped, 'r') as zip_ref:
        zip_ref.extractall('')
    os.remove(zipped)


def aggregate_by_water_year(df,date_col,agg_column,agg_fun='sum'):
    df['water_year'] = df[date_col].dt.year.where(df[date_col].dt.month < 10, df[date_col].dt.year + 1)
    df_agg = df.groupby('water_year').agg({agg_column:[agg_fun]})
    return(df_agg)

Now we will load the HUC8 shapefile of the San Juan headwaters watershed into Google Earth Engine.

In [23]:
start='2020-10-01'
end='2023-09-30'
path_to_watershed='/content/remote-sensing-hydro-2025/data/watersheds/san-juan-mtns/san-juan-huc8.shp'
center_coordinates_of_area=[37.74,-106.92]

In [8]:
# upload watershed to GEE----------------------------
watershed_gee = geemap.shp_to_ee(path_to_watershed)

In [24]:
# take the spatial mean to get area-averaged time series over waterhsed
gpm_df = gee_zonal_mean(start,end,watershed_gee.geometry(),'NASA/GPM_L3/IMERG_MONTHLY_V07','precipitation')
gpm_df['precip-gpm'] = gpm_df['precipitation']*24*365/12 # convert to monthly precip
gpm_df = gpm_df[['precip-gpm','date']].set_index('date')

prism_df = gee_zonal_mean(start,end,watershed_gee.geometry(),'OREGONSTATE/PRISM/AN81m',['ppt','tmean','tmin','tmax'])
prism_df['precip-prism'] = prism_df['ppt']
prism_df = prism_df[['precip-prism','tmean','tmin','tmax','date']].set_index('date')

soil_moisture_df = gee_zonal_mean(start,end,watershed_gee.geometry(),'NASA/SMAP/SPL3SMP_E/006','soil_moisture_am',scale=9000)
soil_moisture_df = soil_moisture_df[['soil_moisture_am','date']].set_index('date')

openET_df = gee_zonal_mean(start,end,watershed_gee.geometry(),'OpenET/ENSEMBLE/CONUS/GRIDMET/MONTHLY/v2_0','et_ensemble_mad')
openET_df['ET-openET'] = openET_df['et_ensemble_mad']
openET_df = openET_df[['ET-openET','date']].set_index('date')

mod16_df = gee_zonal_mean(start,end,watershed_gee.geometry(),'MODIS/006/MOD16A2','ET')
mod16_df['ET'] = mod16_df['ET']*365/8/12/10 # convert from every 8 days to monthly ET, divide by factor of 10
mod16_df['ET-MOD16'] = mod16_df['ET']
mod16_df = mod16_df[['ET-MOD16','date']].set_index('date')

modis_snow_df = gee_zonal_mean(start,end,watershed_gee.geometry(),'MODIS/061/MOD10A1','NDSI_Snow_Cover')
modis_snow_df = modis_snow_df[['NDSI_Snow_Cover','date']].set_index('date')

In [None]:
# take monthly average of daily data

soil_moisture_df['year-month'] = soil_moisture_df.index.to_period('M')
soil_moisture_df_monthly = soil_moisture_df.groupby('year-month').mean()

mod16_df['year-month'] = mod16_df.index.to_period('M')
mod16_df_monthly = mod16_df.groupby('year-month').mean()

modis_snow_df['year-month'] = modis_snow_df.index.to_period('M')
modis_snow_df_monthly = modis_snow_df.groupby('year-month').mean()

In [None]:
# merge data frames and save .csv
merged_df = gpm_df.join(prism_df).join(openET_df)
merged_df_monthly = merged_df.groupby(merged_df.index.to_period('M')).mean()

merged_df_monthly = merged_df_monthly.join(modis_snow_df_monthly).join(mod16_df_monthly).join(soil_moisture_df_monthly)
merged_df_monthly.to_csv('merged_data_monthly.csv')


In [27]:
# take temporal average to produce rasters for a map
openET = ee.ImageCollection('OpenET/ENSEMBLE/CONUS/GRIDMET/MONTHLY/v2_0')
openET_img = openET.filterDate(start,end).select('et_ensemble_mad').mean().clip(watershed_gee.geometry())

soil_moisture = ee.ImageCollection('NASA/SMAP/SPL3SMP_E/006')
soil_moisture_img = soil_moisture.filterDate(start,end).select('soil_moisture_am').mean().clip(watershed_gee.geometry())

gpm = ee.ImageCollection('NASA/GPM_L3/IMERG_MONTHLY_V07')
gpm_img = gpm.filterDate(start,end).select('precipitation').mean().clip(watershed_gee.geometry())

prism_ppt = ee.ImageCollection('OREGONSTATE/PRISM/AN81m')
prism_ppt_img = prism_ppt.filterDate(start,end).select('ppt').mean().clip(watershed_gee.geometry())

prism_temp = ee.ImageCollection('OREGONSTATE/PRISM/AN81m')
prism_temp_img = prism_temp.filterDate(start,end).select('tmean').mean().clip(watershed_gee.geometry())

modis_snow = ee.ImageCollection('MODIS/061/MOD10A1')
modis_snow_img = modis_snow.filterDate(start,end).select('NDSI_Snow_Cover').mean().clip(watershed_gee.geometry())

In [28]:
# plot the map
my_map = folium.Map(location=[37.76, -106.9], zoom_start=10)
my_map.add_ee_layer(openET_img,'ET (mm)')
my_map.add_ee_layer(soil_moisture_img,'Soil Moisture (%)')
my_map.add_ee_layer(gpm_img,'GPM Precip (mm)')
my_map.add_ee_layer(prism_ppt_img,'PRISM Precip (mm)')
my_map.add_ee_layer(prism_temp_img,'PRISM Temp (C)')
my_map.add_ee_layer(modis_snow_img,'MODIS Snow (fraction of time)')

# Add a layer control panel to the map.
my_map.add_child(folium.LayerControl())

# Display the map.
display(my_map)

In [None]:
# download rasters--------------------------
download_img(gpm_img,watershed_gee.geometry(),'GPM_P')
download_img(openET_img,watershed_gee.geometry(),'OpenET')
download_img(soil_moisture_img,watershed_gee.geometry(),'SMAP_SM')
download_img(prism_ppt_img,watershed_gee.geometry(),'PRISM_P')
download_img(prism_temp_img,watershed_gee.geometry(),'PRISM_T')
download_img(modis_snow_img,watershed_gee.geometry(),'MODIS_SNOW')

Now, download the .csv file to your personal computer. Then, use the Excel file included in the assignment to calculate the water budget for the watershed. The Excel file uses temperature, precipitation, and ET to estimate how snow water equivalent and soil water accumulate over time. You will model soil water depth and modify parameters to match estimated soil water content from SMAP.

# Part 2: repeat this analysis over your own study area
First, upload your watershed shapefile to the files directory. Be sure to include ALL FILES WITH THE SAME STEM (.shp, .dbf, .prj, .shx, etc). Make a note of the path to your watershed. If you put it in the main directory it should just be 'watershed.shp' or whatever you named it.

Then, fill in the code below to repeat the analysis performed in part 1 over your study area. In most cases, you can just copy and paste code from the appropriate section.

In [None]:
start='yyyy-mm-dd'
end='yyyy-mm-dd'
path_to_watershed='<path to shapefile>.shp'
center_coordinates_of_area=[<latitude>,<longitude>]

In [None]:
# upload watershed to GEE----------------------------
watershed_gee = geemap.shp_to_ee(path_to_watershed)

In [None]:
# take the spatial mean to get area-averaged time series over waterhsed\

In [None]:
# take monthly average of daily datasets

In [None]:
# merge data frames and save .csv

In [None]:
# take temporal average to produce rasters for a map


In [None]:
# plot the map

In [None]:
# download rasters