In [None]:
# Data Preprocessing and Harmonization

This notebook documents preprocessing steps used to harmonize GNSS-R, SAR, optical,
and terrain datasets prior to machine learningâ€“based soil moisture downscaling.

The emphasis is on spatial consistency, temporal alignment, and feature readiness
for GeoAI workflows.


In [None]:
## Input Datasets

- GNSS-R / CYGNSS L3 Soil Moisture (coarse resolution)
- SMAP L3 Soil Moisture (reference)
- Sentinel-1 GRD (VV, VH backscatter)
- Sentinel-2 L2A (vegetation indices)
- SRTM DEM (terrain variables)

All datasets are reprojected, resampled, and temporally aligned
to a common analysis grid.


In [None]:
import numpy as np
import pandas as pd

import rasterio
from rasterio.warp import reproject, Resampling

import geopandas as gpd

In [None]:
## Preprocessing Steps

1. Reprojection of all datasets to a common CRS
2. Spatial resampling to a target grid resolution
3. Temporal aggregation to match GNSS-R acquisition windows
4. Masking of invalid or low-quality observations
5. Feature normalization for machine learning

In [None]:
## Notes on Scalability

These preprocessing steps are designed to be portable to cloud-native
environments such as Google Earth Engine or HPC-based GeoAI workflows.