# Training and Testing Dataset Overview

# Introduction
In the realm of environmental research and forecasting, access to reliable and comprehensive datasets is paramount. Four such indispensable sources are gridMET, AMSR, MODIS, and SNOTEL. Each dataset offers unique insights into various aspects of Earth's climate and terrain, contributing significantly to our understanding of environmental dynamics. These datasets serve as the cornerstone for numerous applications, including weather forecasting, ecological modeling, and climate change research. In our **snowcast_wormhole** workflow, we leverage the richness of these datasets to enhance the accuracy and robustness. Let's embark on a detailed exploration of these invaluable data sources.


# gridMET Dataset

gridMET is like having a weather wizard at your fingertips! It's a dataset packed with daily high-spatial resolution (~4-km, 1/24th degree) surface meteorological data covering the contiguous US, spanning from 1979 up to yesterday. It even extends its reach to cover southern British Columbia in real-time products. This dataset provides vital insights into weather patterns, aiding various industries and scientific endeavors.


<img src="../img/gridMET/tmmn-map.png" alt="air_temperature_tmmn Map" height="250" width="250"/>

<img src="../img/gridMET/tmmx_map.png" alt="air_temperature_tmmx Map" height="250" width="250"/>

<img src="../img/gridMET/evapotranspiration_map.png" alt="potential_evapotranspiration Map" height="250" width="250"/>

<img src="../img/gridMET/mean_vapor_pressure_deficit_map.png" alt="mean_vapor_pressure_deficit"  height="250" width="250"/>

<img src="../img/gridMET/relative_humidity_rmax_map.png" alt="relative_humidity_rmax map" height="250" width="250"/>

<img src="../img/gridMET/relative_humidity_rmin_map.png" alt="relative_humidity_rmin Map" height="250" width="250"/>

<img src="../img/gridMET/precipitation_amount.png" alt="precipitation_amount Map" height="250" width="250"/>

<img src="../img/gridMET/wind_speed_map.png" alt="wind_speed Map" height="250" width="250"/>


## Characteristics
> Domain: Climate
> 
> Variables:
> - Temperature, maximum
> - Temperature, minimum
> - Precipitation accumulation
> - Downward surface shortwave radiation
> - Reference evapotranspiration (ASCE Penman-Montieth)
> - Energy Release Component (fuel model G (conifer forest))
> - Burning Index (fuel model G (conifer forest))
> - 100-hour and 1000-hour dead fuel moisture
> - Mean vapor pressure deficit
> - 10-day Palmer Drought Severity Index
> - Humidity, maximum
> - Humidity, minimum
> - Relative humidity
> - Specific humidity
> - Wind velocity
> 
> Spatial Resolution: 4 kilometers
> 
> Temporal Frequency: Daily
> 
> Temporal Coverage: 1979 - Present
> 
> Spatial Extent: CONUS
For more details and updates on GridMET visit this page https://www.climatologylab.org/gridmet.html

# AMSR Dataset

## Introduction
### What is AMSR?
AMSR, short for Advanced Microwave Scanning Radiometer is a satellite-based instrument designed to measure microwave emissions from the Earth's surface and atmosphere. The AMSR dataset encompasses satellite-derived microwave observations crucial for a myriad of applications including weather forecasting, climate monitoring, and environmental studies. These datasets provide valuable insights into geophysical parameters such as sea surface temperature, soil moisture, sea ice concentration, precipitation, and wind speed over oceans. By collecting data on these factors, AMSR helps scientists better understand how water moves and behaves across the planet. Its observations provide valuable insights into Earth's climate dynamics and hydrological processes, contributing to our overall understanding of the environment.


<p align="center">
  <img src="https://developers.google.com/earth-engine/datasets/images/TRMM/TRMM_3B43V7_sample.png" alt="Sample Image from TRMM_3B43V7 dataset" width="400">
</p>
<p align="center">AMSR from TRMM 3B43V7 dataset</p>


## AMSR Data Collections
### AMSR/ADEOS-II
AMS/ADEOS-II, operating on the Advanced Earth Observing Satellite-II (ADEOS-II) platform, was the initial version of the AMSR instrument, providing passive microwave measurements from early 2003 to October 24, 2003. Offering Level-1A and Level-2A products, it served as a crucial tool for studying changes in polar ice caps, global precipitation patterns, and oceanic circulation. This data has been instrumental in enhancing our understanding of Earth's climate dynamics and informing research across various scientific disciplines.

### AMSR2
AMSR2, launched in 2012 onboard the Global Change Observation Mission-Water (GCOM-W1) satellite, represents the next generation of AMSR instruments. It continues the legacy of AMSR-E, capturing observations with improved spatial resolution and enhanced capabilities. AMSR2 data has been instrumental in monitoring sea ice extent, ocean surface winds, and soil moisture dynamics at global scales.

### Hosted by NSIDC DAAC
The NSIDC DAAC serves as the primary repository for AMSR-related data, ensuring that these valuable datasets are freely accessible to the scientific community and the public. The center provides data discovery, access, and user support services, facilitating research in cryospheric and hydrological sciences, climate modeling, and environmental monitoring.

## Characterstics

> Domain: Climate, Weather, Oceanography, Hydrology
> 
> Variables:
> - Sea Surface Temperature (SST)
> - Wind Speed over the Ocean
> - Water Vapor
> - Cloud Water
> - Rain Rate
> - Sea Ice Concentration
> - Snow Depth
> - Soil Moisture
> 
> Spatial Resolution: 5 km to 56 km (varies by product)
> 
> Temporal Resolution: Daily, with finer resolutions available (e.g., 3-hourly, 6-hourly)
> 
> Temporal Coverage: 2002 - Present (varies by instrument: AMSR-E, AMSR2)
> 
> Spatial Extent: Global
> 
> Data Format: HDF-EOS, NetCDF, GeoTIFF
> 
> Projection: Polar stereographic (for polar regions), cylindrical/geographic (global)
> 
> Link: [AMSR-E and AMSR2 Data at NASA Earthdata](https://earthdata.nasa.gov/earth-observation-data/near-real-time/download-nrt-data/amsr2)

# MODIS Dataset

## Introduction
### What is MODIS?
MODIS, which stands for MODerate Resolution Imaging Spectroradiometer, is an advanced instrument operating aboard both the Terra and Aqua spacecraft. It captures a comprehensive view of Earth's surface, oceans, and atmosphere. The MODIS dataset is a comprehensive collection of Earth observation data captured by the MODIS instruments. Scientists use MODIS data to track changes in things like land cover, weather patterns, ice and snow, and the color of the oceans. MODIS boasts a remarkable viewing swath width of 2,330 km and covers the entire Earth surface every one to two days. With 36 spectral bands ranging from 0.405 to 14.385 µm, it provides detailed data at three spatial resolutions: 250m, 500m, and 1,000m.

## Characteristics
- **Spatial Resolution**: MODIS provides data at moderate spatial resolutions ranging from 250 meters to 1 kilometer, depending on the specific product and spectral band.

- **Temporal Resolution**: MODIS offers daily global coverage, providing data at a high temporal resolution suitable for monitoring dynamic environmental processes.

- **Variables**: MODIS measures various Earth surface parameters including land cover, land surface temperature, vegetation indices, fire occurrence, ocean color, and atmospheric properties.

- **Coverage**: MODIS provides global coverage, capturing data over land, ocean, and atmosphere, facilitating multi-disciplinary Earth observation studies.

- **Quality**: MODIS data undergoes extensive calibration and validation processes to ensure accuracy and reliability, with quality flags provided to identify potential data anomalies.

<p align="center">
  <img src="https://developers.google.com/earth-engine/datasets/images/YALE/YALE_YCEO_UHI_Summer_UHI_yearly_pixel_v4_sample.png" alt="Sample Image from YALE_YCEO_UHI dataset" width="400">
</p>
<p align="center">MODIS from YALE YCEO UHI dataset</p>


## Data Format
MODIS data are typically available in Hierarchical Data Format (HDF) or NetCDF formats, which are widely used for storing and distributing Earth observation data. These formats facilitate efficient data access, manipulation, and analysis using various software tools and programming languages commonly employed in the Earth sciences community.

## MODIS Direct Broadcast
Users with x-band receiving systems can capture regional data directly from the spacecraft using the MODIS Direct Broadcast signal, enhancing real-time monitoring capabilities.

# fSCA

Fractional Snow Covered Area (fSCA) is a metric used in the field of snow science and environmental studies to quantify the proportion of a given area that is covered by snow. It is derived from remote sensing data, particularly from sensors like Landsat, which capture images of the Earth's surface in various spectral bands. By analyzing these images, researchers can differentiate between snow-covered and snow-free areas, allowing them to calculate the percentage of the landscape covered by snow at a particular point in time. fSCA is valuable for understanding snow distribution patterns, monitoring changes in snow cover over time, and aiding in snowmelt and water resource management. It plays a crucial role in snowpack modeling, avalanche forecasting, and climate change research, providing essential data for informing decision-making processes related to snow-dependent ecosystems and human activities.



# SNOTEL dataset

## Introduction
### What is SNOTEL?
The SNOwpackTELemetryNetwork (SNOTEL) is an automated system of snowpack and climate sensors managed by the Natural Resources Conservation Service (NRCS) in the Western United States, offering critical data for water supply forecasting, flood prediction, and climate research. SNOTEL provides real-time data on snow water equivalent, snow depth, precipitation, and temperature from remote mountainous regions, aiding in understanding hydroclimatic conditions. SNOTEL offers comprehensive snowpack and climate data from over 900 sites, helping monitor snowpack, precipitation, temperature, and other climatic conditions in the western U.S.The SNOTEL dataset serves as a valuable resource for a wide range of stakeholders, contributing to informed decision-making in various sectors impacted by snowpack and climate conditions.

## SNOTEL Network Overview
### Composition of SNOTEL
- Comprising over 900 automated sites in remote, high-elevation mountain watersheds.
- Monitors snowpack, precipitation, temperature, and other climatic parameters.

### Operations and Data Collection
- Sites operate unattended and without maintenance for extended periods.
- Standard sensor configuration includes snow pillow, precipitation gauge, and temperature sensors.

## Telemetry and Data Transmission
### Data Collection and Storage
- Dataloggers installed in equipment shelters collect and store data.
- Various telemetry systems transmit data back to the Water and Climate Information System.

### Enhanced Site Capabilities
- Enhanced sites equipped with soil moisture, soil temperature, solar radiation, wind speed, and relative humidity sensors.
- Tailored configurations based on physical conditions and climate requirements.

## Characteristics
**Spatial Resolution**: SNOTEL provides data at a network of monitoring sites distributed across mountainous regions, typically covering areas with varying spatial resolutions depending on the density of monitoring stations.

**Temporal Resolution**: SNOTEL data is typically collected at hourly intervals, providing high temporal resolution data for monitoring snowpack conditions and related hydrological variables.

**Variables**: SNOTEL measures snow water equivalent, snow depth, temperature, precipitation, and soil moisture at monitoring sites in mountainous regions.

**Coverage**: SNOTEL stations are primarily located in the western United States, covering areas with significant snowpack and water resource management importance.

**Quality**: SNOTEL data undergoes quality control procedures to ensure accuracy and reliability, including calibration checks and validation against manual measurements.

## Data Format
The Snow Telemetry (SNOTEL) data format encompasses structured datasets collected from remote automated stations situated in mountainous regions, monitoring snowpack, weather, and hydrological parameters. Key aspects include recorded parameters such as snow water equivalent (SWE), snow depth, air temperature, and precipitation, timestamped to denote observation times and often stored at varying resolutions like hourly or daily intervals. Quality control flags accompany data points to denote reliability, while metadata provides station details and sensor calibration information. SNOTEL data is commonly stored in formats like CSV, TSV, HDF5, or netCDF, accessible through agency websites, data portals, or APIs. This format facilitates applications spanning water resource management, climate research, agriculture, recreation, hydrological modeling, and ecological studies.

<p align="center">
  <img src="https://www.climate.gov/sites/default/files/2021-08/DatasetGallery_Snow-Water-Equivalent-in-Western-Basins-Interactive-Graph_thumb_16x9.png" alt="Snow Water Equivalent in Western Basins" width="600">
</p>
<p align="center">Snow Water Equivalent in Western Basins</p>
<br>

<img src="../img/SNOTEL.jpeg" alt="Snow Water Equivalent Percent NRCS 1991-2020 Median April 6 2024" width="600" align="center">

<p align="center">Snow Water Equivalent Percent NRCS 1991-2020 Median April 6 2024</p>


## Applications

**Water Resource Management**:
   - *Snowpack Monitoring*: Assessing snowpack depth and SWE helps in forecasting water availability for irrigation, hydropower generation, and municipal water supply.
   - *Runoff Forecasting*: Data from SNOTEL stations aids in predicting spring runoff, facilitating reservoir management and flood control.

**Climate Research**:
   - *Long-term Climate Trends*: Historical data enables researchers to study long-term climate patterns, including changes in snowfall, temperature, and precipitation.
   - *Climate Change Studies*: SNOTEL data is utilized to understand the impacts of climate change on snowpack dynamics, water resources, and ecosystems.

**Agriculture and Forestry**:
   - *Crop Planning*: Farmers use snowpack data to anticipate water availability during the growing season, aiding in crop planning and irrigation scheduling.
   - *Forest Management*: Forestry agencies utilize SNOTEL data for assessing wildfire risk, planning timber harvests, and monitoring forest health.

**Recreation and Tourism**:
   - *Winter Sports Planning*: Ski resorts and recreational outfitters rely on snowpack data for planning activities such as skiing, snowboarding, and snowmobiling.
   - *Summer Recreation*: Understanding snowmelt timing and water availability helps in planning summer recreational activities like hiking, fishing, and camping.

# SNOTEL Dataset Download Instructions

Because snow has a higher albedo than most other land cover types, it can cause the seasonal changes in the albedo of a landscape to be quite dramatic. The Soil Climate Analysis Network (SCAN) and the SNOwpack TELemetry (SNOWTEL) network provide snow depth and snow water equivalent (the amount of water contained in a snowpack) data for many sites across the United States.

### Step 1: Navigate to the SCAN/SNOWTEL website
- Visit the [SCAN/SNOWTEL Website](http://www.wcc.nrcs.usda.gov/nwcc/inventory).

### Step 2: Choose Data Product and Location
- Select the data product you are interested in (e.g., Snow Depth or Snow Water Equivalent) from the drop-down menu under *Element*.
- Choose a State/County or Basin using the drop-down menus provided.

### Step 3: View Inventory
- Click on 'View Inventory' to see available stations in your selected area.
  - If no results are returned, consider widening your search.

### Step 4: Select Station and Data
- Click 'View' next to the station of interest to access its page.
- Use the table to select the data you need:
  - Choose the data product (Snow Depth or Snow Water Equivalent).
  - Select ‘Daily’ in the Time Series column.
  - Choose the format ('chart' for visualization or 'csv' for download).
  - View current data by selecting the desired time frame in the yellow column and clicking 'View Current', or view historic data by selecting the year and time in the green column and clicking 'View Historic'.

### Step 5: Download Data
- Save the downloaded CSV file to your computer for further analysis.

**For map visualization of SNOWTEL stations, click on ‘SNOWTEL data’ under ‘Climate Monitoring’ in the right panel. The maps are clickable for station selection.**

For more information, visit the [NRCS SNOTEL page](https://www.nrcs.usda.gov/wps/portal/wcc/home/aboutUs/monitoringPrograms/automatedSnowMonitoring).


In [9]:

# First Python script in Geoweaver
import os
import urllib.request, urllib.error, urllib.parse
import sys

print(sys.path)

try:
    from BeautifulSoup import BeautifulSoup
except ImportError:
    from bs4 import BeautifulSoup

nohrsc_url_format_string = "https://www.nohrsc.noaa.gov/nearest/index.html?city={lat}%2C{lon}&county=&l=5&u=e&y={year}&m={month}&d={day}"

test_noaa_query_url = nohrsc_url_format_string.format(lat=40.05352381745094, lon=-106.04027196859343, year=2022, month=5, day=4)

print(test_noaa_query_url)

response = urllib.request.urlopen(test_noaa_query_url)
webContent = response.read().decode('UTF-8')

print(webContent)

parsed_html = BeautifulSoup(webContent)
container_div = parsed_html.body.find('div', attrs={'class':'container'})

if container_div is not None:
    print(container_div.text)
else:
    print("Container div not found")

print(container_div)


['/Users/meghana/Documents/swe-workflow-book/book/chapters', '/Users/meghana', '/opt/anaconda3/lib/python311.zip', '/opt/anaconda3/lib/python3.11', '/opt/anaconda3/lib/python3.11/lib-dynload', '', '/Users/meghana/.local/lib/python3.11/site-packages', '/opt/anaconda3/lib/python3.11/site-packages', '/opt/anaconda3/lib/python3.11/site-packages/aeosa']
https://www.nohrsc.noaa.gov/nearest/index.html?city=40.05352381745094%2C-106.04027196859343&county=&l=5&u=e&y=2022&m=5&d=4
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd" >
<html lang="en">
<head>
	<meta http-equiv="Content-Type" CONTENT="text/html; charset=utf-8" >
	<link rel="stylesheet" type="text/css" href="/css/main.css" >
	<link href="/favicon.ico" rel="shortcut icon" >
	<meta http-equiv="Cache-Control" content="no-cache" >
	<meta name="DC.creator" content="National Operational Hydrologic Remote Sensing Center" >
	<meta name="DC.publisher" content="NOAA's National Weather Service" >

We automate the process of accessing and parsing NOAA's snow data using BeautifulSoup. By constructing a query URL based on user-defined location and date parameters, we automate the process of fetching web content and parsing HTML to extract pertinent information. The above script enhances efficiency in obtaining snow data, offering a user-friendly approach for various analyses or applications without the need for technical expertise.

# DEM Dataset

## Introduction to DEM (Digital Elevation Model):
A Digital Elevation Model (DEM) is a digital representation of the topography of a surface, such as the Earth's terrain or the surface of another celestial body. It consists of a grid of elevation values, where each cell in the grid represents the elevation at a specific location. DEMs are widely used in various fields, including geography, geology, hydrology, environmental modeling, urban planning, and 3D visualization.

## Characteristics

- **Spatial Resolution**: DEMs can vary in spatial resolution, ranging from coarse resolution global datasets to high-resolution local datasets. Higher spatial resolution DEMs provide more detailed information about the terrain.
  
- **Accuracy**: The accuracy of DEMs depends on the source data and the methods used for their generation. High-quality DEMs are crucial for accurate analysis and decision-making in applications such as flood modeling, terrain navigation, and infrastructure planning.
  
- **Coverage**: DEMs can cover different geographic extents, from local areas to entire continents or even the entire globe. The coverage of a DEM determines its utility for specific applications.
  
- **Data Format**: DEM data is typically stored in raster formats such as GeoTIFF, ASCII grid, or Esri GRID. DEM data is typically stored in raster formats such as GeoTIFF, ASCII grid, or Esri GRID.

DEM data is typically stored in raster formats such as GeoTIFF, ASCII grid, or Esri GRID. DEM data is typically stored in raster formats such as GeoTIFF, ASCII grid, or Esri GRID. Additional information such as coordinate system, spatial resolution, and metadata may also be included in the data file.


<img src="../img/DEM/elevation_map.png" alt="Elevation Map" height="300" width="300"/>
<img src="../img/DEM/aspect_map.png" alt="Aspect Map" height="300" width="300">

<img src="../img/DEM/northness_map.png" alt="Nothness Map" height="300" width="300">
<img src="../img/DEM/eastness_map.png" alt="Eastness Map" height="300" width="300">


## Data Sources and Acquisition:

#### Satellite Imagery:
DEMs can be derived from satellite imagery using techniques such as stereo photogrammetry or interferometry.

#### Aerial LiDAR (Light Detection and Ranging):
LiDAR data collected from aircraft can produce high-resolution DEMs with accurate elevation information.

#### Topographic Surveys:
Ground-based surveys using total stations or GPS equipment can also be used to generate DEMs for smaller areas with high precision.

## Applications:

- Terrain Analysis for Infrastructure Development
- Environmental Impact Assessment
- Geological Mapping and Exploration
- Disaster Risk Reduction
- Climate Change Modeling
- Ecological and Habitat Modeling

# DEM Data Download:

In [11]:
# Load dependencies
import geopandas as gpd
import json
import geojson
from pystac_client import Client
import planetary_computer
import xarray
import rioxarray
import xrspatial
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
from pyproj import Proj, transform
import os
import sys, traceback
import requests

home_dir = os.path.expanduser('~')
work_dir = f"{home_dir}/gridmet_test_run"
snowcast_github_dir = f"{home_dir}/Documents/GitHub/SnowCast/"

#exit() # this process no longer need to execute, we need to make Geoweaver to specify which process doesn't need to run

# user-defined paths for data-access
data_dir = f'{snowcast_github_dir}data/'
gridcells_file = data_dir+'snowcast_provided/grid_cells_eval.geojson'
stations_file = f"{work_dir}/all_snotel_cdec_stations_active_in_westus.csv"
gridcells_outfile = data_dir+'terrain/gridcells_terrainData_eval.csv'
stations_outfile = f"{work_dir}/training_all_active_snotel_station_list_elevation.csv_terrain_4km_grid_shift.csv"


def get_planetary_client():
  #requests.get('https://planetarycomputer.microsoft.com/api/stac/v1')

  # setup client for handshaking and data-access
  print("setup planetary computer client")
  client = Client.open("https://planetarycomputer.microsoft.com/api/stac/v1",ignore_conformance=True)
  
  return client

def prepareGridCellTerrain():
  client = get_planetary_client()
  # Load metadata
  gridcellsGPD = gpd.read_file(gridcells_file)
  gridcells = geojson.load(open(gridcells_file))
  stations = pd.read_csv(stations_file)

  # instantiate output panda dataframes
  df_gridcells = df = pd.DataFrame(columns=(
    "Longitude [deg]","Latitude [deg]",
    "Elevation [m]","Aspect [deg]",
    "Curvature [ratio]","Slope [deg]",
    "Eastness [unitCirc.]","Northness [unitCirc.]"))
  # instantiate output panda dataframes
  # Calculate gridcell characteristics using Copernicus DEM data
  print("Prepare GridCell Terrain data")
  for idx,cell in enumerate(gridcells['features']):
      print("Processing grid ", idx)
      search = client.search(
          collections=["cop-dem-glo-30"],
          intersects={"type":"Polygon", "coordinates":cell['geometry']['coordinates']},
      )
      items = list(search.get_items())
      print("==> Searched items: ", len(items))

      cropped_data = None
      try:
          signed_asset = planetary_computer.sign(items[0].assets["data"])
          data = (
              #xarray.open_rasterio(signed_asset.href)
              xarray.open_rasterio(signed_asset.href)
              .squeeze()
              .drop("band")
              .coarsen({"y": 1, "x": 1})
              .mean()
          )
          cropped_data = data.rio.clip(gridcellsGPD['geometry'][idx:idx+1])
      except:
          signed_asset = planetary_computer.sign(items[1].assets["data"])
          data = (
              xarray.open_rasterio(signed_asset.href)
              .squeeze()
              .drop("band")
              .coarsen({"y": 1, "x": 1})
              .mean()
          )
          cropped_data = data.rio.clip(gridcellsGPD['geometry'][idx:idx+1])

      # calculate lat/long of center of gridcell
      longitude = np.unique(np.ravel(cell['geometry']['coordinates'])[0::2]).mean()
      latitude = np.unique(np.ravel(cell['geometry']['coordinates'])[1::2]).mean()

      print("reproject data to EPSG:32612")
      # reproject the cropped dem data
      cropped_data = cropped_data.rio.reproject("EPSG:32612")

      # Mean elevation of gridcell
      mean_elev = cropped_data.mean().values
      print("Elevation: ", mean_elev)

      # Calculate directional components
      aspect = xrspatial.aspect(cropped_data)
      aspect_xcomp = np.nansum(np.cos(aspect.values*(np.pi/180)))
      aspect_ycomp = np.nansum(np.sin(aspect.values*(np.pi/180)))
      mean_aspect = np.arctan2(aspect_ycomp,aspect_xcomp)*(180/np.pi)
      if mean_aspect < 0:
          mean_aspect = 360 + mean_aspect
      print("Aspect: ", mean_aspect)
      mean_eastness = np.cos(mean_aspect*(np.pi/180))
      mean_northness = np.sin(mean_aspect*(np.pi/180))
      print("Eastness: ", mean_eastness)
      print("Northness: ", mean_northness)

      # Positive curvature = upward convex
      curvature = xrspatial.curvature(cropped_data)
      mean_curvature = curvature.mean().values
      print("Curvature: ", mean_curvature)

      # Calculate mean slope
      slope = xrspatial.slope(cropped_data)
      mean_slope = slope.mean().values
      print("Slope: ", mean_slope)

      # Fill pandas dataframe
      df_gridcells.loc[idx] = [longitude,latitude,
                               mean_elev,mean_aspect,
                               mean_curvature,mean_slope,
                               mean_eastness,mean_northness]

  # Save output data into csv format
  df_gridcells.set_index(gridcellsGPD['cell_id'][0:idx+1],inplace=True)
  df_gridcells.to_csv(gridcells_outfile)

def prepareStationTerrain():
  client = get_planetary_client()
  
  df_station = pd.DataFrame(columns=("Longitude [deg]","Latitude [deg]",
                                     "Elevation [m]","Elevation_30 [m]","Elevation_1000 [m]",
                                     "Aspect_30 [deg]","Aspect_1000 [deg]",
                                     "Curvature_30 [ratio]","Curvature_1000 [ratio]",
                                     "Slope_30 [deg]","Slope_1000 [deg]",
                                     "Eastness_30 [unitCirc.]","Northness_30 [unitCirc.]",
                                     "Eastness_1000 [unitCirc.]","Northness_1000 [unitCirc.]"))
  
  stations_df = pd.read_csv(stations_file)
  print(stations_df.head())
  # Calculate terrain characteristics of stations, and surrounding regions using COP 30
  for idx,station in stations_df.iterrows():
      search = client.search(
          collections=["cop-dem-glo-30"],
          intersects={
            "type": "Point", 
            "coordinates": [
              stations_df['lon'],
              stations_df['lat']
            ]
          },
      )
      items = list(search.get_items())
      print(f"Returned {len(items)} items")

      try:
          signed_asset = planetary_computer.sign(items[0].assets["data"])
          data = (
              xarray.open_rasterio(signed_asset.href)
              .squeeze()
              .drop("band")
              .coarsen({"y": 1, "x": 1})
              .mean()
          )
          xdiff = np.abs(data.x-stations_df['lon'])
          ydiff = np.abs(data.y-stations_df['lat'])
          xdiff = np.where(xdiff == xdiff.min())[0][0]
          ydiff = np.where(ydiff == ydiff.min())[0][0]
          data = data[ydiff-33:ydiff+33,xdiff-33:xdiff+33].rio.reproject("EPSG:32612")
      except:
          traceback.print_exc(file=sys.stdout)
          signed_asset = planetary_computer.sign(items[1].assets["data"])
          data = (
              xarray.open_rasterio(signed_asset.href)
              .squeeze()
              .drop("band")
              .coarsen({"y": 1, "x": 1})
              .mean()
          )
          xdiff = np.abs(data.x-stations_df['lon'])
          ydiff = np.abs(data.y-stations_df['lat'])
          xdiff = np.where(xdiff == xdiff.min())[0][0]
          ydiff = np.where(ydiff == ydiff.min())[0][0]
          data = data[ydiff-33:ydiff+33,xdiff-33:xdiff+33].rio.reproject("EPSG:32612")

      # Reproject the station data to better include only 1000m surrounding area
      inProj = Proj(init='epsg:4326')
      outProj = Proj(init='epsg:32612')
      new_x,new_y = transform(inProj,outProj,
                              stations_df['lon'],
                              stations_df['lat'])

      # Calculate elevation of station and surroundings
      mean_elevation = data.mean().values
      elevation = data.sel(x=new_x,y=new_y,method='nearest')
      print(elevation.values)

      # Calcuate directional components
      aspect = xrspatial.aspect(data)
      aspect_xcomp = np.nansum(np.cos(aspect.values*(np.pi/180)))
      aspect_ycomp = np.nansum(np.sin(aspect.values*(np.pi/180)))
      mean_aspect = np.arctan2(aspect_ycomp,aspect_xcomp)*(180/np.pi)
      if mean_aspect < 0:
          mean_aspect = 360 + mean_aspect
      aspect = aspect.sel(x=new_x,y=new_y,method='nearest')
      eastness = np.cos(aspect*(np.pi/180))
      northness = np.sin(aspect*(np.pi/180))
      mean_eastness = np.cos(mean_aspect*(np.pi/180))
      mean_northness = np.sin(mean_aspect*(np.pi/180))

      # Positive curvature = upward convex
      curvature = xrspatial.curvature(data)
      mean_curvature = curvature.mean().values
      curvature = curvature.sel(x=new_x,y=new_y,method='nearest')
      print(curvature.values)

      # Calculate slope
      slope = xrspatial.slope(data)
      mean_slope = slope.mean().values
      slope = slope.sel(x=new_x,y=new_y,method='nearest')
      print(slope.values)

      # Fill pandas dataframe
      df_station.loc[idx] = [stations_df['lon'],
                             stations_df['lat'],
                             station['elevation_m'],
                             elevation.values,mean_elevation,
                             aspect.values,mean_aspect,
                             curvature.values,mean_curvature,
                             slope.values,mean_slope,
                             eastness.values,northness.values,
                             mean_eastness,mean_northness]

  # Save output data into CSV format
  df_station.set_index(stations_df['station_name'][0:idx+1],inplace=True)
  df_station.to_csv(stations_outfile)


def add_more_points_to_the_gridcells():
  # check how many points are in the current grid_cell json
  station_cell_mapping = f"{work_dir}/station_cell_mapping.csv"
  current_grid_df = pd.read_csv(station_cell_mapping)
  
  print(current_grid_df.columns)
  print(current_grid_df.shape)
  
  western_us_coords = f'{work_dir}/dem_file.tif.csv'
  dem_df = pd.read_csv(western_us_coords)
  print(dem_df.head())
  print(dem_df.shape)
  filtered_df = dem_df[dem_df['Elevation'] > 20]  # choose samples from points higher than 20 meters

  # Randomly choose 700 rows from the filtered DataFrame
  random_rows = filtered_df.sample(n=700)
  random_rows = random_rows[["Latitude", "Longitude"]]
  random_rows.rename(columns={
    'Latitude': 'lat', 
    'Longitude': 'lon'
  }, inplace=True)
  previous_cells = current_grid_df[["lat", "lon"]]
  result_df = previous_cells.append(random_rows, ignore_index=True)
  print(result_df.shape)
  result_df.to_csv(f"{work_dir}/new_training_points_with_random_dem_locations.csv")
  print(f"New training points are saved to {work_dir}/new_training_points_with_random_dem_locations.csv")
  
  
  
  # find the random points that are on land from the dem.json
  
  # merge the grid_cell.json with the new dem points into a new grid_cell.json
  
def find_closest_index(target_latitude, target_longitude, lat_grid, lon_grid):
    """
    Find the closest grid point indices for a target latitude and longitude.

    Parameters:
        target_latitude (float): Target latitude.
        target_longitude (float): Target longitude.
        lat_grid (numpy.ndarray): Array of latitude values.
        lon_grid (numpy.ndarray): Array of longitude values.

    Returns:
        int: Latitude index.
        int: Longitude index.
        float: Closest latitude value.
        float: Closest longitude value.
    """
    lat_diff = np.float64(np.abs(lat_grid - target_latitude))
    lon_diff = np.float64(np.abs(lon_grid - target_longitude))
    row_idx = np.argmin(lat_diff + lon_diff)
    return row_idx
  
  
def read_terrain_from_dem_csv():
  western_us_coords = f'{work_dir}/dem_all.csv'
  western_df = pd.read_csv(western_us_coords)
  print("western_df.head() = ", western_df.head())
  
  stations_file_df = pd.read_csv(stations_file)
  print("stations_file_df.head() = ", stations_file_df.head())
  
  def find_closest_dem_row(row, western_df):
    #print(row)
    row_idx = find_closest_index(
      row["latitude"],
      row["longitude"],
      western_df["Latitude"], 
      western_df["Longitude"]
    )
    dem_row = western_df.iloc[row_idx]
    new_row = pd.concat([row, dem_row], axis=0)
    return new_row
  
  stations_file_df = stations_file_df.apply(find_closest_dem_row, args=(western_df,), axis=1)
  stations_file_df.to_csv(stations_outfile, index=False)
  

if __name__ == "__main__":
  try:
    read_terrain_from_dem_csv()
  except:
    traceback.print_exc(file=sys.stdout)


western_df.head() =     Latitude  Longitude  x  y   Elevation     Slope     Aspect   Curvature  \
0      49.0   -125.000  0  0   15.124211  0.470627  272.35254  3624.44560   
1      49.0   -124.964  1  0  136.762280  0.454659  284.84660  1261.81840   
2      49.0   -124.928  2  0  258.745100  0.446243  282.81650 -1763.61730   
3      49.0   -124.892  3  0  387.150480  0.281288    6.37222 -2461.73140   
4      49.0   -124.856  4  0  213.531710  0.405632   30.24500  -511.48447   

   Northness  Eastness  
0   0.041025  0.784977  
1   0.250835  0.768424  
2   0.218295  0.772784  
3   0.782300 -0.110535  
4   0.712497 -0.466602  
stations_file_df.head() =      stationTriplet stationId stateCode networkCode                    name  \
0      ABY:CA:SNOW       ABY        CA        SNOW                   Abbey   
1     0010:ID:COOP      0010        ID        COOP  Aberdeen Experimnt Stn   
2     0041:NM:COOP      0041        NM        COOP             Abiquiu Dam   
3  08108010:NM:BOR  0810801

In our geospatial data processing workflow, we utilize various libraries to analyze terrain characteristics for the SnowCast project. We calculate attributes like elevation, aspect, curvature, slope, eastness, and northness for grid cells and station locations. Our process involves accessing Copernicus DEM data and leveraging the Planetary Computer service. Through this analysis, we contribute to a broader understanding of the geographic region under study.