# Part 1: Take the full land surface model dataset generated for Trail Valley Creek and create the subsets relevant to this study

Benoit Montpetit, CPS/CRD/ECCC, 2025  
Nicolas Leroux, RPN-E/MRD/ECCC, 2025  
Mike Brady, CPS/CRD/ECCC, 2025

This notebook takes the full time series of multi-layered snowpacks from land surface models (Soil Vegetation Snow version 2 [Woolley et al. (2024)](https://doi.org/10.5194/tc-18-5685-2024); [Vionnet et al. (2022)](https://agupubs.onlinelibrary.wiley.com/doi/10.1029/2021WR031778); [(SVS-2; Vionnet et al. Preprint)](https://doi.org/10.5194/egusphere-2025-3396)) and sub-samples it to the relevant time period of this study. Another subset, selecting only the top 30 ensemble identified by [Woolley et al. (2024)](https://doi.org/10.5194/tc-18-5685-2024), is also created.  
  
The dataset used directly with these codes can be found here: [TVC SVS-2 (Montpetit et al., Preprint)](ZenodoLink), to avoid duplicating large datasets on Zenodo.  
A different version of the same dataset, originally published by [Woolley et al. (2024)](https://doi.org/10.5194/tc-18-5685-2024) can be found here: [TVC SVS-2 (Woolley et al., Preprint)](link)

In [None]:
from pathlib import Path
import xarray as xr
import pandas as pd

In [None]:
def load_data_subset_time(svs2_netcdf):
    """helper function to load only a specific time range for a given SVS-2 netcdf"""
    ds = xr.open_dataset(svs2_netcdf)
    return ds.sel(time=slice('2018-12-01', '2019-01-31'))

In [2]:
DATA_ROOT = Path('../Data')

Arctic SVS-2 Data from Zenodo: https://doi.org/10.5281/zenodo.15690838

In [None]:
filepaths = sorted((DATA_ROOT / 'SVS-2' / 'Arctic').rglob('*.nc'))
assert len(filepaths) > 0

In [None]:
# load all arctic temporal subsets and write to netcdf
arctic = xr.concat(
    [
        load_data_subset_time(filepath)
        for filepath in filepaths
    ],
    dim='ensemble'
)
arctic.to_netcdf(DATA_ROOT / 'SVS-2_ArcticEnsembles_TVC02.nc')

### The excel spreadsheet below corresponds to Table 2D of [Woolley et al., 2024](https://doi.org/10.5194/tc-18-5685-2024)

In [None]:
ensembles = pd.read_excel(DATA_ROOT / 'Top30Ensembles_Arctic.xlsx')

In [None]:
arctic_top = xr.Dataset()
for i in range(len(ensembles)):
    model_options = list(ensembles.loc[i,['SD','FS','TC','LWC','C','TF']].values)
    filepath = [s for s in filepaths if all(xs in s for xs in model_options)][0]
    svs = load_data_subset_time(filepath)
    arctic_top=xr.concat([arctic_top, svs], dim='ensemble')
arctic_top.to_netcdf(DATA_ROOT / 'SVS-2_ArcticTop30Ensembles_TVC02.nc')

Default SVS-2 Data from Zenodo: https://doi.org/10.5281/zenodo.15690838

In [None]:
# the Default NetCDFs have a slightly different subdirectory structure than the Arctic NetCDFs so we 
# check for file vs directory when globbing
filepaths = sorted([
    nc for nc in (DATA_ROOT / 'SVS-2' / 'Default').rglob('*.nc')
    if nc.is_file()
])
assert len(filepaths) > 0

In [None]:
# load all default temporal subsets and write to netcdf
default = xr.concat(
    [
        load_data_subset_time(filepath)
        for filepath in filepaths
    ],
    dim='ensemble'
)
default.to_netcdf(DATA_ROOT / 'SVS-2_DefaultEnsembles_TVC02.nc')

### The excel spreadsheet below corresponds to Table 1D of [Woolley et al., 2024](https://doi.org/10.5194/tc-18-5685-2024)

In [None]:
ensembles = pd.read_excel(DATA_ROOT / 'Top30Ensembles_Default.xlsx')

In [None]:
default_top=xr.Dataset()
for i in range(len(ensembles)):
    model_options = list(ensembles.loc[i,['SD','FS','TC','LWC','C','TF']].values)
    filepath = [s for s in filepaths if all(xs in s for xs in model_options)][0]
    svs = load_data_subset_time(filepath)
    default_top=xr.concat([default_top, svs], dim='ensemble')
default_top.to_netcdf(DATA_ROOT / 'SVS-2_DefaultTop30Ensembles_TVC02.nc')