# Part 1: Take the full land surface model dataset generated for Trail Valley Creek and create the subsets relevant to this study

Benoit Montpetit, CPS/CRD/ECCC, 2025  
Nicolas Leroux, RPN-E/MRD/ECCC, 2025  
Mike Brady, CPS/CRD/ECCC, 2025

This notebook takes the full time series of multi-layered snowpacks from land surface models (Soil Vegetation Snow version 2 [Woolley et al (Preprint)](https://egusphere.copernicus.org/preprints/2024/egusphere-2024-1237/); [Vionnet et al. (2022)](https://agupubs.onlinelibrary.wiley.com/doi/10.1029/2021WR031778); [(SVS-2; Vionnet et al,, In Prep)]()) and sub-samples it to the relevant time period of this study. Another subset, selection only the top 30 ensemble identified by [Woolley et al (Preprint)](https://egusphere.copernicus.org/preprints/2024/egusphere-2024-1237/), is also created.  
  
The dataset used directly with these codes can be found here: [TVC SVS-2 (Montpetit et al., Preprint)](ZenodoLink)  
A different version of the same dataset, originally published by [Woolley et al (Preprint)](https://egusphere.copernicus.org/preprints/2024/egusphere-2024-1237/) can be found here: [TVC SVS-2 (Woolley et al., Preprint)](link)

In [None]:
import xarray as xr
import pandas as pd
import os

In [None]:
filepaths =[]
for root, dirs, fs in os.walk('Data/SVS-2/Arctic'):

    for f in fs:

        if f.endswith('.nc'):
            filepaths.append(root+'/'+f)

In [None]:
arctic = xr.Dataset()

In [None]:
for filepath in filepaths:

    svs = xr.open_dataset(filepath)
    arctic=xr.concat([arctic, svs.sel(time=slice('2018-12-01','2019-01-31'))], dim='ensemble')

In [None]:
arctic.to_netcdf('Data/SVS-2_ArcticEnsembles_TVC02.nc')

In [None]:
ensembles = pd.read_excel('Data/Top30Ensembles_Arctic.xlsx')

In [None]:
arctic_top = xr.Dataset()

for i in range(len(ensembles)):
    model_options = list(ensembles.loc[i,['SD','FS','TC','LWC','C','TF']].values)
    filepath = [s for s in filepaths if all(xs in s for xs in model_options)][0]
    svs = xr.open_dataset(filepath)
    arctic_top=xr.concat([arctic_top, svs.sel(time=slice('2018-12-01','2019-01-31'))], dim='ensemble')

In [None]:
arctic_top.to_netcdf('Data/SVS-2_ArcticTop30Ensembles_TVC02.nc')

In [None]:
filepaths =[]
for root, dirs, fs in os.walk('Data/SVS-2/Default'):

    for f in fs:

        if f.endswith('.nc'):
            filepaths.append(root+'/'+f)

In [None]:
default=xr.Dataset()

In [None]:
for filepath in filepaths:

    svs = xr.open_dataset(filepath)
    default=xr.concat([default, svs.sel(time=slice('2018-12-01','2019-01-31'))], dim='ensemble')

In [None]:
default.to_netcdf('Data/SVS-2_DefaultEnsembles_TVC02.nc')

In [None]:
default_top=xr.Dataset()

In [None]:
ensembles = pd.read_excel('Data/Top30Ensembles_Default.xlsx')

In [None]:
for i in range(len(ensembles)):
    model_options = list(ensembles.loc[i,['SD','FS','TC','LWC','C','TF']].values)
    filepath = [s for s in filepaths if all(xs in s for xs in model_options)][0]
    svs = xr.open_dataset(filepath)
    default_top=xr.concat([default_top, svs.sel(time=slice('2018-12-01','2019-01-31'))], dim='ensemble')

In [None]:
default_top.to_netcdf('Data/SVS-2_DefaultTop30Ensembles_TVC02.nc')