# Step 2: depth-average TF
This notebook is to test an efficient application of *Step 2* of Verjans's workflow: depth-average the ocean thermal forcing we produced in Step 1.

The full workflow is outlined in Vincent's Readme1.txt in [this Zenodo archive](https://zenodo.org/records/7931326).  We are modifying the workflow to deploy it efficiently for ISMIP7.

14 Nov 2024 | EHU

Edits:
- Applied xarray `sel` and `where` to streamline this computation. Removed unused read-in commands.
- 26 Mar 25: adjust metadata that will be written from CCR run

### Imports and run settings

In [None]:
#!/usr/bin/env python3
# -*- coding: utf-8 -*-
"""
Depth-averaging TF in given DepthRange at all grid points of given Model
Choose scenario of interest

@author: vincent
"""

import os
import sys
import copy
import csv
import numpy as np
import netCDF4 as nc
import xarray as xr
import dask
from datetime import datetime

from verjansFunctions import freezingPoint
from verjansFunctions import calcDpAveraged

In [None]:
## Settings for this run
savingTF         = True
cwd              = os.getcwd()+'/'

SelModel         = 'CESM2'
DepthRange       = [0,500] #depth range of interest, [200:500] is from Slater et al. (2019, 2020)
ShallowThreshold = 100 #bathymetry threshold: if bathymetry is shallower, gridpoint is discarded

DirModNC   = f'/home/theghub/ehultee/data/'
DirSave    = f'/home/theghub/ehultee/data'


### EHU: I suspect we don't need any of the below from VV, but note we'll need to think through 
###     settings for looping over multiple GCMs in the production run.
To2015hist                 = False
To2100histssp585           = False
To2100histssp126           = True

# if(To2015hist):
#     partname = 'hist'
# elif(To2100histssp585):
#     partname = 'hist2100ssp585'
# elif(To2100histssp126):
#     partname = 'hist2100ssp126'
    
# if(SelModel=='MIROCES2L'):
#     dim2d              = True
#     if(To2015hist):
#         ls_members     = [f'r{id}' for id in range(1,30+1)]
#     elif(To2100histssp585 or To2100histssp126):
#         ls_members     = [f'r{id}' for id in range(1,10+1)]
# elif(SelModel=='IPSLCM6A'):
#     dim2d              = True
#     if(To2015hist):
#         ls_members     = [f'r{id}' for id in range(1,32+1)]
#         ls_members.remove('r2') #no r2 member for IPSLCM6A
#     elif(To2100histssp585 or To2100histssp126):
#         ls_members     = ['r1','r3','r4','r6','r14']

### Read in dataset

In [None]:
path_o = DirModNC + 'tf-CESM2-200001-201412-v4_no_intermed_compute.nc'

ds = xr.open_dataset(path_o)
ds

### Average over a depth slice

What we want to do is simply compute the average TF over a depth range defined in `DepthRange` above, with the condition that bathymetry must be deeper than `ShallowThreshold`.  Vincent does this by reading various NC variables into empty arrays, then applying if-else tests to find the range over which to average.  We should be able to do this on dask arrays using xarray's `sel` command. 

NOTE: I can easily select the relevant depth range, but I think Vincent's `ShallowThreshold` approach assumes that the depth variable is only defined up to the maximum depth of the grid cell (the bathymetry).  I believe the xarray way to do this is to find cells where the TF is NaN for levels >100 m.  Should double-check.

In [None]:
depth_slice = ds.sel(lev=slice(DepthRange[0], DepthRange[1]))
depth_slice.mean(dim='lev', skipna=True)

### Apply depth condition

In [None]:
# no_shallow = ds.TF.sel(lev=125.0) ## first set up what to test - the depth level just below ShallowThreshold
no_shallow = ds.TF.sel(lev=ShallowThreshold, method='nearest')
## Note for other users: the first option is hard-coded to work with a ShallowThreshold
## of 100 m (it selects the nearest value of `lev` known in the dataset).  The second
## option should work more flexibly.  If it doesn't work on your first try, comment the 
## second one and try ther first one.

deep_only = ds.TF.where(~xr.ufuncs.isnan(no_shallow))   ## now select TF in the whole dataset wherever it is *not NaN* below ShallowThreshold
deep_only.max() ## reality check: is this a float and not a nan? is it a reasonable value?

For me the value output here is about 19. Seems very high, but this is the max.  At least it's a float!

In [None]:
deep_sliced = ds.sel(lev=slice(DepthRange[0], DepthRange[1]))
dsm = deep_sliced.mean(dim='lev', skipna=True)
dsm

In [None]:
dsm.assign_attrs(standard_name='TF', ## 26 Mar: leaving this intact for use with Step 3 renaming
                    long_name='Ocean thermal forcing (depth-avg 0 to 500 m)',
                    # fillvalue=1.1e20,
                    latbounds=[limS, limN],
                    lonbounds=[limW,limE])

According to me, the above is the data Vincent wants: depth-averaged over `DepthRange`, trimmed to include only cells with data deeper than `ShallowThreshold`.  Write out to a NetCDF.

### Write NetCDF out

In [None]:
ds_out = dsm.assign_attrs(title='Depth-avg TF for {}'.format(SelModel),
                             summary='TF computed following Verjans code, in a bounding' + 
                              ' box around Greenland, for ISMIP7 Greenland forcing',
                             institution='NASA Goddard Space Flight Center',
                             creation_date=now.strftime('%Y-%m-%d %H:%M:%S'))

In [None]:
from datetime import date
out_fn = DirSave + '/tfdpavg-{}-{}.nc'.format(SelModel, date.today())

from dask.diagnostics import ProgressBar

with ProgressBar():
    dsm.to_netcdf(path=out_fn)

In [None]:
## test read-in

ds2 = xr.open_dataset(out_fn)
ds2

This shows that we have successfully computed and written out depth-averaged TF -- note that the `lev` coordinate present in the original TF dataset has now disappeared, because that is the dimension over which we averaged.  

Check that the dataset read in has a variable named 'TF' and not some general `_xarrayvariable_` type name.