# Ocean thermal forcing -- Verjans refactored with xarray
Clean ocean TF workflow for deployment on CCR.

10 Oct 2024 | EHU
- 15 Oct: try with CESM rather than IPSL, for now.  The [IPSL tripolar grid](https://cmc.ipsl.fr/international-projects/cmip5/ipsl-contribution-to-cmip5-faq/) is a complication to deal with in the next revision.
- 16 Oct: CESM2 is successful (in 2000-2014 and 1950-1999 examples)! Next implement a for-loop.
- 18 Oct: Metadata is not writing to output NetCDF on GHub...could be a problem with xarray version, but `xr.show_versions()` crashes the kernel.  So for now, we accept the metadata problem and attempt a full process of CESM files.
- 23 Jun 25: Test process CESM-WACCM for test protocol
- 7 Jul 25: Reprocess, correcting for depth expressed in cm rather than m in CESM-WACCM

### Imports and settings

In [None]:
import os
import sys
import glob
import copy
import csv
import numpy as np
import netCDF4 as nc
import xarray as xr
import dask
from datetime import datetime, date

from verjansFunctions import freezingPoint

You may want to change the settings below, or modify them to work on CCR.  `DirThetaoNC` is where the `thetao` files are stored; `DirSoNC` is where `so` files are stored (could be the same directory!); and `out_path` is where to write the output NetCDF of this script.

In [None]:
### Settings for this run
saveBoxGreenlandNC = True
cwd                = os.getcwd()+'/'

SelModel = 'CESM2-WACCM'

# DirThetaoNC = f'/home/theghub/ehultee/projects/cmipdata/files/'
DirThetaoNC = f'/Users/eultee/Library/CloudStorage/OneDrive-NASA/Data/ISMIP/Summer25Test/'
# DirSoNC     = f'/home/theghub/ehultee/projects/cmipdata/files/'
DirSoNC = DirThetaoNC
out_path = '/Users/eultee/Library/CloudStorage/OneDrive-NASA/Data/gris-iceocean-outfiles/Summer25Test/'


### Select experiment ###
To2015hist                 = True
To2100histssp585           = False
To2100histssp126           = False

# date_tags_to_run = ['1850', '1900', '1950', '2000'] ## stored separately in original method?
date_tags_to_run = ['1850',] ## 

## Verjans stuff we shouldn't need
## Could reconfigure to use these in labelling output files, if you're clever
# if(To2015hist):
#     Experiments = ['historical']
#     DatesCut    = [2015]
# elif(To2100histssp585): 
#     Experiments = ['historical','ssp585']
#     DatesCut    = [2015,2100]
# elif(To2100histssp126): 
#     Experiments = ['historical','ssp126']
#     DatesCut    = [2015,2100]
# nExp          = len(Experiments)
# depthUnitConv = 1.0 #initialize depth unit converter

### Limits of Greenland domain ###
limN           = 86.0 ## degrees N latitude
limS           = 57.0 ## degrees N latitude
limE           = 4.0 ## degrees E latitude
limW           = 274.0 ## degrees E latitude
## CHECK: confirm that output shows up within this W-E box and not its E-W complement
limDp          = 1200.0
depthSubSample = 1


In [None]:
### 
nExp          = len(Experiments)
depthUnitConv = 1.0 #initialize depth unit converter

if(SelModel=='MIROCES2L'):
    dim2d              = True
    if(To2015hist):
        ls_members     = [f'r{id}' for id in range(1,30+1)]
    elif(To2100histssp585 or To2100histssp126):
        ls_members     = [f'r{id}' for id in range(1,10+1)]
    namelat            = 'latitude'
    namelon            = 'longitude'
    namez              = 'lev'
    datesendhist       = np.array(['201412'])
    if(To2100histssp585):
        datesendssp585     = np.array(['210012'])
    if(To2100histssp126):
        datesendssp126     = np.array(['210012'])
        
if(SelModel=='IPSL-CM6A-LR'):
    dim2d              = True
    if(To2015hist):
        ls_members     = [f'r{id}' for id in range(1,32+1)]
        ls_members.remove('r2') #no r2 member for IPSLCM6A
    elif(To2100histssp585 or To2100histssp126):
        ls_members     = ['r1'] #,'r3','r4','r6','r14']
    namelat            = 'nav_lat'
    namelon            = 'nav_lon'
    namez              = 'olevel'
    datesRef           = [1850.0,2015.0,2040.0] 
    datesendhist       = np.array(['194912','201412'])
    if(To2100histssp585):
        datesendssp585     = np.array(['210012'])
    if(To2100histssp126):
        datesendssp126     = np.array(['210012'])

else:
    print(f'Error script not implemented yet for {SelModel}')

# nMemb           = len(ls_members)

The above is legacy code from V. Verjans...we may want to update some of these hard-coded variable names for other GCMs eventually.  For now, it's going to tell you 'Error script not implemented yet', but it does not matter for us.

### List the files to be read 

In [None]:
ThetaFiles_test = []
SoFiles_test = []
for expt in Experiments:
    fpath1 = DirThetaoNC+'thetao_Omon_{}_{}_'.format(SelModel, expt)
    print(fpath1)
    fpath2 = DirSoNC+'so_Omon_{}_{}_'.format(SelModel, expt)
    th_temp = glob.glob(f'{fpath1}*.nc')
    s_temp = glob.glob(f'{fpath2}*.nc')
    ThetaFiles_test += th_temp ##concat the glob lists
    SoFiles_test += s_temp

Confirm that the list is not empty.  If it is, something has gone wrong in the directory access or in the generation of names.

In [None]:
ThetaFiles_test

### Load, process, write out

Here is the big processing.  These were split over several cells in my original script, but I am combining them here to allow a for-loop.  This might run into a Jupyter memory cap.  Let me know if you have problems and we can iterate on the best way to run this on your system.

Specify the paths of the `thetao` and `so` variables -- ensure they come from the same GCM (`SelModel`) and time period.  Use a `with` statement to read in, trim, and close the parent datasets.  This should leave us with the trimmed datasets `gld_ds` and `gld_so` to work with below.

Compute the ocean TF, mask grounded areas, and assign the result to an xarray Dataset.  Set metadata.

Write the dataset out to a NetCDF file.  You may wish to change the `out_path` above, according to what you can access on CCR.


In [None]:
for yt in date_tags_to_run:
    path0 = [f for f in ThetaFiles_test if yt in f][0] ## thetao
    path1 = [f for f in SoFiles_test if yt in f][0] ## salinity
    ## we have to do a silly list comprehension with year tags, because the 
    ## glob lists are not stored in the same order
    
    ### Progress report
    print('Running time period of {} on files\n thetao: {} \n salinity: {}\n'.format(yt, path0, path1))

    ###----------------
    ### Load files
    ###----------------
    
    ## load in and trim thetao
    with xr.open_dataset(path0, chunks={'lev':10}) as ds:
        ## trim to Greenland bounding box
        include_lat = (ds.lat>=limS) & (ds.lat <=limN)
        include_lon = np.logical_or(((ds.lon%360)<=limE),((ds.lon %360) >=limW)) 
        ## modulo 360 to account for lon going -180 to 180 or 0-360

        with dask.config.set(**{'array.slicing.split_large_chunks': True}): 
            ## mitigate performance problem with slicing
            gld_ds = ds.where((include_lat & include_lon).compute(), drop=True)

    ## load and trim so
    with xr.open_dataset(path1, chunks={'lev':10}) as ds1:
        ## trim to Greenland bounding box
        include_lat = (ds1.lat>=limS) & (ds1.lat <=limN)
        include_lon = np.logical_or(((ds1.lon%360)<=limE),((ds1.lon %360) >=limW))

        with dask.config.set(**{'array.slicing.split_large_chunks': True}): 
            ## mitigate performance problem with slicing
            gld_so = ds1.where((include_lat & include_lon).compute(), drop=True) ## trim to Gld
    
    ###----------------
    ### Compute TF
    ###----------------
    fp = xr.apply_ufunc(freezingPoint, gld_so.so, gld_so.lev*0.01, dask='parallelized',
                       dask_gufunc_kwargs={'allow_rechunk':True})
    fftf = gld_ds.thetao - fp

    ## mask and apply a fill value
    tf_out = fftf.where(gld_ds.thetao<1e10, 1.1e20) ## apply Vincent's fill value of 1.1e20

    tf_out = tf_out.assign_attrs(standard_name='TF',
                        long_name='Ocean thermal forcing',
                        fillvalue=1.1e20,
                        latbounds=[limS, limN],
                        lonbounds=[limW,limE])
    
    
    ###----------------
    ### Format dataset and metadata
    ###----------------
    now = datetime.now()
    ds_temp = tf_out.to_dataset(name='TF')
    ds_out = ds_temp.assign_attrs(title='Ocean thermal forcing for {}'.format(SelModel),
                                 summary='TF computed following Verjans code, in a bounding' + 
                                  ' box around Greenland, for ISMIP7 Greenland forcing',
                                 institution='NASA Goddard Space Flight Center',
                                 creation_date=now.strftime('%Y-%m-%d %H:%M:%S'))

    ###----------------
    ### Write NetCDF out
    ###----------------
    year_tag = path0.strip('.nc').split('_')[-1] ## take the year tag from the GCM input (only one of the two input DS, but we have tried to make them match!)
    out_fn = out_path + 'tf-{}-{}-{}.nc'.format(SelModel, year_tag, date.today())

    from dask.diagnostics import ProgressBar

    with ProgressBar():
        ds_out.to_netcdf(path=out_fn)

print('Success!')