<font size = 10> **Regridding velocity variables** </font>  
Meridional and zonal velocities have been identified as environmental variables with the potential of influencing crabeater seal distribution. These variables are available as ACCESS-OM2-01 outputs, but unlike all other outputs available in the model, their grid is slightly different to the one used by all other outputs. The resolution is the same, but the centre of the grid in the velocity outputs has an offset of about $0.05^{\circ}$ along latitude and longitude.
  
Here, we will extract velocity values for the ocean surface and along the bottom of the water column. We will then regrid them to match the grid of all other outputs available in ACCESS-OM2-01.

# Setting working directory
In order to ensure these notebooks work correctly, we will set the working directory. We assume that you have saved a copy of this repository in your home directory (represented by `~` in the code chunk below). If you have saved this repository elsewhere in your machine, you need to ensure you update this line with the correct filepath where you saved these notebooks.

In [2]:
import os
os.chdir(os.path.expanduser('~/Chapter2_Crabeaters/Scripts'))

# Loading relevant libraries

In [3]:
import UsefulFunctions as uf
import xarray as xr
# import pandas as pd
import numpy as np
from glob import glob
import matplotlib.pyplot as plt
# import cartopy.crs as ccrs
from dask.distributed import Client
import intake
import xesmf as xe

# Creating dictionary of variables
The values in this dictionary will be used several times throughout this notebook.

In [4]:
varDict = {'model': 'ACCESS-OM2-01',
           #ACCESS-OM2-01 cycle 4 (1958-2018)
           'exp': '01deg_jra55v140_iaf_cycle4',
           #ACCESS-OM2-01 cycle 4 extension (2018-2022)
           'exp_ext': '01deg_jra55v140_iaf_cycle4_jra55v150_extension',
           #Temporal resolution
           'freq': '1 monthly',
           #Name of variable in ACCESS-OM2-01 model
           'var_mod': 'v',
           #Short name
           'short_var': 'vel_lat',
           #Output folder
           'base_folder_out': '/g/data/vf71/la6889/Chapter2_Crabeaters/Velocity_Variables/',
           #File name base
           'base_file_out': 'monthly_lat_'}

# Paralellise work

In [3]:
client = Client()

# Load velocity variable to be regridded

In [10]:
#Create a COSIMA cookbook session
catalog = intake.cat.access_nri

#Loading data from fourth cycle (1978 to 2020)
var_df = uf.getACCESSdata_SO(varDict['var_mod'], '1978-01', '2020-01', 
                             freq = varDict['freq'], catalog = catalog, 
                             minlat = -80,exp = varDict['exp'], ice_data = False)
#Loading data from fourth cycle extension (2019 to 2022)
var_df_ext = uf.getACCESSdata_SO(varDict['var_mod'], '2019-01', '2023-01', 
                                 freq = varDict['freq'], ctalog = catalog, 
                                 minlat = -80, exp = varDict['exp_ext'], 
                                 ice_data = False)
#Concatenating both data arrays into one
var_df = xr.concat([var_df, var_df_ext], dim = 'time')

#Transforming longitudes so their range is +/-180 degrees
var_df = uf.corrlong(var_df)
#Removing duplicate variable
del var_df_ext

## Extracting surface data

In [7]:
#Select surface layer for depth and ensuring only data between 1978 and 2022 is selected
var_df_s = var_df.isel(st_ocean = 0).sel(time = slice('1978-01', '2022-12'))
#Remove depth (st_ocean) from coordinates as it is no longer useful
var_df_s = var_df_s.squeeze().drop('st_ocean')
#Rechunk data to facilitate data manipulation
var_df_s = var_df_s.chunk({'time': 12, 'yu_ocean': 513, 'xu_ocean': 200}) 
#Checking results
var_df_s

Unnamed: 0,Array,Chunk
Bytes,5.17 GiB,4.70 MiB
Shape,"(540, 714, 3600)","(12, 513, 200)"
Dask graph,1620 chunks in 1091 graph layers,1620 chunks in 1091 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray
"Array Chunk Bytes 5.17 GiB 4.70 MiB Shape (540, 714, 3600) (12, 513, 200) Dask graph 1620 chunks in 1091 graph layers Data type float32 numpy.ndarray",3600  714  540,

Unnamed: 0,Array,Chunk
Bytes,5.17 GiB,4.70 MiB
Shape,"(540, 714, 3600)","(12, 513, 200)"
Dask graph,1620 chunks in 1091 graph layers,1620 chunks in 1091 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray


### Saving surface data
We will save files for each year as an intermediary output in case files are needed in the future.

In [40]:
#Ensure base folder exists, if not, create it
os.makedirs(varDict['base_folder_out'], exist_ok = True)

#Looping through each year and saving outputs as netcdf files
for yr, da in var_df.groupby('time.year'):
    file_out = os.path.join(varDict['base_folder_out'], varDict['base_file_out']+f'surf_{yr}.nc')
    da.to_netcdf(file_out)

'/g/data/v45/la6889/Chapter2_Crabeaters/Velocity_Variables/monthly_surf_lat_20.nc'

## Extracting bottom data
Bottom data needs more processing that surface data. In this case, we need to identify the deepest bin with data for each grid cell.

In [11]:
#Create a mask identifying grids cell with values as 1 or where there are NAs as 0
mask_2d = xr.where(~np.isnan(var_df.isel(time = 0)), 1, np.nan)
#Calculate cumulative sum across depth in mask
mask_2d = mask_2d.cumsum('st_ocean').where(~np.isnan(var_df.isel(time = 0)))
#Identify the depth for the maximum values of cumulative sums
mask_2d = xr.where(mask_2d == mask_2d.max('st_ocean'), 1, np.nan)
#Apply mask to data
var_2d = (mask_2d*var_df).sum('st_ocean')
#Reorder coordinates and ensure only data between 1978 and 2022 remains
var_df = var_2d.transpose('time', 'yu_ocean', 'xu_ocean').sel(time = slice('1978-01', '2022-12'))
#Rechunk data to facilitate data manipulation
var_df = var_df.chunk({'time': 12, 'yu_ocean': 513, 'xu_ocean': 200}) 
#Checking results
var_df

Unnamed: 0,Array,Chunk
Bytes,10.34 GiB,9.39 MiB
Shape,"(540, 714, 3600)","(12, 513, 200)"
Dask graph,1620 chunks in 1112 graph layers,1620 chunks in 1112 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray
"Array Chunk Bytes 10.34 GiB 9.39 MiB Shape (540, 714, 3600) (12, 513, 200) Dask graph 1620 chunks in 1112 graph layers Data type float64 numpy.ndarray",3600  714  540,

Unnamed: 0,Array,Chunk
Bytes,10.34 GiB,9.39 MiB
Shape,"(540, 714, 3600)","(12, 513, 200)"
Dask graph,1620 chunks in 1112 graph layers,1620 chunks in 1112 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray


### Saving bottom data
We will save files for each year as an intermediary output in case files are needed in the future.

In [None]:
for yr, da in var_df.groupby('time.year'):
    file_out = os.path.join(varDict['base_folder_out'], varDict['base_file_out']+f'bottom_{yr}.nc')
    da.to_netcdf(file_out)

  return np.nanmax(x_chunk, axis=axis, keepdims=keepdims)
  return np.nanmax(x_chunk, axis=axis, keepdims=keepdims)
  return np.nanmax(x_chunk, axis=axis, keepdims=keepdims)


# Regridding velocity data

## Loading sample target grid
We will load a sample of the grid used by all other outputs in the ACCESS-OM2-01 model. We will use this as our target grid. We will load the `area_t` variable, which contains the area of each grid cell.

In [4]:
#Load grid for the Southern Ocean
grid = (catalog['01deg_jra55v140_iaf_cycle4'].
search(variable = 'area_t').to_dask()['area_t']).sel(yt_ocean = slice(-80, -45))
#Transforming longitudes so their range is +/-180 degrees
grid = uf.corrlong(grid)
#We will now create and attach a mask of land area
grid['mask'] = xr.where(~np.isnan(grid), 1, 0)
#We will rename the coordinates so the regridder can recognise it
grid = grid.rename({'xt_ocean': 'lon', 'yt_ocean': 'lat'})
#Checking results
grid.mask.plot()

Unnamed: 0,Array,Chunk
Bytes,9.79 MiB,1.41 MiB
Shape,"(713, 3600)","(513, 720)"
Dask graph,12 chunks in 4 graph layers,12 chunks in 4 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray
"Array Chunk Bytes 9.79 MiB 1.41 MiB Shape (713, 3600) (513, 720) Dask graph 12 chunks in 4 graph layers Data type float32 numpy.ndarray",3600  713,

Unnamed: 0,Array,Chunk
Bytes,9.79 MiB,1.41 MiB
Shape,"(713, 3600)","(513, 720)"
Dask graph,12 chunks in 4 graph layers,12 chunks in 4 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray


## Loading velocity data from memory 

In [1]:
#Getting list of files to be regridded
files_to_regrid = sorted(glob(os.path.join(varDict['base_folder_out'], 'regridded',varDict['short_var'], '*.nc')))

## Loading a single file from memory 
We just need a single timestep to calculate the regridder that will be applied to all velocity files.

In [None]:
#Loading a single time step
var_df = xr.open_dataarray(files_to_regrid[0]).isel(time = 0)
#Adding a mask for land areas
var_df['mask'] = xr.where(~np.isnan(var_df), 1, 0).squeeze().drop('time')
#We will rename the coordinates so the regridder can recognise it
var_df = var_df.rename({'xu_ocean': 'lon', 'yu_ocean': 'lat'})
#Checking result
var_df

## Calculate regridder
We will apply the bilinear method which performs well and has a relatively short processing time.

In [7]:
#We provide the file to be regridded followed by our target grid
regridder = xe.Regridder(var_df, grid, method = 'bilinear')
#Saving regridder
regridder.to_netcdf(os.path.join(varDict['base_folder_out'], 'regridder.nc'))

xESMF Regridder 
Regridding algorithm:       bilinear 
Weight filename:            bilinear_714x3600_713x3600.nc 
Reuse pre-computed weights? False 
Input grid shape:           (714, 3600) 
Output grid shape:          (713, 3600) 
Periodic in longitude?      False

### **Optional step**: Loading regridder for use later
We need to load the regridder file and use it as `weights` when we calculate the regridder once. This takes less than the original regridder calculation.  
Remember that to calculate the regridder, we only need a single time step for the data we want to regrid.

In [11]:
#Load regridder
regrid = xr.open_dataset(os.path.join(varDict['base_folder_out'], 'regridder.nc'))
#Once again, we provide a single time step of the original data to be regriddded
regridder = xe.Regridder(var_df, grid, method = 'bilinear', weights = regrid)

## Regridding data
Once we have the regridder ready, we can apply it to the velocity dataset. We will do this for the yearly files containing surface and bottom velocity information.

In [50]:
#Define folder for regridded data
folder_out = os.path.join(varDict['base_folder_out'], 'regridded', varDict['short_var'])
#Ensure it exists
os.makedirs(folder_out, exist_ok = True)

#Loop through all the files in the list
for f in files_to_regrid:
    #Load as data array
    var_df = xr.open_dataarray(f)
    #Create land mask before regridding
    var_df['mask'] = xr.where(~np.isnan(var_df.isel(time = 0)), 1, 0).squeeze().drop('time')
    #Renaming coordinates so the regridder can recognise them
    var_df = var_df.rename({'xu_ocean': 'lon', 'yu_ocean': 'lat'})
    #Applying regridder
    reg_var_df = regridder(var_df)
    #Renaming coordinates so they match coordinates for all other ACCESS-OM2-01 outputs
    reg_var_df = reg_var_df.rename({'lon': 'xt_ocean', 'lat': 'yt_ocean'})
    #Get year contained in data array
    yr = np.unique(reg_var_df.time.dt.year)[0]
    #Renaming variable in data array and saving output
    if 'bottom' in f:
        reg_var_df.name = varDict['short_var']+'_bottom_msec'
        file_out = os.path.join(folder_out, 'monthly_bottom_' + varDict['short_var'] + f'_{yr}.nc')
    elif 'surf' in f:
        reg_var_df.name = varDict['short_var']+'_surf_msec'
        file_out = os.path.join(folder_out, 'monthly_surf_' + varDict['short_var'] + f'_{yr}.nc')
    #Saving output as netcdf
    reg_var_df.to_netcdf(file_out)

'/g/data/v45/la6889/Chapter2_Crabeaters/Velocity_Variables/regridded/vel_lat'