<font size="8"> **Regridding velocity outputs from ACCESS-OM2-01** </font>  
Meridional and zonal velocities have been identified as environmental variables with the potential of influencing crabeater seal distribution. These variables are available as ACCESS-OM2-01 outputs, but unlike all other outputs available in the model, their grid is slightly different to the one used by all other outputs. The resolution is the same, but the centre of the grid in the velocity outputs has an offset of about $0.05^{\circ}$ along latitude and longitude.
  
Here, we will extract velocity values for the ocean surface and along the bottom of the water column. We will then regrid them to match the grid of all other outputs available in ACCESS-OM2-01.

# Setting working directory
In order to ensure these notebooks work correctly, we will set the working directory. We assume that you have saved a copy of this repository in your home directory (represented by `~` in the code chunk below). If you have saved this repository elsewhere in your machine, you need to ensure you update this line with the correct filepath where you saved these notebooks.

In [1]:
import os
os.chdir(os.path.expanduser('~/Chapter2_Crabeaters/Scripts'))

# Loading modules

In [2]:
#Accessing model data
import cosima_cookbook as cc
#Dealing with data
import xarray as xr
import numpy as np
import pandas as pd
#Data visualisation
import matplotlib.pyplot as plt
#Collection of useful functions developed for this project
import UsefulFunctions as uf
#Parallelising work
from dask.distributed import Client
#Reprojection
import rioxarray
import xesmf as xe

## Parallelising work
First, we will start a cluster with multiple cores to make analysis faster. Remember the number of CPUs cannot exceed the CPUs you have access to.

In [3]:
client = Client()
client

0,1
Connection method: Cluster object,Cluster type: distributed.LocalCluster
Dashboard: /proxy/8787/status,

0,1
Dashboard: /proxy/8787/status,Workers: 4
Total threads: 12,Total memory: 48.00 GiB
Status: running,Using processes: True

0,1
Comm: tcp://127.0.0.1:41705,Workers: 4
Dashboard: /proxy/8787/status,Total threads: 12
Started: Just now,Total memory: 48.00 GiB

0,1
Comm: tcp://127.0.0.1:37955,Total threads: 3
Dashboard: /proxy/38299/status,Memory: 12.00 GiB
Nanny: tcp://127.0.0.1:45097,
Local directory: /jobfs/91163472.gadi-pbs/dask-worker-space/worker-wnpo9_4u,Local directory: /jobfs/91163472.gadi-pbs/dask-worker-space/worker-wnpo9_4u

0,1
Comm: tcp://127.0.0.1:39075,Total threads: 3
Dashboard: /proxy/40899/status,Memory: 12.00 GiB
Nanny: tcp://127.0.0.1:39815,
Local directory: /jobfs/91163472.gadi-pbs/dask-worker-space/worker-mmy0qprf,Local directory: /jobfs/91163472.gadi-pbs/dask-worker-space/worker-mmy0qprf

0,1
Comm: tcp://127.0.0.1:45499,Total threads: 3
Dashboard: /proxy/33415/status,Memory: 12.00 GiB
Nanny: tcp://127.0.0.1:40551,
Local directory: /jobfs/91163472.gadi-pbs/dask-worker-space/worker-4jqeeugq,Local directory: /jobfs/91163472.gadi-pbs/dask-worker-space/worker-4jqeeugq

0,1
Comm: tcp://127.0.0.1:35137,Total threads: 3
Dashboard: /proxy/43125/status,Memory: 12.00 GiB
Nanny: tcp://127.0.0.1:32851,
Local directory: /jobfs/91163472.gadi-pbs/dask-worker-space/worker-cueqiwcy,Local directory: /jobfs/91163472.gadi-pbs/dask-worker-space/worker-cueqiwcy


# Defining dictionary of useful variables
In this dictionary we will define a variables that will be used multiple times throughout this notebook to avoid repetition. It will mostly contain paths to folders where intermediate or final outputs will be stored.

In [4]:
varDict = {'var_mod': 'v',
           #We will use cycle 4, which has an extension run until Dec 2022
           'exp': '01deg_jra55v140_iaf_cycle4',
           'exp_ext': '01deg_jra55v140_iaf_cycle4_jra55v150_extension',
           #Frequency of data
           'freq': '1 monthly',
           #Folder where regridded data will be stored
           'out_folder': '/g/data/v45/la6889/Chapter2_Crabeaters/Velocity_Fields/Meridional_Vel/'}

# Accessing ACCESS-OM2-01 data

The fourth run of the ACCESS-OM2-01 model has outputs available from 1958 to 2022. However, these outputs are available through two different experiments: `01deg_jra55v140_iaf_cycle4` and `01deg_jra55v140_iaf_cycle4_jra55v150_extension`. 

Another reason for chossing the 4th cycle is that biogeochemical (BGC) data is available in this run only.

Below, we are accessing the velocity data for these experiments and merging into a single dataset. We will then extract surface and bottom of the water column velocity values before regridding and storing it to disk.

## Starting a new cookbook session

In [5]:
session = cc.database.create_session()

## Loading velocity data

In [6]:
#Loading data from fourth cycle (1958 to 2018)
var_vel = uf.getACCESSdata_SO(varDict['var_mod'], '1971-01', '2019-01', 
                              freq = varDict['freq'], ses = session, minlat = -80,
                              exp = varDict['exp'])

#Loading data from fourth cycle extension (2019 to 2022)
var_vel_ext = uf.getACCESSdata_SO(varDict['var_mod'], '2019-01', '2023-01', 
                              freq = varDict['freq'], ses = session, minlat = -80,
                              exp = varDict['exp_ext'])

#Concatenating both data arrays into one
var_vel = xr.concat([var_vel, var_vel_ext], dim = 'time')
var_vel = uf.corrlong(var_vel)

#Removing variable merged (and now duplicated)
del var_vel_ext

#Checking results
var_vel

Unnamed: 0,Array,Chunk
Bytes,448.85 GiB,1.76 MiB
Shape,"(625, 75, 714, 3600)","(1, 19, 135, 180)"
Dask graph,315000 chunks in 1256 graph layers,315000 chunks in 1256 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray
"Array Chunk Bytes 448.85 GiB 1.76 MiB Shape (625, 75, 714, 3600) (1, 19, 135, 180) Dask graph 315000 chunks in 1256 graph layers Data type float32 numpy.ndarray",625  1  3600  714  75,

Unnamed: 0,Array,Chunk
Bytes,448.85 GiB,1.76 MiB
Shape,"(625, 75, 714, 3600)","(1, 19, 135, 180)"
Dask graph,315000 chunks in 1256 graph layers,315000 chunks in 1256 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray


## Extracting surface layer data
For some ocean variables, we need to subset data to extract surface values or bottom values. Subsetting data for the surface layer is an easy process, we simply need to select the first depth bin available. The `st_ocean` dimension contains the depth bins.

In [8]:
#Selecting the first depth available in the model (i.e. surface layer)
var_vel_surf = var_vel.isel(st_ocean = 0).squeeze().drop('st_ocean')

#Checking results - dataset has three dimensions instead of the original four
var_vel_surf

Unnamed: 0,Array,Chunk
Bytes,5.98 GiB,94.92 kiB
Shape,"(625, 714, 3600)","(1, 135, 180)"
Dask graph,78750 chunks in 1257 graph layers,78750 chunks in 1257 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray
"Array Chunk Bytes 5.98 GiB 94.92 kiB Shape (625, 714, 3600) (1, 135, 180) Dask graph 78750 chunks in 1257 graph layers Data type float32 numpy.ndarray",3600  714  625,

Unnamed: 0,Array,Chunk
Bytes,5.98 GiB,94.92 kiB
Shape,"(625, 714, 3600)","(1, 135, 180)"
Dask graph,78750 chunks in 1257 graph layers,78750 chunks in 1257 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray


## Loading ACCESS-OM2-01 sample grid
This is the grid that we want the velocity field to have after regridding is done.

In [9]:
#Accessing the area of grid and keeping data for the Southern Ocean only
grid = cc.querying.getvar(varDict['exp'], 'area_t', session, n = 1).sel(yt_ocean = slice(-80, -45))
#Correcting longitude values to keep them between +/- 180
grid = uf.corrlong(grid)
#Renaming coordinates to match observations
grid = grid.rename({'xt_ocean': 'lon', 'yt_ocean': 'lat'})

#Checking results
grid

Unnamed: 0,Array,Chunk
Bytes,9.79 MiB,1.41 MiB
Shape,"(713, 3600)","(513, 720)"
Dask graph,12 chunks in 4 graph layers,12 chunks in 4 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray
"Array Chunk Bytes 9.79 MiB 1.41 MiB Shape (713, 3600) (513, 720) Dask graph 12 chunks in 4 graph layers Data type float32 numpy.ndarray",3600  713,

Unnamed: 0,Array,Chunk
Bytes,9.79 MiB,1.41 MiB
Shape,"(713, 3600)","(513, 720)"
Dask graph,12 chunks in 4 graph layers,12 chunks in 4 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray


# Regridding velocity data
We will calculate the regridder once only, and then we will apply it to the surface and bottom layers. 

We will need to rename the coordinates in the velocity data array before calculating the regridder. Otherwise, coordinates will not be recognised by the `Regridder` function of `xesmf`.

In [10]:
#Renaming lat/lon coordinates
var_vel_surf = var_vel_surf.rename({'xu_ocean': 'lon', 'yu_ocean': 'lat'})
#Checking results
var_vel_surf

Unnamed: 0,Array,Chunk
Bytes,5.98 GiB,94.92 kiB
Shape,"(625, 714, 3600)","(1, 135, 180)"
Dask graph,78750 chunks in 1257 graph layers,78750 chunks in 1257 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray
"Array Chunk Bytes 5.98 GiB 94.92 kiB Shape (625, 714, 3600) (1, 135, 180) Dask graph 78750 chunks in 1257 graph layers Data type float32 numpy.ndarray",3600  714  625,

Unnamed: 0,Array,Chunk
Bytes,5.98 GiB,94.92 kiB
Shape,"(625, 714, 3600)","(1, 135, 180)"
Dask graph,78750 chunks in 1257 graph layers,78750 chunks in 1257 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray


In [45]:
# vel_sample = var_vel.isel(time = 0, st_ocean = 0).squeeze().drop('st_ocean').drop('time')
# vel_sample['mask'] = xr.where(~np.isnan(vel_sample), 1, 0)
# vel_sample = vel_sample.rename({'xu_ocean': 'lon', 'yu_ocean': 'lat'})
# grid['mask'] = xr.where(~np.isnan(grid), 1, 0)
# grid
vel_sample

Unnamed: 0,Array,Chunk
Bytes,9.81 MiB,94.92 kiB
Shape,"(714, 3600)","(135, 180)"
Dask graph,126 chunks in 1257 graph layers,126 chunks in 1257 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray
"Array Chunk Bytes 9.81 MiB 94.92 kiB Shape (714, 3600) (135, 180) Dask graph 126 chunks in 1257 graph layers Data type float32 numpy.ndarray",3600  714,

Unnamed: 0,Array,Chunk
Bytes,9.81 MiB,94.92 kiB
Shape,"(714, 3600)","(135, 180)"
Dask graph,126 chunks in 1257 graph layers,126 chunks in 1257 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,19.61 MiB,189.84 kiB
Shape,"(714, 3600)","(135, 180)"
Dask graph,126 chunks in 1260 graph layers,126 chunks in 1260 graph layers
Data type,int64 numpy.ndarray,int64 numpy.ndarray
"Array Chunk Bytes 19.61 MiB 189.84 kiB Shape (714, 3600) (135, 180) Dask graph 126 chunks in 1260 graph layers Data type int64 numpy.ndarray",3600  714,

Unnamed: 0,Array,Chunk
Bytes,19.61 MiB,189.84 kiB
Shape,"(714, 3600)","(135, 180)"
Dask graph,126 chunks in 1260 graph layers,126 chunks in 1260 graph layers
Data type,int64 numpy.ndarray,int64 numpy.ndarray


In [46]:
grid

Unnamed: 0,Array,Chunk
Bytes,9.79 MiB,1.41 MiB
Shape,"(713, 3600)","(513, 720)"
Dask graph,12 chunks in 4 graph layers,12 chunks in 4 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray
"Array Chunk Bytes 9.79 MiB 1.41 MiB Shape (713, 3600) (513, 720) Dask graph 12 chunks in 4 graph layers Data type float32 numpy.ndarray",3600  713,

Unnamed: 0,Array,Chunk
Bytes,9.79 MiB,1.41 MiB
Shape,"(713, 3600)","(513, 720)"
Dask graph,12 chunks in 4 graph layers,12 chunks in 4 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,19.58 MiB,2.82 MiB
Shape,"(713, 3600)","(513, 720)"
Dask graph,12 chunks in 7 graph layers,12 chunks in 7 graph layers
Data type,int64 numpy.ndarray,int64 numpy.ndarray
"Array Chunk Bytes 19.58 MiB 2.82 MiB Shape (713, 3600) (513, 720) Dask graph 12 chunks in 7 graph layers Data type int64 numpy.ndarray",3600  713,

Unnamed: 0,Array,Chunk
Bytes,19.58 MiB,2.82 MiB
Shape,"(713, 3600)","(513, 720)"
Dask graph,12 chunks in 7 graph layers,12 chunks in 7 graph layers
Data type,int64 numpy.ndarray,int64 numpy.ndarray


## Loading ACCESS-OM2-01 sample grid
This is the grid that we want the SIC observations to have after regridding is done.

In [14]:
#Creating new COSIMA cookbook session
session = cc.database.create_session()

#Accessing the area of grid and keeping data for the Southern Ocean only
grid = cc.querying.getvar(varDict['expt'], 'area_t', session, n = 1).sel(yt_ocean = slice(-90, -45))
#Correcting longitude values to keep them between +/- 180
grid = uf.corrlong(grid)
#Renaming coordinates to match observations
grid = grid.rename({'xt_ocean': 'lon', 'yt_ocean': 'lat'})

#Checking results
grid

Unnamed: 0,Array,Chunk
Bytes,10.16 MiB,1.48 MiB
Shape,"(740, 3600)","(540, 720)"
Dask graph,12 chunks in 4 graph layers,12 chunks in 4 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray
"Array Chunk Bytes 10.16 MiB 1.48 MiB Shape (740, 3600) (540, 720) Dask graph 12 chunks in 4 graph layers Data type float32 numpy.ndarray",3600  740,

Unnamed: 0,Array,Chunk
Bytes,10.16 MiB,1.48 MiB
Shape,"(740, 3600)","(540, 720)"
Dask graph,12 chunks in 4 graph layers,12 chunks in 4 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray


In [59]:
grid_out = xr.Dataset({'lon': (['lon'], grid.lon.values), 'lat': (['lat'], grid.lat.values)})
grid_out
# grid_in = {'lon': var_vel_surf.lon.values, 'lat': var_vel_surf.lat.values}

In [57]:
var_vel_surf

Unnamed: 0,Array,Chunk
Bytes,5.98 GiB,94.92 kiB
Shape,"(625, 714, 3600)","(1, 135, 180)"
Dask graph,78750 chunks in 1257 graph layers,78750 chunks in 1257 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray
"Array Chunk Bytes 5.98 GiB 94.92 kiB Shape (625, 714, 3600) (1, 135, 180) Dask graph 78750 chunks in 1257 graph layers Data type float32 numpy.ndarray",3600  714  625,

Unnamed: 0,Array,Chunk
Bytes,5.98 GiB,94.92 kiB
Shape,"(625, 714, 3600)","(1, 135, 180)"
Dask graph,78750 chunks in 1257 graph layers,78750 chunks in 1257 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray


In [60]:
#Calculating regridder
reg = xe.Regridder(var_vel_surf, grid_out, method = 'conservative')
#Checking results
reg

ValueError: The truth value of a Array is ambiguous. Use a.any() or a.all().

In [53]:
reg_vel = reg(var_vel_surf)
reg_vel

ValueError: Dimension 1 has 6 blocks, adjust_chunks specified with 1 blocks

## Subsetting data every 7 years

In [9]:
#Defining months in 7 years
months_in_7_yrs = 7*12
#Creating a list of timesteps within our study period
times_interest = pd.period_range('1978-01', '2022-12', freq = 'M')
#Identifying the date when the 7 year period begins
times_begin = [(t-pd.offsets.MonthEnd(months_in_7_yrs)).to_timestamp() for t in times_interest]

In [10]:
#Creating empty list to save results
long_term_pack_ice = []

#Loop through each timestep of our interest
for i, t in enumerate(times_interest):
    #Select 7-year periods and calculate proportion of time a grid cell covered by at least 85% SIC
    da = pack_ice.sel(time = slice(times_begin[i], t.to_timestamp())).sum('time')/months_in_7_yrs
    #Assign a date to each timestep - Here we assign the end date of the 7 year period
    da['time'] = t.to_timestamp()
    #Add results to list
    long_term_pack_ice.append(da)

In [11]:
#Concatenate results into a single file
long_term_pack_ice = xr.concat(long_term_pack_ice, dim = 'time')
#Checking results - Note there are fewer time steps that original data. As we do not need the initial seven years.
long_term_pack_ice

Unnamed: 0,Array,Chunk
Bytes,10.33 GiB,759.38 kiB
Shape,"(540, 713, 3600)","(1, 270, 360)"
Dask graph,17820 chunks in 6664 graph layers,17820 chunks in 6664 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray
"Array Chunk Bytes 10.33 GiB 759.38 kiB Shape (540, 713, 3600) (1, 270, 360) Dask graph 17820 chunks in 6664 graph layers Data type float64 numpy.ndarray",3600  713  540,

Unnamed: 0,Array,Chunk
Bytes,10.33 GiB,759.38 kiB
Shape,"(540, 713, 3600)","(1, 270, 360)"
Dask graph,17820 chunks in 6664 graph layers,17820 chunks in 6664 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray


# Saving outputs to local machine
Data saved as yearly outputs due to limitations in storing a single large file.

In [12]:
#Ensuring output directory exists
os.makedirs(varDict['base_folder'], exist_ok = True)

In [13]:
#Grouping data by year
for yr, da in long_term_pack_ice.groupby('time.year'):
    #Creating name for yearly output file
    file_out = os.path.join(varDict['base_folder'], f'LongTerm_PackIce_Monthly_Jan-Dec_{yr}.nc')
    #Saving yearly output file
    da.to_netcdf(file_out)