<font size="8"> **Transforming netcdf files to georeferenced rasters** </font>  
ACCESS-OM2-01 model outputs are available as netcdf files with no spatial data. In this notebook, we will transform the netcdf files into georeferenced raster data (`tif` format) to facilitate data extraction needed for habitat models.

# Setting working directory
In order to ensure these notebooks work correctly, we will set the working directory. We assume that you have saved a copy of this repository in your home directory (represented by `~` in the code chunk below). If you have saved this repository elsewhere in your machine, you need to ensure you update this line with the correct filepath where you saved these notebooks.

In [1]:
import os
os.chdir(os.path.expanduser('~/Chapter2_Crabeaters/Scripts'))

# Loading modules

In [2]:
#Accessing model data
import cosima_cookbook as cc
#Useful functions
import UsefulFunctions as uf
#Dealing with data
import xarray as xr
import numpy as np
import pandas as pd
#Data visualisation
import matplotlib.pyplot as plt
#Spatial analysis
import geopandas as gp
import rasterio
from shapely.geometry import Point
from dask.distributed import Client

# Paralellising work 

In [3]:
client = Client()
client

0,1
Connection method: Cluster object,Cluster type: distributed.LocalCluster
Dashboard: /proxy/8787/status,

0,1
Dashboard: /proxy/8787/status,Workers: 4
Total threads: 12,Total memory: 48.00 GiB
Status: running,Using processes: True

0,1
Comm: tcp://127.0.0.1:46257,Workers: 4
Dashboard: /proxy/8787/status,Total threads: 12
Started: Just now,Total memory: 48.00 GiB

0,1
Comm: tcp://127.0.0.1:38139,Total threads: 3
Dashboard: /proxy/35931/status,Memory: 12.00 GiB
Nanny: tcp://127.0.0.1:38317,
Local directory: /jobfs/85578438.gadi-pbs/dask-worker-space/worker-kxgvq6ai,Local directory: /jobfs/85578438.gadi-pbs/dask-worker-space/worker-kxgvq6ai

0,1
Comm: tcp://127.0.0.1:46857,Total threads: 3
Dashboard: /proxy/36393/status,Memory: 12.00 GiB
Nanny: tcp://127.0.0.1:35119,
Local directory: /jobfs/85578438.gadi-pbs/dask-worker-space/worker-_by58bx3,Local directory: /jobfs/85578438.gadi-pbs/dask-worker-space/worker-_by58bx3

0,1
Comm: tcp://127.0.0.1:40619,Total threads: 3
Dashboard: /proxy/39051/status,Memory: 12.00 GiB
Nanny: tcp://127.0.0.1:39929,
Local directory: /jobfs/85578438.gadi-pbs/dask-worker-space/worker-i2azixek,Local directory: /jobfs/85578438.gadi-pbs/dask-worker-space/worker-i2azixek

0,1
Comm: tcp://127.0.0.1:41479,Total threads: 3
Dashboard: /proxy/45603/status,Memory: 12.00 GiB
Nanny: tcp://127.0.0.1:40245,
Local directory: /jobfs/85578438.gadi-pbs/dask-worker-space/worker-9aymaz64,Local directory: /jobfs/85578438.gadi-pbs/dask-worker-space/worker-9aymaz64


# Creating COSIMA Cookbook session
This allows us to search and load ACCESS-OM2-01 outputs to our notebook.

In [4]:
session = cc.database.create_session()

# Defining dictionary of useful variables
In this dictionary contains variables that will be used multiple times throughout this notebook, including experiment name, variable of interest and paths to folders where outputs will be saved./g/data/v45/la6889/Chapter2_Crabeaters/Ocean/SurfaceZonalVelocity/

In [12]:
varDict = {'model': 'ACCESS-OM2-01',
           #ACCESS-OM2-01 cycle 4 (1958-2018)
           'exp': '01deg_jra55v140_iaf_cycle4',
           #ACCESS-OM2-01 cycle 4 extension (2018-2022)
           'exp_ext': '01deg_jra55v140_iaf_cycle4_jra55v150_extension',
           #Temporal resolution
           'freq': '1 monthly',
           #Output folder
           'base_folder': '/g/data/v45/la6889/Chapter2_Crabeaters/Ocean/BottomMeridionalVelocity/Rasters_tiff'}

# Searching variables available in experiment of interest 
We can load all variables available in the experiment(s) of our choice. We can then use the `long_name` column to find the name of the variable of our interest in the model.

In [6]:
var_acc = cc.querying.get_variables(session, experiment = varDict['exp_ext'], frequency='1 monthly')
var_acc[var_acc.long_name.str.contains('current')]

Unnamed: 0,name,long_name,units,frequency,ncfile,cell_methods,# ncfiles,time_start,time_end
146,u,i-current,m/sec,1 monthly,output1008/ocean/ocean-3d-u-1-monthly-mean-ym_...,time: mean,51,2019-01-01 00:00:00,2023-04-01 00:00:00
147,u,i-current,m/sec,1 monthly,output1008/ocean/ocean-3d-u-1-monthly-pow02-ym...,time: mean_pow(02),51,2019-01-01 00:00:00,2023-04-01 00:00:00
150,v,j-current,m/sec,1 monthly,output1008/ocean/ocean-3d-v-1-monthly-mean-ym_...,time: mean,51,2019-01-01 00:00:00,2023-04-01 00:00:00
151,v,j-current,m/sec,1 monthly,output1008/ocean/ocean-3d-v-1-monthly-pow02-ym...,time: mean_pow(02),51,2019-01-01 00:00:00,2023-04-01 00:00:00


## Adding additional keys to dictionary of variables
The name of the variable of interest in the model is added to the dictionary once we identify it in the step above.

In [13]:
varDict['var_mod'] = 'v'
varDict['var_short_name'] = 'v-velocity'
varDict['var_long_name'] = 'meridional current velocity'
#Checking results
varDict

{'model': 'ACCESS-OM2-01',
 'exp': '01deg_jra55v140_iaf_cycle4',
 'exp_ext': '01deg_jra55v140_iaf_cycle4_jra55v150_extension',
 'freq': '1 monthly',
 'base_folder': '/g/data/v45/la6889/Chapter2_Crabeaters/Ocean/BottomMeridionalVelocity/Rasters_tiff',
 'var_mod': 'v',
 'var_short_name': 'v-velocity',
 'var_long_name': 'meridional current velocity'}

## Creating a single dataset for our study period
Given that we are accessing outputs for two different experiments (usual run and extension), we will merge all data available for our study period (1978 to 2022) into a single variable to ensure all data is processed in the same way.

In [14]:
#Loading data from fourth cycle (temporal range 1958 to 2018)
var_df = uf.getACCESSdata_SO(varDict['var_mod'], '1978-01', '2019-01', 
                              freq = varDict['freq'], ses = session, 
                              exp = varDict['exp'], ice_data = False)
#Loading data from fourth cycle extension (2019 to 2022)
var_df_ext = uf.getACCESSdata_SO(varDict['var_mod'], '2019-01', '2023-01', 
                              freq = varDict['freq'], ses = session, 
                              exp = varDict['exp_ext'], ice_data = False)

We will merge the two datasets into a single variable.

In [15]:
#Concatenating both data arrays into one
var_df = xr.concat([var_df, var_df_ext], dim = 'time')
var_df = uf.corrlong(var_df)

#Removing duplicate variable
del var_df_ext

## Optional: Applying transformations
Applying any unit transformations to the original ACCESS-OM2-01 outputs. For example, in the block below, we will transform temperature from Kelvins to $^{\circ}C$.

In [8]:
var_df = var_df-273.15
var_df

Unnamed: 0,Array,Chunk
Bytes,1.49 GiB,1.76 MiB
Shape,"(2, 75, 740, 3600)","(1, 19, 135, 180)"
Dask graph,1008 chunks in 8 graph layers,1008 chunks in 8 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray
"Array Chunk Bytes 1.49 GiB 1.76 MiB Shape (2, 75, 740, 3600) (1, 19, 135, 180) Dask graph 1008 chunks in 8 graph layers Data type float32 numpy.ndarray",2  1  3600  740  75,

Unnamed: 0,Array,Chunk
Bytes,1.49 GiB,1.76 MiB
Shape,"(2, 75, 740, 3600)","(1, 19, 135, 180)"
Dask graph,1008 chunks in 8 graph layers,1008 chunks in 8 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray


## Optional: Simple subsetting of data along depth dimension
If we are interested in information about a variable of interest at a particular depth (e.g., ocean surface, 10 m, 100 m, etc). We can extract values fairly easily using the `sel` or `isel` functions from `xarray`.  
  
A reminder that `sel` allows us to select grid cell based on the value of a dimension. For example, if we want to select information at a depth of 10 m, then we will use `var.sel(st_ocean = 10, method = 'nearest')`. The `isel` option allows to select data based on the index, so if we want data for the ocean surface, we could select the first index along the depth dimension as shown in the code block below.

In [9]:
var_df = var_df.isel(st_ocean = 0)
var_df

Unnamed: 0,Array,Chunk
Bytes,5.37 GiB,94.92 kiB
Shape,"(541, 740, 3600)","(1, 135, 180)"
Dask graph,68166 chunks in 1089 graph layers,68166 chunks in 1089 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray
"Array Chunk Bytes 5.37 GiB 94.92 kiB Shape (541, 740, 3600) (1, 135, 180) Dask graph 68166 chunks in 1089 graph layers Data type float32 numpy.ndarray",3600  740  541,

Unnamed: 0,Array,Chunk
Bytes,5.37 GiB,94.92 kiB
Shape,"(541, 740, 3600)","(1, 135, 180)"
Dask graph,68166 chunks in 1089 graph layers,68166 chunks in 1089 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray


## Optional: Extracting data for bottom of the ocean
Extracting values for the bottom of the ocean is not as simple as extracting surface values. Instead, we have to perform the following steps:
1. Create a mask where a value of `1` replaces grid cells that contain a measurements. 
2. Perform a cumulative sum along the depth axis (`st_ocean`).
3. Identifying the last grid cell along the depth axis (`st_ocean`) that contains data. This will be the grid cell with the largest value along the depth dimension. All other grid cells are changed to `NaN`s.

In [16]:
mask_2d = xr.where(~np.isnan(var_df.isel(time = 0)), 1, np.nan)
mask_2d = mask_2d.cumsum('st_ocean').where(~np.isnan(var_df.isel(time = 0)))
mask_2d = xr.where(mask_2d == mask_2d.max('st_ocean'), 1, np.nan)
mask_2d

Unnamed: 0,Array,Chunk
Bytes,1.49 GiB,3.52 MiB
Shape,"(75, 740, 3600)","(19, 135, 180)"
Dask graph,504 chunks in 1100 graph layers,504 chunks in 1100 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray
"Array Chunk Bytes 1.49 GiB 3.52 MiB Shape (75, 740, 3600) (19, 135, 180) Dask graph 504 chunks in 1100 graph layers Data type float64 numpy.ndarray",3600  740  75,

Unnamed: 0,Array,Chunk
Bytes,1.49 GiB,3.52 MiB
Shape,"(75, 740, 3600)","(19, 135, 180)"
Dask graph,504 chunks in 1100 graph layers,504 chunks in 1100 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray


Applying mask to three-dimensional mask to dataset.

In [17]:
var_2d = (mask_2d*var_df).sum('st_ocean')
var_2d

Unnamed: 0,Array,Chunk
Bytes,10.74 GiB,189.84 kiB
Shape,"(740, 3600, 541)","(135, 180, 1)"
Dask graph,68166 chunks in 1109 graph layers,68166 chunks in 1109 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray
"Array Chunk Bytes 10.74 GiB 189.84 kiB Shape (740, 3600, 541) (135, 180, 1) Dask graph 68166 chunks in 1109 graph layers Data type float64 numpy.ndarray",541  3600  740,

Unnamed: 0,Array,Chunk
Bytes,10.74 GiB,189.84 kiB
Shape,"(740, 3600, 541)","(135, 180, 1)"
Dask graph,68166 chunks in 1109 graph layers,68166 chunks in 1109 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray


Re-arranging dimensions to match original dataset: `time` followed by coordinates.

In [19]:
if var_df.name in ['u', 'v']:
    var_df = var_2d.transpose('time', 'yu_ocean', 'xu_ocean')
else:
    var_df = var_2d.transpose('time', 'yt_ocean', 'xt_ocean')
#Checking results
var_df

Unnamed: 0,Array,Chunk
Bytes,10.74 GiB,189.84 kiB
Shape,"(541, 740, 3600)","(1, 135, 180)"
Dask graph,68166 chunks in 1110 graph layers,68166 chunks in 1110 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray
"Array Chunk Bytes 10.74 GiB 189.84 kiB Shape (541, 740, 3600) (1, 135, 180) Dask graph 68166 chunks in 1110 graph layers Data type float64 numpy.ndarray",3600  740  541,

Unnamed: 0,Array,Chunk
Bytes,10.74 GiB,189.84 kiB
Shape,"(541, 740, 3600)","(1, 135, 180)"
Dask graph,68166 chunks in 1110 graph layers,68166 chunks in 1110 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray


# Assigning Reference System
Prior to saving dataset as geo-referenced image, we will need to assign a Reference Systems. Given that coordinates are given in degrees, it is assumed that this dataset is using WGS84 as datum and the reference system is [EPSG 4326](https://epsg.io/4326).

In [20]:
#Adding CRS (WGS84)
var_df.rio.write_crs('epsg:4326', inplace = True)

#Changing latitude and longitude names before saving as tif file
if var_df.name in ['u', 'v']:
    var_df = var_df.rename({'xu_ocean': 'x', 'yu_ocean': 'y'})
else:
    var_df = var_df.rename({'xt_ocean': 'x', 'yt_ocean': 'y'})
var_df

Unnamed: 0,Array,Chunk
Bytes,10.74 GiB,189.84 kiB
Shape,"(541, 740, 3600)","(1, 135, 180)"
Dask graph,68166 chunks in 1110 graph layers,68166 chunks in 1110 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray
"Array Chunk Bytes 10.74 GiB 189.84 kiB Shape (541, 740, 3600) (1, 135, 180) Dask graph 68166 chunks in 1110 graph layers Data type float64 numpy.ndarray",3600  740  541,

Unnamed: 0,Array,Chunk
Bytes,10.74 GiB,189.84 kiB
Shape,"(541, 740, 3600)","(1, 135, 180)"
Dask graph,68166 chunks in 1110 graph layers,68166 chunks in 1110 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray


# Saving data
We will save individual time step as `tif` files for further processing.

In [21]:
os.makedirs(varDict['base_folder'], exist_ok = True)

for i, t in enumerate(var_df.time):
    ds_t = var_df.sel(time = t)
    date = np.datetime_as_string(t.values, unit = 'D')
    name_out = os.path.join(varDict['base_folder'], 
                            f'BottomMeridionalVelocity_{date}.tif')
    ds_t.rio.to_raster(name_out)

  return np.nanmax(x_chunk, axis=axis, keepdims=keepdims)
  return np.nanmax(x_chunk, axis=axis, keepdims=keepdims)
