### <center> This notebook has the postprocess of the raw downloaded glorys12v1 reanalysis and the following construction of tropical/coastal dataset

Since glorys12v1 is a heavy dataset only the tropical and coastal zone of western america was downloaded. The data was downloaded in chunks representative of some main regions. In this notebook the raw data will be loaded and then transformed in a hovmoller type dataset with a meridional mean for the tropical zone and a zonal mean for the coastal zones. This process will reduce the dataset dimensions to a (time, distance) dataset. Processed variables are: sea surface heights, ocean heat content defined as average temperature in the first 300m and sea surface temperature.

In [17]:
# Imports
import xarray as xr
import numpy as np
import pandas as pd
from glob import glob
import dask
dask.config.set({"array.slicing.split_large_chunks": False})

<dask.config.set at 0x7f2a3b493dc0>

In [18]:
# Load glorys12v1 tropical and coastal masks (see create_masks.ipynb)
masks_GLORYS = xr.open_dataset('data/GLORYS_masks.nc')

def load_glorys(path):
    "quick function for opening preprocess glorys data"
    data = xr.open_mfdataset(path) # Load raw netcdf
    data = data.rename({'longitude':'lon', 'latitude':'lat'}) # Change some names
    data.coords['lon'] = xr.where(data.lon<0, data.lon+360, data.lon)
    data = data.sortby('lon', ascending=True)
    return data

# Load raw downloaded data
tropical = load_glorys('data/GLORYS12V1/HOVMOLLERS/data/tropical/*.nc')
camerica = load_glorys('data/GLORYS12V1/HOVMOLLERS/data/camerica/*.nc')
mexico   = load_glorys('data/GLORYS12V1/HOVMOLLERS/data/mexico/*.nc')
usa      = load_glorys('data/GLORYS12V1/HOVMOLLERS/data/usa/*.nc')
peru     = load_glorys('data/GLORYS12V1/HOVMOLLERS/data/peru/*.nc').sortby('lat', ascending=False)
chile    = load_glorys('data/GLORYS12V1/HOVMOLLERS/data/chile/*.nc').sortby('lat', ascending=False)

In [20]:
# Get the tropical hovmoller
tropical = tropical.sel(lon=slice(masks_GLORYS.lon.min(), masks_GLORYS.lon.max()))
tropical = tropical.where(masks_GLORYS.tropicalmask==1).mean('lat')
tropical = tropical.compute().dropna('lon')
tropical

In [21]:
# Build coastal hovmoller
camerica = camerica.where(masks_GLORYS.coastmask_north==1).mean('lon').dropna('lat').compute()
mexico = mexico.where(masks_GLORYS.coastmask_north==1).mean('lon').dropna('lat').compute()
usa = usa.where(masks_GLORYS.coastmask_north==1).mean('lon').dropna('lat').compute()

peru = peru.where(masks_GLORYS.coastmask_south==1).mean('lon').dropna('lat').compute()
chile = chile.where(masks_GLORYS.coastmask_south==1).mean('lon').dropna('lat').compute()

In [22]:
# Concat and define final northern hemisphere hovmoller
coastnorth = xr.concat([camerica, mexico, usa], 'lat').drop_duplicates('lat').sortby('lat')
coastnorth

In [23]:
# Concat and define final southern hemisphere hovmoller
coastsouth = xr.concat([peru, chile], 'lat').drop_duplicates('lat').sortby('lat', ascending=False)
coastsouth

Now the data representative of the equatorial kelvin wave properties is in the "tropical" array, and the coastally trapped kelvin wave data is in the "coastnorth" and "coastsouth" arrays. Before saving this new data into disk some attributes and coordinates will be added. In particular it is of interest the distance along the hovmoller path as a new dataset dimension, this distance must be built in spherical coordinates first for the tropical part (constant latitude) and then along the coast of america with changing latitudes and longitudes. In either way the "Haversine" formula will be used, because it allows to compute the distance between two points in spherical coordinates:

The central angle between two points in a sphere is: 
$$ \theta = \frac{d}{R}$$

With $d$ the distance between the two points and $R$ the sphere radius. Now lets define the haversine of the angle $\theta$ as:
$$hav(\theta) = sin^2(\frac{\theta}{2}) = \frac{1-cos(\theta)}{2}$$

Given this definitions, for two points $p_1=(\lambda_1, \phi_1)$ and $p_2=(\lambda_2, \phi_2)$, the haversine formula states that:
$$ hav(\theta) = hav(\phi_2-\phi_1)+cos(\phi_1)cos(\phi_2)hav(\lambda_2-\lambda_1) $$

Which allows to solve for the distance "d" between the two points as:
$$ d = 2R\cdot arcsin(\sqrt{hav(\theta)})$$

Or more explicitly:

$$ d = 2R \cdot arcsin(\sqrt{sin^2(\frac{\phi_2-\phi_1}{2})+cos(\phi_1)\cdot cos(\phi_2)\cdot sin^2(\frac{\lambda_2-\lambda_1}{2})}) $$


With this last formula a new dimension/coordinate can be added to the dataset based on the coordinates of each pixel along the kelvin wave pathway. First of all the latitudes and longitudes of all pixels will be added as a coordinates as well as an index o numerical position in the grid to then compute the distance in between points and get the final dataset with the correct coordinate data.

In [27]:
lon,lat = np.meshgrid(masks_GLORYS.lon,masks_GLORYS.lat)
tropicalcoords   = pd.DataFrame((lon[0,:],np.zeros(len(lon[0,:]))), index=['lon','lat']).T
lonn = []
lons = []
for i in range(len(masks_GLORYS.lat)):
    try:
        x = masks_GLORYS.coastmask_north.where(masks_GLORYS.coastmask_north==1)
        x = x.isel(lat=i).dropna('lon')[-1].lon.item()
        lonn.append(x)
    except:
        lonn.append(np.nan)
        pass
    
    try:
        x = masks_GLORYS.coastmask_south.where(masks_GLORYS.coastmask_north==1)
        x = x.isel(lat=i).dropna('lon')[-1].lon.item()
        lons.append(x)
    except:
        lons.append(np.nan)
        pass
coastnorthcoords = pd.DataFrame((np.array(lonn),lat[:,0]),index=['lon','lat']).T.dropna()
coastsouthcoords = pd.DataFrame((np.array(lons),lat[:,0]),index=['lon','lat']).T.dropna()

# coastnorthcoords.index = coastnorthcoords.lat
# coastsouthcoords.index = coastsouthcoords.lat