Current runtime: ~1.5min

# Working with Chlorophyll Data from the Cloud

This notebook illustrates how to remotely access chlorophyll data from *OceanData.sci*.

In particular, it allows the user to specify
* time range of interest
* variable of interest
* time binning method (daily or 8-day means)
* spatial resolution (4km or 9km)

The data is loaded lazily into xarray. This means that data is only transferred to the local machine when it is need, which reduces memory requirements, but does of course mean that thing will take a bit longer.

### To begin, load in our packages of choice

In [None]:
import numpy as np
import xarray as xr

import matplotlib
import matplotlib.pyplot as plt
import matplotlib.colors as colors

import cmocean
from pyproj import Proj

from AddParallels_and_Meridians import AddParallels_and_Meridians
from FiniteDiff import FiniteDiff

#### This is only if you use a dark background notebook. Otherwise, comment this out.

In [None]:
plt.style.use('dark_background')

font = {'size' : 16}

matplotlib.rc('font', **font)

## Specify Data Selection

* `start_date`: datetime object indicating beginning time for selection. In `'YYYY-MM-DD'` format.
* `end_date`: datetime object indicating end time (none-inclusive) for selection. In `'YYYY-MM-DD'` format.
* `VAR`: desired variable. Currently only tested for `'CHL'`
* `ALG`: associated variable algorithm/method. Currently only tested for `'chl_ocx'`
* `BIN`: time-binning period. Currently only accepts `'DAY'` and `'8D'` for dail and 8-day averages, respectively
* `SRES`: spatial resolution. Options are `'4km'` and `'9km'`

In [None]:
## YYYY-MM-DD
start_date = np.datetime64('2018-01-01')
end_date   = np.datetime64('2018-02-01')
num_days = (end_date - start_date).tolist().days

# variable to load
VAR = 'CHL'

# algorithm
ALG = 'chl_ocx'

# Binning period
BIN = '8D'  # DAY, 8D, MO

# Spatial resolution
SRES = '9km'   # 4km, 9km

## Create a list of URLs and associated times

These URLs will then be used to access the requested netcdf datafiles.

In [None]:
# Build a list of URLs and datetime objects
dap_urls = []
the_days = []

url_base = "https://oceandata.sci.gsfc.nasa.gov:443/opendap/MODISA/L3SMI/"

for ii in range(num_days):
    
    curr_date = start_date + ii
    
    curr_year = curr_date.tolist().year
    ref_date = np.datetime64('{0:d}-01-01'.format(curr_year))
    
    day_num = 1 + (curr_date - ref_date).tolist().days
    
    # We need to change the formatting a bit depending on the binning
    do = True
    if BIN == 'DAY':
        time_str = 'A{0:d}{1:03d}'.format(curr_year, day_num)
    elif BIN == '8D':
        if (day_num - 1) % 8 == 0:
            targ_day = day_num + 7
            if targ_day > 365:
                targ_day = 365
            
            time_str = 'A{0:d}{1:03d}{2:d}{3:03d}'.format(curr_year, day_num, curr_year, targ_day)
        else:
            # There isn't an 8D set starting here
            do = False
    
    if do:
        file_url = url_base + \
                '{0:d}/{1:03d}/{2}'.format(curr_year, day_num, time_str) + \
                '.L3m_{0}_{1}_{2}_{3}'.format(BIN, VAR, ALG, SRES) + \
                '.nc'
    
        dap_urls += [file_url]
        the_days += [curr_date]
    
print('dap_urls containts {0:d} urls for {1} data.'.format(len(dap_urls), VAR))

## Now load the datasets

We don't use `xr.open_mfdataset` because the source datafiles have no time dimension, in addition to having some extraneous variables the cause merging problems.

Instead, we simply create a list of datasets, on for each URL, and in the same order as the URLs.

In [None]:
data_sets = [xr.open_dataset(url) \
             for (url,ind) \
             in zip(dap_urls, np.arange(num_days))]

### Create the time array corresponding to the datasets

In [None]:
time_array = xr.DataArray(the_days, None, 'time', 'time')

### Concatenate each separate dataset into one large dataset with a time dimension. 

The values of the time dimension will be taken from `time_array`.

In [None]:
merged = xr.concat(data_sets, time_array)

## Analysis

We now have the desired dataset 'loaded' into our notebook (recall that it is lazy loading). We can now proceed to analyze the data as we desire!

###### Subsetting

Plotting the whole globe takes a while, so let's just plot a small regiong

In [None]:
lon_lb = -125
lon_ub = - 35

lat_lb = -20
lat_ub =  70

subs_chl = merged.chl_ocx.sel(lon=slice(lon_lb,lon_ub),lat=slice(lat_ub,lat_lb)).data

subs_lon = merged.lon.sel(lon=slice(lon_lb,lon_ub)).data
subs_lat = merged.lat.sel(lat=slice(lat_ub,lat_lb)).data

In [None]:
subs_chl = np.ma.masked_where(np.isnan(subs_chl), subs_chl)

In [None]:
subs_chl[0,:,:]

In [None]:
sLON, sLAT = np.meshgrid(subs_lon, subs_lat)

In [None]:
ddlon = FiniteDiff(subs_lon, 2, uniform=False, spb=False)
ddlat = FiniteDiff(subs_lat, 2, uniform=False, spb=False)

In [None]:
def compute_gradient(field, LAT):
    grad_field = np.zeros(field.shape)
    
    for Itime in range(field.shape[0]):
            dfdlat = np.ma.dot(ddlat, field[Itime,:,:])
            dfdlon = np.ma.dot(ddlon, field[Itime,:,:].T).T
            
            grad_field[Itime,:,:] = dfdlon
            #grad_field[Itime,:,:] = np.sqrt( (dfdlon / np.cos(LAT * np.pi / 180))**2 + dfdlat**2 )
            
    return grad_field

In [None]:
chl_mean = np.nanmean(subs_chl, axis=0)

In [None]:
grad_chl = compute_gradient(subs_chl, sLAT)

In [None]:
grad_chl = np.ma.masked_where(subs_chl.mask, grad_chl)

In [None]:
plt.pcolormesh(subs_lon, subs_lat, np.mean(grad_chl, axis=0), vmin=-2, vmax=2)
plt.colorbar()

In [None]:
plt.pcolormesh(subs_lon, subs_lat, grad_chl[0,:,:], vmin=-2, vmax=2)
plt.colorbar()

In [None]:
np.sum(np.isnan(grad_chl))

In [None]:
len(np.ravel(grad_chl))