## Option 1: Use ERDDAP DAP endpoint to obtain an OPeNDAP 'Series' output

Transform and reduce via xarray the DAP Series response to a multi-dimensional netCDF file

In [48]:
import xarray as xr
import numpy as np


ds = xr.open_dataset("http://erddap.sensors.ioos.us/erddap/tabledap/gov_usgs_cmgp_buzz_bay_265")
ds

`s` is the dimension and all variables are appended an `s.`.

In [49]:
ds["s.backscatter_intensity_2651_a"]

### We need to split the coordinates and variables, rename, re-attach coords/dims, and re-create the dataset.

In [50]:
def fix_erddap(ds, dimension, coordinates):
    """Does not handle more than 1 dimension!"""
    new_vars, coords = {}, {}
    for variable in ds.data_vars:
        new_name = variable.split(".")[-1]
        if new_name in coordinates + [dimension]:
            if new_name not in dimension:
                # avoid repeating values due to erddap filling the time dimension
                da = ds[variable][0:1]
                #da = xr.DataArray(np.unique(ds[variable]))
            else:
                da = ds[variable]
            new_var = da.rename(new_name)
            new_var = xr.IndexVariable(new_name, new_var)
            coords.update({new_name: new_var})
        elif new_name not in coordinates + [dimension]:
            new_var = ds[variable].rename(new_name).rename({"s": dimension})
            new_vars.update({new_name: new_var})

    return xr.Dataset(new_vars, coords=coords, attrs=ds.attrs)

In [51]:
dimension = "time"
coordinates = ["station","latitude", "longitude", "z"]

new_ds_dap_series = fix_erddap(ds, dimension, coordinates)
new_ds_dap_series

Write out the resulting reduced file to disk to compare size:

In [52]:
%%time
# write to single netcdf
filename = "output/gov_usgs_cmgp_buzz_bay_265_series_out.nc"
new_ds_series.to_netcdf(filename)

CPU times: user 31.1 ms, sys: 11.5 ms, total: 42.6 ms
Wall time: 39.6 ms


## Option 2: Use ERDDAP .nc endpoint to obtain the flattened netCDF (.cf) representation

Then work directly with the downloaded netCDF file to perform the same transformations.  The netCDF file response requires us to download all the data but the workaround is simpler?

In [37]:
url = "http://erddap.sensors.ioos.us/erddap/tabledap/gov_usgs_cmgp_buzz_bay_265.nc"

!wget $url

--2020-04-13 13:51:59--  http://erddap.sensors.ioos.us/erddap/tabledap/gov_usgs_cmgp_buzz_bay_265.nc
Resolving erddap.sensors.ioos.us (erddap.sensors.ioos.us)... 216.218.248.42
Connecting to erddap.sensors.ioos.us (erddap.sensors.ioos.us)|216.218.248.42|:80... connected.
HTTP request sent, awaiting response... 200 
Length: unspecified [application/x-netcdf]
Saving to: ‘gov_usgs_cmgp_buzz_bay_265.nc.1’

gov_usgs_cmgp_buzz_     [    <=>             ]   2.91M  34.5KB/s    in 19s     

2020-04-13 13:52:18 (160 KB/s) - ‘gov_usgs_cmgp_buzz_bay_265.nc.1’ saved [3047016]



In [38]:
ds = xr.open_dataset("gov_usgs_cmgp_buzz_bay_265.nc")
ds

Here we reduce the dimensions of data variables that are expected to be non-varying within the timeseries (latitude, longitude, z, station) - should also match those specified in ERDDAP's 'cdm_timeseries_variables' attribute.  

Numpy unique() can be used to determine distinct coordinate values.


In [42]:
coords={}
for dim in ['latitude','longitude','z','station']:
    da = xr.IndexVariable(dim, np.unique(ds[dim]))
    coords.update({dim: da})
new_ds_wget = ds.assign_coords(coords)  
new_ds_wget


Swap the default 'row' dimsension with 'time'

In [44]:
new_ds_wget = new_ds_wget.swap_dims({"row": "time"})
new_ds_wget

Write out the resulting reduced file to disk to compare size:

In [46]:
%%time
# write to single netcdf
filename = "output/gov_usgs_cmgp_buzz_bay_265_nc_out.nc"
new_ds_wget.to_netcdf(filename)

CPU times: user 22.5 ms, sys: 8.66 ms, total: 31.2 ms
Wall time: 30.5 ms
