## Option 1: Use ERDDAP DAP endpoint to obtain a DAP 'Series' output

Transform and reduce via xarray the 'series' response to a multi-dimensional netCDF file

In [73]:
import xarray as xr


ds = xr.open_dataset("http://erddap.sensors.ioos.us/erddap/tabledap/gov_usgs_cmgp_buzz_bay_265")
ds

`s` is the dimension and all variables are appended an `s.`.

In [75]:
ds["s.backscatter_intensity_2651_a"]

### We need to split the coordinates and variables, rename, re-attach coords/dims, and re-create the dataset.

In [76]:
def fix_erddap(ds, dimension, coordinates):
    """Does not handle more than 1 dimension!"""
    new_vars, coords = {}, {}
    for variable in ds.data_vars:
        print(variable)
        new_name = variable.split(".")[-1]
        if new_name in coordinates + [dimension]:
            if new_name not in dimension:
                # avoid repeating values due to erddap filling the time dimension
                da = ds[variable][0:1]
            else:
                da = ds[variable]
            new_var = da.rename(new_name)
            new_var = xr.IndexVariable(new_name, new_var)
            coords.update({new_name: new_var})
        elif new_name not in coordinates + [dimension]:
            new_var = ds[variable].rename(new_name).rename({"s": dimension})
            new_vars.update({new_name: new_var})

    return xr.Dataset(new_vars, coords=coords, attrs=ds.attrs)

In [77]:
dimension = "time"
coordinates = ["station","latitude", "longitude", "z"]

new_ds_series = fix_erddap(ds, dimension, coordinates)
new_ds_series

s.time
s.latitude
s.longitude
s.z
s.backscatter_intensity_2651_a
s.sea_water_velocity_to_direction_2651ds_a
s.sea_water_speed_2651ds_a
s.eastward_sea_water_velocity_2651ds_a
s.northward_sea_water_velocity_2651ds_a
s.sea_water_pressure_cm_time__standard_deviation_2651_a
s.sea_water_pressure_2651_a
s.sea_water_temperature_2651_a
s.station


Write out the resulting reduced file to disk to compare size:

In [78]:
%%time
# write to single netcdf
filename = "output/gov_usgs_cmgp_buzz_bay_265_series_out.nc"
new_ds_series.to_netcdf(filename)

CPU times: user 27.2 ms, sys: 4.31 ms, total: 31.6 ms
Wall time: 29.8 ms


## Option 2: Use ERDDAP .nc endpoint to obtain the flattened netCDF (.cf) representation

Then work directly with the downloaded netCDF file to perform the same transformations.  The netCDF file response requires us to download all the data but the workaround is simpler?

In [30]:
url = "http://erddap.sensors.ioos.us/erddap/tabledap/gov_usgs_cmgp_buzz_bay_265.nc"

!wget $url

--2020-04-13 10:29:14--  http://erddap.sensors.ioos.us/erddap/tabledap/gov_usgs_cmgp_buzz_bay_265.nc
Resolving erddap.sensors.ioos.us (erddap.sensors.ioos.us)... 216.218.248.43
Connecting to erddap.sensors.ioos.us (erddap.sensors.ioos.us)|216.218.248.43|:80... connected.
HTTP request sent, awaiting response... 200 
Length: unspecified [application/x-netcdf]
Saving to: ‘gov_usgs_cmgp_buzz_bay_265.nc’

gov_usgs_cmgp_buzz_     [      <=>           ]   2.91M   361KB/s    in 8.3s    

2020-04-13 10:29:24 (359 KB/s) - ‘gov_usgs_cmgp_buzz_bay_265.nc’ saved [3047016]



In [58]:
ds = xr.open_dataset("gov_usgs_cmgp_buzz_bay_265.nc")
ds

In [70]:
#ds.expand_dims('latitude')

# we reduce the known 1 node dimensions:
coords={}
for dim in ['latitude','longitude','z','station']:
    da = xr.IndexVariable(dim, ds[dim][0:1])
    coords.update({dim: da})
new_ds_wget = ds.assign_coords(coords)  
new_ds_wget


In [71]:
new_ds_wget = new_ds_wget.swap_dims({"row": "time"})
new_ds_wget

Write out the resulting reduced file to disk to compare size:

In [72]:
%%time
# write to single netcdf
filename = "output/gov_usgs_cmgp_buzz_bay_265_nc_out.nc"
new_ds_wget.to_netcdf(filename)

CPU times: user 29.9 ms, sys: 8.49 ms, total: 38.3 ms
Wall time: 35.8 ms
