In [1]:
import xarray as xr
import numpy as np
import netCDF4 as nc4
import pandas as pd
import os
import glob
import uuid

In [2]:
versions = [xr.__version__, np.__version__, nc4.__version__, pd.__version__ ]
versions

['0.15.1', '1.18.4', '1.5.3', '1.0.3']

## Read station data from station_id 100058 in Sensor Map 

Data files are contained in a subdirectory corresponding to the station_id in this repo (100058).  Resulting output file is written there at the end as well.

(SSBN7 / SUN2WAVE / SUN2W) Sunset Nearshore Wave
https://stage.admin.axds.co/#!/sensors/metadata/stations/view?stationId=100058&tab=data

```
ADCP: Current speed and dir at 0,-10m (1000360, 1000356)

device_1000360.nc
	time = 1011 ;
	z = 2 ;
device_1000356.nc
	time = 1011 ;
	z = 2 ;

ADCP: Water temp at -10m (1000361)

device_1000361.nc
	time = 1011 ;
	z = 1 ;
    
Waves at surface: wave height, wave period, wind direction (1000357, 1000359, 1000358)

device_1000357.nc
	time = 1948 ;
	z = 1 ;
device_1000358.nc
	time = 338 ;
	z = 1 ;
device_1000359.nc
	time = 1948 ;
	z = 1 ;
```


In [3]:
station_id='100058'

In [4]:
# inspect all device files
# each one currently has time, z dimensions
device_files = sorted(glob.glob(station_id + '/device*.nc'))
print(device_files)
for f in device_files:
    print('\n'+f)
    d = nc4.Dataset(f)
    print(d)

['100058/device_1000356_current_dir.nc', '100058/device_1000357_wave_period.nc', '100058/device_1000358_wind_dir.nc', '100058/device_1000359_wave_height.nc', '100058/device_1000360_current_speed.nc', '100058/device_1000361_water_temp.nc']

100058/device_1000356_current_dir.nc
<class 'netCDF4._netCDF4.Dataset'>
root group (NETCDF4 data model, file format HDF5):
    title: feed_1000312_raw
    dimensions(sizes): time(1011), z(2)
    variables(dimensions): uint8 qc_agg_1000356(time,z), uint64 qc_tests_1000356(time,z), int32 time(time), float64 value_1000356(time,z), float64 z(z)
    groups: 

100058/device_1000357_wave_period.nc
<class 'netCDF4._netCDF4.Dataset'>
root group (NETCDF4 data model, file format HDF5):
    title: feed_1000313_raw
    dimensions(sizes): time(1948), z(1)
    variables(dimensions): uint8 qc_agg_1000357(time,z), uint64 qc_tests_1000357(time,z), int32 time(time), float64 value_1000357(time,z), float64 z(z)
    groups: 

100058/device_1000358_wind_dir.nc
<class 'netC

## Create a timeSeries - multiStation file with time, station dimensions

Similar to Option 2 in [dsg_timeseries_micah_testing.ipynb notebook](https://github.com/mwengren/notebooks-dev/blob/master/netcdf_cf/dsg_timeseries_micah_testing.ipynb) in this repo in that it includes time, station dimensions to represent the timeseries and the 'instance' dimension in CF DSG:  http://cfconventions.org/cf-conventions/cf-conventions.html#discrete-sampling-geometries  

All of the instance dimension variables vary by 'station' dimension (e.g. station, latitude, longitude, z), and therefore reflect that it is a timeSeries - multistation file according to the IOOS Metadata Profile [guidelines](https://ioos.github.io/ioos-metadata/ioos-metadata-profile-v1-2.html#platform).

In [5]:
%%time
# combine
timeseries = xr.open_mfdataset(device_files, combine='by_coords', parallel=True)
timeseries = timeseries.rename_dims({"z": "station"})
timeseries = timeseries.reset_coords()
timeseries['station']=(['station'], [1,2])
timeseries['latitude']=(['station'], [33.8444]*2)
timeseries['longitude']=(['station'], [-78.4839]*2)

# add attributes:
timeseries['station'].attrs['cf_role'] = 'timeseries_id'
timeseries.attrs['cdm_data_type'] = 'TimeSeries'
timeseries.attrs['cdm_timeseries_variables'] = 'station,longitude,latitude,z'
timeseries.attrs['title'] = 'DSG TimeSeries'
timeseries


CPU times: user 208 ms, sys: 17.1 ms, total: 225 ms
Wall time: 199 ms


Unnamed: 0,Array,Chunk
Bytes,20.97 kB,20.97 kB
Shape,"(2621, 2)","(2621, 2)"
Count,10 Tasks,1 Chunks
Type,float32,numpy.ndarray
"Array Chunk Bytes 20.97 kB 20.97 kB Shape (2621, 2) (2621, 2) Count 10 Tasks 1 Chunks Type float32 numpy.ndarray",2  2621,

Unnamed: 0,Array,Chunk
Bytes,20.97 kB,20.97 kB
Shape,"(2621, 2)","(2621, 2)"
Count,10 Tasks,1 Chunks
Type,float32,numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,41.94 kB,41.94 kB
Shape,"(2621, 2)","(2621, 2)"
Count,10 Tasks,1 Chunks
Type,float64,numpy.ndarray
"Array Chunk Bytes 41.94 kB 41.94 kB Shape (2621, 2) (2621, 2) Count 10 Tasks 1 Chunks Type float64 numpy.ndarray",2  2621,

Unnamed: 0,Array,Chunk
Bytes,41.94 kB,41.94 kB
Shape,"(2621, 2)","(2621, 2)"
Count,10 Tasks,1 Chunks
Type,float64,numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,41.94 kB,41.94 kB
Shape,"(2621, 2)","(2621, 2)"
Count,10 Tasks,1 Chunks
Type,float64,numpy.ndarray
"Array Chunk Bytes 41.94 kB 41.94 kB Shape (2621, 2) (2621, 2) Count 10 Tasks 1 Chunks Type float64 numpy.ndarray",2  2621,

Unnamed: 0,Array,Chunk
Bytes,41.94 kB,41.94 kB
Shape,"(2621, 2)","(2621, 2)"
Count,10 Tasks,1 Chunks
Type,float64,numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,20.97 kB,20.97 kB
Shape,"(2621, 2)","(2621, 2)"
Count,11 Tasks,1 Chunks
Type,float32,numpy.ndarray
"Array Chunk Bytes 20.97 kB 20.97 kB Shape (2621, 2) (2621, 2) Count 11 Tasks 1 Chunks Type float32 numpy.ndarray",2  2621,

Unnamed: 0,Array,Chunk
Bytes,20.97 kB,20.97 kB
Shape,"(2621, 2)","(2621, 2)"
Count,11 Tasks,1 Chunks
Type,float32,numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,41.94 kB,41.94 kB
Shape,"(2621, 2)","(2621, 2)"
Count,11 Tasks,1 Chunks
Type,float64,numpy.ndarray
"Array Chunk Bytes 41.94 kB 41.94 kB Shape (2621, 2) (2621, 2) Count 11 Tasks 1 Chunks Type float64 numpy.ndarray",2  2621,

Unnamed: 0,Array,Chunk
Bytes,41.94 kB,41.94 kB
Shape,"(2621, 2)","(2621, 2)"
Count,11 Tasks,1 Chunks
Type,float64,numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,41.94 kB,41.94 kB
Shape,"(2621, 2)","(2621, 2)"
Count,11 Tasks,1 Chunks
Type,float64,numpy.ndarray
"Array Chunk Bytes 41.94 kB 41.94 kB Shape (2621, 2) (2621, 2) Count 11 Tasks 1 Chunks Type float64 numpy.ndarray",2  2621,

Unnamed: 0,Array,Chunk
Bytes,41.94 kB,41.94 kB
Shape,"(2621, 2)","(2621, 2)"
Count,11 Tasks,1 Chunks
Type,float64,numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,20.97 kB,20.97 kB
Shape,"(2621, 2)","(2621, 2)"
Count,11 Tasks,1 Chunks
Type,float32,numpy.ndarray
"Array Chunk Bytes 20.97 kB 20.97 kB Shape (2621, 2) (2621, 2) Count 11 Tasks 1 Chunks Type float32 numpy.ndarray",2  2621,

Unnamed: 0,Array,Chunk
Bytes,20.97 kB,20.97 kB
Shape,"(2621, 2)","(2621, 2)"
Count,11 Tasks,1 Chunks
Type,float32,numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,41.94 kB,41.94 kB
Shape,"(2621, 2)","(2621, 2)"
Count,11 Tasks,1 Chunks
Type,float64,numpy.ndarray
"Array Chunk Bytes 41.94 kB 41.94 kB Shape (2621, 2) (2621, 2) Count 11 Tasks 1 Chunks Type float64 numpy.ndarray",2  2621,

Unnamed: 0,Array,Chunk
Bytes,41.94 kB,41.94 kB
Shape,"(2621, 2)","(2621, 2)"
Count,11 Tasks,1 Chunks
Type,float64,numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,41.94 kB,41.94 kB
Shape,"(2621, 2)","(2621, 2)"
Count,11 Tasks,1 Chunks
Type,float64,numpy.ndarray
"Array Chunk Bytes 41.94 kB 41.94 kB Shape (2621, 2) (2621, 2) Count 11 Tasks 1 Chunks Type float64 numpy.ndarray",2  2621,

Unnamed: 0,Array,Chunk
Bytes,41.94 kB,41.94 kB
Shape,"(2621, 2)","(2621, 2)"
Count,11 Tasks,1 Chunks
Type,float64,numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,20.97 kB,20.97 kB
Shape,"(2621, 2)","(2621, 2)"
Count,11 Tasks,1 Chunks
Type,float32,numpy.ndarray
"Array Chunk Bytes 20.97 kB 20.97 kB Shape (2621, 2) (2621, 2) Count 11 Tasks 1 Chunks Type float32 numpy.ndarray",2  2621,

Unnamed: 0,Array,Chunk
Bytes,20.97 kB,20.97 kB
Shape,"(2621, 2)","(2621, 2)"
Count,11 Tasks,1 Chunks
Type,float32,numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,41.94 kB,41.94 kB
Shape,"(2621, 2)","(2621, 2)"
Count,11 Tasks,1 Chunks
Type,float64,numpy.ndarray
"Array Chunk Bytes 41.94 kB 41.94 kB Shape (2621, 2) (2621, 2) Count 11 Tasks 1 Chunks Type float64 numpy.ndarray",2  2621,

Unnamed: 0,Array,Chunk
Bytes,41.94 kB,41.94 kB
Shape,"(2621, 2)","(2621, 2)"
Count,11 Tasks,1 Chunks
Type,float64,numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,41.94 kB,41.94 kB
Shape,"(2621, 2)","(2621, 2)"
Count,11 Tasks,1 Chunks
Type,float64,numpy.ndarray
"Array Chunk Bytes 41.94 kB 41.94 kB Shape (2621, 2) (2621, 2) Count 11 Tasks 1 Chunks Type float64 numpy.ndarray",2  2621,

Unnamed: 0,Array,Chunk
Bytes,41.94 kB,41.94 kB
Shape,"(2621, 2)","(2621, 2)"
Count,11 Tasks,1 Chunks
Type,float64,numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,20.97 kB,20.97 kB
Shape,"(2621, 2)","(2621, 2)"
Count,10 Tasks,1 Chunks
Type,float32,numpy.ndarray
"Array Chunk Bytes 20.97 kB 20.97 kB Shape (2621, 2) (2621, 2) Count 10 Tasks 1 Chunks Type float32 numpy.ndarray",2  2621,

Unnamed: 0,Array,Chunk
Bytes,20.97 kB,20.97 kB
Shape,"(2621, 2)","(2621, 2)"
Count,10 Tasks,1 Chunks
Type,float32,numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,41.94 kB,41.94 kB
Shape,"(2621, 2)","(2621, 2)"
Count,10 Tasks,1 Chunks
Type,float64,numpy.ndarray
"Array Chunk Bytes 41.94 kB 41.94 kB Shape (2621, 2) (2621, 2) Count 10 Tasks 1 Chunks Type float64 numpy.ndarray",2  2621,

Unnamed: 0,Array,Chunk
Bytes,41.94 kB,41.94 kB
Shape,"(2621, 2)","(2621, 2)"
Count,10 Tasks,1 Chunks
Type,float64,numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,41.94 kB,41.94 kB
Shape,"(2621, 2)","(2621, 2)"
Count,10 Tasks,1 Chunks
Type,float64,numpy.ndarray
"Array Chunk Bytes 41.94 kB 41.94 kB Shape (2621, 2) (2621, 2) Count 10 Tasks 1 Chunks Type float64 numpy.ndarray",2  2621,

Unnamed: 0,Array,Chunk
Bytes,41.94 kB,41.94 kB
Shape,"(2621, 2)","(2621, 2)"
Count,10 Tasks,1 Chunks
Type,float64,numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,20.97 kB,20.97 kB
Shape,"(2621, 2)","(2621, 2)"
Count,11 Tasks,1 Chunks
Type,float32,numpy.ndarray
"Array Chunk Bytes 20.97 kB 20.97 kB Shape (2621, 2) (2621, 2) Count 11 Tasks 1 Chunks Type float32 numpy.ndarray",2  2621,

Unnamed: 0,Array,Chunk
Bytes,20.97 kB,20.97 kB
Shape,"(2621, 2)","(2621, 2)"
Count,11 Tasks,1 Chunks
Type,float32,numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,41.94 kB,41.94 kB
Shape,"(2621, 2)","(2621, 2)"
Count,11 Tasks,1 Chunks
Type,float64,numpy.ndarray
"Array Chunk Bytes 41.94 kB 41.94 kB Shape (2621, 2) (2621, 2) Count 11 Tasks 1 Chunks Type float64 numpy.ndarray",2  2621,

Unnamed: 0,Array,Chunk
Bytes,41.94 kB,41.94 kB
Shape,"(2621, 2)","(2621, 2)"
Count,11 Tasks,1 Chunks
Type,float64,numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,41.94 kB,41.94 kB
Shape,"(2621, 2)","(2621, 2)"
Count,11 Tasks,1 Chunks
Type,float64,numpy.ndarray
"Array Chunk Bytes 41.94 kB 41.94 kB Shape (2621, 2) (2621, 2) Count 11 Tasks 1 Chunks Type float64 numpy.ndarray",2  2621,

Unnamed: 0,Array,Chunk
Bytes,41.94 kB,41.94 kB
Shape,"(2621, 2)","(2621, 2)"
Count,11 Tasks,1 Chunks
Type,float64,numpy.ndarray


### Add Attribution:

Add some minimal attribution for partial IOOS Metadata Profile compliance:

In [6]:
timeseries['value_1000356'].attrs['standard_name'] = 'sea_water_velocity_to_direction'
timeseries['value_1000357'].attrs['standard_name'] = 'sea_surface_wave_significant_period'
timeseries['value_1000358'].attrs['standard_name'] = 'wind_from_direction'
timeseries['value_1000359'].attrs['standard_name'] = 'sea_surface_wave_significant_height'
timeseries['value_1000360'].attrs['standard_name'] = 'sea_water_speed'
timeseries['value_1000361'].attrs['standard_name'] = 'sea_water_temperature'

timeseries['value_1000356'].attrs['platform'] = 'station'
timeseries['value_1000357'].attrs['platform'] = 'station'
timeseries['value_1000358'].attrs['platform'] = 'station'
timeseries['value_1000359'].attrs['platform'] = 'station'
timeseries['value_1000360'].attrs['platform'] = 'station'
timeseries['value_1000361'].attrs['platform'] = 'station'

timeseries['station'].attrs['cf_role'] = 'timeseries_id'
timeseries['value_1000361']


Unnamed: 0,Array,Chunk
Bytes,41.94 kB,41.94 kB
Shape,"(2621, 2)","(2621, 2)"
Count,11 Tasks,1 Chunks
Type,float64,numpy.ndarray
"Array Chunk Bytes 41.94 kB 41.94 kB Shape (2621, 2) (2621, 2) Count 11 Tasks 1 Chunks Type float64 numpy.ndarray",2  2621,

Unnamed: 0,Array,Chunk
Bytes,41.94 kB,41.94 kB
Shape,"(2621, 2)","(2621, 2)"
Count,11 Tasks,1 Chunks
Type,float64,numpy.ndarray


### Test some values - waves (z=0 only):

Wave measurements are taken only at the surface, so the z=-10 slice (station=1) will have no data

In [7]:
# z=-10: this is an empty slice:
timeseries.value_1000359.loc['2018-06-30T08:00:00':'2018-06-30T12:00:00',1].compute()

The z=0 slice (station=2) will have wave obs values:

In [8]:
# z=0: this has data:
timeseries.value_1000359.loc['2018-06-30T08:00:00':'2018-06-30T12:00:00',2].compute()

### Test some values - current speed (both z values):

The z=-10 slice (station=1) will have current speed measurements:

In [9]:
# z=-10:
timeseries.value_1000360.loc['2018-06-30T08:00:00':'2018-06-30T12:00:00',1].compute()

The z=0 slice (station=2) will also have current speed measurements:

In [10]:
# z=0:
timeseries.value_1000360.loc['2018-06-30T08:00:00':'2018-06-30T12:00:00',2].compute()

### Write out netcdf:
This will at least work for ERDDAP testing for timeSeries

In [11]:
%%time

encoding={
    'latitude': {'dtype': 'float32', '_FillValue': -9999.9},
    'longitude': {'dtype': 'float32', '_FillValue': -9999.9},
    'z': {'dtype': 'float32', '_FillValue': -9999.9},
    'time': {'dtype': 'int32', '_FillValue': -9999},
    'station': {'dtype': 'int16', '_FillValue': -9999}
}

# write to single netcdf
#timeseries_filename = f"{station_id}/station_{station_id}_timeseries_{uuid.uuid4().hex}.nc"
timeseries_filename = f"{station_id}/station_{station_id}_timeseries_multistation.nc"
print(timeseries_filename)
timeseries.to_netcdf(timeseries_filename, encoding=encoding)

100058/station_100058_timeseries_multistation.nc
CPU times: user 165 ms, sys: 36.4 ms, total: 201 ms
Wall time: 199 ms


## To Do:

Because each variable isn't measured at each depth (or 'station') - wind, waves, water temp, etc - there are large gaps in the resulting netCDF file for empty timeSeries.  This is probably unavoidable, unless perhaps using one of the ragged array DSG types: http://cfconventions.org/cf-conventions/cf-conventions.html#representations-features.

This file uses an Orthogonal Multidimensonal Array representation, where the time variable varies only by the time dimension - time(time), but with padded time dimension where the size is the combination of all time steps in the source sensor (device_*_.nc) files.  This means the time dimension is longer than it probably needs to be.

Another option would be to use an Incomplete Multidimensional Array representation, where time variable varies by both station and time - time(station, time), and the time dimension is only as long as the longest timeseries in the source sensor (device_*_.nc) files.  Would save some storage space but it wasn't obvious how to create this in the xarray code in the initial file creation step.


