# Generate sample NetCDF file following OG-1.0
J. Sevadjian, Nov 21, 2022  

## Goals
* Use Spray data to test making OG-1.0 NetCDF files.  
* Initial tests will be to test the NetCDF format. 
* There are currently questions about how the proposed CDL works with CF aware software like Panoply and in ERDDAP.


## Summary
* Generated a test NetCDF file using the latest OG-1.0 CDL example template. 
* The .nc file did not work as I would have hoped in Panoply or ERDDAP. 
* Made some adjustments based on CF guidance and the NCEI NetCDF templates (needed Trajectory dimension, moved Parameter dimension to variables) and have generated a modified CDL that works better in Panoply and ERDDAP. 
* Params changes are open for discussion. The OG-1.0 intent of the dimension and variable was not clear.
* This repo contains the modified .cdl and sample .nc file 
* It also contains a python notebook that anyone can use to reproduce the .nc file, tweak the format or try on another dataset in the DAC ERDDAP


## References
- [Latest draft CDL (instrument scalar_v2)](
https://github.com/OceanGlidersCommunity/OG-format-user-manual/blob/main/sp041_20191205T1757-instrument-scalar_v2.cdl)
- [Latest draft OG-1.0 Format User Manual](
https://github.com/OceanGlidersCommunity/OG-format-user-manual/blob/main/OG_Format.adoc)  
- [CF 1.8 conventions](https://cfconventions.org/Data/cf-conventions/cf-conventions-1.8/cf-conventions.html#trajectory-data)  
- [NCEI NetCDF 2.0 templates](https://www.ncei.noaa.gov/data/oceans/ncei/formats/netcdf/v2.0/index.html) 

## Notes
* The format user manual and the cdl define different CF conventions. I started writing this in CF 1.7 (per the user manual?) but the latest example CDL is CF 1.8.

## Imports


In [51]:
import numpy as np
import pandas as pd
import xarray as xr
import scipy 
import urllib3
import certifi
from erddapy import ERDDAP
import netCDF4
from netCDF4 import Dataset
from netCDF4 import stringtochar

## Load input data

In [52]:
# Use erddappy to fetch data for one spray glider trajectory
# Reference: https://ioos.github.io/erddapy/00-quick_intro-output.html

e = ERDDAP(server="https://gliders.ioos.us/erddap")
e.constraints = None
e.protocol = "tabledap"
e.dataset_id = "sp011-20221014T1612"

# Load glider data into xarray
ds = e.to_xarray(decode_times=False)
ds

## Prep the data
- The Spray data retrieved from ERDDAP is insufficient for the new OG-1.0 format
- Massage it to make it work for a sample file
- This section is specific to Spray data adjustments

In [53]:
modified_lons = []; modified_lats = []; modified_times = [];

for lon in ds["precise_lon"].data:
    if not np.isnan(lon):
        modified_lons.append(lon)
        last_real_value = lon
    else:
        modified_lons.append(last_real_value)

for lat in ds["precise_lat"].data:
    if not np.isnan(lat):
        modified_lats.append(lat)
        last_real_value = lat
    else:
        modified_lats.append(last_real_value)
        
# Precise time has some same values and Nans? Fixing
for idx, time in enumerate(ds["precise_time"].data):
    if not np.isnan(time):
        # Check for increasing...
        if idx>1 and time > modified_times[idx-1]:
            modified_times.append(time)
        else:
            # Resolve existing same time oddity
            modified_times.append(time+1)       
    else:
        # Insert fake time in place of nan
        modified_times.append(modified_times[idx-1] + 1)

test_list=modified_times
res = all(i < j for i, j in zip(test_list, test_list[1:]))  

# Finished with Spray input data tweaks 

## Write the NetCDF

Adjustments:
- Added Trajectory dimension (CF 1.8 requirement)
- Removed N_PARAMS dimension (??)
    - Not clear about intention of params dim/var. Is this one variable that is a list of parameters? 
    - What is the use case?
    - If this is related to instrument metadata, I've added instrument variables following a NCEI NetCDF 2.0 template to show one reasonable way to include instrument info. outside of the standard variable attributes.
    - Is this for machine-machine support or for humans?

The NetCDF output file:
1) Conforms to [CF 1.8 conventions](https://cfconventions.org/Data/cf-conventions/cf-conventions-1.8/cf-conventions.html#trajectory-data)  
2) Follows [NCEI NetCDF 2.0 templates](https://www.ncei.noaa.gov/data/oceans/ncei/formats/netcdf/v2.0/index.html) where relevant  
3) Loads data variables into Panoply as GeoTraj  
4) Loads into ERDDAP well as EDDTableFromMutlidimNcFiles


In [59]:
# Output Filename
path = 'og-netcdf-1.nc'

# Write the file
with netCDF4.Dataset(path, mode='w', format='NETCDF4') as ncout:

    # GLOBAL ATTRIBUTES
    
    # Start by populating with global attributes from erddap request
    for v in ds.attrs:
        if ds.attrs[v] is not None and v[:4] != "cdm_":
            setattr(ncout, v, ds.attrs[v])
    
    # Then reset these global attributes
    ncout.featureType = "trajectory";
    ncout.cdm_data_type = "Trajectory";
    ncout.Conventions = "Unidata Dataset Discovery v1.0, COARDS, CF-1.8";
    
    # SETUP DIMENSIONS
    
    dims = {}
    
    # N_PARAM DIM 
    # Not clear what N_PARAM is meant to be. Leaving it out.
    # Is there are technical use case for the N_PARAM dimension? 
    # With it, I don't think the format is to CF spec and is perhaps overly complicated
    # I'm putting that same information in variable attributes for this example, 
    # This brings it to spec and simplifies it without losing the information.
    # Per NCEI templates: the instrument/sensor details should be specified as variables.
    # https://www.ncei.noaa.gov/data/oceans/ncei/formats/netcdf/v2.0/index.html
    #dims['N_PARAMS'] = ncout.createDimension('N_PARAMS', 3)
    
    # N_MEASUREMENTS DIM
    # Equiv. to 'obs' in CF docs.
    # Using the size of one data variable to specify the obs/measurements dimension
    dims['N_MEASUREMENTS'] = ncout.createDimension('N_MEASUREMENTS', ds.salinity.size)
    
    # TRAJECTORY DIM
    # CF docs require both a trajectory dimension and variable. I am adding the dimension.
    # There are options for specifying the trajectory dimension. 
    # The simplest is the number of trajectories in the file.
    my_traj_name = 'trajectory'
    dims['trajectory'] = ncout.createDimension('trajectory', len(my_traj_name))
    
    # CREATE VARIABLES
    
    variables = {}
    
    # Create Params Variable? 
    #my_param_list = ['sst','chla','doxy','salinity']
    #print(len(my_param_list))
    
    # Create Trajectory Variable 
    # Adding because required by CF
    # Does not read into ERDDAP without it because ERDDAP is expecting conformance to the CF trajectory format.
    trajstrlen = 'S'+str(len(my_traj_name))
    str_out = netCDF4.stringtochar(np.array(['trajectory_name'], trajstrlen))
    variables['trajectory'] = ncout.createVariable(
        my_traj_name, 'S1', 'trajectory')
    variables['trajectory'][:] = str_out
    variables['trajectory'].cf_role = "trajectory_id"
    
    # CREATE COORDINATE VARIABLES (time, lat, lon, depth)
  
    # Time
    variables['TIME'] = ncout.createVariable(
        'TIME',
        ds['precise_time'].dtype,
        'N_MEASUREMENTS')
    variables['TIME'][:] = modified_times
    variables['TIME'].axis = "T"
    variables['TIME'].units = "seconds since 1970-01-01 00:00:00 UTC";
    variables['TIME'].calendar = "julian"
    variables['TIME'].standard_name = "time"
    variables['TIME'].long_name = "Time"
    
    # Latitude
    variables['LATITUDE'] = ncout.createVariable(
        'LATITUDE',
        ds['precise_lat'].dtype,
        'N_MEASUREMENTS')
    variables['LATITUDE'][:] = modified_lats
    variables['LATITUDE'].axis = "Y"
    variables['LATITUDE'].units = "degrees_north"
    variables['LATITUDE'].long_name = "Latitude"
    variables['LATITUDE'].standard_name = "latitude"
    
    # Longitude
    variables['LONGITUDE'] = ncout.createVariable(
        'LONGITUDE',
        ds['precise_lon'].dtype,
        'N_MEASUREMENTS')
    variables['LONGITUDE'][:] = modified_lons
    variables['LONGITUDE'].axis = "X"
    variables['LONGITUDE'].units = "degrees_east"
    variables['LONGITUDE'].long_name = "Longitude"
    variables['LONGITUDE'].standard_name = "longitude"
    
    # Depth
    variables['DEPTH'] = ncout.createVariable(
        'DEPTH',
        ds['depth'].dtype,
        'N_MEASUREMENTS')
    variables['DEPTH'][:] = ds['depth'].data
    variables['DEPTH'].axis = "Z"
    variables['DEPTH'].long_name = "Depth"
    variables['DEPTH'].standard_name = "depth"
    variables['DEPTH'].units = "m"
    variables['DEPTH'].positive = "down"
    
    # CREATE DATA VARS
    # Selecting these variables for this quick example
    # chlorophyll_a, density, dissolved_oxygen, dissolved_oxygen_qc
    
    # Chlorophyll Variable
    v = ds["chlorophyll_a"]
    variables['CHLA'] = ncout.createVariable('CHLA', v.dtype, 'N_MEASUREMENTS', fill_value=np.nan)
    variables['CHLA'][:]= v.data
    variables['CHLA'].coordinates = "TIME LATITUDE LONGITUDE DEPTH"
    variables['CHLA'].standard_name = v.standard_name
    variables['CHLA'].long_name = v.long_name
    variables['CHLA'].units = v.units
    variables['CHLA'].vocabulary = "https://vocab.nerc.ac.uk/collection/OG1/current/";
    variables['CHLA'].ancillary_variables = "CHLA_INSTRUMENT"
    
    # Density Variable
    v = ds["density"]
    variables['DENSITY'] = ncout.createVariable('DENSITY', v.dtype, 'N_MEASUREMENTS', fill_value=np.nan)
    variables['DENSITY'][:]= v.data
    variables['DENSITY'].coordinates = "TIME LATITUDE LONGITUDE DEPTH"
    variables['DENSITY'].standard_name = v.standard_name
    variables['DENSITY'].long_name = v.long_name
    variables['DENSITY'].units = v.units
    variables['DENSITY'].vocabulary = "https://vocab.nerc.ac.uk/collection/OG1/current/";
    
    # Dissolved Oxygen Variable
    v = ds["dissolved_oxygen"]
    variables['DOXY'] = ncout.createVariable('DOXY', v.dtype, 'N_MEASUREMENTS', fill_value=np.nan)
    variables['DOXY'][:]= v.data
    variables['DOXY'].coordinates = "TIME LATITUDE LONGITUDE DEPTH"
    variables['DOXY'].standard_name = v.standard_name
    variables['DOXY'].long_name = v.long_name
    variables['DOXY'].units = v.units
    variables['DOXY'].vocabulary = "https://vocab.nerc.ac.uk/collection/OG1/current/";
    variables['DOXY'].ancillary_variables = "DOXY_QC DOXY_INSTRUMENT"
    
    # Dissolved Oxygen QC Variable
    v = ds["dissolved_oxygen_qc"]
    variables['DOXY_QC'] = ncout.createVariable('DOXY_QC', v.dtype, 'N_MEASUREMENTS')
    variables['DOXY_QC'][:]= v.data
    variables['DOXY_QC'].coordinates = "TIME LATITUDE LONGITUDE DEPTH"
    variables['DOXY_QC'].long_name = v.long_name
    
     
    # For sensor information you could use variables OR use attributes in geophysical vars
    # How to choose?
    # If there will be cases where a variable in a trajectory file will have more then one instrument OR
    # if these files will be aggregated and there is a potential for information loss upon aggregation
    # Otherwise attributes in the geopysical variables are sufficient and simplifies file structure
    
    # Demo Create Sensor/Instrument Variables
    # Containing the desired info that would have been in a param dim
    # https://www.ncei.noaa.gov/data/oceans/ncei/formats/netcdf/v2.0/index.html
    # Naming convention could be geophysical parameter name + Instrument
    # Can have multiple of the same instrument type
    # Associate instruments to data with ancillary variables on the geophysical var.
    
    variables['CHLA_INSTRUMENT'] = ncout.createVariable('CHLA_INSTRUMENT', 'i4', 'trajectory')
    variables['CHLA_INSTRUMENT'][:]= 1
    variables['CHLA_INSTRUMENT'].vocabulary = "https://docs.google.com/document/d/1dN90xkw9oCbLs0sPPhOmszdOjLpwcqxiK5mjeZP7abA/edit";
    variables['CHLA_INSTRUMENT'].make_model = "ECO_FL"
    variables['CHLA_INSTRUMENT'].long_name = v.long_name
    # Optional Attributes:
    # serial_number, calibration_date, factory_calibrated, user_calibrated, calibration_report, accuracy, valid_range, and precision
    
    variables['DOXY_INSTRUMENT'] = ncout.createVariable('DOXY_INSTRUMENT', 'i4', 'trajectory')
    variables['DOXY_INSTRUMENT'][:]= 1
    variables['DOXY_INSTRUMENT'].vocabulary = "https://docs.google.com/document/d/1dN90xkw9oCbLs0sPPhOmszdOjLpwcqxiK5mjeZP7abA/edit";
    variables['DOXY_INSTRUMENT'].make_model = "SEABIRD_SBE43F_IDO"
    variables['DOXY_INSTRUMENT'].long_name = 'Dissolved Oxygen Instrument'
    # Optional Attributes:
    # serial_number, calibration_date, factory_calibrated, user_calibrated, calibration_report, accuracy, valid_range, and precision
