# Demo: Convert single basestation *.nc file

This notebook converts sg015 data from the Labrador Sea in September 2024 into OG1 format.

- OG1 format is a newly agreed format (since June 2024) for glider data sets from various platforms (e.g., Seaglider, Slocum, Seaexplorer).  It lives on github here: (https://github.com/OceanGlidersCommunity/OG-format-user-manual).
- OG1 manual: https://oceangliderscommunity.github.io/OG-format-user-manual/OG_Format.html

Process:

1. For one basestation dataset, split the dataset by dimension (`split_ds`)
3. Transform into OG1 format: dataset with dims `sg_data_point`
    - Change the dimension to `N_MEASUREMENTS`
    - Rename variables according to `vocabularies.standard_names` 
    - Assign variable attributes according to `vocabularies.vocab_attrs`.  (Note: This *could* go wrong since it makes assumptions about the input variables. May need additional handling.)
4. Add missing mandatory variables: 
    - From `split_ds[(gps_info,)]`, add the `LATITUDE_GPS`, `LONGITUDE_GPS` and `TIME_GPS` (Note: presently `TIME_GPS` is stripped before saving, but `TIME` values contain `TIME_GPS`)
    - Create `PROFILE_NUMBER` and `PHASE`
    - Calculate `DEPTH_Z` which is positive up
5. Update attributes for the file. 
    - Combines `creator` and `contributor` from original attributes into `contributor`
    - Adds `contributing_institutions` based on `institution`
    - Reformats time in `time_coverage_*` and `start_time`--> `start_date`
    - Adds `date_modified`
    - Renames `comments`-->`history`, `site`-->`summary`
    - Adds `title`, `platform`, `platform_vocabulary`, `featureType`, `Conventions`, `rtqc_method*` according to OceanGliders format
    - Retains `naming_authority`, `institution`, `project`, `geospatial_*` as OG attributes
    - Retains extra attributes: `license`, `keywords`, `keywords_vocabulary`, `file_version`, `acknowledgement`, `date_created`, `disclaimer`

Future behaviour to be added:

6. Retain the variables starting with `sg_cal` and check whether they vary over the mission (shouldn't)
6. Add sensors, using information in the `split_ds` with no dimensions
    - Need (from sg_cal_constants: `sg_cal` plus `volmax`, `vbd_cnts_per_cc`, `therm_expan`, `t_*`, `mass`, `hd_*`, `ctcor`, `cpcor`, `c_*`, `abs_compress`, `a`, `Tcor`, `Soc`, `Pcor`, `Foffset`)
    - Maybe also `reviewed`, `magnetic_variation` (which will change with position), `log_D_FLARE`, `flight_avg_speed_north` and `flight_avg_speed_east` also with `_gsm`, `depth_avg_curr_north` and `depth_avg_curr_east` also with `_gsm`, 
    `wlbb2f` - means sensor
    `sg_cal_mission_title`
    `sg_cal_id_str`
    `calibcomm_oxygen`
    `calibcomm`
    `sbe41` means ??
    `hdm_qc`
    `glider`

In [1]:
import sys
import importlib
sys.path.append('/Users/eddifying/Cloudfree/gitlab-cloudfree/seagliderOG1')
import warnings
warnings.simplefilter("ignore", category=Warning)

In [2]:
import matplotlib.pyplot as plt
import yaml
from seagliderOG1 import plotters
from seagliderOG1 import readers
from seagliderOG1 import convertOG1
from seagliderOG1 import vocabularies
import xarray as xr
import os
import warnings
warnings.simplefilter("ignore", category=Warning)


In [3]:
# Add these to existing attributes
contrib_to_append = {
    'contributor_name': 'Eleanor Frajka-Williams',
    'contributor_email': 'eleanorfrajka@gmail.com',
    'contributor_role': 'Data scientist',
    'contributor_role_vocabulary': 'http://vocab.nerc.ac.uk/search_nvs/W08',
    'contributing_institutions': 'University of Hamburg - Institute of Oceanography',
    'contributing_institutions_vocabulary': 'https://edmo.seadatanet.org/report/1156',
    'contributing_institutions_role': 'Data scientist',
    'contributing_institutions_role_vocabulary': 'http://vocab.nerc.ac.uk/search_nvs/W08',
}

## Load Seaglider data in basestation format

Test case build on a file which was written in 2013 by basestation v2.8 into nodc format template v0.9.

In [4]:
# Specify the server where data are located
if 0:
    input_loc = "https://www.ncei.noaa.gov/data/oceans/glider/seaglider/uw/015/20040924/"
else:
    input_loc = '/Users/eddifying/Dropbox/data/sg015-ncei-download/'

# Load and concatenate all datasets from the server, optionally specifying the range of profiles to load
start_dive = 1
end_dive = 10

list_datasets = readers.read_basestation(input_loc, 1, 10)

# Pick one basestation xarray dataset to work with
ds1 = list_datasets[0]

plotters.show_attributes(ds1)


information is based on xarray Dataset


Unnamed: 0,Attribute,Value,DType
0,quality_control_version,1.1,float32
1,base_station_micro_version,3705,int32
2,time_coverage_resolution,PT1S,str
3,geospatial_vertical_max,184.065441,float64
4,sea_name,Labrador Sea,str
5,mission,1,int32
6,geospatial_lat_units,degrees,str
7,geospatial_lon_units,degrees,str
8,references,"Frajka-Williams, E.F. and P.B.Rhines 2009: Phy...",str
9,seaglider_software_version,66.050003,float32


In [29]:
importlib.reload(convertOG1)
# Convert the dataset and output also variables not included
ds_new, attr_warnings, sg_cal, dc_other, dc_log = convertOG1.process_dataset(ds1)

# Create the attributes in order
ordered_attributes = convertOG1.update_dataset_attributes(ds1, contrib_to_append)

for key, value in ordered_attributes.items():
    ds_new.attrs[key] = value

# Construct the platform serial number
PLATFORM_SERIAL_NUMBER = 'sg' + ds_new.attrs['id'][1:4]
print(PLATFORM_SERIAL_NUMBER)
ds_new['PLATFORM_SERIAL_NUMBER'] = PLATFORM_SERIAL_NUMBER
ds_new['PLATFORM_SERIAL_NUMBER'].attrs['long_name'] = "glider serial number"

# Construct the unique identifier attribute
id = f"{PLATFORM_SERIAL_NUMBER}_{ds_new.start_date}_delayed"
ds_new.attrs['id'] = id
print(ds_new.attrs['id'])


sg015
sg015_20040924T180206_delayed


## Save the final dataset

At the moment, since the plotters.show_attributes() and plotters.show_variables() are designed to work with netcdf files, I write the xarray dataset to netcdf before loading it and checking the attributes and variable names.

In [31]:
plotters.show_attributes(ds_new)
## Save the dataset to a NetCDF file
# Check if the file exists and delete it if it does

output_file = os.path.join('../data', ds_new.attrs['id'] + '.nc')
if os.path.exists(output_file):
    os.remove(output_file)

# Save the dataset to a NetCDF file
convertOG1.save_dataset(ds_new, output_file)


information is based on xarray Dataset
TypeError Invalid value for attr 'dtype': dtype('float64'). For serialization to netCDF files, its value must be of one of the following types: str, Number, ndarray, number, list, tuple


True