# seagliderOG1 demo

The purpose of this notebook is to demonstrate the functionality of `seagliderOG1` to convert from Seaglider basestation files to OG1 format.

- OG1 format is a newly agreed format (since June 2024) for glider data sets from various platforms (e.g., Seaglider, Slocum, Seaexplorer).  It lives on github here: (https://github.com/OceanGlidersCommunity/OG-format-user-manual).
- OG1 manual: https://oceangliderscommunity.github.io/OG-format-user-manual/OG_Format.html

The test case is to convert sg015 data from the Labrador Sea in September 2004.

The demo is organised to show

- Conversion of a single dive cycle (single `p*.nc` file)

- Conversion for a folder of local dive-cycle files (full mission of `p*.nc` files)

- Download from remote server + conversion (directory with full mission of `p*.nc` files)

Options are provided to only load e.g. 10 files, but note that OG1 format expects a full mission.


In [1]:
import pathlib
import sys

script_dir = pathlib.Path().parent.absolute()
parent_dir = script_dir.parents[0]
sys.path.append(str(parent_dir))
sys.path.append(str(parent_dir) + '/seagliderOG1')
print(parent_dir)
print(sys.path)

import xarray as xr
import os
import pooch
from seagliderOG1 import readers, writers, plotters, tools
from seagliderOG1 import convertOG1, vocabularies


/home/runner/work/seagliderOG1/seagliderOG1
['/home/runner/micromamba/envs/TEST/lib/python312.zip', '/home/runner/micromamba/envs/TEST/lib/python3.12', '/home/runner/micromamba/envs/TEST/lib/python3.12/lib-dynload', '', '/home/runner/micromamba/envs/TEST/lib/python3.12/site-packages', '/home/runner/work/seagliderOG1/seagliderOG1', '/home/runner/work/seagliderOG1/seagliderOG1/seagliderOG1']


  fill_val = 2 ** (int(re.findall("\d+", str(new_dtype))[0]) - 1) - 1


In [2]:
# Specify the path for writing datafiles
data_path = os.path.join(parent_dir, 'data')

## Reading basestation files

This has three ways to load a glider dataset.

Load an example dataset using `seagliderOG1.fetchers.load_sample_dataset`

Alternatively, use your own with e.g. `ds = xr.open_dataset('/path/to/yourfile.nc')`

### Load single sample dataset

In [3]:
ds = readers.load_sample_dataset()
ds

Downloading file 'p0330015_20100906.nc' from 'https://www.ncei.noaa.gov/data/oceans/glider/seaglider/uw/033/20100903/p0330015_20100906.nc' to '/home/runner/.cache/seagliderOG1'.


  return xr.open_dataset(file_path)
  return xr.open_dataset(file_path)
  return xr.open_dataset(file_path)
  return xr.open_dataset(file_path)
  return xr.open_dataset(file_path)


### Load datasets from a local directory

In [4]:
# Specify the input directory on your local machine
input_dir = "/Users/eddifying/Dropbox/data/sg015-ncei-download"

# Load and concatenate all datasets in the input directory
# Optionally, specify the range of profiles to load (start_profile, end_profile)
list_datasets = readers.read_basestation(input_dir, start_profile=500, end_profile=503)

# Where list_datasets is a list of xarray datasets.  A single dataset can be accessed as
ds = list_datasets[0]

AttributeError: module 'seagliderOG1.readers' has no attribute 'read_basestation'

In [5]:
ds

### Load datasets from a remote directory (URL)

In [6]:
# Specify the server where data are located
server = "https://www.ncei.noaa.gov/data/oceans/glider/seaglider/uw/015/20040924/"

# Load and concatenate all datasets from the server, optionally specifying the range of profiles to load
list_datasets = readers.read_basestation(server, start_profile=500, end_profile=503)

AttributeError: module 'seagliderOG1.readers' has no attribute 'read_basestation'

## Convert to OG1 format

Process:

1. For one basestation dataset, split the dataset by dimension (`split_ds`)
3. Transform into OG1 format: dataset with dims `sg_data_point`
    - Change the dimension to `N_MEASUREMENTS`
    - Rename variables according to `vocabularies.standard_names` 
    - Assign variable attributes according to `vocabularies.vocab_attrs`.  (Note: This *could* go wrong since it makes assumptions about the input variables. May need additional handling.)
4. Add missing mandatory variables: 
    - From `split_ds[(gps_info,)]`, add the `LATITUDE_GPS`, `LONGITUDE_GPS` and `TIME_GPS` (Note: presently `TIME_GPS` is stripped before saving, but `TIME` values contain `TIME_GPS`)
    - Create `PROFILE_NUMBER` and `PHASE`
    - Calculate `DEPTH_Z` which is positive up
5. Update attributes for the file. 
    - Combines `creator` and `contributor` from original attributes into `contributor`
    - Adds `contributing_institutions` based on `institution`
    - Reformats time in `time_coverage_*` and `start_time`--> `start_date`
    - Adds `date_modified`
    - Renames `comments`-->`history`, `site`-->`summary`
    - Adds `title`, `platform`, `platform_vocabulary`, `featureType`, `Conventions`, `rtqc_method*` according to OceanGliders format
    - Retains `naming_authority`, `institution`, `project`, `geospatial_*` as OG attributes
    - Retains extra attributes: `license`, `keywords`, `keywords_vocabulary`, `file_version`, `acknowledgement`, `date_created`, `disclaimer`

Future behaviour to be added:

6. Retain the variables starting with `sg_cal` and check whether they vary over the mission (shouldn't)
6. Add sensors, using information in the `split_ds` with no dimensions
    - Need (from sg_cal_constants: `sg_cal` plus `volmax`, `vbd_cnts_per_cc`, `therm_expan`, `t_*`, `mass`, `hd_*`, `ctcor`, `cpcor`, `c_*`, `abs_compress`, `a`, `Tcor`, `Soc`, `Pcor`, `Foffset`)
    - Maybe also `reviewed`, `magnetic_variation` (which will change with position), `log_D_FLARE`, `flight_avg_speed_north` and `flight_avg_speed_east` also with `_gsm`, `depth_avg_curr_north` and `depth_avg_curr_east` also with `_gsm`, 
    `wlbb2f` - means sensor
    `sg_cal_mission_title`
    `sg_cal_id_str`
    `calibcomm_oxygen`
    `calibcomm`
    `sbe41` means ??
    `hdm_qc`
    `glider`
    
### Convert a single (sample) dataset

In [7]:
# Loads one dataset (p0150500_20050213.nc)
ds = readers.load_sample_dataset()

ds_OG1 = convertOG1.convert_to_OG1(ds)

# Check the results - uncomment the following lines to either generate a plot or show the variables.
plotters.plot_profile_depth(ds_OG1)

  return xr.open_dataset(file_path)
  return xr.open_dataset(file_path)
  return xr.open_dataset(file_path)
  return xr.open_dataset(file_path)
  return xr.open_dataset(file_path)


Variable 'eng_depth' not in OG1 vocabulary.


Variable 'eng_aa4330_Temp' not in OG1 vocabulary.


Variable 'longitude_gsm' not in OG1 vocabulary.


Variable 'speed_gsm' not in OG1 vocabulary.


Variable 'vert_speed_gsm' not in OG1 vocabulary.


Variable 'eng_aa4330_TCPhase' not in OG1 vocabulary.


Variable 'eng_sbect_tempFreq' not in OG1 vocabulary.


Variable 'eng_sbect_condFreq' not in OG1 vocabulary.


Variable 'eng_rec' not in OG1 vocabulary.


Variable 'north_displacement_gsm' not in OG1 vocabulary.


Variable 'horz_speed_gsm' not in OG1 vocabulary.


No conversion information found for micromoles/kg to micromoles/kg


Variable 'eng_aa4330_O2' not in OG1 vocabulary.


Variable 'eng_aa4330_AirSat' not in OG1 vocabulary.


Variable 'latitude_gsm' not in OG1 vocabulary.


Variable 'aanderaa4330_instrument_dissolved_oxygen' not in OG1 vocabulary.


  dsa[orig_varname] = ([newdim], ds[orig_varname].values, ds[orig_varname].attrs)
Variable 'eng_elaps_t_0000' not in OG1 vocabulary.


Variable 'buoyancy' not in OG1 vocabulary.


Variable 'east_displacement_gsm' not in OG1 vocabulary.


Variable 'sound_velocity' not in OG1 vocabulary.


Variable 'density_insitu' not in OG1 vocabulary.


Variable 'density' not in OG1 vocabulary.


Variable 'eng_aa4330_CalPhase' not in OG1 vocabulary.


Variable 'eng_GC_phase' not in OG1 vocabulary.


No conversion information found for cm s-1 to cm s-1


Variable 'conservative_temperature' not in OG1 vocabulary.


Variable 'glide_angle_gsm' not in OG1 vocabulary.


  dsa[orig_varname] = ([newdim], ds[orig_varname].values, ds[orig_varname].attrs)
Variable 'eng_elaps_t' not in OG1 vocabulary.


Variable 'absolute_salinity' not in OG1 vocabulary.


No conversion information found for micromoles/kg to micromoles/kg


Variable 'ctd_pressure' not in OG1 vocabulary.


Variable 'depth' not in OG1 vocabulary.


Variable 'time' not in OG1 vocabulary.


sg_cal_calibcomm: SBE s/n 0112 calibration 20apr09
sg_cal_calibcomm_optode: Optode 4330F S/N 182 foil batch 2808F calibrated 09may09


  return ds.assign(divenum=('N_MEASUREMENTS', [dive_number] * ds.dims['N_MEASUREMENTS']))
  ds['dive_num_cast'] = (['N_MEASUREMENTS'], np.full(ds.dims['N_MEASUREMENTS'], np.nan))
  ds['PHASE'] = (['N_MEASUREMENTS'], np.full(ds.dims['N_MEASUREMENTS'], np.nan))
  ds['PHASE_QC'] = (['N_MEASUREMENTS'], np.zeros(ds.dims['N_MEASUREMENTS'], dtype=int))
  ds['DEPTH_Z'] = (['N_MEASUREMENTS'], np.full(ds.dims['N_MEASUREMENTS'], np.nan))


TypeError: Input data must be a pandas DataFrame or xarray Dataset

In [8]:
# Print to screen a table of the variables and variable attributes
#plotters.show_attributes(ds_OG1)
ds_OG1

(<xarray.Dataset> Size: 175kB
 Dimensions:                                            (N_MEASUREMENTS: 589)
 Coordinates:
     TIME                                               (N_MEASUREMENTS) datetime64[ns] 5kB ...
     LONGITUDE                                          (N_MEASUREMENTS) float64 5kB ...
     LATITUDE                                           (N_MEASUREMENTS) float64 5kB ...
     DEPTH                                              (N_MEASUREMENTS) float64 5kB ...
 Dimensions without coordinates: N_MEASUREMENTS
 Data variables: (12/61)
     PSAL_RAW_QC                                        (N_MEASUREMENTS) float32 2kB ...
     TEMP_QC                                            (N_MEASUREMENTS) float32 2kB ...
     TEMP_RAW_QC                                        (N_MEASUREMENTS) float32 2kB ...
     CNDC_RAW_QC                                        (N_MEASUREMENTS) float32 2kB ...
     CNDC_QC                                            (N_MEASUREMENTS) float32 2kB .

### Convert mission from a local directory of basestation files

- For local data in the directory `input_dir`
- Creates a plot of ctd_depth against ctd_time.


In [9]:
# Specify the input directory on your local machine
input_dir = "/Users/eddifying/Dropbox/data/sg015-ncei-download"

# Load and concatenate all datasets in the input directory
# Optionally, specify the range of profiles to load (start_profile, end_profile)
list_datasets = readers.read_basestation(input_dir, start_profile=500, end_profile=503)

# Convert the list of datasets to OG1
ds_OG1 = convertOG1.convert_to_OG1(list_datasets)

# Generate a simple plot
plotters.plot_profile_depth(ds_OG1)
#plotters.show_contents(ds_all,'attrs')

AttributeError: module 'seagliderOG1.readers' has no attribute 'read_basestation'

### Convert mission from the NCEI server (with p*nc files)

- Data from the sg015 mission in the Labrador Sea (https://www.ncei.noaa.gov/access/metadata/landing-page/bin/iso?id=gov.noaa.nodc:0111844), dataset identifier gov.noaa.nodc:0111844.


In [10]:
# Specify the server where data are located
server = "https://www.ncei.noaa.gov/data/oceans/glider/seaglider/uw/015/20040924/"

# Load and concatenate all datasets from the server, optionally specifying the range of profiles to load
list_datasets = readers.read_basestation(server, start_profile=500, end_profile=503)

# Convert the list of datasets to OG1
ds_OG1 = convertOG1.convert_to_OG1(list_datasets)

AttributeError: module 'seagliderOG1.readers' has no attribute 'read_basestation'

## Saving data

Due to problems with writing `xarray` datasets as netCDF when attributes are not of a specified type (`str`, `Number`, `np.ndarray`, `np.number`, `list`, `tuple`), a function was written `save_dataset`.

In [11]:
# Write the file
# This writer catches errors in data types (DType errors) when using xr.to_netcdf()
# The solution is to convert them to strings, which may be undesired behaviour
output_file = os.path.join(data_path, 'demo_test.nc')
if os.path.exists(output_file):
    os.remove(output_file) 

writers.save_dataset(ds_OG1, output_file);

AttributeError: 'tuple' object has no attribute 'to_netcdf'

In [12]:
# Load the data saved
ds1 = xr.open_dataset(output_file)

# Generate a simple plot
#plotters.show_contents(ds_all,'attrs')
plotters.plot_depth_colored(ds1, color_by='PROFILE_NUMBER')


FileNotFoundError: [Errno 2] No such file or directory: '/home/runner/work/seagliderOG1/seagliderOG1/data/demo_test.nc'

## Run multiple missions

In [13]:
# Add these to existing attributes - update to your details
contrib_to_append = vocabularies.contrib_to_append
print(contrib_to_append)

{'contributor_name': 'Eleanor Frajka-Williams', 'contributor_email': 'eleanorfrajka@gmail.com', 'contributor_role': 'Data scientist', 'contributor_role_vocabulary': 'http://vocab.nerc.ac.uk/search_nvs/W08', 'contributing_institutions': 'University of Hamburg - Institute of Oceanography', 'contributing_institutions_vocabulary': 'https://edmo.seadatanet.org/report/1156', 'contributing_institutions_role': 'Data scientist', 'contributing_institutions_role_vocabulary': 'http://vocab.nerc.ac.uk/search_nvs/W08'}


In [14]:
# Specify a list of servers or local directories
input_locations = [
    # Either Iceland, Faroes or RAPID/MOCHA
    #"https://www.ncei.noaa.gov/data/oceans/glider/seaglider/uw/005/20090829/", # done
    #"https://www.ncei.noaa.gov/data/oceans/glider/seaglider/uw/005/20080606/", # done
    #"https://www.ncei.noaa.gov/data/oceans/glider/seaglider/uw/005/20081106/", # done
    #"https://www.ncei.noaa.gov/data/oceans/glider/seaglider/uw/012/20070831/", # done
    #"https://www.ncei.noaa.gov/data/oceans/glider/seaglider/uw/014/20080214/",  # done
    #"https://www.ncei.noaa.gov/data/oceans/glider/seaglider/uw/014/20080222/", # done
    #"https://www.ncei.noaa.gov/data/oceans/glider/seaglider/uw/016/20061112/",  # done
    #"https://www.ncei.noaa.gov/data/oceans/glider/seaglider/uw/016/20090605/", # done
    #"https://www.ncei.noaa.gov/data/oceans/glider/seaglider/uw/016/20071113/", # done
    #"https://www.ncei.noaa.gov/data/oceans/glider/seaglider/uw/016/20080607/",  # done
    #"https://www.ncei.noaa.gov/data/oceans/glider/seaglider/uw/033/20100518/", # done  
    #"https://www.ncei.noaa.gov/data/oceans/glider/seaglider/uw/033/20100903/", # done
    #"https://www.ncei.noaa.gov/data/oceans/glider/seaglider/uw/101/20081108/",     # done
    #"https://www.ncei.noaa.gov/data/oceans/glider/seaglider/uw/101/20061112/",    # done
    #"https://www.ncei.noaa.gov/data/oceans/glider/seaglider/uw/101/20070609/",   # done
    #"https://www.ncei.noaa.gov/data/oceans/glider/seaglider/uw/102/20061112/",  # done
    # Labrador Sea
    #"https://www.ncei.noaa.gov/data/oceans/glider/seaglider/uw/015/20040924/",
    "https://www.ncei.noaa.gov/data/oceans/glider/seaglider/uw/014/20040924/",
    #"https://www.ncei.noaa.gov/data/oceans/glider/seaglider/uw/008/20031002/",
    #"https://www.ncei.noaa.gov/data/oceans/glider/seaglider/uw/004/20031002/",
    #"https://www.ncei.noaa.gov/data/oceans/glider/seaglider/uw/016/20050406/",
    # RAPID/MOCHA
    #"https://www.ncei.noaa.gov/data/oceans/glider/seaglider/uw/033/20100729/",
    #"https://www.ncei.noaa.gov/data/oceans/glider/seaglider/uw/034/20110128/",
]

for input_loc in input_locations:
    # Example usage
    ds_all = convertOG1.process_and_save_data(input_loc, output_dir=data_path, save=True,  run_quietly=True)

KeyError: "Requested sample dataset p0140001_20040924.nc not known. Specify one of the following available datasets: ['p0330001_20100903.nc', 'p0330002_20100903.nc', 'p0330003_20100903.nc', 'p0330004_20100904.nc', 'p0330005_20100904.nc', 'p0330006_20100904.nc', 'p0330007_20100905.nc', 'p0330008_20100905.nc', 'p0330009_20100905.nc', 'p0330010_20100905.nc', 'p0330011_20100905.nc', 'p0330012_20100905.nc', 'p0330013_20100906.nc', 'p0330014_20100906.nc', 'p0330015_20100906.nc', 'p0330016_20100906.nc', 'p0330017_20100906.nc', 'p0330018_20100906.nc', 'p0330019_20100906.nc']"