An efficient xarray backend for reading Dinkum Binary Data (DBD) files from Slocum ocean gliders. Slocum gliders are autonomous underwater vehicles widely used in oceanography to collect temperature, salinity, and other water-column measurements along sawtooth profiles.
This package provides native xarray support for DBD files, allowing you to read glider data directly into xarray Datasets without intermediate NetCDF conversion. The C++ binary parser (via pybind11) matches the performance of the original dbd2netCDF tool.
- Native xarray integration: Read DBD files directly with
xarray.open_dataset() - High performance: Efficient binary parsing matching dbd2netCDF performance
- Multiple file support: Easily concatenate multiple DBD files
- Flexible filtering: Select specific sensors and missions
- Automatic repair: Optional corrupted data recovery
- Full metadata: Preserves sensor units and file attributes
Requires Python 3.10+
pip install xarray-dbdFor the CLI tools only:
pipx install xarray-dbd # installs dbd2nc and mkone commandsOr install from source (requires a C++ compiler and CMake):
git clone https://github.com/mousebrains/dbd2netcdf-python
cd dbd2netcdf-python
pip install -e .import xarray as xr
import xarray_dbd as xdbd
# Method 1: Using xarray's open_dataset with engine parameter
ds = xr.open_dataset('test.sbd', engine='dbd')
# Method 2: Using convenience function
ds = xdbd.open_dbd_dataset('test.sbd')
# Access data
print(ds)
print(ds['m_present_time'])
print(ds['m_depth'])import xarray_dbd as xdbd
from pathlib import Path
# Get all sbd files
files = sorted(Path('.').glob('*.sbd'))
# Read and concatenate
ds = xdbd.open_multi_dbd_dataset(files)
print(f"Total records: {len(ds.i)}")
print(f"Variables: {list(ds.data_vars)}")# Only keep specific sensors
ds = xdbd.open_dbd_dataset(
'test.sbd',
to_keep=['m_present_time', 'm_depth', 'm_lat', 'm_lon']
)# Skip certain missions
ds = xdbd.open_multi_dbd_dataset(
files,
skip_missions=['initial.mi', 'status.mi']
)
# Or keep only specific missions
ds = xdbd.open_multi_dbd_dataset(
files,
keep_missions=['mission1.mi', 'mission2.mi']
)ds = xdbd.open_dbd_dataset(
'test.sbd',
skip_first_record=True, # Skip first record (default)
repair=True, # Attempt to repair corrupted data
to_keep=['m_*'], # Keep sensors matching pattern (future feature)
criteria=['m_present_time'], # Sensors for record selection
)DBD (Dinkum Binary Data) files are the native format used by Slocum ocean gliders. The format consists of:
- ASCII Header: Mission metadata and configuration
- Sensor List: Definitions of all sensors with names, units, and data types
- Known Bytes: Endianness detection section
- Compressed Data: Efficiently encoded sensor readings using:
- Run-length encoding for unchanged values
- Variable-length records with 2-bit codes per sensor
- Support for 1, 2, 4, and 8-byte sensor values
See docs/performance.md for benchmarks, memory analysis, and methodology.
Open a single DBD file as an xarray Dataset.
Parameters:
filename(str or Path): Path to DBD fileskip_first_record(bool): Skip first data record (default: True)repair(bool): Attempt to repair corrupted records (default: False)to_keep(list of str): Sensor names to keep (default: all)criteria(list of str): Sensor names for selection criteriadrop_variables(list of str): Variables to exclude
Returns: xarray.Dataset
Open multiple DBD files as a single concatenated xarray Dataset.
Parameters:
filenames(iterable): Paths to DBD filesskip_first_record(bool): Skip first record in each file (default: True)repair(bool): Attempt to repair corrupted records (default: False)to_keep(list of str): Sensor names to keep (default: all)criteria(list of str): Sensor names for selection criteriaskip_missions(list of str): Mission names to skipkeep_missions(list of str): Mission names to keep
Returns: xarray.Dataset
The dbdreader2 API is derived from Lucas Merckelbach's
dbdreader library. xarray-dbd
provides drop-in DBD and MultiDBD classes that mirror the dbdreader API.
For a fully transparent swap, alias the import:
# Before (dbdreader)
import dbdreader
dbd = dbdreader.DBD("file.dcd", cacheDir="cache")
t, depth = dbd.get("m_depth")
mdbd = dbdreader.MultiDBD(filenames=files, cacheDir="cache")
t, temp, sal = mdbd.get_sync("sci_water_temp", "sci_water_cond")
# After (xarray-dbd) — same API
import xarray_dbd.dbdreader2 as dbdreader # drop-in replacement
dbd = dbdreader.DBD("file.dcd", cacheDir="cache")
t, depth = dbd.get("m_depth")
mdbd = dbdreader.MultiDBD(filenames=files, cacheDir="cache")
t, temp, sal = mdbd.get_sync("sci_water_temp", "sci_water_cond")The top-level xarray_dbd namespace also re-exports DBD and MultiDBD
for convenience:
import xarray_dbd as xdbd
dbd = xdbd.DBD("file.dcd", cacheDir="cache")| Feature | xarray-dbd dbdreader2 | dbdreader |
|---|---|---|
get(*params) |
Yes | Yes |
get_sync(*params) |
Yes (np.interp) |
Yes (C ext) |
parameterNames |
Yes | Yes |
parameterUnits |
Yes | Yes |
has_parameter() |
Yes | Yes |
get_xy(), get_CTD_sync() |
Yes | Yes |
decimalLatLon |
Yes | Yes |
set_time_limits() |
Yes | Yes |
include_source |
Yes | Yes |
Single file — get() one or more parameters:
import xarray_dbd.dbdreader2 as dbdreader
dbd = dbdreader.DBD("unit_123-2024-100-0-0.dcd", cacheDir="cache")
# Single parameter → (time, values)
t, depth = dbd.get("m_depth")
# Multiple parameters → list of (time, values) tuples
results = dbd.get("m_depth", "m_pitch", "m_roll")
for t, v in results:
print(t.shape, v.shape)
dbd.close()Synchronized reads — get_sync() and get_xy():
# get_sync: all values interpolated onto the first parameter's time base
t, depth, pitch = dbd.get_sync("m_depth", "m_pitch")
# get_xy: y interpolated onto x's time base (returns x, y arrays)
depth_vals, pitch_vals = dbd.get_xy("m_depth", "m_pitch")Multi-file — MultiDBD:
# Explicit file list
mdbd = dbdreader.MultiDBD(filenames=["a.dcd", "b.dcd"], cacheDir="cache")
# Or glob pattern
mdbd = dbdreader.MultiDBD(pattern="/data/glider/*.dcd", cacheDir="cache")
t, depth = mdbd.get("m_depth")
print(f"{len(t)} records across {len(mdbd.filenames)} files")
mdbd.close()CTD synchronization — get_CTD_sync():
mdbd = dbdreader.MultiDBD(filenames=ebd_files, cacheDir="cache")
tctd, C, T, P = mdbd.get_CTD_sync()
# Or with extra parameters synced to the CTD time base:
tctd, C, T, P, depth = mdbd.get_CTD_sync("m_depth")Time limits:
mdbd = dbdreader.MultiDBD(pattern="*.dcd", cacheDir="cache")
print(mdbd.get_time_range()) # ['01 Jan 2024 00:00', '15 Jan 2024 23:59']
mdbd.set_time_limits("5 Jan", "10 Jan") # filter by file open time
t, depth = mdbd.get("m_depth") # only data from 5–10 JanMission filtering:
# Exclude specific missions
mdbd = dbdreader.MultiDBD(
pattern="*.dcd", cacheDir="cache",
banned_missions=["initial.mi", "status.mi"],
)
# Or include only specific missions
mdbd = dbdreader.MultiDBD(
pattern="*.dcd", cacheDir="cache",
missions=["science_survey.mi"],
)
print(mdbd.mission_list) # unique mission names (sorted)Complement files — automatic eng/sci pairing:
# Pair each .dcd with its .ecd counterpart (or vice versa)
mdbd = dbdreader.MultiDBD(
pattern="/data/*.dcd", cacheDir="cache",
complement_files=True,
)
# mdbd.parameterNames["eng"] + mdbd.parameterNames["sci"] both populated-
Lazy incremental loading. Construction only scans file headers and sensor metadata — no data records are read. Each
get()call loads only the newly-requested columns (plus the time variable) and caches them for future calls. This keeps peak RSS proportional to the sensors you actually use, not the total sensor count. Passpreload=["s1", "s2"]to batch additional sensors into the firstget()call. -
skip_initial_linesemantics. When reading multiple files, the first contributing file keeps all its records; subsequent files skip their first record. dbdreader skips the first record of every file. Multi-file record counts may therefore differ by up to N-1. -
Float64 output.
get()always returns float64 arrays, matching dbdreader's behavior. Integer fill values (-127 for int8, -32768 for int16) are filtered out (withreturn_nans=False) or replaced with NaN (withreturn_nans=True). -
Time limits are per-file.
set_time_limits()filters by file open time, including or excluding entire files. It does not filter individual records within a file. dbdreader also filters by file open time, so this is operationally the same for most use cases. -
Error handling. The same
DbdErrorexception class and numeric error codes (DBD_ERROR_CACHE_NOT_FOUND, etc.) are provided for compatibility.
DBD(filename, cacheDir=None, skip_initial_line=True, preload=None)| Property | Type | Description |
|---|---|---|
parameterNames |
list[str] |
Available sensor names |
parameterUnits |
dict[str, str] |
{sensor: unit} mapping |
timeVariable |
str |
"m_present_time" or "sci_m_present_time" |
filename |
str |
Path to the opened file |
headerInfo |
dict |
Header key-value pairs |
| Method | Returns | Description |
|---|---|---|
get(*params, decimalLatLon=True, return_nans=False) |
(t, v) or [(t, v), ...] |
Extract parameter data |
get_sync(*params) |
(t, v0, v1, ...) |
Interpolated to first param's time |
get_xy(param_x, param_y) |
(x, y) |
y interpolated onto x's time |
has_parameter(name) |
bool |
Check sensor availability |
get_mission_name() |
str |
Mission name (lowercase) |
get_fileopen_time() |
float |
File open time (epoch seconds) |
close() |
— | Release stored data |
MultiDBD(
filenames=None, pattern=None, cacheDir=None,
complement_files=False, complemented_files_only=False,
banned_missions=(), missions=(),
max_files=None, skip_initial_line=True, preload=None,
)| Property | Type | Description |
|---|---|---|
parameterNames |
dict[str, list] |
{"eng": [...], "sci": [...]} |
parameterUnits |
dict[str, str] |
Union of eng + sci units |
filenames |
list[str] |
All loaded file paths |
mission_list |
list[str] |
Unique mission names (sorted) |
time_limits_dataset |
tuple |
(min_time, max_time) for full dataset |
| Method | Returns | Description |
|---|---|---|
get(*params, decimalLatLon=True, return_nans=False) |
(t, v) or [(t, v), ...] |
Extract from combined eng+sci data |
get_sync(*params, interpolating_function_factory=None) |
(t, v0, v1, ...) |
Synced to first param's time |
get_xy(x, y, interpolating_function_factory=None) |
(x, y) |
y interpolated onto x's time |
get_CTD_sync(*extra, interpolating_function_factory=None) |
(t, C, T, P, ...) |
CTD-synced data with quality filters |
has_parameter(name) |
bool |
Check sensor availability |
set_time_limits(minTimeUTC=None, maxTimeUTC=None) |
— | Filter by file open time; triggers reload |
get_time_range(fmt=...) |
[start, end] |
Formatted time range of current selection |
get_global_time_range(fmt=...) |
[start, end] |
Formatted time range of entire dataset |
close() |
— | Release all data |
| Feature | xarray-dbd | dbd2netCDF |
|---|---|---|
| Language | C++ via pybind11 | C++ |
| xarray integration | Native | Via NetCDF |
| Installation | pip install |
Compile from source |
| Dependencies | numpy, xarray | NetCDF, HDF5 libraries |
| Performance | Comparable | Fast |
| Multi-file | Built-in | Manual |
See examples/Examples.md for standalone scripts with plots and detailed documentation.
import xarray_dbd as xdbd
ds = xdbd.open_dbd_dataset('test.sbd')
# Print dataset info
print(ds)
# Get data dimensions
print(f"Number of records: {len(ds.i)}")
# List all variables
print("Variables:", list(ds.data_vars))
# Access sensor data
depth = ds['m_depth'].values
time = ds['m_present_time'].values
# Get attributes
print(f"Mission: {ds.attrs['mission_name']}")
print(f"Depth units: {ds['m_depth'].attrs['units']}")import xarray_dbd as xdbd
import matplotlib.pyplot as plt
# Read flight data
files = sorted(Path('.').glob('*.sbd'))
ds = xdbd.open_multi_dbd_dataset(files)
# Plot depth vs time
plt.figure(figsize=(12, 4))
plt.plot(ds['m_present_time'], ds['m_depth'])
plt.gca().invert_yaxis()
plt.xlabel('Time')
plt.ylabel('Depth (m)')
plt.title(f"Mission: {ds.attrs.get('mission_name', 'Unknown')}")
plt.show()# Read full resolution science data
files = sorted(Path('.').glob('*.ebd'))
ds = xdbd.open_multi_dbd_dataset(
files,
to_keep=['m_present_time', 'sci_water_temp', 'sci_water_cond']
)
# Convert to pandas for analysis
df = ds.to_dataframe()
print(df.describe())- Python 3.10+ required — uses
from __future__ import annotationsfor modern type-hint syntax. - Free-threaded Python (3.13t) — pybind11 extensions may crash under the no-GIL build; this is an upstream pybind11 limitation.
- Timestamps are raw floats —
m_present_timevalues are Unix epoch seconds (float64). Convert withpandas.to_datetime(ds['m_present_time'], unit='s'). - No lazy loading for xarray API —
open_dataset()reads all sensor data into memory. For very large deployments, useto_keepto select only needed sensors. The dbdreader2 API (DBD/MultiDBD) uses lazy incremental loading.
| Problem | Fix |
|---|---|
ImportError: _dbd_cpp |
Reinstall with pip install -e . — the C++ extension needs compiling. |
RuntimeError: ... cache ... |
Pass cache_dir= pointing to the directory containing .cac/.ccc files. |
| Empty dataset (0 records) | Check that the file isn't a header-only stub (0 data records). |
OSError: Failed to read |
The file may be truncpted or use an unsupported format version. Try repair=True. |
pip install -e ".[dev]"
pytestruff format xarray_dbd/
ruff check xarray_dbd/See CONTRIBUTING.md for full development setup instructions.
This project is based on dbd2netCDF by Pat Welch and is licensed under the GNU General Public License v3.0.
- Original dbd2netCDF implementation: Pat Welch (pat@mousebrains.com)
- DBD format documentation: The Slocum glider community
- xarray backend interface: xarray developers
Contributions are welcome! Please feel free to submit issues or pull requests.
If you use this software in your research, please cite both this package and the original dbd2netCDF tool.