## Demo: Stage1 processing for mooring data

**Stage 1 Overview**: This is the first processing stage that converts raw instrument files into standardized CF-compliant NetCDF format. It handles multiple instrument types using the `ctd_tools` library for reading and the oceanarray framework for metadata management.

### What Stage1 Does:
- **File Conversion**: Reads raw instrument files (SeaBird, RBR, Nortek, etc.) and converts to NetCDF
- **Standardization**: Applies CF conventions for variable names, units, and metadata
- **Format Preservation**: Preserves original data values without modification or filtering
- **Metadata Enrichment**: Adds deployment information from YAML configuration files
- **Organization**: Outputs files organized by instrument type in the processed directory

### Input Files:
- Raw instrument data files (various formats: `.cnv`, `.rsk`, `.dat`, `.mat`)
- YAML configuration files specifying mooring and instrument metadata

### Output Files:
- Standardized NetCDF files: `{mooring}_{serial}_raw.nc`
- Processing log files for debugging and quality assurance

### Processing Flow:
1. Raw files → ctd_tools readers → xarray.Dataset
2. Apply CF conventions and metadata from YAML
3. Preserve all original data values unchanged
4. Save as NetCDF with standardized naming

**Note**: Stage1 focuses purely on format conversion with no data modification. All processing (filtering, clock corrections, quality control, trimming) happens in Stage2 and subsequent stages.

This notebook demonstrates the usage of the refactored `oceanarray.stage1` module.

In [None]:
import os
import sys
import matplotlib.pyplot as plt
import numpy as np
import xarray as xr
from pathlib import Path

# For testing individual readers (optional)
from seasenselib.readers import NortekAsciiReader, AdcpMatlabRdadcpReader, SbeCnvReader,RbrRskAutoReader
from seasenselib.plotters import TimeSeriesPlotter

## Configuration

Set up the base directory and mooring lists for processing.

In [None]:
# Base directory containing the mooring data
basedir = '../examples/'

# Define mooring lists
single_test = ['dsE_1_2018']

# Choose which set to process
moorlist = single_test

print(f"Base directory: {basedir}")
print(f"Processing {len(moorlist)} moorings: {moorlist}")

## Testing Individual Readers

Test reading specific instrument files directly to debug issues.

In [None]:
# Test reading a Nortek AquaDopp file directly
try:
    rawdir = Path(basedir) / 'moor/raw/msm76_2018/'
    instrument = 'aquadopp'
    data_dir = rawdir / instrument
    fname = 'DSC18_477102.dat'
    filename = data_dir / fname
    header_file = data_dir / (fname[:-4] + '.hdr')

    print(f"Data file: {filename}")
    print(f"Header file: {header_file}")
    print(f"Files exist: data={filename.exists()}, header={header_file.exists()}")

    if filename.exists() and header_file.exists():
        reader = NortekAsciiReader(str(filename), header_file_path=str(header_file))
        dataset = reader.get_data()

        print(f"\nDataset loaded successfully!")
        print(f"Variables: {list(dataset.data_vars)}")
        print(f"Time range: {dataset.time.min().values} to {dataset.time.max().values}")

        # Plot if east_velocity exists
        if 'east_velocity' in dataset.data_vars:
            plotter = TimeSeriesPlotter(dataset)
            plotter.plot(parameter_name='east_velocity')
            plt.title(f'East Velocity - {fname}')
            plt.show()

        # Display dataset info
        display(dataset)
    else:
        print("Files not found - skipping test")

except Exception as e:
    print(f"Error testing Nortek reader: {e}")

In [None]:
# Test reading an ADCP MATLAB file directly
try:
    rawdir = Path(basedir) / 'moor/raw/msm76_2018/'
    instrument = 'adcp'
    data_dir = rawdir / instrument
    fname = 'DS0218_RDI_000_24289.mat'
    filename = data_dir / fname

    print(f"ADCP file: {filename}")
    print(f"File exists: {filename.exists()}")

    if filename.exists():
        reader = AdcpMatlabRdadcpReader(str(filename))
        dataset = reader.get_data()

        print(f"\nADCP Dataset loaded successfully!")
        print(f"Variables: {list(dataset.data_vars)}")
        print(f"Time range: {dataset.time.min().values} to {dataset.time.max().values}")

        # Plot pressure if it exists
        if 'pressure' in dataset.data_vars:
            plt.figure(figsize=(12, 4))
            plt.plot(dataset.time, dataset.pressure)
            plt.title(f'Pressure - {fname}')
            plt.xlabel('Time')
            plt.ylabel('Pressure')
            plt.grid(True)
            plt.show()

        # Display dataset info
        display(dataset)
    else:
        print("File not found - skipping test")

except Exception as e:
    print(f"Error testing ADCP reader: {e}")

## Analyzing Processed Results

Load and visualize the processed NetCDF files.

In [None]:
# Test reading an microCAT file directly
try:
    rawdir = Path(basedir) / 'moor/raw/msm76_2018/'
    instrument = 'microcat'
    data_dir = rawdir / instrument
    fname = 'DSE18_SBE37SM_RS232_03707518_2018_08_26.cnv'
    filename = data_dir / fname

    print(f"Microcat file: {filename}")
    print(f"File exists: {filename.exists()}")

    if filename.exists():
        reader = SbeCnvReader(str(filename))
        dataset = reader.get_data()

        print(f"\nMicrocat Dataset loaded successfully!")
        print(f"Variables: {list(dataset.data_vars)}")
        print(f"Time range: {dataset.time.min().values} to {dataset.time.max().values}")

        # Plot pressure if it exists
        if 'pressure' in dataset.data_vars:
            plt.figure(figsize=(12, 4))
            plt.plot(dataset.time, dataset.pressure)
            plt.title(f'Pressure - {fname}')
            plt.xlabel('Time')
            plt.ylabel('Pressure')
            plt.grid(True)
            plt.show()

        # Display dataset info
        display(dataset)
    else:
        print("File not found - skipping test")

except Exception as e:
    print(f"Error testing ADCP reader: {e}")

In [None]:
# Test reading an rbrsolo file directly
try:
    rawdir = Path(basedir) / 'moor/raw/msm76_2018/'
    instrument = 'rbrsolo'
    data_dir = rawdir / instrument
    fname = 'DSE18_101647_20180827_1551.rsk'
    filename = data_dir / fname

    print(f"RBR Solo file: {filename}")
    print(f"File exists: {filename.exists()}")

    if filename.exists():
        reader = RbrRskAutoReader(str(filename))
        dataset = reader.get_data()

        print(f"\nRBR Solo Dataset loaded successfully!")
        print(f"Variables: {list(dataset.data_vars)}")
        print(f"Time range: {dataset.time.min().values} to {dataset.time.max().values}")

        # Plot temperature if it exists
        if 'temperature' in dataset.data_vars:
            plt.figure(figsize=(12, 4))
            plt.plot(dataset.time, dataset.temperature)
            plt.title(f'Temperature - {fname}')
            plt.xlabel('Time')
            plt.ylabel('Temperature')
            plt.grid(True)
            plt.show()

        # Display dataset info
        display(dataset)
    else:
        print("File not found - skipping test")

except Exception as e:
    print(f"Error testing RBR Solo reader: {e}")