# Parsing data from the Sonardyne FETCH AZA

The purpose of this notebook is to demonstrate the functionality of `fetchAZA` python package.

The demo is organised to show

- Step 1: Reading the *.csv files into xarray datasets

- Step 2: Writing the xarray datasets into individual netCDF files

- Step 3: Various plots


Note that when you submit a pull request, you should `clear all outputs` from your python notebook for a cleaner merge.


In [1]:
import pathlib
import sys

script_dir = pathlib.Path().parent.absolute()
parent_dir = script_dir.parents[0]
sys.path.append(str(parent_dir))

import xarray as xr
import os
import numpy as np
import matplotlib.pyplot as plt
import importlib
import datetime
from fetchAZA import convertAZA, readers, writers, plotters, tools, timetools, utilities
import warnings
import re
import glob
import logging
_log = logging.getLogger(__name__)

# Specify the path for writing datafiles
data_path = os.path.join(parent_dir, 'data')
fig_path = os.path.join(parent_dir, 'figures')

warnings.filterwarnings("ignore", message="In a future version of xarray decode_timedelta will default to False rather than None. To silence this warning, set decode_timedelta to True, False, or a 'CFTimedeltaCoder' instance.")
warnings.filterwarnings("ignore", category=xr.SerializationWarning, message="SerializationWarning: Can't decode floating point timedelta to 's' without precision loss, decoding to 'ns' instead. To silence this warning use time_unit='ns' in call to decoding function.")


## Step 1 & 2 as convertAZA.convertAZA



In [2]:
fn = 'sample_data.csv'
STN = 'sample'
deploy_date = '2023-02-27'
recovery_date = '2023-03-08T08:00:00'
latitude = 26.5
longitude = -76.75
water_depth = -3800

ds_pressure, ds_AZA = convertAZA.convertAZA(data_path, fn, STN, deploy_date, recovery_date, latitude, longitude, water_depth, True)

/Users/eddifying/Cloudfree/gitlab-cloudfree/fetchAZA/data/sample_data*.nc
/Users/eddifying/Cloudfree/gitlab-cloudfree/fetchAZA/data/sample_data_KLR.nc
/Users/eddifying/Cloudfree/gitlab-cloudfree/fetchAZA/data/sample_data_AZAseq.nc
/Users/eddifying/Cloudfree/gitlab-cloudfree/fetchAZA/data/sample_data_INC.nc
/Users/eddifying/Cloudfree/gitlab-cloudfree/fetchAZA/data/sample_data_TMP.nc
/Users/eddifying/Cloudfree/gitlab-cloudfree/fetchAZA/data/sample_data_DQZ.nc
/Users/eddifying/Cloudfree/gitlab-cloudfree/fetchAZA/data/sample_data_PIES.nc
Dataset AZAseq not included in combined datasets
Deleting file: /Users/eddifying/Cloudfree/gitlab-cloudfree/fetchAZA/data/sample_data_KLR.nc
Deleting file: /Users/eddifying/Cloudfree/gitlab-cloudfree/fetchAZA/data/sample_data_TMP.nc
Deleting file: /Users/eddifying/Cloudfree/gitlab-cloudfree/fetchAZA/data/sample_data_DQZ.nc
Deleting file: /Users/eddifying/Cloudfree/gitlab-cloudfree/fetchAZA/data/sample_data_PIES.nc


  ds.to_netcdf(output_file)


## Step 1: Read the *csv file for Logging Events.  

This is done with readers.process_csv_to_xarray().  All logging events are read into individual xarray datasets, stored as a dictionary of datasets where the key.  In addition, the AZA sequence (events following the pattern AZS-AZA-AZA-AZA-AZS) are read into an additional dataset with key 'AZAseq'.  Since this dataset does not contain every individual AZA or AZS event, it does not replace the individual datasets.

A log of the processing is also generated.

Optionally, the deployment and recovery dates can be passed.  If they are, then the datasets will be sliced to these dates.

In [None]:
fn = 'sample_data.csv'
STN = 'sample'
deploy_date = '2023-02-27'
recovery_date = '2023-03-08T08:00:00'

# Process filename
file_path = os.path.join(data_path, fn)
file_root = fn.split('.')[0]
platform_id = file_root
today = datetime.datetime.now()
start_time = today.strftime("%Y%m%dT%H")

# Create a log file
log_file = os.path.join(data_path, f"{platform_id}_{start_time}_read.log")
logf_with_path = os.path.join(data_path, log_file)
logging.basicConfig(
    filename=logf_with_path, 
    encoding='utf-8',
    format="%(asctime)s %(levelname)-8s %(funcName)s %(message)s",
    filemode="w", # 'w' to overwrite, 'a' to append
    level=logging.INFO,
    datefmt="%Y%m%dT%H%M%S",
    force=True,
    )
_log.info('Reading AZA from CSV to netCDF')
_log.info('Processing data from: %s', file_path)


# Process the CSV file and create xarray datasets containing the data
datasets = readers.read_csv_to_xarray(file_path)


## Step 2: Write the data to netCDF

In [None]:
writers.save_datasets(datasets, file_path)

## Step 3: Further processing of pressure records and AZA sequence records

Note that in the steps above, the original data were not changed, with the exception of changes noted in the log file.

This means that each of the newly created *.nc mirrors--almost exactly--the original data.

Here we carry out additional steps including:

1. Load netCDF datasets based on provided keys. (`readers.load_netcdf_datasets(data_path, file_root, keys)`)
2. Convert units and adjust time formats. (`timetools.convert_seconds_to_float(ds)`)
3. Assign sampling time for the AZA sequence dataset. (`timetools.assign_sample_time()`)
4. Filter datasets to the deployment period. (`timetools.cut_to_deployment(datasets, deploy_date, recovery_date)`)
5. Reindex datasets on time. (`timetools.reindex_on_time(ds)`)
6. Rename variables in datasets using predefined mappings. (using `vars_to_rename`, a dict)
7. Add dataset-specific attributes to variables.
8. Combine selected datasets into a single dataset. (using `xr.merge()`)
9. Interpolate the combined dataset to an evenly spaced time grid. (after determining median interval of hourly with `timetools.calculate_sample_rate(ds)`)
10. Clean and organize dataset attributes and variables.
11. Process the AZA sequence dataset, including renaming attributes and cleaning variables.


In [None]:
ds_pressure, ds_AZA = tools.process_datasets(data_path, file_root, deploy_date, recovery_date)
ds_pressure

### Save the data

In [None]:
# Save the datasets
output_file = os.path.join(data_path, f"{STN}_{deploy_date.replace("-","")}_use.nc")
writers.save_dataset(ds_pressure, output_file)

output_file = os.path.join(data_path, f"{STN}_{deploy_date.replace('-','')}_AZA.nc")
writers.save_dataset(ds_AZA, output_file)


### Plot variables in ds_pressure

In [None]:
# Example usage
plotters.plot_temperature_variables(ds_pressure, ['TEMPERATURE', 'PRESSURE'],"sample")

print("From the analysis, we determine that PRESSURE_DQZ and PRESSURE_PIES are identical.")


# Diagnostics & basic statistics

## 1. Plot variables which are against time

In [None]:
# Call the function to plot all variables against RECORD_TIME
#fig, axs = plotters.plot_all_variables_against_time(ds_pressure, time_var='RECORD_TIME')
fig,axs = plotters.plot_all_variables(ds_pressure, 'sample')

## 2. Histograms

In [None]:
# Example usage
plotters.plot_histograms(ds_pressure, "sample")

## 3. Compare pressure

### Plot the pressure from the ambient (KLR) and transfer (DQZ)

In [None]:
# Example usage
fig, axs = plotters.compare_pressure(ds_pressure, ['PRESSURE_DQZ', 'PRESSURE_KLR'], "sample")


### Plot the pressure during an AZA sequence (transfer, ambient and low)

In [None]:
variables_to_compare=['TRANSFER_PRESSURE','AMBIENT_PRESSURE','LOW_PRESSURE']

# Example usage
fig, axs = plotters.plot_AZA_pressure(ds_AZA, variables_to_compare)


In [None]:
# Plot all pressure variables in ds_AZA on the same axes with symbols and lines
plt.figure(figsize=(8, 4))
pressure_vars = ['TRANSFER_PRESSURE', 'AMBIENT_PRESSURE', 'LOW_PRESSURE']
time_var = 'SAMPLE_TIME'
# Define marker styles to cycle through
markers = ['o', 'v', 's', 'D', '^']

for idx, var in enumerate(pressure_vars):
    plt.plot(
        ds_AZA[time_var], 
        ds_AZA[var], 
        label=var, 
        marker=markers[idx % len(markers)], 
        linestyle='-', 
        markerfacecolor='none'
    )

plt.title('Pressure Variables in AZA')
plt.xlabel('Time')
plt.ylabel('Pressure (kPa)')
plt.legend()
plt.grid(True)
plt.gca().invert_yaxis()  # Invert the y-axis
plt.tight_layout()
plt.show()
