# SonarAI Technical Notebook

# Notebook Overview

In this notebook, we aim to investigate potential correlations between environmental variables derived from NCAR datasets—such as **Sea Surface Temperature** and **Mean Evaporation Rate**(tbd) — and the **abundance of fish observed within the water column**, as recorded by sonar.

The **sonar dataset** was collected by **NOAA Fisheries** during a research cruise conducted aboard the vessel *H.B. Bigelow* between **October and November 2019**. To facilitate analysis, we integrate this dataset with two NCAR environmental datasets (*[tbd]* and *[tbd]*), aligning them both **spatially** and **temporally**.

This integration enables us to examine statistical relationships between physical oceanographic conditions and fish distribution patterns.

## Notebook Structure

1. **Imports**
2. **Data Loading**
2. **Data Preprocessing**
3. **Data Visualization**


---

## Prerequisites
This section was inspired by [this template](https://github.com/alan-turing-institute/the-turing-way/blob/master/book/templates/chapter-template/chapter-landing-page.md) of the wonderful [The Turing Way](https://the-turing-way.netlify.app) Jupyter Book.

Following your overview, tell your reader what concepts, packages, or other background information they'll **need** before learning your material. Tie this explicitly with links to other pages here in Foundations or to relevant external resources. Remove this body text, then populate the Markdown table, denoted in this cell with `|` vertical brackets, below, and fill out the information following. In this table, lay out prerequisite concepts by explicitly linking to other Foundations material or external resources, or describe generally helpful concepts.

Label the importance of each concept explicitly as **helpful/necessary**.

| Concepts | Importance | Notes |
| --- | --- | --- |
| [Intro to Cartopy](https://foundations.projectpythia.org/core/cartopy/cartopy) | Necessary | |
| [Understanding of NetCDF](https://foundations.projectpythia.org/core/data-formats/netcdf-cf) | Helpful | Familiarity with metadata structure |
| Project management | Helpful | |

- **Time to learn**: estimate in minutes. For a rough idea, use 5 mins per subsection, 10 if longer; add these up for a total. Safer to round up and overestimate.
- **System requirements**:
    - Populate with any system, version, or non-Python software requirements if necessary
    - Otherwise use the concepts table above and the Imports section below to describe required packages as necessary
    - If no extra requirements, remove the **System requirements** point altogether

---

## Imports
Begin your body of content with another `---` divider before continuing into this section, then remove this body text and populate the following code cell with all necessary Python imports **up-front**:

In [1]:
import xarray as xr
import s3fs
import numpy as np
import plotly.graph_objects as go
from plotly.subplots import make_subplots
import pandas as pd
import requests
import io
import os

## Initializing the datasets

All datasets are accessed using the OSDF infrastructure

In [2]:
bucket_name = 'noaa-wcsd-zarr-pds'
ship_name = "Henry_B._Bigelow"
cruise_name = "HB1906"
sensor_name = "EK60"

# Accessing the NOAA HB1906 dataset using OSDF
s3_file_system = s3fs.S3FileSystem(anon=True)
zarr_store = f'{cruise_name}.zarr'
s3_zarr_store_path = f"{bucket_name}/level_2/{ship_name}/{cruise_name}/{sensor_name}/{zarr_store}"
store = s3fs.S3Map(root=s3_zarr_store_path, s3=s3_file_system, check=False)
cruise = xr.open_zarr(store=store, consolidated=None)
start_time = "2019-10-16T15:00:00"
end_time = "2019-10-16T23:30:00"
timeslice = slice(start_time, end_time)
depths=slice(10, 250)
cruise = cruise.sel(time=timeslice, depth=depths, drop=False)
cruise = cruise.sel(frequency=38000, method='nearest').compute()
cruise = cruise.where(cruise.depth < cruise.bottom, drop=True)
hm_timestamps = cruise.time.values.tolist()

# location of one specific buoy located on Georges Bank
target_lon = 360 - 66.546
target_lat = 41.088
print(f"Target coordinates: Longitude: {target_lon}, Latitude: {target_lat}")
# Accessing stationary buoy data from a specific buoy located on Georges Bank, sampled daily
bouy_data_day_before = xr.open_dataset(
    'https://data.rda.ucar.edu/d277007/avhrr_v2.1/2019/oisst-avhrr-v02r01.20191015.nc#mode=bytes', engine='netcdf4')
buoy_data_actual_day = xr.open_dataset(
    'https://data.rda.ucar.edu/d277007/avhrr_v2.1/2019/oisst-avhrr-v02r01.20191016.nc#mode=bytes',
    engine='netcdf4')
buoy_data_day_after = xr.open_dataset(
    'https://data.rda.ucar.edu/d277007/avhrr_v2.1/2019/oisst-avhrr-v02r01.20191017.nc#mode=bytes',
    engine='netcdf4')

sst_day_before = bouy_data_day_before['sst'].sel(lon=target_lon, lat=target_lat, method='nearest').values[0][0]
sst_actual_day = buoy_data_actual_day['sst'].sel(lon=target_lon, lat=target_lat, method='nearest').values[0][0]
sst_day_after = buoy_data_day_after['sst'].sel(lon=target_lon, lat=target_lat, method='nearest').values[0][0]
# Accessing a second dataset from NCAR, gridded satellite data, sampled hourly
url = 'https://data-osdf.rda.ucar.edu/ncar/rda/d633000/kerchunk/meanflux/Mean_evaporation_rate-osdf.json'
ds = xr.open_dataset(url, engine='kerchunk')

response = requests.get('https://data-osdf.rda.ucar.edu/ncar/rda/pythia_2025/osdf-cookbook/mae_error_map.npy')
response.raise_for_status()
error_map = np.load(io.BytesIO(response.content))
# return sst_day_before, sst_actual_day, sst_day_after, cruise

Target coordinates: Longitude: 293.454, Latitude: 41.088


---

## Preprocessing

In [3]:
mer = ds.sel(longitude=target_lon, latitude=target_lat, method='nearest')
# print(subset.forecast_initial_time[:5].values)
mer = mer.sel(forecast_initial_time=slice(start_time, end_time))
mer = mer.MER.head(9).values.tolist()[0]

In [4]:
def calculate_sv_mean(input_sv):
    sv = 10. ** (input_sv / 10.)
    return 10 * np.log10(np.mean(sv))

In [5]:
cruise['time_hour'] = cruise['time'].dt.floor('1h')

# Step 2: Group by each hour
grouped = cruise.groupby('time_hour')

# Step 3: Extract each 1-hour Dataset as a chunk
chunks = [group.drop_vars('time_hour') for _, group in grouped]

In [6]:
sv_hourly = []
timestamps = []
mers = []

for i in range(0,len(chunks)):
    sv_data = chunks[i]['Sv']
    result = calculate_sv_mean(sv_data)

    ts = pd.to_datetime(chunks[i]['time'].values[0])
    result = result.compute()
    result = float(result.values)

    sv_hourly.append([ts, result])
    mers.append([ts, mer[i]])


## Visualization

Decription

In [7]:
def plot_synchronized_heatmaps(
        heatmap_timestamps: np.ndarray,
        anomalies: np.ndarray = None,
        correlated_variables: list = None,
        depths: np.ndarray = None,
        colorscale: str = 'Reds',
):

    # --- Create Subplots ---
    print("anomalies shape:", anomalies.shape)
    print("len(heatmap_timestamps):", len(heatmap_timestamps))
    print("len(depths):", len(depths))
    heatmap_timestamps = pd.to_datetime(heatmap_timestamps)
    # print(len(heatmap_timestamps))
    # print(heatmap_timestamps[:50])
    fig = make_subplots(
        rows=2, cols=1,
        shared_xaxes=True,
        # shared_yaxes=True,
        vertical_spacing=0.03
    )


    # cb_len1, cb_y1 = 0.28, 0.86
    cb_len, cb_y = 0.28, 0.14
    # print(correlated_variables[0])
    # Unpack sparse data
    timestamps_0, values_0 = zip(*correlated_variables[0])
    timestamps_1, values_1 = zip(*correlated_variables[1])

    # Add Scatter plots
    fig.add_trace(
        go.Scatter(
            x=timestamps_0,
            y=values_0,
            name="Sv (1h log-mean)",
            mode='lines+markers'
        ),
        row=1, col=1
    )

    fig.add_trace(
        go.Scatter(
            x=timestamps_1,
            y=values_1,
            name="Mean Evaporation Rate (MAR)",
            mode='lines+markers'
        ),
        row=1, col=1
    )

    # --- Add Heatmap ---
    anomalies = anomalies[:,:,1]
    fig.add_trace(
        go.Heatmap(
            z=anomalies.astype(int),
            colorscale=colorscale,
            zmin=anomalies.min(),
            zmax=anomalies.max(),
            colorbar_len=cb_len,
            colorbar_y=cb_y,
            # name="Correlations",
            showscale=True,
            x=heatmap_timestamps,
            y=depths,
        ),
        row=2, col=1
    )
    fig.update_yaxes(autorange='reversed', row=2, col=1)
    # --- Save ---

    try:
        save_dir = os.path.dirname(os.getcwd())
        if save_dir and not os.path.exists(save_dir):
            os.makedirs(save_dir)
        fig.write_html(save_dir + '/out.html')
        print("Plot saved")
        fig.show()
    except Exception as e:
        print(f"Error saving plot: {e}")

plot_synchronized_heatmaps(heatmap_timestamps=hm_timestamps, anomalies=error_map, correlated_variables=[sv_hourly, mers], depths=cruise.depth.values)

anomalies shape: (1088, 28096, 4)
len(heatmap_timestamps): 29211
len(depths): 1094


KeyboardInterrupt: 