This repository holds code for a UW MSDS capstone project that analyzes ambient underwater noise levels in historical Orcasound hydrophone data. A hydrophone is an underwater microphone that can be used to monitor ocean noise levels. In the critical habitat of endangered Southern Resident killer whales, the predominant souces of anthropogenic noise pollution are commercial ships, and secondarily recreational boats.
This open source project has four main components:
- The pipeline that converts historical
.tsfiles into compact Power Spectral Density (PSD) grids saved as parquet files with the option to save with partitioning.- Additionally, the pipeline converts ship tracking data from Marine Monitor (M2) at the Orcasound lab from zip files to parquet files.
- The partitioned_accessor that reads the partitioned parquet files stored on S3.
- The ship metrics that calculates sound metrics for ship passages and generates polars dataframes.
- The most recent dashboard version that displays key results using Taipy. The live dashboard is visible here.
The hydrophone and ship tracking data is currently being automatically processed using scheduled Github Actions in the orca-action-workflow.
2023 MSDS project guides to recreate the AWS environments used to process the hydrophone data can be found in the aws_batch directory.
This tutorial is designed to provide a step-by-step guide for utilizing this repository.
In the desired location, perform the following to clone a copy of the repository onto your machine.
git clone https://github.com/orcasound/ambient-sound-analysis.git
If starting from a new virtual environment, ensure that ffmpeg is included in the creation, and that the Python version is 3.11.
cd ambient-sound-analysis
conda env create -f environment.yml
conda activate orca_env
Sometimes ffmpeg can have path issues, and result in the following error when pulling the data:
UnboundLocalError: local variable 'clip_start_time' referenced before assignment
For more information, see: https://stackoverflow.com/questions/65836756/python-ffmpeg-wont-accept-path-why
To install the requirements for this repository, run:
python -m pip install .
You can also install directly from GitHub:
python -m pip install orcasound_noise@git+https://github.com/orcasound/ambient-sound-analysis
This repository includes regression tests that compare pipeline outputs against golden fixtures stored as
.pkl files under tests/golden/.
- What is
.pkl?: A.pklfile is a Python pickle (serialized object) file. In this repo we use it to storepandas.DataFrameobjects (PSD and broadband outputs) with their indexes and dtypes preserved. - Why use it?:
- Fast to read/write in tests
- Preserves
DataFramestructure (timestamps, frequency columns, dtypes) without extra schema handling - Compact compared to many text formats
- Tradeoffs:
- Python-specific and not human-readable
- Not ideal for diffs in code review
- Security: only unpickle files you trust (pickle can execute code during load)
To regenerate the golden fixtures locally (requires ffmpeg):
python -m tests.generate_test_data
import pandas as pd
from orcasound_noise.pipeline.pipeline import NoiseAnalysisPipeline
from orcasound_noise.utils import Hydrophone
import datetime as dtTo access the hydrophone data, we set up a NoiseAnalysisPiepline object to pull from the S3 buckets. At minimum, the pipeline object requires specifying a hydrophone, a sample length, and a frequency.
The following are the available hydrophones:
- Port Townsend: PORT_TOWNSEND
- Bush Point: BUSH_POINT
- Sunsent Bay: SUNSET_BAY
- Orcasound Lab: ORCASOUND_LAB
In the following example, we collect 60-second samples at 1 Hz frequency from the Port Townsend hydrophone. It is recommended not to exceed 10 minute samples.
Note: The following code needs to be wrapped in main.
#Example 1: Port Townsend, 1 Hz Frequency, 60-second samples
if __name__ == '__main__':
pipeline = NoiseAnalysisPipeline(Hydrophone.PORT_TOWNSEND,
delta_f=1, bands=None,
delta_t=60, mode='safe')Here we can also specify the octave bands. Additionally, the pipeline may produce wav and parquet files in temporary folders. If desired, specify a folder path to save these files into.
#Example 2: Port Townsend, 1 Hz Frequency, 60-second samples,
# 1/3rd octave bands, and saving wav+pqt files
if __name__ == '__main__':
pipeline2 = NoiseAnalysisPipeline(Hydrophone.PORT_TOWNSEND,
delta_f=1, bands=3,
delta_t=60, wav_folder = 'wav',
pqt_folder = 'pqt'
mode='safe')Using the generate_parquet_file function, we can process the raw data from the S3 source and save the resulting PSDs in parquet files.
This function requires a date range in the form of a datetime object, and can use down to the minute granularity (in UTC).
There are two parquet files generated, the PSD and the broadband view, and the function returns the paths to each.
The generate_parquet_file function calls upon generate_psds, which loads the .ts files from S3 and converts them to the desired PSDs.
#Example: Using pipeline object specified above, we generate the parquet files for 12pm - 1pm UTC
psd_path, broadband_path = pipeline.generate_parquet_file(dt.datetime(2023, 3, 22, 12),
dt.datetime(2023, 3, 22, 13),
upload_to_s3=False)Now that the data has been processed, it can be visualized into spectrograms and time series plots.
import pandas as pd
import matplotlib.pyplot as plt
from orcasound_noise.pipeline.acoustic_util import plot_spec, plot_bbTo read the parquet files generated in the previous section, we use the read_parquet() method from pandas.
psd_df = pd.read_parquet(psd_path)
bb_df = pd.read_parquet(broadband_path)With the pandas data frames obtained from the parquet files, we can now visualize the PSD in a spectrogram, using the plot_spec method.
plot_spec(psd_df)This will create a spectrogram in plotly on your local machine.
To visualize the broadband dataframe, we plot it with the plot_bb method:
plot_bb(bb_df)
Download the repo, then
pip install -r requirements.txt
python taipy_visualization/main.py
The dashboard will open at localhost.
The Taipy dashboard has two main pages. Note: the hosted version is expected to expire at the end of March 2026; after that, the dashboard can still be run locally.
The main page provides an interactive view of the acoustic environment at the Orcasound Lab hydrophone:
- Ship & Whale Detection Timeline — A Gantt-style chart showing ship passages and OrcaHello whale detections over a user-selected date range. Includes an acoustic masking estimate indicating the percentage of time whale communication may have been interfered with by vessel noise.
- PSD Spectrogram — A Power Spectral Density spectrogram (1–16 kHz) for a selected date and hour, with an optional ship passage overlay. This visualization separates low-frequency vessel noise from the mid-to-high frequency bands used by Southern Resident killer whales (SRKWs).
- Broadband Sound Levels — Time-series plots of broadband noise across three frequency bands:
- Full Range — total integrated sound energy
- SRKW Communication Band (1–6 kHz) — the primary vocalization range for killer whale pulsed calls
- Ship Band (1–500 Hz) — the frequency range dominated by commercial vessel noise
- Ship Passage Details — When a ship passage is selected, a detail panel shows general tracking information (vessel type, speed, distance, duration) and acoustic metrics (broadband quantiles and level-to-source ratios).
The leaderboard ranks individual vessel passages by their acoustic impact:
- Filter by noise metric (broadband, communication band, or ship band), vessel type, and isolation status
- A sortable table of ship passages with summary acoustic statistics
- Selecting a passage displays detailed general and acoustic metrics for that track
This project includes two analysis modules designed to make downstream querying and vessel-noise analysis easier:
The partitioned accessor provides a convenient interface for reading hydrophone parquet datasets that are stored with partitioning by hydrophone and date.
It is intended for efficient retrieval of subsets of large datasets without loading full archives into memory.
Use this module when you need to:
- query specific time windows or hydrophones from partitioned parquet data
- support analysis workflows that read directly from S3-backed parquet stores
- prepare filtered PSD/broadband data for plotting, modeling, or metrics pipelines
See also: src/orcasound_noise/analysis/README.md
The ship metrics module computes vessel-noise summary metrics from processed hydrophone data and ship tracking context.
It is used to quantify acoustic characteristics during vessel passages and produce analysis-ready metric tables.
Use this module when you need to:
- calculate standardized ship-noise metrics for research or reporting
- summarize vessel passage sound levels across selected windows
- generate metric outputs for dashboards, notebooks, and comparative studies
See also: src/orcasound_noise/analysis/metrics/README.md
A Power Spectral Density describes the power present in the audio signal as a function of frequency, per unit frequency and for a given averaging time. In this codebase, PSD values are generally stored as Pandas Dataframes, where the index represents the timestamps, the columns represent frequency bands, and each cell value represents the relative power in that frequency band and time interval, in decibels.
The sample duration, or delta_t (time interval), represents the number of seconds per sample. A duration of 1 means that timestamps are one second apart, and each data point represents the average noise level over one second in that frequency band. The default for generated data is 1 second duration.
The frequency band represents the frequency range over which the power is integrated. Within the vessel noise and marine bioacoustic literature, this is commonly done in fractions of an octave, e.g. 1/3 or 1/12 octave bands. The value in the column index represents the upper frequency range. For example, in a 1/3rd octave PSD, the 63 column represents the power in the range of 0 to 63 Hz, while the 80 column represents the power from 63 to 80 Hz.
Hydrophones are referenced using the Hydrophone enum located in the utils package.. These enums store all the relevant connection info for each hydrophone, including where to find the streamed ts files and where to store the archived parquet files.
from orcasound_noise.utils import Hydrophone
my_hydrophone = Hydrophone.ORCASOUND_LABThe S3 File connector provides an interface for interacting with the S3 buckets where data is stored. This is generally initialized within other objects and rarely should be used directly.
Currently, all files are available for download without authentication. If files are being uploaded, then an AWSACCESS_KEY_ID and secret must be available in the environment. This can be done by adding a .aws-config file to the root of your working folder, (see example file) or by any other means of modifying the environment.
For example, to programaticaly provide authentication:
import os
from orcasound_noise.pipeline import NoiseAnalysisPipeline
# Set env
os.env["AWS_ACCESS_KEY_ID"] = "my_id"
os.env["AWS_SECRET_ACCESS_KEY"] = "my_secret"
# Upload file
pipeline = NoiseAnalysisPipeline(Hydrophone.ORCASOUND_LAB, pqt_folder='pqt', delta_f=10, bands=3, delta_t=1)
pipeline.generate_parquet_file(dt.datetime(2020, 1, 1), dt.datetime(2020, 2, 1), upload_to_s3=True)- librosa - Used for audio spectral analysis.
- ffmpeg - Used for audio conversion.
- Taipy - Used for the dashboard presentation.
- orca-hls-utils - Used for HLS acquisition.
- polars and pandas - Used for dataframe handling
MSDS 2023 Project
MSDS 2024 Project
MSDS 2026 Project
- Clayton Brock GitHub LinkedIn
- Erin Mee - GitHub LinkedIn
- Hua-Hsing Huang GitHub LinkedIn
- Srimant Mishra GitHub LinkedIn
This project is developed for Orcasound, an open-source community effort, with the primary goal of understanding how underwater noise may affect orcas in Puget Sound.
The datasets, analyses, and code in this repository are intended for research, education, and conservation-oriented analysis.
Ship passage data and derived ship sound metrics are included only to characterize the underwater acoustic environment and its potential effects on orcas.
Data quality, coverage, and processing assumptions may vary by source, location, and time period. Users should validate fitness for their own use case before drawing conclusions.
- Thanks to Valentina Staneva, Val and Scott Veirs, Ben Hendricks, and everyone else connected to the Orcasounds org for their input and guidance.
- Thanks to Megan Hazen and the rest of the UW MSDS faculty for their teachings and guidance.
Erbe, Christine. Underwater Acoustics: Noise and the Effects on Marine Mammals. Jasco Applied Sciences, https://www.oceansinitiative.org/wp-content/uploads/2012/07/PocketBook-3rd-ed.pdf.
Gabriele, C. M., Ponirakis, D., & Klinck, H. (2021). Underwater Sound Levels in Glacier Bay During Reduced Vessel Traffic Due to the COVID-19 Pandemic. Frontiers in Marine Science, 8. https://doi.org/10.3389/fmars.2021.674787
Heise, Kathy, et al. “PROPOSED METRICS FOR THE MANAGEMENT OF UNDERWATER NOISE FOR SOUTHERN RESIDENT KILLER WHALES.” Coastal Ocean Report Series. (2017). 10.25317/CORI20172.
Sound Measurement. Discovery of Sound in the Sea. https://dosits.org/science/measurement/
Veirs, V. Veirs, S. (2018, November 9). Orcasound lab: a soundscape analysis case study in killer whale habitat with implications for coastal ocean observatories. https://orcasound.net/talks/2018-asa-vveirs/
Veirs, S. (2023, March 9). Salish Sea Bioacoustics: Marine Noise Pollution. https://www.emaze.com/@ALOWTWOLL/uwocean409-2023
Wall, Carrie C., et al. “The next wave of passive acoustic data management: How Centralized Access Can Enhance Science.” Frontiers in Marine Science, vol. 8, 14 July 2021, https://doi.org/10.3389/fmars.2021.703682.