Whale Call Detection

Overview | Setup | Data | Usage | Licenses

Overview

This repository implements a framework to detect whale calls embedded in seismic waveforms. Based on seismic waveform spectrogram, the algorithm performs the following two tasks:

recognizing the presence of whale calls (classification task)
predicting call time if recognized (regression task).

Figure 1. An overview of the framework. The seismic data shown here is a sample of a blue whale call detected on station PMAQ and the timestamp of this call is '2021-10-02 07:13:33.02'. The waveform has been filtered with a band pass filter [10, 32] HZ.

Setup

Installation

Download and install the latest Anaconda Python distribution
Download and uncompress the repository here.
Execute the following commands to install all software requirements:

cd whale-call-detection
conda env create
pip install --editable .

(Optional) Install pre-commits hooks:

cd whale-call-detection
conda activate whale
pre-commit install

Installation (Docker)

Download and install docker.
Execute the following commands to install all software requirements to a Docker image:

cd whale-call-detection
docker build -t whale-call-detection .

The following commands will mount the current source code and provide access to the docker container's terminal:

cd whale-call-detection
docker run -it --rm \
    -v `pwd`:/home/whale-call-detection \
    -p 5000:5000 \
    -p 8888:8888 \
    whale-call-detection \
    /bin/bash

Using the GPU from inside the Docker container

To use the GPU from within the Docker container, make sure to install nvidia-docker and set the Docker runtime to nvidia. For example:

cd whale-call-detection
docker run -it --rm --runtime=nvidia \
    -v `pwd`:/home/whale-call-detection \
    -p 5000:5000 \
    -p 8888:8888 \
    whale-call-detection \
    /bin/bash

Data

The raw seismic data used in this study is publicly available via Natural Resoures Canada's FTP server: ftp://ftp.seismo.nrcan.gc.ca/. The labels (blue and fin whale calls) are generated using the algorithm and code developed by Plourde and Nedimovic [2022]. For more details on data availability and preprocesing, please check this documentation.

The directory format of processed waveform data is:

├── root_data_dir/
│   ├── 20200201/
│   │   ├── 2021.06.06.CN.CNQ..EHZ.SAC
        ├── 2021.06.06.CN.ICQ..HHE.SAC
        ├── ...
│   ├── 20200202/
│   │   ├── 2020.02.02.CN.CNQ..EHZ.SAC
        ├──2020.02.02.CN.ICQ..HHE.SAC
        ├── ...
│   ├── ...
│   │

The format of directory used to initialize an instance of WhaleDataModule is:

├── root_data_dir/
    ├── train.csv
    ├── valid.csv
    ├── test.csv

Here is the dataset dictionary:

Column	Explanation	Type	Example
`file_path`	path to the `.SAC` file with its `component` replaced with `CHANNEL`	`str`	`/root_data_dir/20210822/2021.08.22.CN.SNFQ..CHANNEL.SAC`
`time_window_start`	signal window start time	`str`	`2021-08-22 05:56:45.38`
`time_window_end`	signal window end time	`str`	`2021-08-22 05:57:01.38`
`time_R_max`	target call time, i.e. the time with the maximum whale index (R) value	`str`	`2021-08-22 05:56:54.01`
`time_call_start`	the start time of a whale call ($\sim$ 1s for fin whale calls and $\sim$ 8s for blue whale calls)	`str`	`2021-08-22 05:56:50.01`
`time_call_end`	the end time of a whale call ($\sim$ 1s for fin whale calls and $\sim$ 8s for blue whale calls)	`str`	`2021-08-22 05:56:58.01`
`R`	the whale call index value (fixed as `0.0` for noise samples)	`float`	`19.7`
`SNR`	the signal-to-noise ratio of the whale call (fixed as `-99.0` for noise samples	`float`	`21.22`
`station_code`	station where the signal is detected	`str`	`SNFQ`
`whale_type`	type of signal (`0` for noise samples and `1` for a whale call	`int`	1
`component`	list of components separated by space available for given station	`str`	`HHE HHN HHZ`

Split

To split the dataset into train, validation and test datasets we have developped a script scripts/split_data.py.

python scripts/split_data.py -h
usage: split_data.py [-h] [--input-file INPUT_FILE] [--output-path OUTPUT_PATH]
Script to apply bandpass filter to a list of SAC files
optional arguments:
  -h, --help            show this help message and exit
  --input-file INPUT_FILE
                        Path to dataset (.csv) (default: data/LABELS/FW/MIXED/fw_HQ_component_grouped_filt.csv)
  --output-path OUTPUT_PATH
                        Path to output folder. (default: data/datasets/FWC_HQ_3CH_FILT/)

The dataset is split into 3 subsets using the same random seed: 80% for the training set, 10% for the validation set, and 10% for the test set.

Usage

Training

To train a Long shot-term memory (LSTM) network, please check LSTM for more details. Samples of labels for training can be found here. Please note that raw waveform data is not included and can be downloaded following instructions in data.md.

Making predictions

To make prediction using a trained model (available here):

python scripts/predict.py -h
usage: predict.py [-h] [--model-ckpt MODEL_CKPT] [--inp-csv INP_CSV] [--out-csv OUT_CSV]
                  [--batch-size BATCH_SIZE]

Make prediction using a pretrained model

optional arguments:
  -h, --help            show this help message and exit
  --model-ckpt MODEL_CKPT
                        path to the pretrained model checkpoint (default: model.ckpt)
  --inp-csv INP_CSV     path to the input csv file (default: samples.csv)
  --out-csv OUT_CSV     path to the predictions csv file (default: predictions.csv)
  --batch-size BATCH_SIZE
                        batch size for prediction (default: 16)

The samples.csv has 4 columns: [file_path,time_window_start,time_window_end,component].
The predictions.csv has 6 columns: [ file_path,time_window_start,time_window_end,component,label_pred,time_pred].

Comet experiment logging

Experiments are tracked using comet.
Please check this quickstart for more details on setup comet.
The hisory of experiments can be then visualized in your online Comet dashboard.

Consulting Optuna logs

Optuna logs from a optuna.sqlite3 database located in the current directory can be consulted as follow:

optuna-dashboard sqlite:///optuna.sqlite3

Otherwise, Optuna logs can be consulted as follow:

optuna-dashboard sqlite:///ABSOLUTE_PATH_TO_OPTUNA.SQLITE3_FILE

Licenses

Models

Not applicable as no pre-trained modes are used.

Datasets

The seismograph data is licensed under the Open Government License - Canada.

Packages

Package	Version	License
optuna	3.1.0	MIT License
optuna-dashboard	0.10.0	MIT License
pandas	1.4.3	BSD 3-Clause License
transformers	4.20.1	Apache 2.0 License
torchaudio	0.12.1	BSD 2-Clause License
torch	1.12.1	BSD 3-Clause License
pytorch_lightning	1.9.2	Apache2.0
plotly	5.9.0	MIT License
obspy	1.3.0	LGPL v3.0
matplotlib	3.6.3	Customized License
wget	3.2	GNU General Public License
types-pyyam	6.0.12.6	Apache 2.0 license
jsonargparse[signatures]	4.20.0	MIT License
wandb	0.15.8	MIT License

Name		Name	Last commit message	Last commit date
Latest commit History 228 Commits
.github/workflows		.github/workflows
config		config
data		data
docs		docs
examples		examples
experiments		experiments
hparams		hparams
models		models
notebooks		notebooks
scripts		scripts
tests		tests
whale		whale
.DS_Store		.DS_Store
.flake8		.flake8
.gitignore		.gitignore
.mypy.ini		.mypy.ini
.pre-commit-config.yaml		.pre-commit-config.yaml
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
environment.yml		environment.yml
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Whale Call Detection

Overview

Setup

Installation

Installation (Docker)

Using the GPU from inside the Docker container

Data

Split

Usage

Training

Making predictions

Comet experiment logging

Consulting Optuna logs

Licenses

Models

Datasets

Packages

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Whale Call Detection

Overview

Setup

Installation

Installation (Docker)

Using the GPU from inside the Docker container

Data

Split

Usage

Training

Making predictions

Comet experiment logging

Consulting Optuna logs

Licenses

Models

Datasets

Packages

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages