Skip to content

Tool for exploring and improving AI detection of SRKW calls to generate expert-labeled data -- prototype at https://ai4orcas.net/portfolio/pod-cast-annotation-system/

License

Notifications You must be signed in to change notification settings

orcasound/aifororcas-podcast

Repository files navigation

Pod.Cast 🎱 🐋 | Annotation system

License: MIT

Developed by Prakruti Gogia, Akash Mahajan and Nithya Govindarajan during Microsoft AI4Earth & OneWeek hackathons. (this is volunteer-driven & is not an official product)

For a general introduction to the Pod.Cast project, initiated in 2019, and its relationship to other AI for Orcas efforts, please read the Pod.Cast project general overview at ai4orcas.net.

Techinical Overview

podcast_server.py is a prototype flask-based web-app to label unlabelled bioacoustic recordings, while viewing predictions from a model. This is useful to setup some quick-and-dirty labelling sessions that don't need any advanced features such as automated model inference, user access roles, interfacing with other backends, gamification etc.

(See prediction-explorer for a related tool to quickly visualize & browse model predictions on a set of audio files. This runs locally)

Screenshot of Pod.Cast annotation UI

  • Each page/session gets a unique URL (via the sessionid URL param), that you can use to share if you find something interesting
  • Refer to the instructions on the page for how to edit model predictions or create annotations
  • The progress bar tracks the current "round" of unlabelled sessions for which annotations have been submitted
  • If you aren't sure, or want to see a new one, skip & refresh loads a random (un-annotated) session without submitting anything

Dataset Creation

This tool has been used in an active learning style to create & release new training & test sets at orcadata/wiki.

  • To do so, a candidate 2-3hr window is identified, with likely activity (reported by sighting networks / Orcasound listeners). Data is processed from Orcasound's S3 archives as follows:
    • Format conversion (HLS -> concatenated wav file)
    • Audio is split into 1-minute easily browsable "sessions"
    • Data to use for labelling/training is prioritized as follows:
      • Candidates are selected for labelling using predictions from an ML model, using a mid-low threshold (tuned for high recall). This helps discard data & prioritize labelling effort.
  • Each round generates new labelled data that improves models trained on this data, making them more robust to varied acoustic conditions at different hydrophone nodes.
  • Held-out test sets have also been created in a similar fashion as accuracy and robustness benchmarks.

Flowchart of feedback loop between model & human listeners

Architecture

This prototype is a single page application with a simple flask backend that interfaces with Azure blob storage. For simplicity/ease of access, this version doubles up use of blob storage as a sort of database. A JSON file acts as a single entry, and separate containers as sort of tables/collections (for now for this hack makes it easy to do quick-and-dirty viewing/editing in Azure Storage Explorer, or any equivalent blob viewer for S3 etc.).

Architecture diagram showing API interactions between frontend, backend & blob storage

Backend API:

GET /fetch/session/roundid

Scans the getcontainer blob for an unlabelled session, randomly picks & returns a {sessionid=X} response. The sessionid is simply the name of the corresponding X.JSON file on the blob. Updates/resets internal global variable backend_state that contains info for the progress bar.

GET /load/session/roundid/sessionid

GET Azure blob wav

Fetches the corresponding JSON file from the getcontainer blob. (For an example, see example-load.json) JSON file contains backend_state for the progress bar, and uri that points the client directly to the corresponding audio file on the blob storage.

POST /submit/session/roundid/sessionid

Writes a JSON to the postcontainer blob. (For an example, see example-submit.json, which has the same schema). Also updates internal global variable backend_state that contains info for the progress bar.

Client logic:

Primary logic is defined in main.js.

  • fetchUrl, dataUrl, postUrl in index.html define above API
  • The client first checks for the sessionid URL parameter & runs loadSession or fetchAndLoadSession as appropriate
  • This is done on page load and when a submit/skip button is clicked

Use & setup

Setup & local debugging

  1. Create an isolated python environment, and pip install --upgrade pip && pip install -r requirements.txt. (Python 3.6.8 has been tested, though recent versions should likely work as dependencies are quite simple)

  2. Set the environment variable FLASK_APP=podcast_server.py and FLASK_ENV=development. If you haven't made your own CREDS file yet, see #3. Once that's done from this directory start the server with python -m flask run, and browse to the link in the terminal (e.g. http://127.0.0.1:5000/) in your browser (Edge and Chrome are tested).

  3. The CREDS.yaml specifies how the backend authenticates with blob storage & the specific container names to use. The provided file is a template and should be replaced:

Note that when you run this locally, you will still be connecting & writing to the actual blob storage specified in CREDS.yaml so be careful.

Using your own blob storage

This assumes you have already created an Azure Storage account & know how to view & access it using Azure Storage Explorer.

  1. Enable a CORS rule to the account. In short, setting this allows a browser client to directly make a request to the blob storage to retrieve a *.wav file.

Screenshot of Azure Storage explorer showing CORS permissions

  1. Make sure you have 3 containers; [1]: audiocontainer *.wav audio files (~1min duration - as each file forms one page/session) [2]: getcontainer model predictions specified in JSON format example-load.json corresponding to each *.wav file [3]: postcontainer destination for user-submitted annotations in JSON format example-submit.json.

  2. Enable public read-only access to blobs in audiocontainer (select the "blobs" option). Along with #1, this is required for the browser to directly retrieve *.wav files.

Screenshot of Azure Storage explorer to set public access level

Deployment to Azure App Service

Prerequsite: Install Azure CLI

  1. Authenticate and setup your local environment to be using the right subscription
az login 
az account list --output table 
az account set --subscription SUBSCRIPTIONID
  1. In the root directory of your application, create a deployment config file at .azure/config. This contains details about your resource group, appservice plan to use, etc. (An example file is at .azure/config)

  2. Now run the following commands to deploy the app. The first command packages up your local directory into a *.zip for deployment and deploys the app on Azure. If an app with the same name in the deployment config file exists it will update it, else create a new app. The second command is to only be run the first time, to register the entry point of the app. (see note below)

az webapp up --sku B1 --dryrun
az webapp config set -g mldev -n aifororcas-podcast --startup-file "gunicorn --bind=0.0.0.0 --timeout 600 podcast_server:app"

This deployment example is loosely based on the Quickstart. We make a change to the startup command to register the different name of our app file podcast_server.py. (FYI some more details about the CLI commands used here are at: az-webapp-up, configuring-python-app)

References

This code uses a fork of audio-annotator for the frontend code. audio-annotator uses wavesurfer.js for rendering and playing audio. Please refer to the respective references for more info on the core functions/classes used in this repo. (Note: the wavesurfer.js version used here is older than the current docs).

Icons used in readme flowcharts were made by Prosymbols from www.flaticon.com.

About

Tool for exploring and improving AI detection of SRKW calls to generate expert-labeled data -- prototype at https://ai4orcas.net/portfolio/pod-cast-annotation-system/

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published