#Bioacoustics AIMs 2025 _ Hands on Session 3

#Instructions on how to use this notebook
This notebook is hosted on ``Google Colab``. To be able to work on it, you have to create your own copy. Go to *File* and select *Save a copy in Drive*.



# Intaka Hoplite Database

We will explore data from Intaka Island. It is associated with this eBird hotspot:
https://ebird.org/hotspot/L920375

At your table, try to learn the songs of the five pigeons and doves on the eBird hotspot for Intaka Island. **Write a short description of how to tell their songs apart.**

* Speckled Pigeon
* Ring-necked Dove
* Laughing Dove
* Red-eyed Dove
* Namaqua Dove

Use the **agile modeling notebook to create a classifier** for one or two of these species. You and your table-mates should pick different species.

The following two cells need to be run in `2_agile_modeling_v2.ipynb` notebook in order for the rest to work.


In [None]:
# Install hoplite and TF 2.20
!pip install git+https://github.com/google-research/perch-hoplite.git
!pip install tensorflow~=2.20.0

In [None]:
# Copy hoplite database locally.
from etils import epath
import os
from perch_hoplite.db import sqlite_usearch_impl
intaka_path = epath.Path('gs://chirp-public-bucket/soundscapes/intaka')

# Copy the hoplite database to the Colab local storage.
# The usearch.index file is 3.6Gb, so takes a couple minutes to download.
# Files are placed in `/content` directory.
for fp in intaka_path.glob('hoplite*'):
  print(fp)
  with fp.open('rb') as f:
    with open(fp.name, 'wb') as g:
      g.write(f.read())
with (intaka_path / 'usearch.index').open('rb') as f:
  print(intaka_path / 'usearch.index')
  with open('usearch.index', 'wb') as g:
    %time g.write(f.read())

# Update the DB with some override values...
# Need to use the perch_v2_cpu model, and point the db to the google cloud
# bucket containing the data.
db_path = '/content'
db = sqlite_usearch_impl.SQLiteUsearchDB.create(db_path)

model_cfg = db.get_metadata('model_config')
model_cfg.model_config.tfhub_path = 'google/bird-vocalization-classifier/tensorFlow2/perch_v2_cpu'
model_cfg.model_config.tfhub_version = 1
db.insert_metadata('model_config', model_cfg)

sources_cfg = db.get_metadata('audio_sources')
sources_cfg.audio_globs[0]['base_path'] = 'gs://chirp-public-bucket/soundscapes/intaka'
db.insert_metadata('audio_sources', sources_cfg)

db.commit()

**When you are done / at the end of our time, make a CSV with all of the segments you labeled, using the following code.**

Send the CSV to Maria.

In [None]:
print(db.get_classes())

# Get all labeled examples
for cls in db.get_classes():
  idxes = db.get_embeddings_by_label(cls)
  for idx in idxes:
    s = db.get_embedding_source(idx)
    for lbl in db.get_labels(idx):
      print(f'{s.source_id}, {s.offsets[0]}, {lbl.label}, {lbl.type}')