# Create Audio Embeddings

This notebook demonstrates how to generate embeddings for audio files using BirdNET (Kahl et al., 2023), a pretrained model available in the [bioacoustics model zoo](https://github.com/kitzeslab/bioacoustics-model-zoo).

**Notebook Overview:**
1. **Setup:** Define directories and load required libraries.
2. **Model Loading:** Load the BirdNET model.
3. **File Collection:** Get sample `.wav` files from the specified data folder.
4. **Embedding Generation:** For each file, generate embeddings and save them as a CSV file.
5. **Results:** Summarize and display processing information.

_This demo is simplified from the script used in the related study to run on a small number of sample files._

In [1]:
import os
from pathlib import Path
import pandas as pd
import numpy as np
from tqdm import tqdm
import bioacoustics_model_zoo as bmz
from time import time

## Set up directories

We define the location of our input audio files (sample_audio) and create a directory
for saving the resulting embedding CSV files (sample_embeddings).

In [2]:
# Set directory for sample audio files
data_root = Path("sample_audio")

# Set and, if needed, create the output directory for embedding CSV files
emb_save_dir = Path("sample_embeddings")
emb_save_dir.mkdir(exist_ok=True, parents=True)

## Load the model

We load BirdNET from the bioacoustics model zoo.
This model will be used to generate embeddings from our audio files.

Note that loading BirdNET will raise warnings and download files needed for it to work into your working directory; this is normal!

In [5]:
# Load the BirdNET model (this may take a few moments if not already cached)
model = bmz.BirdNET()

Downloaded completed: BirdNET_GLOBAL_6K_V2.4_Labels_af.txt
downloading model from URL...


                    This architecture is not listed in opensoundscape.ml.cnn_architectures.ARCH_DICT.
                    It will not be available for loading after saving the model with .save() (unless using pickle=True). 
                    To make it re-loadable, define a function that generates the architecture from arguments: (n_classes, n_channels) 
                    then use opensoundscape.ml.cnn_architectures.register_architecture() to register the generating function.

                    The function can also set the returned object's .constructor_name to the registered string key in ARCH_DICT

                    See opensoundscape.ml.cnn_architectures module for examples of constructor functions
                    


Downloaded completed: BirdNET_GLOBAL_6K_V2.4_Model_FP16.tflite


    TF 2.20. Please use the LiteRT interpreter from the ai_edge_litert package.
    See the [migration guide](https://ai.google.dev/edge/litert/migration)
    for details.
    
INFO: Created TensorFlow Lite XNNPACK delegate for CPU.


## Gather Input Files

We search for audio files with the extension `.wav` within the indicated sample folder.

In [17]:
# Get a list of input .wav files
input_files = list(data_root.glob("*.wav"))
print(f"Found {len(input_files)} files to embed.")
print(f"Embeddings will be saved to: {emb_save_dir}/")

Found 3 files to embed.
Embeddings will be saved to: sample_embeddings/


## Generate Feature Embeddings

This function:
- Determines an output CSV filename (saved in `sample_embeddings`).
- Uses BirdNET to generate embeddings.
- Saves the embeddings to disk, in our case reducing precision to float16.

Errors are returned for logging.

In [9]:
def embed(file_path):
    """
    Generate embeddings for a single audio file and save the result as a CSV.
    
    Parameters:
        file_path (Path): Path to the audio file.
    
    Returns:
        tuple (file_path, status): where status is "Success" or an Exception instance.
    """
    file_path = Path(file_path)
    # Build the output filename directly under the emb_save_dir (no nested folders)
    dest = emb_save_dir / f"{file_path.stem}_embeddings.csv"

    try:
        # Generate embeddings (passing a single file as a list)
        embeddings = model.embed([str(file_path)], batch_size=64, num_workers=1)
        # Convert columns to float16 format for smaller file size
        for col in embeddings.columns:
            embeddings[col] = embeddings[col].astype('float16')
        # Save embedding DataFrame to CSV
        embeddings.to_csv(dest)
    except Exception as e:
        return file_path, e

    return file_path, "Success"

## Process Audio Files

We loop over the files and embed each one, displaying a progress bar (`tqdm` required).

In [10]:
start_time = time()
results = []

for file in tqdm(input_files, desc="Processing Audio Files"):
    result = embed(file)
    results.append(result)

total_time = time() - start_time

Processing Audio Files:   0%|          | 0/3 [00:00<?, ?it/s]

  0%|          | 0/1 [00:00<?, ?it/s]

Processing Audio Files:  33%|███▎      | 1/3 [00:12<00:25, 12.90s/it]

  0%|          | 0/1 [00:00<?, ?it/s]

Processing Audio Files:  67%|██████▋   | 2/3 [00:25<00:12, 12.57s/it]

  0%|          | 0/1 [00:00<?, ?it/s]

Processing Audio Files: 100%|██████████| 3/3 [00:38<00:00, 12.69s/it]


## Check completion

Finally, check which files were embedded and the overall processing time, along with a preview of the outcome statuses.

In [15]:
# Set up DataFrame for easy previewing
status_df = pd.DataFrame(results, columns=["file", "status"])
print("Processing completed!")
print(f"Total time: {total_time:.2f} seconds.\n")
print(status_df)

Processing completed!
Total time: 38.07 seconds.

                           file   status
0  sample_audio/test_dset_2.wav  Success
1  sample_audio/test_dset_3.wav  Success
2  sample_audio/test_dset_1.wav  Success
