# Segments Extraction  

This notebook extracts annotated audio segments from the official recordings of Tovanella and WABAD using the `Bird_tags_Train.mat` file. Since BirdNET analyzes 3-second clips, all extracted segments follow this duration.  

Segments are generated with a 50% overlap, shifting by 1.5 seconds between consecutive clips.  

## Extraction Process:
1. **`species_dict`**: maps common names to scientific names for all species.  
2. **`category_annots.json`** & **`audio_annots.json`**: store segment annotations for each species in every audio file.  
3. **`audio_info.json`**: provides total duration and sampling rate for each recording.  
4. **`true_segments.json`**: lists the species present in each extracted segment.  

Unannotated segments can be included (labeled as `"None"`) by enabling `generate_None`, treating them as a non-species class.  

For WABAD, a different approach was used due to multiple recording sites—only relevant sites containing the study species were processed.


In [31]:
import os
import json
import pandas as pd
import csv
import utils
import scipy.io
from birdlib import utils

In [32]:
DATASET_NAME = 'dataset'
DATASET_PATH = f'/home/giacomoschiavo/segments/{DATASET_NAME}'
AUDIO_SOURCE = '/home/giacomoschiavo/Tovanella'

# Update! Birds_tags_Train_2.mat
In this new dataset some file are written in this form

"< code > - < common species > - < scientific name >.mp3" -> "XC636429 - Merlo dal collare - Turdus torquatus.wav"

To simpify the process, they will be renamed as "< code >_ 0 _.mp3" -> "XC636429_0_.wav"

First, we need to convert .mp3 to .wav

In [33]:
from pydub import AudioSegment
tovanella_path = '/home/giacomoschiavo/Tovanella'

def convert_mp3_to_wav(path):
    for filename in os.listdir(path):
        if filename.endswith(".mp3"):
            mp3_path = os.path.join(path, filename)
            wav_filename = os.path.splitext(filename)[0] + ".WAV"
            wav_path = os.path.join(path, wav_filename)

            try:
                audio = AudioSegment.from_mp3(mp3_path)
                audio.export(wav_path, format="wav")
                print(f"✅ Converted: {filename} → {wav_filename}")
            except Exception as e:
                print(f"Error with {filename}: {e}")

# convert_mp3_to_wav(tovanella_path)


In [34]:
for audio in os.listdir(tovanella_path):
    if audio.upper().endswith('.WAV') and '-' in audio:
        code = audio.split(' - ')[0]
        print(audio)
        os.rename(
            os.path.join(tovanella_path, audio),
            os.path.join(tovanella_path, f'{code}_0.WAV')
        )

# Species Dict
Create a dictionary to map the scientific name -> common name

In [35]:
species_dict = utils.get_species_dict("utils/BirdNET_GLOBAL_6K_V2.4_Labels_en_uk.txt")
# export species_dict to json
# with open('utils/species_dict_map.json', 'w') as f:
#     json.dump(species_dict, f)

# Category and Audio Annotation Files: A Deep Dive

These files hold invaluable annotation data, offering distinct perspectives on our dataset.

**`category_annots.json`**: This file provides a species-centric view. For each species identified, it lists *all* corresponding annotations found across *every* audio recording within the Tovanella folder.

**`audio_annots.json`**: In contrast, this file takes an audio-centric approach. For each individual audio file in our collection, it details *all* the annotations present within that specific recording.

In [36]:
# extract annotations from the given file
bird_tags = scipy.io.loadmat('Birds_tags_Train_2.mat')["Bird_tags"] 
# visualize an example, showing all the properties
for i, prop in enumerate(bird_tags[12][0][0][0]):
    print(i, prop)

0 ['Turdus_philomelos']
1 ['20190607_030000.WAV']
2 [[ 5.05964467  1.72461929  1.72461929  5.05964467 24.41782537 26.39896524]]
3 [[24.41782537  5.05964467]
 [24.41782537  1.72461929]
 [26.39896524  1.72461929]
 [26.39896524  5.05964467]
 [24.41782537  5.05964467]]
4 [[2]]


In [None]:
category_annots, audio_annots, _ = utils.get_audio_category_annots("Bird_tags_Train.mat", AUDIO_SOURCE, species_dict)
category_annots_2, audio_annots_2, _ = utils.get_audio_category_annots("Birds_tags_Train_2.mat", AUDIO_SOURCE, species_dict)

# category_annots.update(category_annots_2)
# audio_annots.update(audio_annots_2)
# with open("utils/category_annots.json", "w") as f:
#     json.dump(category_annots, f)
# with open("utils/audio_annots.json", "w") as f:
#     json.dump(audio_annots, f)
# with open("utils/category_annots_2.json", "w") as f:
#     json.dump(category_annots, f)
# with open("utils/audio_annots_2.json", "w") as f:
#     json.dump(audio_annots, f)


In [None]:
category_annots_test, audio_annots_test, _ = utils.get_audio_category_annots("Bird_tags_Test.mat", AUDIO_SOURCE, species_dict)

# with open("utils/category_annots_test.json", "w") as f:
#     json.dump(category_annots_test, f)
# with open("utils/audio_annots_test.json", "w") as f:
#     json.dump(audio_annots_test, f)

In [40]:
# creates species list
species_list = list(category_annots.keys())

# Segments Creation
Creates all the segments listed in Category Info

In [41]:
audio_info = utils.load_or_generate_info('audio_info.json', audio_annots, AUDIO_SOURCE, 'utils')
audio_info_test = utils.load_or_generate_info('audio_info_test.json', audio_annots_test, AUDIO_SOURCE, 'utils')

In [42]:
true_segments_train = utils.generate_true_segments(audio_annots, audio_info)
true_segments_test = utils.generate_true_segments(audio_annots_test, audio_info_test)

In [43]:
import copy
true_segments = copy.deepcopy(true_segments_train)
true_segments.update(true_segments_test)

In [44]:
# SAVE
os.makedirs(f'utils/{DATASET_NAME}', exist_ok=True)
with open(f'utils/{DATASET_NAME}/true_segments_train.json', 'w') as f:
    json.dump(true_segments_train, f)
with open(f'utils/{DATASET_NAME}/true_segments_test.json', 'w') as f:
    json.dump(true_segments_test, f)
with open(f'utils/{DATASET_NAME}/true_segments.json', 'w') as f:
    json.dump(true_segments, f)


In [45]:
utils.generate_segments(audio_source_path=AUDIO_SOURCE,
                  target_path=f"{DATASET_PATH}/train",
                  true_segments=true_segments_train,
                  audio_info=audio_info,
                  generate_None=True)

Processing segments for 20190621_010000.WAV...: 100%|[31m██████████[0m| 399/399 [00:00<00:00, 2497.35it/s]
Processing segments for 20190621_020000.WAV...: 100%|[31m██████████[0m| 399/399 [00:00<00:00, 2491.83it/s]
Processing segments for 20190621_030000.WAV...: 100%|[31m██████████[0m| 399/399 [00:00<00:00, 1811.66it/s]
Processing segments for 20190621_040000.WAV...: 100%|[31m██████████[0m| 399/399 [00:00<00:00, 1351.55it/s]
Processing segments for 20190621_050000.WAV...: 100%|[31m██████████[0m| 399/399 [00:00<00:00, 1610.24it/s]
Processing segments for 20190621_060000.WAV...: 100%|[31m██████████[0m| 399/399 [00:00<00:00, 1488.93it/s]
Processing segments for 20190621_070000.WAV...: 100%|[31m██████████[0m| 399/399 [00:00<00:00, 1453.09it/s]
Processing segments for 20190621_080000.WAV...: 100%|[31m██████████[0m| 399/399 [00:00<00:00, 1873.86it/s]
Processing segments for 20190621_090000.WAV...: 100%|[31m██████████[0m| 399/399 [00:00<00:00, 2172.31it/s]
Processing segments

In [46]:
utils.generate_segments(audio_source_path=AUDIO_SOURCE,
                  target_path=f"{DATASET_PATH}/test",
                  true_segments=true_segments_test,
                  audio_info=audio_info_test,
                  generate_None=True)

Processing segments for 20190601_000000.WAV...: 100%|[31m██████████[0m| 399/399 [00:00<00:00, 475.80it/s] 
Processing segments for 20190601_030000.WAV...: 100%|[31m██████████[0m| 399/399 [00:00<00:00, 841.47it/s] 
Processing segments for 20190601_040000.WAV...: 100%|[31m██████████[0m| 399/399 [00:00<00:00, 648.35it/s]
Processing segments for 20190601_050000.WAV...: 100%|[31m██████████[0m| 399/399 [00:00<00:00, 533.50it/s]
Processing segments for 20190601_060000.WAV...: 100%|[31m██████████[0m| 399/399 [00:00<00:00, 553.33it/s]
Processing segments for 20190601_070000.WAV...: 100%|[31m██████████[0m| 399/399 [00:00<00:00, 784.32it/s]
Processing segments for 20190601_080000.WAV...: 100%|[31m██████████[0m| 399/399 [00:00<00:00, 727.97it/s]
Processing segments for 20190601_090000.WAV...: 100%|[31m██████████[0m| 399/399 [00:00<00:00, 587.69it/s]
Processing segments for 20190601_100000.WAV...: 100%|[31m██████████[0m| 399/399 [00:00<00:00, 581.84it/s]
Processing segments for 20

In [47]:
# count segments by species
target_path = f"{DATASET_PATH}/train"
species_count = {species: len(os.listdir(os.path.join(target_path, species))) for species in os.listdir(target_path)}
species_count_df = pd.DataFrame(list(species_count.items()), columns=["Species", "Count"])
species_count_df.sort_values(by="Count", ascending=False).reset_index(drop=True)

Unnamed: 0,Species,Count
0,,13860
1,Fringilla coelebs_Common Chaffinch,8330
2,Turdus philomelos_Song Thrush,4379
3,Sylvia atricapilla_Eurasian Blackcap,3789
4,Regulus ignicapilla_Common Firecrest,3218
5,Phylloscopus collybita_Common Chiffchaff,2172
6,Erithacus rubecula_European Robin,1726
7,Troglodytes troglodytes_Eurasian Wren,1394
8,Periparus ater_Coal Tit,1160
9,Regulus regulus_Goldcrest,877
