# Segments Extraction  

This notebook extracts annotated audio segments from the official recordings of Tovanella and WABAD using the `Bird_tags_Train.mat` file. Since BirdNET analyzes 3-second clips, all extracted segments follow this duration.  

Segments are generated with a 50% overlap, shifting by 1.5 seconds between consecutive clips.  

## Extraction Process:
1. **`species_dict`**: maps common names to scientific names for all species.  
2. **`category_annots.json`** & **`audio_annots.json`**: store segment annotations for each species in every audio file.  
3. **`audio_info.json`**: provides total duration and sampling rate for each recording.  
4. **`true_segments.json`**: lists the species present in each extracted segment.  

Unannotated segments can be included (labeled as `"None"`) by enabling `generate_None`, treating them as a non-species class.  

For WABAD, a different approach was used due to multiple recording sites—only relevant sites containing the study species were processed.


In [1]:
import os
import json
import pandas as pd
import csv
import utils
import scipy.io
from birdlib import utils

In [2]:
DATASET_NAME = 'PROVA'
DATASET_PATH = f'/home/giacomoschiavo/segments/{DATASET_NAME}'
AUDIO_SOURCE = '/home/giacomoschiavo/Tovanella'

In [3]:
# Configuration variable
# DATASET_NAME = 'NEW_DATASET_2'                              # name of the dataset (used to save utils file under its name) 
# DATASET_PATH = f'E:/Giacomo/Tovanella/{DATASET_NAME}'       # path of the dataset
# AUDIO_SOURCE = 'E:/Giacomo/Tovanella/Tovanella'             # folder that contains all the audio files

# Species Dict
Create a dictionary to map the scientific name -> common name

In [4]:
species_dict = utils.get_species_dict("utils/BirdNET_GLOBAL_6K_V2.4_Labels_en_uk.txt")
# export species_dict to json
# with open('utils/species_dict_map.json', 'w') as f:
#     json.dump(species_dict, f)

# Category and Audio Annotation Files: A Deep Dive

These files hold invaluable annotation data, offering distinct perspectives on our dataset.

**`category_annots.json`**: This file provides a species-centric view. For each species identified, it lists *all* corresponding annotations found across *every* audio recording within the Tovanella folder.

**`audio_annots.json`**: In contrast, this file takes an audio-centric approach. For each individual audio file in our collection, it details *all* the annotations present within that specific recording.

In [5]:
# extract annotations from the given file
bird_tags = scipy.io.loadmat('Bird_tags_Train.mat')["Bird_tags"] 
# visualize an example, showing all the properties
for i, prop in enumerate(bird_tags[12][0][0][0]):
    print(i, prop)

0 ['Fringilla_coelebs']
1 ['20190621_030000.WAV']
2 [[ 6.08474576  1.61016949  1.61016949  6.08474576 42.61703208 45.50069122]]
3 [[42.61703208  6.08474576]
 [42.61703208  1.61016949]
 [45.50069122  1.61016949]
 [45.50069122  6.08474576]
 [42.61703208  6.08474576]]
4 [[2]]


In [6]:
category_annots, audio_annots = utils.get_audio_category_annots("Bird_tags_Train.mat", AUDIO_SOURCE, species_dict)

# with open("utils/category_annots.json", "w") as f:
#     json.dump(category_annots, f)
# with open("utils/audio_annots.json", "w") as f:
#     json.dump(audio_annots, f)

In [7]:
category_annots_test, audio_annots_test = utils.get_audio_category_annots("Bird_tags_Test.mat", AUDIO_SOURCE, species_dict)

# with open("utils/category_annots_test.json", "w") as f:
#     json.dump(category_annots_test, f)
# with open("utils/audio_annots_test.json", "w") as f:
#     json.dump(audio_annots_test, f)

In [8]:
# creates species list
species_list = list(category_annots.keys())

# Segments Creation
Creates all the segments listed in Category Info

In [11]:
def load_or_generate_info(filename, annots, audio_source, save_path):
    full_path = os.path.join(save_path, filename)
    if os.path.exists(full_path):
        with open(full_path) as f:
            return json.load(f)
    info = utils.generate_audio_info(audio_source, annots)
    with open(full_path, 'w') as f:
        json.dump(info, f)
    return info

audio_info = load_or_generate_info('audio_info.json', audio_annots, AUDIO_SOURCE, 'utils')
audio_info_test = load_or_generate_info('audio_info_test.json', audio_annots, AUDIO_SOURCE, 'utils')

In [13]:
true_segments_train = utils.generate_true_segments(audio_annots, audio_info)
true_segments_test = utils.generate_true_segments(audio_annots_test, audio_info_test)

In [15]:
import copy
true_segments = copy.deepcopy(true_segments_train)
true_segments.update(true_segments_test)

In [None]:
# SAVE
os.makedirs(f'utils/{DATASET_NAME}', exist_ok=True)
# with open(f'utils/{DATASET_NAME}/true_segments_train.json', 'w') as f:
#     json.dump(true_segments_train, f)
# with open(f'utils/{DATASET_NAME}/true_segments_test.json', 'w') as f:
#     json.dump(true_segments_test, f)

# with open(f'utils/{DATASET_NAME}/true_segments.json', 'w') as f:
#     json.dump(true_segments, f)



In [None]:
utils.generate_segments(audio_source_path=AUDIO_SOURCE,
                  target_path=f"{DATASET_PATH}/train",
                  true_segments=true_segments_train,
                  audio_info=audio_info,
                  generate_None=True)

Processing segments...:   0%|[31m          [0m| 0/399 [00:00<?, ?it/s]

Elaborating audio 20190621_010000.WAV...


Processing segments...: 100%|[31m██████████[0m| 399/399 [00:00<00:00, 1024.73it/s]
Processing segments...:   0%|[31m          [0m| 0/399 [00:00<?, ?it/s]

Elaborating audio 20190621_020000.WAV...


Processing segments...: 100%|[31m██████████[0m| 399/399 [00:00<00:00, 1178.26it/s]
Processing segments...:   0%|[31m          [0m| 0/399 [00:00<?, ?it/s]

Elaborating audio 20190621_030000.WAV...


Processing segments...: 100%|[31m██████████[0m| 399/399 [00:00<00:00, 1049.26it/s]
Processing segments...:   0%|[31m          [0m| 0/399 [00:00<?, ?it/s]

Elaborating audio 20190621_040000.WAV...


Processing segments...: 100%|[31m██████████[0m| 399/399 [00:00<00:00, 874.61it/s] 
Processing segments...:   0%|[31m          [0m| 0/399 [00:00<?, ?it/s]

Elaborating audio 20190621_050000.WAV...


Processing segments...: 100%|[31m██████████[0m| 399/399 [00:00<00:00, 933.77it/s] 
Processing segments...:   0%|[31m          [0m| 0/399 [00:00<?, ?it/s]

Elaborating audio 20190621_060000.WAV...


Processing segments...: 100%|[31m██████████[0m| 399/399 [00:00<00:00, 922.82it/s] 
Processing segments...:   0%|[31m          [0m| 0/399 [00:00<?, ?it/s]

Elaborating audio 20190621_070000.WAV...


Processing segments...: 100%|[31m██████████[0m| 399/399 [00:00<00:00, 901.80it/s] 
Processing segments...:   0%|[31m          [0m| 0/399 [00:00<?, ?it/s]

Elaborating audio 20190621_080000.WAV...


Processing segments...: 100%|[31m██████████[0m| 399/399 [00:00<00:00, 1022.99it/s]
Processing segments...:   0%|[31m          [0m| 0/399 [00:00<?, ?it/s]

Elaborating audio 20190621_090000.WAV...


Processing segments...: 100%|[31m██████████[0m| 399/399 [00:00<00:00, 1110.11it/s]
Processing segments...:   0%|[31m          [0m| 0/399 [00:00<?, ?it/s]

Elaborating audio 20190621_100000.WAV...


Processing segments...: 100%|[31m██████████[0m| 399/399 [00:00<00:00, 983.17it/s] 
Processing segments...:   0%|[31m          [0m| 0/399 [00:00<?, ?it/s]

Elaborating audio 20190621_110000.WAV...


Processing segments...: 100%|[31m██████████[0m| 399/399 [00:00<00:00, 1163.86it/s]
Processing segments...:   0%|[31m          [0m| 0/399 [00:00<?, ?it/s]

Elaborating audio 20190621_120000.WAV...


Processing segments...: 100%|[31m██████████[0m| 399/399 [00:00<00:00, 1096.60it/s]
Processing segments...:   0%|[31m          [0m| 0/399 [00:00<?, ?it/s]

Elaborating audio 20190621_130000.WAV...


Processing segments...: 100%|[31m██████████[0m| 399/399 [00:00<00:00, 977.60it/s]
Processing segments...:   0%|[31m          [0m| 0/399 [00:00<?, ?it/s]

Elaborating audio 20190621_140000.WAV...


Processing segments...: 100%|[31m██████████[0m| 399/399 [00:00<00:00, 1202.40it/s]
Processing segments...:   0%|[31m          [0m| 0/399 [00:00<?, ?it/s]

Elaborating audio 20190621_150000.WAV...


Processing segments...: 100%|[31m██████████[0m| 399/399 [00:00<00:00, 1176.67it/s]
Processing segments...:   0%|[31m          [0m| 0/399 [00:00<?, ?it/s]

Elaborating audio 20190621_160000.WAV...


Processing segments...: 100%|[31m██████████[0m| 399/399 [00:00<00:00, 1225.19it/s]
Processing segments...:   0%|[31m          [0m| 0/399 [00:00<?, ?it/s]

Elaborating audio 20190621_170000.WAV...


Processing segments...: 100%|[31m██████████[0m| 399/399 [00:00<00:00, 1032.05it/s]
Processing segments...:   0%|[31m          [0m| 0/399 [00:00<?, ?it/s]

Elaborating audio 20190621_180000.WAV...


Processing segments...: 100%|[31m██████████[0m| 399/399 [00:00<00:00, 1104.37it/s]
Processing segments...:   0%|[31m          [0m| 0/399 [00:00<?, ?it/s]

Elaborating audio 20190621_190000.WAV...


Processing segments...: 100%|[31m██████████[0m| 399/399 [00:00<00:00, 1158.73it/s]
Processing segments...:   0%|[31m          [0m| 0/399 [00:00<?, ?it/s]

Elaborating audio 20190621_210000.WAV...


Processing segments...: 100%|[31m██████████[0m| 399/399 [00:00<00:00, 1235.29it/s]
Processing segments...:   0%|[31m          [0m| 0/399 [00:00<?, ?it/s]

Elaborating audio 20200215_060000.WAV...


Processing segments...: 100%|[31m██████████[0m| 399/399 [00:00<00:00, 1187.08it/s]
Processing segments...:   0%|[31m          [0m| 0/399 [00:00<?, ?it/s]

Elaborating audio 20200215_070000.WAV...


Processing segments...: 100%|[31m██████████[0m| 399/399 [00:00<00:00, 1200.27it/s]
Processing segments...:   0%|[31m          [0m| 0/399 [00:00<?, ?it/s]

Elaborating audio 20200215_080000.WAV...


Processing segments...: 100%|[31m██████████[0m| 399/399 [00:00<00:00, 1172.05it/s]
Processing segments...:   0%|[31m          [0m| 0/399 [00:00<?, ?it/s]

Elaborating audio 20200215_090000.WAV...


Processing segments...: 100%|[31m██████████[0m| 399/399 [00:00<00:00, 1237.52it/s]
Processing segments...:   0%|[31m          [0m| 0/399 [00:00<?, ?it/s]

Elaborating audio 20200215_100000.WAV...


Processing segments...: 100%|[31m██████████[0m| 399/399 [00:00<00:00, 1028.94it/s]
Processing segments...:   0%|[31m          [0m| 0/399 [00:00<?, ?it/s]

Elaborating audio 20200215_110000.WAV...


Processing segments...: 100%|[31m██████████[0m| 399/399 [00:00<00:00, 1229.35it/s]
Processing segments...:   0%|[31m          [0m| 0/399 [00:00<?, ?it/s]

Elaborating audio 20200215_120000.WAV...


Processing segments...: 100%|[31m██████████[0m| 399/399 [00:00<00:00, 1042.52it/s]
Processing segments...:   0%|[31m          [0m| 0/399 [00:00<?, ?it/s]

Elaborating audio 20190608_030000.WAV...


Processing segments...: 100%|[31m██████████[0m| 399/399 [00:00<00:00, 941.00it/s] 
Processing segments...:   0%|[31m          [0m| 0/399 [00:00<?, ?it/s]

Elaborating audio 20190608_040000.WAV...


Processing segments...: 100%|[31m██████████[0m| 399/399 [00:00<00:00, 1056.17it/s]
Processing segments...:   0%|[31m          [0m| 0/399 [00:00<?, ?it/s]

Elaborating audio 20200217_000000.WAV...


Processing segments...: 100%|[31m██████████[0m| 399/399 [00:00<00:00, 1138.20it/s]
Processing segments...:   0%|[31m          [0m| 0/399 [00:00<?, ?it/s]

Elaborating audio 20200217_060000.WAV...


Processing segments...: 100%|[31m██████████[0m| 399/399 [00:00<00:00, 1202.23it/s]
Processing segments...:   0%|[31m          [0m| 0/399 [00:00<?, ?it/s]

Elaborating audio 20200217_070000.WAV...


Processing segments...: 100%|[31m██████████[0m| 399/399 [00:00<00:00, 1228.92it/s]
Processing segments...:   0%|[31m          [0m| 0/399 [00:00<?, ?it/s]

Elaborating audio 20200217_090000.WAV...


Processing segments...: 100%|[31m██████████[0m| 399/399 [00:00<00:00, 1131.00it/s]
Processing segments...:   0%|[31m          [0m| 0/399 [00:00<?, ?it/s]

Elaborating audio 20200217_110000.WAV...


Processing segments...: 100%|[31m██████████[0m| 399/399 [00:00<00:00, 1186.99it/s]
Processing segments...:   0%|[31m          [0m| 0/399 [00:00<?, ?it/s]

Elaborating audio 20200217_120000.WAV...


Processing segments...: 100%|[31m██████████[0m| 399/399 [00:00<00:00, 1242.00it/s]
Processing segments...:   0%|[31m          [0m| 0/399 [00:00<?, ?it/s]

Elaborating audio 20200217_140000.WAV...


Processing segments...: 100%|[31m██████████[0m| 399/399 [00:00<00:00, 1230.13it/s]
Processing segments...:   0%|[31m          [0m| 0/399 [00:00<?, ?it/s]

Elaborating audio 20200217_160000.WAV...


Processing segments...: 100%|[31m██████████[0m| 399/399 [00:00<00:00, 1192.44it/s]
Processing segments...:   0%|[31m          [0m| 0/399 [00:00<?, ?it/s]

Elaborating audio 20190608_050000.WAV...


Processing segments...: 100%|[31m██████████[0m| 399/399 [00:00<00:00, 972.79it/s]
Processing segments...:   0%|[31m          [0m| 0/399 [00:00<?, ?it/s]

Elaborating audio 20190608_060000.WAV...


Processing segments...: 100%|[31m██████████[0m| 399/399 [00:00<00:00, 1147.89it/s]
Processing segments...:   0%|[31m          [0m| 1/399 [00:00<01:19,  4.99it/s]

Elaborating audio 20190608_070000.WAV...


Processing segments...: 100%|[31m██████████[0m| 399/399 [00:00<00:00, 1174.75it/s]
Processing segments...:   0%|[31m          [0m| 0/399 [00:00<?, ?it/s]

Elaborating audio 20190608_080000.WAV...


Processing segments...: 100%|[31m██████████[0m| 399/399 [00:00<00:00, 1214.38it/s]
Processing segments...:   0%|[31m          [0m| 0/399 [00:00<?, ?it/s]

Elaborating audio 20190608_090000.WAV...


Processing segments...: 100%|[31m██████████[0m| 399/399 [00:00<00:00, 1142.44it/s]
Processing segments...:   0%|[31m          [0m| 0/399 [00:00<?, ?it/s]

Elaborating audio 20190608_100000.WAV...


Processing segments...: 100%|[31m██████████[0m| 399/399 [00:00<00:00, 741.56it/s] 
Processing segments...:   0%|[31m          [0m| 0/399 [00:00<?, ?it/s]

Elaborating audio 20190608_110000.WAV...


Processing segments...: 100%|[31m██████████[0m| 399/399 [00:00<00:00, 1152.39it/s]
Processing segments...:   0%|[31m          [0m| 0/399 [00:00<?, ?it/s]

Elaborating audio 20190608_120000.WAV...


Processing segments...: 100%|[31m██████████[0m| 399/399 [00:00<00:00, 1169.95it/s]
Processing segments...:   0%|[31m          [0m| 0/399 [00:00<?, ?it/s]

Elaborating audio 20190608_130000.WAV...


Processing segments...: 100%|[31m██████████[0m| 399/399 [00:00<00:00, 1048.84it/s]
Processing segments...:   0%|[31m          [0m| 0/399 [00:00<?, ?it/s]

Elaborating audio 20190608_140000.WAV...


Processing segments...: 100%|[31m██████████[0m| 399/399 [00:00<00:00, 1083.48it/s]
Processing segments...:   0%|[31m          [0m| 0/399 [00:00<?, ?it/s]

Elaborating audio 20190608_150000.WAV...


Processing segments...: 100%|[31m██████████[0m| 399/399 [00:00<00:00, 1129.33it/s]
Processing segments...:   0%|[31m          [0m| 0/399 [00:00<?, ?it/s]

Elaborating audio 20190608_160000.WAV...


Processing segments...: 100%|[31m██████████[0m| 399/399 [00:00<00:00, 1197.16it/s]
Processing segments...:   0%|[31m          [0m| 0/399 [00:00<?, ?it/s]

Elaborating audio 20190608_170000.WAV...


Processing segments...: 100%|[31m██████████[0m| 399/399 [00:00<00:00, 1153.74it/s]
Processing segments...:   0%|[31m          [0m| 0/399 [00:00<?, ?it/s]

Elaborating audio 20190608_180000.WAV...


Processing segments...: 100%|[31m██████████[0m| 399/399 [00:00<00:00, 1187.13it/s]
Processing segments...:   0%|[31m          [0m| 0/399 [00:00<?, ?it/s]

Elaborating audio 20190608_190000.WAV...


Processing segments...: 100%|[31m██████████[0m| 399/399 [00:00<00:00, 1230.48it/s]
Processing segments...:   0%|[31m          [0m| 0/399 [00:00<?, ?it/s]

Elaborating audio 20190603_030000.WAV...


Processing segments...: 100%|[31m██████████[0m| 399/399 [00:00<00:00, 672.84it/s]
Processing segments...:   0%|[31m          [0m| 0/399 [00:00<?, ?it/s]

Elaborating audio 20190603_040000.WAV...


Processing segments...: 100%|[31m██████████[0m| 399/399 [00:00<00:00, 786.00it/s]
Processing segments...:   0%|[31m          [0m| 0/399 [00:00<?, ?it/s]

Elaborating audio 20190603_050000.WAV...


Processing segments...: 100%|[31m██████████[0m| 399/399 [00:00<00:00, 801.62it/s]
Processing segments...:   0%|[31m          [0m| 0/399 [00:00<?, ?it/s]

Elaborating audio 20190603_060000.WAV...


Processing segments...: 100%|[31m██████████[0m| 399/399 [00:00<00:00, 593.14it/s]
Processing segments...:   0%|[31m          [0m| 0/399 [00:00<?, ?it/s]

Elaborating audio 20190603_070000.WAV...


Processing segments...: 100%|[31m██████████[0m| 399/399 [00:00<00:00, 727.91it/s]
Processing segments...:   0%|[31m          [0m| 0/399 [00:00<?, ?it/s]

Elaborating audio 20190603_080000.WAV...


Processing segments...: 100%|[31m██████████[0m| 399/399 [00:00<00:00, 650.67it/s]
Processing segments...:   0%|[31m          [0m| 0/399 [00:00<?, ?it/s]

Elaborating audio 20190603_090000.WAV...


Processing segments...: 100%|[31m██████████[0m| 399/399 [00:00<00:00, 671.13it/s]
Processing segments...:   0%|[31m          [0m| 0/399 [00:00<?, ?it/s]

Elaborating audio 20190603_100000.WAV...


Processing segments...: 100%|[31m██████████[0m| 399/399 [00:00<00:00, 776.01it/s]
Processing segments...:   0%|[31m          [0m| 0/399 [00:00<?, ?it/s]

Elaborating audio 20190603_110000.WAV...


Processing segments...: 100%|[31m██████████[0m| 399/399 [00:00<00:00, 655.47it/s]
Processing segments...:   0%|[31m          [0m| 0/399 [00:00<?, ?it/s]

Elaborating audio 20190603_120000.WAV...


Processing segments...: 100%|[31m██████████[0m| 399/399 [00:00<00:00, 497.16it/s]
Processing segments...:   0%|[31m          [0m| 0/399 [00:00<?, ?it/s]

Elaborating audio 20190603_130000.WAV...


Processing segments...: 100%|[31m██████████[0m| 399/399 [00:00<00:00, 649.62it/s]
Processing segments...:   0%|[31m          [0m| 0/399 [00:00<?, ?it/s]

Elaborating audio 20190603_140000.WAV...


Processing segments...: 100%|[31m██████████[0m| 399/399 [00:00<00:00, 768.88it/s] 
Processing segments...:   0%|[31m          [0m| 0/399 [00:00<?, ?it/s]

Elaborating audio 20190603_150000.WAV...


Processing segments...: 100%|[31m██████████[0m| 399/399 [00:00<00:00, 810.21it/s] 
Processing segments...:   0%|[31m          [0m| 0/399 [00:00<?, ?it/s]

Elaborating audio 20190603_160000.WAV...


Processing segments...: 100%|[31m██████████[0m| 399/399 [00:00<00:00, 811.95it/s]
Processing segments...:   0%|[31m          [0m| 0/399 [00:00<?, ?it/s]

Elaborating audio 20190603_170000.WAV...


Processing segments...: 100%|[31m██████████[0m| 399/399 [00:00<00:00, 683.79it/s]
Processing segments...:   0%|[31m          [0m| 0/399 [00:00<?, ?it/s]

Elaborating audio 20190603_180000.WAV...


Processing segments...: 100%|[31m██████████[0m| 399/399 [00:00<00:00, 629.16it/s]
Processing segments...:   0%|[31m          [0m| 0/399 [00:00<?, ?it/s]

Elaborating audio 20190603_190000.WAV...


Processing segments...: 100%|[31m██████████[0m| 399/399 [00:00<00:00, 779.32it/s]
Processing segments...:   0%|[31m          [0m| 0/399 [00:00<?, ?it/s]

Elaborating audio 20190603_230000.WAV...


Processing segments...: 100%|[31m██████████[0m| 399/399 [00:00<00:00, 825.87it/s] 


In [20]:
utils.generate_segments(audio_source_path=AUDIO_SOURCE,
                  target_path=f"{DATASET_PATH}/final_test",
                  true_segments=true_segments_test,
                  audio_info=audio_info_test,
                  generate_None=True)

Processing segments...:   0%|[31m          [0m| 0/399 [00:00<?, ?it/s]

Elaborating audio 20190601_000000.WAV...


Processing segments...: 100%|[31m██████████[0m| 399/399 [00:01<00:00, 387.51it/s]
Processing segments...:   0%|[31m          [0m| 0/399 [00:00<?, ?it/s]

Elaborating audio 20190601_030000.WAV...


Processing segments...: 100%|[31m██████████[0m| 399/399 [00:01<00:00, 298.77it/s]
Processing segments...:   0%|[31m          [0m| 0/399 [00:00<?, ?it/s]

Elaborating audio 20190601_040000.WAV...


Processing segments...: 100%|[31m██████████[0m| 399/399 [00:00<00:00, 788.29it/s]
Processing segments...:   0%|[31m          [0m| 0/399 [00:00<?, ?it/s]

Elaborating audio 20190601_050000.WAV...


Processing segments...: 100%|[31m██████████[0m| 399/399 [00:00<00:00, 586.02it/s]
Processing segments...:   0%|[31m          [0m| 0/399 [00:00<?, ?it/s]

Elaborating audio 20190601_060000.WAV...


Processing segments...: 100%|[31m██████████[0m| 399/399 [00:00<00:00, 655.98it/s]
Processing segments...:   0%|[31m          [0m| 1/399 [00:00<01:20,  4.97it/s]

Elaborating audio 20190601_070000.WAV...


Processing segments...: 100%|[31m██████████[0m| 399/399 [00:00<00:00, 863.92it/s] 
Processing segments...:   0%|[31m          [0m| 0/399 [00:00<?, ?it/s]

Elaborating audio 20190601_080000.WAV...


Processing segments...: 100%|[31m██████████[0m| 399/399 [00:00<00:00, 759.96it/s]
Processing segments...:   0%|[31m          [0m| 0/399 [00:00<?, ?it/s]

Elaborating audio 20190601_090000.WAV...


Processing segments...: 100%|[31m██████████[0m| 399/399 [00:00<00:00, 800.62it/s] 
Processing segments...:   0%|[31m          [0m| 0/399 [00:00<?, ?it/s]

Elaborating audio 20190601_100000.WAV...


Processing segments...: 100%|[31m██████████[0m| 399/399 [00:00<00:00, 858.05it/s] 
Processing segments...:   0%|[31m          [0m| 0/399 [00:00<?, ?it/s]

Elaborating audio 20190601_110000.WAV...


Processing segments...: 100%|[31m██████████[0m| 399/399 [00:00<00:00, 768.88it/s] 
Processing segments...:   0%|[31m          [0m| 0/399 [00:00<?, ?it/s]

Elaborating audio 20190601_120000.WAV...


Processing segments...: 100%|[31m██████████[0m| 399/399 [00:00<00:00, 708.62it/s]
Processing segments...:   0%|[31m          [0m| 0/399 [00:00<?, ?it/s]

Elaborating audio 20190601_130000.WAV...


Processing segments...: 100%|[31m██████████[0m| 399/399 [00:00<00:00, 683.22it/s]
Processing segments...:   0%|[31m          [0m| 0/399 [00:00<?, ?it/s]

Elaborating audio 20190601_140000.WAV...


Processing segments...: 100%|[31m██████████[0m| 399/399 [00:00<00:00, 712.16it/s]
Processing segments...:   0%|[31m          [0m| 0/399 [00:00<?, ?it/s]

Elaborating audio 20190601_150000.WAV...


Processing segments...: 100%|[31m██████████[0m| 399/399 [00:00<00:00, 506.29it/s]
Processing segments...:   0%|[31m          [0m| 0/399 [00:00<?, ?it/s]

Elaborating audio 20190601_160000.WAV...


Processing segments...: 100%|[31m██████████[0m| 399/399 [00:00<00:00, 1213.63it/s]
Processing segments...:   0%|[31m          [0m| 0/399 [00:00<?, ?it/s]

Elaborating audio 20190601_170000.WAV...


Processing segments...: 100%|[31m██████████[0m| 399/399 [00:00<00:00, 1219.54it/s]
Processing segments...:   0%|[31m          [0m| 0/399 [00:00<?, ?it/s]

Elaborating audio 20190601_180000.WAV...


Processing segments...: 100%|[31m██████████[0m| 399/399 [00:00<00:00, 776.49it/s]
Processing segments...:   0%|[31m          [0m| 0/399 [00:00<?, ?it/s]

Elaborating audio 20190601_190000.WAV...


Processing segments...: 100%|[31m██████████[0m| 399/399 [00:00<00:00, 893.03it/s] 
Processing segments...:   0%|[31m          [0m| 0/399 [00:00<?, ?it/s]

Elaborating audio 20190601_210000.WAV...


Processing segments...: 100%|[31m██████████[0m| 399/399 [00:00<00:00, 718.51it/s] 
Processing segments...:   0%|[31m          [0m| 0/399 [00:00<?, ?it/s]

Elaborating audio 20190601_230000.WAV...


Processing segments...: 100%|[31m██████████[0m| 399/399 [00:00<00:00, 1239.92it/s]


In [14]:
# count segments by species
target_path = f"{DATASET_PATH}/test"
species_count = {species: len(os.listdir(os.path.join(target_path, species))) for species in os.listdir(target_path)}
species_count_df = pd.DataFrame(list(species_count.items()), columns=["Species", "Count"])
species_count_df.sort_values(by="Count", ascending=False).reset_index(drop=True)

Unnamed: 0,Species,Count
0,,4907
1,Fringilla coelebs_Common Chaffinch,1067
2,Phylloscopus collybita_Common Chiffchaff,674
3,Erithacus rubecula_European Robin,556
4,Sylvia atricapilla_Eurasian Blackcap,493
5,Turdus merula_Eurasian Blackbird,315
6,Regulus ignicapilla_Common Firecrest,238
7,Wind,189
8,Troglodytes troglodytes_Eurasian Wren,111
9,Muscicapa striata_Spotted Flycatcher,108


# WABAD Segments Extraction

For the WABAD dataset, a segment extraction strategy similar to the previous one is employed, with key adaptations to address its unique characteristics.

Initially, the focus is specifically on the **less represented species**. In this particular analysis, species with **fewer than 750 occurrences** were targeted, while the "non-species" class was excluded.

Next, the process involves extracting annotations directly from WABAD. This requires referencing a list of specific site datasets to download, located in the `wabad_datasets.txt` file. Once the annotations are gathered, audio segments are created. This follows the **same robust pipeline** used previously: transforming raw category and audio annotations (`category_annots`, `audio_annots`), saving audio details (`audio_info`), and then feeding into the generation of labeled segments (`true_segments`).

Finally, given the abundance of unannotated segments (exceeding 10,000 samples of "None"), they have been excluded from this analysis.

In [8]:
# get species from category_info file, filter non-species name
species_common_name_list = [species.split("_")[0] for species in list(category_annots.keys()) if len(species.split("_")) > 1]

In [36]:
minority_threshold = 750
train_folder = '/home/giacomoschiavo/segments/PROVA/train'
train_species = os.listdir(train_folder)
species_count = {species : len(os.listdir(os.path.join(train_folder, species))) for species in train_species}
species_to_augment = [species.split("_")[0] for species in train_species if species_count[species] <= minority_threshold and len(species.split("_")) != 1]
species_to_augment

['Muscicapa striata',
 'Periparus ater',
 'Regulus regulus',
 'Dryocopus martius',
 'Certhia familiaris',
 'Turdus merula',
 'Loxia curvirostra',
 'Dendrocopos major',
 'Lophophanes cristatus']

In [37]:
# 1. Locate site dataset list: `wabad_datasets.txt` in the `utils` folder.
# 2. Manually download and extract ALL listed datasets.
# 3. Place the extracted datasets into the designated `WABAD` folder.

# this is an example of the outcome
# E:\Giacomo\Tovanella\WABAD\BAM\BAM\Raven Pro annotations\BAM_20151116_060801.txt
# WABAD_PATH = "E:/Giacomo/Tovanella/WABAD"
WABAD_PATH = "/home/giacomoschiavo/WABAD/audio"

In [38]:
def extract_wabad_info(folder_path):
    # folder_path = ".../WABAD/BIAL/BIAL/Raven Pro annotations"
    audio_info_wabad = {}
    category_info_wabad = {}
    for txt_file in os.listdir(folder_path):
        complete_path = os.path.join(folder_path, txt_file)
        with open(complete_path, newline='', encoding='utf-8') as csvfile:
            reader = csv.DictReader(csvfile, delimiter='\t') 
            for row in reader:
                if row["Species"] not in species_to_augment or "End Time (s)" not in row:
                    continue
                file_name = txt_file.replace(".txt", ".wav")
                start_time = float(row["Begin Time (s)"])
                end_time = float(row["End Time (s)"])
                duration = end_time - start_time
                common_name = species_dict[row["Species"]]
                label = f"{row['Species']}_{common_name}"

                audio_info_entry = {
                    "scientific_name": row["Species"],
                    "common_name": common_name, 
                    "start_time": start_time,
                    "duration": duration,
                    "label": label
                }
                category_info_entry = {
                    "file_name": file_name,
                    "start_time": start_time,
                    "duration": duration,
                    "label": label
                }
                
                if file_name not in audio_info_wabad:
                    audio_info_wabad[file_name] = []
                if label not in category_info_wabad:
                    category_info_wabad[label] = []
                audio_info_wabad[file_name].append(audio_info_entry)
                category_info_wabad[label].append(category_info_entry)
    return audio_info_wabad, category_info_wabad

In [39]:
audio_annots_wabad = {}
category_annots_wabad = {}

for folder in os.listdir(WABAD_PATH):
    if not os.path.isdir(os.path.join(WABAD_PATH, folder)):
        continue
    annotations = os.path.join(WABAD_PATH, folder, folder, "Raven Pro annotations")
    audio_info_update, category_info_update = extract_wabad_info(annotations)
    for audio in audio_info_update.keys():
        if audio not in audio_annots_wabad:
            audio_annots_wabad[audio] = []
        audio_annots_wabad[audio].extend(audio_info_update[audio])
    for category in category_info_update.keys():
        if category not in category_annots_wabad:
            category_annots_wabad[category] = []
        category_annots_wabad[category].extend(category_info_update[category])    

with open("utils/WABAD/audio_annots_wabad.json", 'w', encoding='utf-8') as jsonfile:
    json.dump(audio_annots_wabad, jsonfile)

with open("utils/WABAD/category_annots_wabad.json", 'w', encoding='utf-8') as jsonfile:
    json.dump(category_annots_wabad, jsonfile)

In [40]:
# show contribution of WABAD for every species
species_count_wabad = {species_name: len(segms) for species_name, segms in category_annots_wabad.items()}
species_count_wabad_df = pd.DataFrame(list(species_count_wabad.items()), columns=["Species", "Count WABAD"])
merged_df = pd.merge(species_count_df, species_count_wabad_df, on="Species", how="inner")
merged_df.sort_values(by=["Count"], ascending=False)

Unnamed: 0,Species,Count,Count WABAD
5,Turdus merula_Eurasian Blackbird,315,2308
0,Muscicapa striata_Spotted Flycatcher,108,117
6,Loxia curvirostra_Common Crossbill,52,25
1,Periparus ater_Coal Tit,28,768
7,Dendrocopos major_Great Spotted Woodpecker,25,242
3,Dryocopus martius_Black Woodpecker,21,32
8,Lophophanes cristatus_Crested Tit,14,132
2,Regulus regulus_Goldcrest,12,437
4,Certhia familiaris_Eurasian Treecreeper,3,96


In [42]:
# move all WABAD audio in a unique folder -> run "move_files.py" in the VM
WABAD_PATH = "/home/giacomoschiavo/WABAD/audio"
WABAD_AUDIO_SOURCE = "/home/giacomoschiavo/WABAD/all_wabad_audio"
# for folder in os.listdir(WABAD_PATH):
#     if not os.path.isdir(os.path.join(WABAD_PATH, folder)):
#         continue
#     # ...\BAM\BAM\Recordings
#     folder_path = os.path.join(WABAD_PATH, folder, folder, "Recordings")
#     all_audio = os.listdir(folder_path)
#     for audio in all_audio:
#         if audio.upper() in audio_annots_wabad.keys():
#             os.rename(
#                 os.path.join(folder_path, audio),
#                 os.path.join(WABAD_AUDIO_SOURCE, audio)
#             )
    

In [43]:
# with open("utils/audio_info_wabad.json") as f:
#     audio_info_wabad = json.load(f)

audio_info_wabad = utils.generate_audio_info(WABAD_AUDIO_SOURCE, audio_annots_wabad)
with open("utils/WABAD/audio_info_wabad.json", "w") as f:
    json.dump(audio_info_wabad, f)

In [44]:
true_segments_wabad = utils.generate_true_segments(audio_annots_wabad, audio_info_wabad)
with open("utils/WABAD/true_segments_wabad.json", "w") as f:
    json.dump(true_segments_wabad, f)

In [45]:
# here we generate the segments for WABAD in WABAD_SEGMENTS_PATH folder
WABAD_SEGMENTS_PATH = "/home/giacomoschiavo/WABAD/segments"
os.makedirs(WABAD_SEGMENTS_PATH, exist_ok=True)

In [46]:
utils.generate_segments(WABAD_AUDIO_SOURCE, WABAD_SEGMENTS_PATH, true_segments_wabad, audio_info_wabad, generate_None=False)

Processing segments for PITI_20220313_070800.wav...: 100%|[31m██████████[0m| 39/39 [00:00<00:00, 3779.26it/s]
Processing segments for OESF_20230518_060601.wav...: 100%|[31m██████████[0m| 39/39 [00:00<00:00, 2997.30it/s]
Processing segments for OESF_20230611_144932.wav...: 100%|[31m██████████[0m| 39/39 [00:00<00:00, 3135.24it/s]
Processing segments for PINA_20220603_082300.wav...: 100%|[31m██████████[0m| 39/39 [00:00<00:00, 2543.39it/s]
Processing segments for PINA_20220506_073700.wav...: 100%|[31m██████████[0m| 39/39 [00:00<00:00, 2351.71it/s]
Processing segments for PINA_20220502_083700.wav...: 100%|[31m██████████[0m| 39/39 [00:00<00:00, 2756.20it/s]
Processing segments for PINA_20220504_070800.wav...: 100%|[31m██████████[0m| 39/39 [00:00<00:00, 2683.14it/s]
Processing segments for PINA_20220502_083200.wav...: 100%|[31m██████████[0m| 39/39 [00:00<00:00, 2656.65it/s]
Processing segments for PINA_20220506_074900.wav...: 100%|[31m██████████[0m| 39/39 [00:00<00:00, 2708.

In [47]:
species_count_wabad_fr = {}
for species in os.listdir(WABAD_SEGMENTS_PATH):
    species_count_wabad_fr[species] = len(os.listdir(os.path.join(WABAD_SEGMENTS_PATH, species)))

species_count_wabad_fr_df = pd.DataFrame(list(species_count_wabad_fr.items()), columns=["Species", "Count WABAD FR"])
merged_df = pd.merge(species_count_df, species_count_wabad_fr_df, on="Species", how="inner")
merged_df.sort_values(by=["Count"], ascending=False)

Unnamed: 0,Species,Count,Count WABAD FR
4,,4907,12951
6,Turdus merula_Eurasian Blackbird,315,8239
0,Muscicapa striata_Spotted Flycatcher,108,23
7,Loxia curvirostra_Common Crossbill,52,122
1,Periparus ater_Coal Tit,28,3661
8,Dendrocopos major_Great Spotted Woodpecker,25,578
3,Dryocopus martius_Black Woodpecker,21,112
9,Lophophanes cristatus_Crested Tit,14,338
2,Regulus regulus_Goldcrest,12,1086
5,Certhia familiaris_Eurasian Treecreeper,3,250
