<a href="https://colab.research.google.com/github/lydiakatsis/zsl-acoustic-monitoring-scripts/blob/main/Batch_run_birdnet_googledrive.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## **Run BirdNet Lite on GoogleDrive folder**

Code from https://github.com/kahst/BirdNET-Lite 

Forked version used, as was adapted to run through all files in directory: https://github.com/UCSD-E4E/BirdNET-Lite 

Now using my own forked repo as uploaded a bird list to run with the model
https://github.com/lydiakatsis/BirdNET-Lite.git 


# BirdNET-Lite
TFLite version of BirdNET. Bird sound recognition for more than 6,000 species worldwide.

Center for Conservation Bioacoustics, Cornell Lab of Ornithology, Cornell University. Go to https://birdnet.cornell.edu to learn more about the project.

You can run BirdNET via the command line. You can add a few parameters that affect the output.

The input parameters include:

```
--i, Path to input folder. All the nested folders will also be processed.
--o, Path to output folder. By default results are written into the input folder.
--lat, Recording location latitude. Set -1 to ignore.
--lon, Recording location longitude. Set -1 to ignore.
--week, Week of the year when the recording was made. Values in [1, 48] (4 weeks per month). Set -1 to ignore.
--overlap, Overlap in seconds between extracted spectrograms. Values in [0.0, 2.9]. Defaults tp 0.0.
--sensitivity, Detection sensitivity; Higher values result in higher sensitivity. Values in [0.5, 1.5]. Defaults to 1.0.
--min_conf, Minimum confidence threshold. Values in [0.01, 0.99]. Defaults to 0.1.
--custom_list, Path to text file containing a list of species. Not used if not provided.
--filetype, Filetype of soundscape recordings. Defaults to 'wav'.
```

Note: A custom species list needs to contain one species label per line. Take a look at the `model/label.txt` for the correct species label. Only labels from this text file are valid. You can find an example of a valid custom list in the 'example' folder.

Here are two example commands to run this BirdNET version:

```

python3 analyze.py --i 'example/XC558716 - Soundscape.mp3' --lat 35.4244 --lon -120.7463 --week 18

python3 analyze.py --i 'example/XC563936 - Soundscape.mp3' --lat 47.6766 --lon -122.294 --week 11 --overlap 1.5 --min_conf 0.25 --sensitivity 1.25 --custom_list 'example/custom_species_list.txt'

```

Note: Please make sure to provide lat, lon, and week. BirdNET will work without these values, but the results might be less reliable.

The results of the anlysis will be stored in a result file in CSV format. All confidence values are raw prediction scores and should be post-processed to eliminate occasional false-positive results.


# Load libaries

In [22]:
import glob as glob
import pandas as pd
import datetime
from datetime import datetime, timedelta
import os
import numpy as np
from google.colab import drive
import random
from pathlib import Path

# Clone the required Git repos

In [2]:
!git clone https://github.com/lydiakatsis/BirdNET-Lite.git

Cloning into 'BirdNET-Lite'...
remote: Enumerating objects: 42, done.[K
remote: Total 42 (delta 0), reused 0 (delta 0), pack-reused 42[K
Unpacking objects: 100% (42/42), done.


**Copy the python scripts required to run the detector**

In [3]:
!cp /content/BirdNET-Lite/analyze.py .

# Mount Google Drive
For instructions mounting Teams or downloading data from GoogleCloud Bucket go here.


In [4]:
drive.mount('/content/drive')

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


In [5]:
os.chdir('/content/BirdNET-Lite')

# Run the model


Provide folder name and will recursively analyse all files in folder, saving a csv for each file titled with the file name next to the sound file

In [6]:
# Change folder to where sound files are stored
folder = "/content/drive/MyDrive/ZSL/Network Rail project/trial_data/"    

In [7]:
# Make sure to update lat, long, week, and min_conf 
!python /content/BirdNET-Lite/analyze.py --i "$folder" --lat 51.507359 --lon -0.136439 --week 9 --min_conf 0.5 --sensitivity 0.85 --custom_list 'london_birdnet_a.txt'


INFO: Created TensorFlow Lite delegate for select TF ops.
INFO: TfLiteFlexDelegate delegate: 1 nodes delegated out of 182 nodes with 1 partitions.

INFO: Created TensorFlow Lite XNNPACK delegate for CPU.
LOADING TF LITE MODEL... DONE!
READING AUDIO DATA... tcmalloc: large alloc 1382400000 bytes == 0x88a8000 @  0x7fc73c4601e7 0x7fc7063ba0ce 0x7fc706410cf5 0x7fc706410f4f 0x7fc7064b3673 0x58f62c 0x510bf2 0x58fd37 0x510325 0x5b4ee6 0x58ff2e 0x50d482 0x5b4ee6 0x58ff2e 0x50d482 0x5b4ee6 0x58ff2e 0x50c4fc 0x58fd37 0x50c4fc 0x5b4ee6 0x6005a3 0x607796 0x60785c 0x60a436 0x64db82 0x64dd2e 0x7fc73c05dc87 0x5b636a
DONE! READ 2400 CHUNKS.
ANALYZING AUDIO... DONE! Time 242.6 SECONDS
WRITING RESULTS TO /content/drive/MyDrive/ZSL/Network Rail project/trial_data/20220304_172600.csv ... DONE! WROTE 164 RESULTS.
READING AUDIO DATA... tcmalloc: large alloc 1382400000 bytes == 0x5e678000 @  0x7fc73c4601e7 0x7fc7063ba0ce 0x7fc706410cf5 0x7fc706410f4f 0x7fc7064b3673 0x58f62c 0x510bf2 0x58fd37 0x510325 0x5b4ee

# Read in results csvs and reformat to make easier to interpret
This code reads in all the invididual csvs created, and compiles into one, with information of file name, date and time added. Resulting csv written to folder location, under name of 'concatenated_results.csv'

In [10]:
results_list = glob.glob(folder +  '/*.csv')

In [13]:
li = []

for filename in results_list:
    df = pd.read_csv(filename, sep=';')
    df['file_name'] = os.path.splitext(os.path.basename(filename))[0]
    df['path'] = filename[:-3]+'WAV'
    df['date'] = [f[0:7] for f in df['file_name']]
    df['time'] = [f[9:15] for f in df['file_name']]

    li.append(df)

frame = pd.concat(li, axis=0, ignore_index=True)

frame.to_csv(folder + '/concatenated_results.csv')

# Results summary - create list of unique species IDs
- later - show sample of spectrograms with labels printed

In [14]:
print('number of species:', frame['Common name'].nunique()  )

number of species: 22


In [15]:
frame['Common name'].value_counts()

European Robin              323
Eurasian Wren               139
Great Tit                    65
Eurasian Collared-Dove       54
Rose-ringed Parakeet         39
Great Spotted Woodpecker     26
Eurasian Magpie               9
Dunnock                       9
Common Wood-Pigeon            7
Eurasian Blackbird            7
Eurasian Blue Tit             7
Carrion Crow                  3
European Pied Flycatcher      2
Eurasian Siskin               2
Common Firecrest              1
Spotted Flycatcher            1
Black Redstart                1
European Goldfinch            1
Eurasian Treecreeper          1
Long-eared Owl                1
Goldcrest                     1
Hawfinch                      1
Name: Common name, dtype: int64

# Create separate folders for validation
The below code will copy x random examples of sounds classified as each species into a new folder called 'validation', a separate folder entitled the species name will be created for each species, containing examples to validate manually.

In [19]:
random.seed(0)
np.random.seed(0)

In [47]:
validation_folders = folder + 'validation'
Path(validation_folders).mkdir(exist_ok=True)

In [24]:
bird_results = pd.read_csv(folder + 'concatenated_results.csv')

In [32]:
# Select up to 50 random files for each species
size = 50        # sample size
replace = True  # with replacement
fn = lambda obj: obj.loc[np.random.choice(obj.index, size, replace),:]
bird_results_random = bird_results.groupby('Common name', as_index=False).apply(fn)

# Remove duplcates
bird_results_random = bird_results_random.drop_duplicates(subset=['Common name','path','Start (s)'])


In [None]:
!pip install opensoundscape


In [40]:
from opensoundscape.audio import Audio

In [43]:
# For each species, create a folder in val-folders, name is species name
# Read file, and trim between start and end time
# write file into new folder

grouped = bird_results_random.groupby('Common name')

for group_name, df_group in grouped:
    save_dir = validation_folders + '/' + group_name
    print('making new folder: ' + save_dir)
    Path(save_dir).mkdir(exist_ok=True)

    for row_index, row in df_group.iterrows():
        file_path = row['path']
        basename = os.path.basename(file_path)
        start = row['Start (s)']
        end = row['End (s)']
        new_name = save_dir + '/' + str(start) + '_' + basename
        print(new_name)
        audio_object = Audio.from_file(file_path).trim(start,end)
        audio_object.save(new_name)

        
        
    print(");")    

making new folder: /content/drive/MyDrive/ZSL/Network Rail project/trial_data//validation/Black Redstart
/content/drive/MyDrive/ZSL/Network Rail project/trial_data//validation/Black Redstart/495.0_20220304_172600.WAV
);
making new folder: /content/drive/MyDrive/ZSL/Network Rail project/trial_data//validation/Carrion Crow
/content/drive/MyDrive/ZSL/Network Rail project/trial_data//validation/Carrion Crow/7041.0_20220305_062700.WAV
/content/drive/MyDrive/ZSL/Network Rail project/trial_data//validation/Carrion Crow/1875.0_20220305_062700.WAV


KeyboardInterrupt: ignored

In [55]:
from pandas import ExcelWriter
list_dfs = []

folders = sorted(glob.glob(validation_folders + '/*'))

for folder in folders:
    species = os.path.basename(folder)
    files = sorted(glob.glob(folder + '/*.WAV'))
    names = [os.path.basename(x) for x in files]
    df = pd.DataFrame(data=files, columns = ['filename'])
    df['number'] = [float(a.split('_')[0]) for a in names]
    df['name'] = names
    df['Insert verification column here...'] = ''
    df = df.sort_values(by=['number'])
    df = df.drop(columns='number')
    df['species'] = species
    list_dfs.append(df)

df = pd.concat(list_dfs)
df = df.reset_index()
df = df[['species', 'filename', 'name', 'Insert verification column here...']]
df.to_excel(validation_folders + '/validation_spreadsheet.xlsx' , index=False)
