# Run BirdNet on Bucket files #

* Create a user managed vertex notebook on GCP
* This script doesnt use GPUs, so only select CPU resources
* Model is TFLite -- you can use the default env it gives you, which is Tensorflow
* Drag this script into the home directory that pops up when you open the notebook and run everything from there.

Load packages

In [1]:
import glob as glob
import pandas as pd
import datetime
from datetime import datetime, timedelta
import os
import numpy as np
import random
from pathlib import Path

In [17]:
!pip install librosa

Collecting librosa
  Downloading librosa-0.9.2-py3-none-any.whl (214 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m214.3/214.3 kB[0m [31m15.9 MB/s[0m eta [36m0:00:00[0m
Collecting soundfile>=0.10.2
  Downloading soundfile-0.11.0-py2.py3-none-any.whl (23 kB)
Collecting audioread>=2.1.9
  Downloading audioread-3.0.0.tar.gz (377 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m377.0/377.0 kB[0m [31m37.5 MB/s[0m eta [36m0:00:00[0m
[?25h  Preparing metadata (setup.py) ... [?25ldone
[?25hCollecting resampy>=0.2.2
  Downloading resampy-0.4.2-py3-none-any.whl (3.1 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m3.1/3.1 MB[0m [31m68.2 MB/s[0m eta [36m0:00:00[0m:00:01[0m
Collecting pooch>=1.0
  Downloading pooch-1.6.0-py3-none-any.whl (56 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m56.3/56.3 kB[0m [31m11.6 MB/s[0m eta [36m0:00:00[0m
Collecting appdirs>=1.3.0
  Downloading appdirs-1.4.4-py2.py

## Mount the Google Cloud bucket ##

In [3]:
# Change this if using a different bucket
bucket_name = 'nr-acoustic-data'

In [4]:
!mountpoint -q /home/jupyter/gcs && echo "mounted" || gcsfuse --implicit-dirs --rename-dir-limit=100 --disable-http2 --max-conns-per-host=100 bucket_name "/home/jupyter/gcs"

mounted


## Clone BirdNet repo ##

In [2]:
!git clone https://github.com/lydiakatsis/BirdNET-Lite.git

Cloning into 'BirdNET-Lite'...
remote: Enumerating objects: 45, done.[K
remote: Counting objects: 100% (3/3), done.[K
remote: Compressing objects: 100% (3/3), done.[K
remote: Total 45 (delta 0), reused 0 (delta 0), pack-reused 42[K
Unpacking objects: 100% (45/45), done.


In [3]:
!cp BirdNET-Lite/analyze_.py .

In [7]:
# Change these folders
input_folder = "gcs/MSD-56/"
results_folder ="gcs/output/"

In [8]:
# Make sure to update lat, long, week, and min_conf 
# This version uses a specific bird list of species found in London

!python analyze.py --i "$input_folder" --o "$results_folder" --lat 51.507359 --lon -0.136439 --week 9 --min_conf 0.5 --sensitivity 0.85 --custom_list 'BirdNET-Lite/london_birdnet_a.txt'

INFO: Created TensorFlow Lite delegate for select TF ops.
INFO: TfLiteFlexDelegate delegate: 1 nodes delegated out of 182 nodes with 1 partitions.

INFO: Created TensorFlow Lite XNNPACK delegate for CPU.
LOADING TF LITE MODEL... DONE!
READING AUDIO DATA... DONE! READ 2 CHUNKS.
ANALYZING AUDIO... DONE! Time 0.2 SECONDS
WRITING RESULTS TO gcs/output/MSD-56/20220302_090845.csv ... DONE! WROTE 0 RESULTS.
READING AUDIO DATA... DONE! READ 200 CHUNKS.
ANALYZING AUDIO... DONE! Time 15.8 SECONDS
WRITING RESULTS TO gcs/output/MSD-56/20220302_120000.csv ... DONE! WROTE 1 RESULTS.
READING AUDIO DATA... DONE! READ 200 CHUNKS.
ANALYZING AUDIO... DONE! Time 16.0 SECONDS
WRITING RESULTS TO gcs/output/MSD-56/20220302_130000.csv ... DONE! WROTE 20 RESULTS.
READING AUDIO DATA... DONE! READ 200 CHUNKS.
ANALYZING AUDIO... DONE! Time 15.9 SECONDS
WRITING RESULTS TO gcs/output/MSD-56/20220302_140000.csv ... DONE! WROTE 23 RESULTS.
READING AUDIO DATA... DONE! READ 200 CHUNKS.
ANALYZING AUDIO... ^C
Error in pr

## Read in results csvs and reformat to make easier to interpret ##
This code reads in all the invididual csvs created, and compiles into one, with information of file name, date and time added. Resulting csv written to folder location, under name of 'concatenated_results.csv'

In [9]:
results_list = glob.glob(results_folder +  '/*/*.csv')

In [11]:
li = []

for filename in results_list:
    df = pd.read_csv(filename, sep=';')
    df['file_name'] = os.path.splitext(os.path.basename(filename))[0]
    df['path'] = filename[:-3]+'WAV'
    df['date'] = [f[0:7] for f in df['file_name']]
    df['time'] = [f[9:15] for f in df['file_name']]

    li.append(df)

frame = pd.concat(li, axis=0, ignore_index=True)

frame.to_csv(results_folder + 'concatenated_results.csv')

## Results summary - create list of unique species IDs ##

In [12]:
print('number of species:', frame['Common name'].nunique()  )

number of species: 24


In [13]:
frame['Common name'].value_counts()

Eurasian Wren               107
European Robin               88
Great Tit                    38
Eurasian Blue Tit            32
Eurasian Magpie               9
Long-tailed Tit               5
Hawfinch                      5
Dunnock                       4
Eurasian Collared-Dove        4
Redwing                       3
Great Spotted Woodpecker      3
Eurasian Treecreeper          2
Short-toed Treecreeper        2
Carrion Crow                  2
Rose-ringed Parakeet          2
Eurasian Bullfinch            1
Spotted Flycatcher            1
Coal Tit                      1
European Serin                1
Eurasian Hoopoe               1
Common Firecrest              1
European Goldfinch            1
Song Thrush                   1
Eurasian Nutcracker           1
Name: Common name, dtype: int64