<a href="https://colab.research.google.com/github/lydiakatsis/zsl-acoustic-monitoring-scripts/blob/main/Birdnet-CNN/Run_BirdNet_Colab.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Run BirdNet on Colab #

## Instructions ##
 
1. The only part of this script you will need to change is the directory of audio files and results folders.
    * Change these file paths to the paths of your input folder and output folder within the Bucket.
2. Once you have set the file directories, select Run -> Run all cells from the menu above.
3. The final argument should shut down the instance when this has finished, however if there is an error at some point or the script becomes unresponsive, then this argument will not be executed, so you should check in on progress of script periodically.
4. If scripts becomes unresponsive before it has completed, then restart the kernel. The script is unresponsive if the boxes next to the current command changes from [*] / [number] to [ ] 

**Source of potential errors:** 

* The notebook may become unresponsive after running for a long time - you will have to restart notebook and code again.

* Some files will not be analysed as there will be several 0MB files recorded by the AudioMoth, so don't be concerned if some files can't be analysed.





## Clone repo ##

In [None]:
# This is the new one - more species, and other things also!!
![ -d "/content/BirdNET-Analyzer" ] && echo "Scripts are downloaded" || git clone https://github.com/kahst/BirdNET-Analyzer.git
!wget -O BirdNET-Analyzer/species_list.txt "https://www.dropbox.com/s/3ji3dzxs9gsa6t7/london_birdnet_a.txt?dl=0"

In [None]:
# Install libraries
!pip show librosa && echo "librosa installed" || pip install librosa
!pip show tensorflow && echo "tensorflow installed" || pip install tensorflow 


## 1. Running on Google Cloud Bucket data

Mount Google Cloud Bucket so can access data like a local directory.

In [None]:
# Make sure GCSFUSE in installed so can mount bucket
!echo "deb http://packages.cloud.google.com/apt gcsfuse-bionic main" > /etc/apt/sources.list.d/gcsfuse.list
!curl https://packages.cloud.google.com/apt/doc/apt-key.gpg | apt-key add -
!apt -qq update
!apt -qq install gcsfuse

In [5]:
# Authenication and PROJECT_ID allocation - change PROJECT_ID if necessary
from google.colab import auth
auth.authenticate_user()
PROJECT_ID = "zsl-acoustic-pipeline"

In [None]:
# Mount bucket - change 'acoustic-data-raw' to bucket name with raw data, and 'acoustic-processing-outputs' to bucket name for output storage.
!mountpoint -q /content/gcs_raw && echo "mounted" || mkdir -p /content/gcs_raw; gcsfuse --implicit-dirs --rename-dir-limit=100 --disable-http2 --max-conns-per-host=100 "acoustic-data-raw" "/content/gcs_raw"
!mountpoint -q /content/gcs_outputs && echo "mounted" || mkdir -p /content/gcs_outputs; gcsfuse --implicit-dirs --rename-dir-limit=100 --disable-http2 --max-conns-per-host=100 "acoustic-processing-outputs" "/content/gcs_outputs"


**OR**

## 2. Running on Google Drive data

Mount Google Drive so can access like local directory

In [None]:
from google.colab import drive
drive.mount('/content/drive')

## Set folders for input (i.e. sound folders to analyse) and output ##

These folders will be mounted on the left, so will start with '/home/jupyter/' but they are accessing files from the Google Cloud Bucket that you mounted. Make sure input_folder is the raw audio folder, and the results_folder is within gcs_outputs.

**All you need to change from below is 'trial_data_2021' on both lines to the new folder.**

# &darr; &darr; &darr; &darr; &darr; &darr; &darr; &darr; &darr; &darr; &darr; &darr; #

In [7]:
# Change these folders
# Make sure output is in a gcs bucket
input_folder = "/content/gcs_raw/trial_data_2021/bird-config/"
results_folder ="/content/gcs_outputs/trial_data_2021/birdnet/"

# &uarr; &uarr; &uarr; &uarr; &uarr; &uarr; &uarr; &uarr; &uarr; &uarr; &uarr; &uarr; #

# Run analyser #

In [9]:
cd /content/BirdNET-Analyzer

/content/BirdNET-Analyzer


In [10]:
# Adjust arguments for lat, lon, week of year and species list accordingly
# species_list.txt is a list of species occurring in London for use in Network Rail analysis
!python analyze.py --i "$input_folder" --o "$results_folder" --lat 51.507359 --lon -0.136439 --week 11 --min_conf 0.8 --slist 'species_list.txt' --threads 2

INFO: Created TensorFlow Lite XNNPACK delegate for CPU.
Species list contains 109 species
Found 2 files to analyze
Analyzing /content/gcs_raw/trial_data_2021/bird-config/MSD-X/20220306_062700.WAV
Analyzing /content/gcs_raw/trial_data_2021/bird-config/MSD-Y/20220303_172600.WAV
tcmalloc: large alloc 1382400000 bytes == 0x8878000 @  0x7f54d8db41e7 0x7f54d65fe14e 0x7f54d6656745 0x7f54d66569bf 0x7f54d66f9773 0x5aae14 0x49abe4 0x4fd2db 0x4997c7 0x4fd8b5 0x49abe4 0x55cd91 0x5d8941 0x49abe4 0x55cd91 0x5d8941 0x4997c7 0x5d8868 0x4990ca 0x5d8868 0x532594 0x5d1e94 0x5d8cdf 0x55dc1e 0x5d8868 0x5d8506 0x55f797 0x55cd91 0x5d8941 0x5d8506 0x55f797
tcmalloc: large alloc 1382400000 bytes == 0x8878000 @  0x7f54d8db41e7 0x7f54d65fe14e 0x7f54d6656745 0x7f54d66569bf 0x7f54d66f9773 0x5aae14 0x49abe4 0x4fd2db 0x4997c7 0x4fd8b5 0x49abe4 0x55cd91 0x5d8941 0x49abe4 0x55cd91 0x5d8941 0x4997c7 0x5d8868 0x4990ca 0x5d8868 0x532594 0x5d1e94 0x5d8cdf 0x55dc1e 0x5d8868 0x5d8506 0x55f797 0x55cd91 0x5d8941 0x5d8506 0x55

# Concatenate results into one csv #

In [11]:
import glob as glob
import pandas as pd
import os
from datetime import date, datetime

In [12]:
d = date.today()
d = str(d)
results_list = glob.glob(results_folder +  '*/*.txt')
metadata = pd.read_csv('/content/gcs_raw/nr-acoustic-data/metadata/NR_deployment_2022_ARUs.csv')

li = []

for filename in results_list:
    df = pd.read_csv(filename, sep='\t')
    df['file_name'] = os.path.splitext(os.path.basename(filename))[0]
    df['path'] = filename[:-3]+'WAV'
    df['datetime'] = [f[0:15] for f in df['file_name']]
    df['time'] = pd.to_datetime(df['datetime'], format='%Y%m%d_%H%M%S') +  pd.to_timedelta(df["Begin Time (s)"], unit='s')
    df['date'] = [f[0:7] for f in df['file_name']]
    df['ID'] = os.path.basename(os.path.dirname(filename))
    li.append(df)

frame = pd.concat(li, axis=0, ignore_index=True)
frame = pd.merge(frame, metadata, on='ID', how='inner')
frame.to_csv(results_folder + d + '_concatenated_results_birdnet.csv')