<a href="https://colab.research.google.com/github/lydiakatsis/zsl-acoustic-monitoring-scripts/blob/main/Birdnet-CNN/Run_BirdNET_Vertex_AI.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Run BirdNet on Cloud Bucket files in Vertex AI #

## Instructions ##
 
1. The only part of this script you will need to change is the directory of audio files and results folders.
    * Change these file paths to the paths of your input folder and output folder within the Bucket.
2. Once you have set the file directories, select Run -> Run all cells from the menu above.
3. The final argument should shut down the instance when this has finished, however if there is an error at some point or the script becomes unresponsive, then this argument will not be executed, so you should check in on progress of script periodically.
4. If scripts becomes unresponsive before it has completed, then restart the kernel. The script is unresponsive if the boxes next to the current command changes from [*] / [number] to [ ] 

**Source of potential errors:** 
* If you have set up a new notebook with a lower number of CPUs, then the number of threads argument on cell 6 may need changing to a lower number.

* The notebook may become unresponsive after running for a long time - you will have to restart notebook and code again.

* Some files will not be analysed as there will be several 0MB files recorded by the AudioMoth, so don't be concerned if some files can't be analysed.





## Mount buckets ##

In [None]:
# Mount raw data bucket - this bucket contains all the wav files 
!mountpoint -q /home/jupyter/gcs_raw && echo "mounted" || mkdir -p gcs_raw; gcsfuse --implicit-dirs --rename-dir-limit=100 --disable-http2 --max-conns-per-host=100 "acoustic-data-raw" "/home/jupyter/gcs_raw"
# Mount outputs bucket - results csvs will be written to this bucket
!mountpoint -q /home/jupyter/gcs_outputs && echo "mounted" || mkdir -p gcs_outputs; gcsfuse --implicit-dirs --rename-dir-limit=100 --disable-http2 --max-conns-per-host=100 "acoustic-processing-outputs" "/home/jupyter/gcs_outputs"

2022/11/24 10:48:38.247257 Start gcsfuse/0.41.8 (Go version go1.18.4) for app "" using mount point: /home/jupyter/gcs_raw
2022/11/24 10:48:38.264325 Opening GCS connection...
2022/11/24 10:48:38.395506 Mounting file system "acoustic-data-raw"...
2022/11/24 10:48:38.419094 File system has been successfully mounted.
2022/11/24 10:48:38.551976 Start gcsfuse/0.41.8 (Go version go1.18.4) for app "" using mount point: /home/jupyter/gcs_outputs
2022/11/24 10:48:38.566143 Opening GCS connection...
2022/11/24 10:48:38.672340 Mounting file system "acoustic-processing-outputs"...
2022/11/24 10:48:38.705055 File system has been successfully mounted.


## Set folders for input (i.e. sound folders to analyse) and output ##

These folders will be mounted on the left, so will start with '/home/jupyter/' but they are accessing files from the Google Cloud Bucket that you mounted. Make sure input_folder is the raw audio folder, and the results_folder is within gcs_outputs.

**All you need to change from below is 'trial_data_2021' on both lines to the new folder.**

# &darr; &darr; &darr; &darr; &darr; &darr; &darr; &darr; &darr; &darr; &darr; &darr; #

In [None]:
# Change these folders
# Make sure output is in a gcs bucket
input_folder = "/home/jupyter/gcs_raw/trial_data_2021/bird-config/"
results_folder ="/home/jupyter/gcs_outputs/trial_data_2021/birdnet/"

# &uarr; &uarr; &uarr; &uarr; &uarr; &uarr; &uarr; &uarr; &uarr; &uarr; &uarr; &uarr; #

## Clone repo ##

In [None]:
# This is the new one - more species, and other things also!!
![ -d "/home/jupyter/BirdNET-Analyzer" ] && echo "Scripts are downloaded" || git clone https://github.com/kahst/BirdNET-Analyzer.git
!wget -O BirdNET-Analyzer/species_list.txt "https://www.dropbox.com/s/3ji3dzxs9gsa6t7/london_birdnet_a.txt?dl=0"

fatal: destination path 'BirdNET-Analyzer' already exists and is not an empty directory.
--2022-11-23 10:58:09--  https://www.dropbox.com/s/3ji3dzxs9gsa6t7/london_birdnet_a.txt?dl=0
Resolving www.dropbox.com (www.dropbox.com)... 162.125.2.18, 2620:100:6020:18::a27d:4012
Connecting to www.dropbox.com (www.dropbox.com)|162.125.2.18|:443... connected.
HTTP request sent, awaiting response... 302 Found
Location: /s/raw/3ji3dzxs9gsa6t7/london_birdnet_a.txt [following]
--2022-11-23 10:58:10--  https://www.dropbox.com/s/raw/3ji3dzxs9gsa6t7/london_birdnet_a.txt
Reusing existing connection to www.dropbox.com:443.
HTTP request sent, awaiting response... 302 Found
Location: https://uc79d509f3034583e0c46adc6d13.dl.dropboxusercontent.com/cd/0/inline/BxQUf8KRDLr5gdfGLQStBMFDCF4Byj4Bx3tL3q-IY_ZeUzUmCNnGT3M5SeWxq3eVLrAcfmwdIubQX754F4fKwgC0epYpwPvl6lnN0ppKyniMBROEL5D8F14hT2cft3sUNZijANKASQxUU2eczvysKSaqxB_e7rW2LM2sj-D9u2A_sA/file# [following]
--2022-11-23 10:58:10--  https://uc79d509f3034583e0c46adc6d13

In [None]:
# If librosa doesn't import, pip install it
!pip show librosa && echo "librosa installed" || pip install librosa

# Run analyser #

In [None]:
cd /home/jupyter/BirdNET-Analyzer
!python analyze.py --i "$input_folder" --o "$results_folder" --lat 51.507359 --lon -0.136439 --week 11 --min_conf 0.8 --slist 'species_list.txt' --threads 8

# Concatenate results into one csv #

In [None]:
import glob as glob
import pandas as pd
import os
from datetime import date, datetime

In [None]:
d = date.today()
d = str(d)
results_list = glob.glob(results_folder +  '*/*.txt')
metadata = pd.read_csv('/home/jupyter/gcs_raw/nr-acoustic-data/metadata/NR_deployment_2022_ARUs.csv')

li = []

for filename in results_list:
    df = pd.read_csv(filename, sep='\t')
    df['file_name'] = os.path.splitext(os.path.basename(filename))[0]
    df['path'] = filename[:-3]+'WAV'
    df['datetime'] = [f[0:15] for f in df['file_name']]
    df['time'] = pd.to_datetime(df['datetime'], format='%Y%m%d_%H%M%S') +  pd.to_timedelta(df["Begin Time (s)"], unit='s')
    df['date'] = [f[0:7] for f in df['file_name']]
    df['ID'] = os.path.basename(os.path.dirname(filename))
    li.append(df)

frame = pd.concat(li, axis=0, ignore_index=True)
frame = pd.merge(frame, metadata, on='ID', how='inner')
frame.to_csv(results_folder + d + '_concatenated_results_birdnet.csv')

In [None]:
# Shutdown the notebook   
!sudo shutdown -h now 