## Run bat detect on bucket files using Vertex AI ##

GPU will automatically be used if select the resources when making the notebook

Run time is approx 17 seconds per minute of audio with 4 CPUS and no GPU
Run time is 6 seconds per minute of audio with 8 CPUS and 1 GPU


## Instructions ##
 
1. The only part of this script you will need to change is the directory of audio files and results folders.
    * Change these file paths to the paths of your input folder and output folder within the Bucket.
2. Once you have set the file directories, select Run -> Run all cells from the menu above.
3. The final argument should shut down the instance when this has finished, however if there is an error at some point or the script becomes unresponsive, then this argument will not be executed, so you should check in on progress of script periodically.
4. If scripts becomes unresponsive before it has completed, then restart the kernel. The script is unresponsive if the boxes next to the current command changes from [*] / [number] to [ ] 


**Source of potential errors:** 
* If you have multiple kernels running in this instance, then GPU allocation may be disrupted and it will throw errors - make sure there is only one notebook open when running this script. You can view kernels on the left, with the Stop Icon that is below the folder browser. Shut down all the kernels except for this script.

* The notebook may become unresponsive after running for a long time - you will have to restart notebook and code again. It will resume classifying from where it left off.

* Some files will not be analysed as there will be several 0MB files recorded by the AudioMoth, so don't be concerned if some files can't be analysed.




In [2]:
# mount buckets
!mountpoint -q /home/jupyter/gcs_outputs && echo "mounted" || mkdir -p gcs_outputs; gcsfuse --implicit-dirs --rename-dir-limit=100 --disable-http2 --max-conns-per-host=100 "acoustic-processing-outputs" "/home/jupyter/gcs_outputs"
!mountpoint -q /home/jupyter/gcs_raw && echo "mounted" || mkdir -p gcs_raw; gcsfuse --implicit-dirs --rename-dir-limit=100 --disable-http2 --max-conns-per-host=100 "acoustic-data-raw" "/home/jupyter/gcs_raw"

mounted
2022/12/12 16:39:50.706569 Start gcsfuse/0.41.8 (Go version go1.18.4) for app "" using mount point: /home/jupyter/gcs_outputs
2022/12/12 16:39:50.719365 Opening GCS connection...
2022/12/12 16:39:50.792784 Mounting file system "acoustic-processing-outputs"...
daemonize.Run: readFromProcess: sub-process: mountWithArgs: mountWithConn: Mount: mount: running /usr/bin/fusermount: exit status 1
mounted
2022/12/12 16:39:50.927539 Start gcsfuse/0.41.8 (Go version go1.18.4) for app "" using mount point: /home/jupyter/gcs_raw
2022/12/12 16:39:50.939939 Opening GCS connection...
2022/12/12 16:39:51.010956 Mounting file system "acoustic-data-raw"...
daemonize.Run: readFromProcess: sub-process: mountWithArgs: mountWithConn: Mount: mount: running /usr/bin/fusermount: exit status 1


In [3]:
# download scripts if not already there
![ -d "/home/jupyter/batdetect_v3-master" ] && echo "Scripts are downloaded" || gcloud storage cp -r 'gs://data-processing-scripts/batdetect_v3-master' .

Scripts are downloaded


In [4]:
!pip show librosa && echo "librosa installed" || pip install librosa==0.8.1

Name: librosa
Version: 0.8.1
Summary: Python module for audio and music processing
Home-page: https://librosa.org
Author: Brian McFee, librosa development team
Author-email: brian.mcfee@nyu.edu
License: ISC
Location: /opt/conda/lib/python3.7/site-packages
Requires: audioread, decorator, joblib, numba, numpy, packaging, pooch, resampy, scikit-learn, scipy, soundfile
Required-by: 
librosa installed


In [6]:
import os
path = '/home/jupyter/batdetect_v3-master'
os.chdir(path)

In [7]:
# import the necessary libraries
import os
import glob
import config
import matplotlib.pyplot as plt

import bat_detect.utils.detector_utils as du
import bat_detect.utils.audio_utils as au
import bat_detect.utils.plot_utils as viz
import time

In [8]:
from time import sleep
from tqdm.notebook import trange, tqdm

In [9]:
# setup the arguments
args = {}

## Change input and output directories here: ##

# Set folders for input (i.e. sound folders to analyse) and output #

These folders will be mounted on the left, so will start with '/home/jupyter/' but they are accessing files from the Google Cloud Bucket that you mounted. Make sure input_folder is the raw audio folder, and the results_folder is within gcs_outputs.

**All you need to change from below is 'trial_data_2021' on both lines to the new folder.**

# &darr; &darr; &darr; &darr; &darr; &darr; &darr; &darr; &darr; &darr; &darr; &darr; #

In [10]:
args['ann_dir'] = "/home/jupyter/gcs_outputs/trial_data_2021/batdetect/"
args['audio_dir'] = "/home/jupyter/gcs_raw/trial_data_2021/bat-config/"

# &uarr; &uarr; &uarr; &uarr; &uarr; &uarr; &uarr; &uarr; &uarr; &uarr; &uarr; &uarr; #


In [11]:
## Leave these as is
args['detection_threshold'] = 0.3
args['time_expansion_factor'] = 1
args['model_url'] = config.MODEL_URL
args['model_path'] = os.path.join('models', os.path.basename(args['model_url']))

args['cnn_features'] = False
args['spec_features'] = False
args['quiet'] = True
args['save_preds_if_empty'] = False
args['spec_slices'] = False
args['chunk_size'] = 3
args['save_preds_if_empty'] = True


path = '/home/jupyter/batdetect_v3-master'
os.chdir(path)

In [12]:
# load the model
model, params = du.load_model(args['model_url'], args['model_path'])

# load files
files = du.get_audio_files(args['audio_dir'] )
error_files = []

for ii, audio_file in enumerate(tqdm(files, total = len(files))):
    t1 = int(time.time())

    print('\n' + str(ii).ljust(6) + os.path.basename(audio_file))
    try:
        results = du.process_file(audio_file, model, params, args)
        if args['save_preds_if_empty'] or (len(results['pred_dict']['annotation']) > 0):
            msd = os.path.basename(os.path.dirname(audio_file))
            project = os.path.basename(os.path.dirname(os.path.dirname(os.path.dirname(audio_file))))
            results_path = os.path.join(args['ann_dir'], project, msd, os.path.basename(audio_file))
            print("Saving results to_" + results_path)
            du.save_results_to_file(results, results_path)
            
        t2 = int(time.time())
        print('File processed in ' + str(t2 - t1) + ' seconds')
    
    except:
        error_files.append(audio_file)
        print("Error processing file!")
        


  0%|          | 0/2 [00:00<?, ?it/s]


0     Copy of 20220325_210000.WAV


  return torch.max_pool2d(input, kernel_size, stride, padding, dilation, ceil_mode)
To keep the current behavior, use torch.div(a, b, rounding_mode='trunc'), or for actual floor division, use torch.div(a, b, rounding_mode='floor'). (Triggered internally at  /pytorch/aten/src/ATen/native/BinaryOps.cpp:467.)
  return torch.floor_divide(self, other)


Saving results to_/home/jupyter/gcs_outputs/trial_data_2021/batdetect/trial_data_2021/MSD-Z/Copy of 20220325_210000.WAV
File processed in 4 seconds

1     Copy of 20220325_212600.WAV
Saving results to_/home/jupyter/gcs_outputs/trial_data_2021/batdetect/trial_data_2021/MSD-Z/Copy of 20220325_212600.WAV
File processed in 4 seconds


In [13]:
# print summary info for the individual detections 
print('Results for ' + results['pred_dict']['id'])
print('{} calls detected\n'.format(len(results['pred_dict']['annotation'])))

print('time\tprob\tlfreq\tspecies_name')
for ann in results['pred_dict']['annotation']:
    print('{}\t{}\t{}\t{}'.format(ann['start_time'], ann['class_prob'], ann['low_freq'], ann['class']))

Results for Copy of 20220325_212600.WAV
1 calls detected

time	prob	lfreq	species_name
18.3115	0.245	23750	Nyctalus leisleri
