<a href="https://colab.research.google.com/github/lydiakatsis/zsl-acoustic-monitoring-scripts/blob/main/CityNet-CNN/Run_CityNet_Colab.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Script to run CityNet algorithm on Colab

Will analyse files stored either in Google Drive or in GCP bucket

Original scripts developed by Fairbrass et al. and found [here](https://github.com/mdfirman/CityNet) 

### Outline

Uses 2 algorithms, one that classifies level of anthropogenic sound, and another that classifies levels of biotic sounds. Calculate mean level for each sound clip and stores in a csv.


**Warning - this script is slow to run - for each 10 minute file, it will take 20 seconds**

## Instructions
* Set run-time to GPU (Runtime -> Change runtime type)
* The only part of this script you will need to change is 
    1. Mounting either the Bucket or GoogleDrive and change the file directories appropriately, and 
    2. Setting the directory of audio files and results folders.

* If scripts becomes unresponsive before it has completed, then restart the kernel. The script is unresponsive if the boxes next to the current command changes from [*] / [number] to [ ]


## Sources of potential error
* If you have multiple kernels running in this instance, then GPU allocation may be disrupted and it will throw errors - make sure there is only one notebook open when running this script. You can view kernels on the left, with the Stop Icon that is below the folder browser. Shut down all the kernels except for this script.

* The notebook may become unresponsive after running for a long time - you will have to restart notebook and code again. It will resume classifying from where it left off.

* Some files will not be analysed as there will be several 0MB files recorded by the AudioMoth, so don't be concerned if some files can't be analysed.

* You will only be able to mount the GCP Bucket if you have been granted permission to access it by the project owner.

In [2]:
# Make sure all necessary libraries are installed
!pip show tensorflow && echo "tensorflow installed" || pip install tensorflow 
!pip show librosa && echo "librosa installed" || pip install librosa
!pip show tf_slim && echo "tf_slim installed" || pip install tf_slim
!pip show PyYAML && echo "PyYAML installed" || pip install -U PyYAML

import os
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '3' 

Name: tensorflow
Version: 2.9.2
Summary: TensorFlow is an open source machine learning framework for everyone.
Home-page: https://www.tensorflow.org/
Author: Google Inc.
Author-email: packages@tensorflow.org
License: Apache 2.0
Location: /usr/local/lib/python3.8/dist-packages
Requires: astunparse, h5py, tensorboard, gast, six, packaging, numpy, opt-einsum, tensorflow-estimator, keras-preprocessing, flatbuffers, tensorflow-io-gcs-filesystem, libclang, protobuf, setuptools, grpcio, termcolor, typing-extensions, wrapt, keras, google-pasta, absl-py
Required-by: kapre
tensorflow installed
Name: librosa
Version: 0.8.1
Summary: Python module for audio and music processing
Home-page: https://librosa.org
Author: Brian McFee, librosa development team
Author-email: brian.mcfee@nyu.edu
License: ISC
Location: /usr/local/lib/python3.8/dist-packages
Requires: scipy, decorator, packaging, resampy, numba, audioread, numpy, pooch, soundfile, scikit-learn, joblib
Required-by: kapre
librosa installed
Look

In [3]:
![ -d "/*/*/CityNet" ] && echo "Scripts are downloaded" || git clone https://github.com/mdfirman/CityNet.git
!wget -O CityNet/multi_predict_.py "https://www.dropbox.com/s/wgg8zi118uqgy5e/multi_predict_.py?dl=0"


Cloning into 'CityNet'...
remote: Enumerating objects: 1574, done.[K
remote: Counting objects: 100% (21/21), done.[K
remote: Compressing objects: 100% (16/16), done.[K
remote: Total 1574 (delta 7), reused 13 (delta 5), pack-reused 1553[K
Receiving objects: 100% (1574/1574), 55.52 MiB | 13.70 MiB/s, done.
Resolving deltas: 100% (951/951), done.
--2022-12-12 15:57:46--  https://www.dropbox.com/s/wgg8zi118uqgy5e/multi_predict_.py?dl=0
Resolving www.dropbox.com (www.dropbox.com)... 162.125.69.18, 2620:100:6031:18::a27d:5112
Connecting to www.dropbox.com (www.dropbox.com)|162.125.69.18|:443... connected.
HTTP request sent, awaiting response... 302 Found
Location: /s/raw/wgg8zi118uqgy5e/multi_predict_.py [following]
--2022-12-12 15:57:48--  https://www.dropbox.com/s/raw/wgg8zi118uqgy5e/multi_predict_.py
Reusing existing connection to www.dropbox.com:443.
HTTP request sent, awaiting response... 302 Found
Location: https://uc1c78d1957bf3b0abedc1687b65.dl.dropboxusercontent.com/cd/0/inline/

In [4]:
cd /content/CityNet

/content/CityNet


In [5]:
# Make sure model is downloaded
![ -d "/*/*/CityNet/__MACOSX" ] && echo "Models are downloaded" || python demo.py

-> Downloading and unzipping pre-trained model...
-> ...Done
->  Making predictions for biotic and anthropogenic separately
Loading from tf_models/biotic/weights_99.pkl-1
Took 7.205s to classify
Loading from tf_models/anthrop/weights_99.pkl-1
Took 0.229s to classify
-> ...Done
-> Saving predictions to disk
-> ...Done
-> Plotting predictions
60.00036281179138
60.00036281179138
-> ...Done


# Depending on where data for processing is stored, either mount GCP bucket, or mount google drive:

## 1. Running on Google Cloud Bucket data

Mount Google Cloud Bucket so can access data like a local directory.

In [6]:
# Make sure GCSFUSE in installed so can mount bucket
!echo "deb http://packages.cloud.google.com/apt gcsfuse-bionic main" > /etc/apt/sources.list.d/gcsfuse.list
!curl https://packages.cloud.google.com/apt/doc/apt-key.gpg | apt-key add -
!apt -qq update
!apt -qq install gcsfuse

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0100  2426  100  2426    0     0   103k      0 --:--:-- --:--:-- --:--:--  103k
OK
34 packages can be upgraded. Run 'apt list --upgradable' to see them.
The following package was automatically installed and is no longer required:
  libnvidia-common-460
Use 'apt autoremove' to remove it.
The following NEW packages will be installed:
  gcsfuse
0 upgraded, 1 newly installed, 0 to remove and 34 not upgraded.
Need to get 13.3 MB of archives.
After this operation, 30.7 MB of additional disk space will be used.
Selecting previously unselected package gcsfuse.
(Reading database ... 124013 files and directories currently installed.)
Preparing to unpack .../gcsfuse_0.41.9_amd64.deb ...
Unpacking gcsfuse (0.41.9) ...
Setting up gcsfuse (0.41.9) ...


In [7]:
# Authenication and PROJECT_ID allocation - change PROJECT_ID if necessary
from google.colab import auth
auth.authenticate_user()
PROJECT_ID = "zsl-acoustic-pipeline"

In [8]:
# Mount bucket - change 'acoustic-data-raw' to bucket name with raw data, and 'acoustic-processing-outputs' to bucket name for output storage.
!mountpoint -q /content/gcs_raw && echo "mounted" || mkdir -p /content/gcs_raw; gcsfuse --implicit-dirs --rename-dir-limit=100 --disable-http2 --max-conns-per-host=100 "acoustic-data-raw" "/content/gcs_raw"
!mountpoint -q /content/gcs_outputs && echo "mounted" || mkdir -p /content/gcs_outputs; gcsfuse --implicit-dirs --rename-dir-limit=100 --disable-http2 --max-conns-per-host=100 "acoustic-processing-outputs" "/content/gcs_outputs"


2022/12/12 15:58:38.141447 Start gcsfuse/0.41.9 (Go version go1.18.4) for app "" using mount point: /content/gcs_raw
2022/12/12 15:58:38.154509 Opening GCS connection...
2022/12/12 15:58:40.484334 Mounting file system "acoustic-data-raw"...
2022/12/12 15:58:40.484735 File system has been successfully mounted.
2022/12/12 15:58:40.557137 Start gcsfuse/0.41.9 (Go version go1.18.4) for app "" using mount point: /content/gcs_outputs
2022/12/12 15:58:40.569842 Opening GCS connection...
2022/12/12 15:58:42.938376 Mounting file system "acoustic-processing-outputs"...
2022/12/12 15:58:42.938746 File system has been successfully mounted.


**OR**

## 2. Running on Google Drive data

Mount Google Drive so can access like local directory

In [None]:
from google.colab import drive
drive.mount('/content/drive')

# Set file paths

# &darr; &darr; &darr; &darr; &darr; &darr; &darr; &darr; &darr; &darr; &darr; &darr; #

In [9]:
## Change these folders to line up with where your raw data and outputs are stored
folder = "/content/gcs_raw/trial_data_2021/city-config/"
results = "/content/gcs_outputs/trial_data_2021/citynet/" 

# &uarr; &uarr; &uarr; &uarr; &uarr; &uarr; &uarr; &uarr; &uarr; &uarr; &uarr; &uarr; #

# Run classifier

In [10]:
%run -i multi_predict_.py "$folder" "$results"

->  Making predictions for biotic and anthropogenic separately




Loading from tf_models/biotic/weights_99.pkl-1


  0%|          | 0/2 [00:00<?, ?it/s]

Loading from tf_models/anthrop/weights_99.pkl-1


  0%|          | 0/2 [00:00<?, ?it/s]

Took 3.674s to classify
-> ...Done


# Concatenate results

Join together all the separate csvs created for each file to make one final one. Will be stored in the results directory and titled todays_date + concatenated_results_city_net.csv

In [18]:
import glob as glob
import pandas as pd
import os
import datetime
from datetime import date, datetime


In [19]:
results_list_anth = glob.glob(results +  '*/*/*anthrop.csv')
results_list_bio = glob.glob(results +  '*/*/*biotic.csv')

d = date.today()
d = d.strftime('%y%m%d')

In [20]:
li = []

for filename in results_list_anth:
    df = pd.read_csv(filename)
    df = df.rename(columns={"Average sound": "Average anthropogenic sound"})
    li.append(df)

frame_anth = pd.concat(li, axis=0, ignore_index=True)

In [21]:
li = []

for filename in results_list_bio:
    df = pd.read_csv(filename)
    df = df.rename(columns={"Average sound": "Average biotic sound"})
    li.append(df)

frame_bio = pd.concat(li, axis=0, ignore_index=True)


In [22]:
frame_merged =  pd.merge(frame_anth, frame_bio, how='inner', on = 'Filename')
frame_merged['SD'] = [os.path.basename(os.path.dirname(f)) for f in frame_merged['Filename']]
frame_merged['basename'] = [os.path.basename(f).split('.')[0] for f in frame_merged['Filename']]
frame_merged['files_timestamp'] = [datetime.strptime(f, '%Y%m%d_%H%M%S') for f in frame_merged['basename']]
frame_merged['hour'] = [f.hour for f in frame_merged['files_timestamp']]


frame_merged.columns = frame_merged.columns.str.replace(' ','_')
frame_merged.to_csv(results + d +  '_concatenated_results_city_net.csv')
