# Analyzing audio recordings using CNN

## Set up

Here we create a coding environment to use Tensorflow

https://www.pnas.org/doi/10.1073/pnas.2004702117

https://onlinelibrary.wiley.com/doi/10.1111/oik.08525

https://github.com/tensorflow/models/tree/master/research/audioset/vggish

https://zenodo.org/records/3907296


To create the environment for this workflow follow the following steps:
1) Install Anaconda (https://www.anaconda.com/download/)
2) Create and activate environment, using powershell
```
conda create --name soundScape python=3.10
conda activate soundScape
```

3) Install base libraries
```
conda install -c conda-forge mamba
mamba install -c conda-forge cudatoolkit=11.2 cudnn=8.1.0 cudatoolkit-dev ipykernel nbformat numpy scipy
python -m pip install "tensorflow<2.11" tf-slim resampy soundfile
```

4) Check tensorflow install
```
python -c "import tensorflow as tf; print(tf.config.list_physical_devices('GPU'))"
```

5) Copy vggish
```
# "G:\Shared drives\Projets\Actif\2023_ECCC4_Biodiv\3-Analyses\2-Analyses\vggish"
```

## Training

https://colab.research.google.com/drive/1E3CaPAqCai9P9QhJ3WYPNCVmrJU4lAhF#scrollTo=O1YVQb-MBiUx

https://github.com/tensorflow/models/tree/master/research/audioset/vggish

https://github.com/tensorflow/models/blob/master/research/audioset/vggish/vggish_train_demo.py

In [1]:
# Specity parameters
script="\"C:/Projects/2023_ECCC4_Biodiv/6.BirdTransfereLearning/vggish_train.py\""
data_path="\"C:/Users/Jurie/Desktop/VGGish_transfer_learning_data\""
model_saved_path="\"P:/Projets/Actif/2023_ECCC4_Biodiv/3-Analyses/2-Analyses/Fine_Tuned_Bird_Model\""

# Run terminal command
!python {script} --num_batches 20 --num_units 100 --train_vggish=False --_NUM_CLASSES 3 --data_path {data_path} --model_saved_path {model_saved_path}

Step 0: loss 0.696466
Step 1: loss 0.694889
Step 2: loss 0.693323
Step 3: loss 0.691771
Step 4: loss 0.690225
Step 5: loss 0.68869
Step 6: loss 0.687164
Step 7: loss 0.685649
Step 8: loss 0.684151
Step 9: loss 0.682666
Step 10: loss 0.681186
Step 11: loss 0.679716
Step 12: loss 0.678263
Step 13: loss 0.676821
Step 14: loss 0.675379
Step 15: loss 0.67396
Step 16: loss 0.672544
Step 17: loss 0.671142
Step 18: loss 0.669753
Step 19: loss 0.668371


2024-06-26 15:45:33.109010: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX AVX2
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2024-06-26 15:45:33.712496: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1616] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 5450 MB memory:  -> device: 0, name: NVIDIA GeForce RTX 3070 Ti Laptop GPU, pci bus id: 0000:01:00.0, compute capability: 8.6
2024-06-26 15:45:34.599119: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:354] MLIR V1 optimization pass is not enabled
INFO:tensorflow:Restoring parameters from P:\Projets\Actif\2023_ECCC4_Biodiv\3-Analyses\2-Analyses\vggish\vggish_model.ckpt
I0626 15:45:35.852332 28156 saver.py:1410] Restoring parameters from P:\Projets\Actif\2023_ECCC4_Biodiv\3-Analyses\2-Analyses

## Inference

In [2]:
# Specity parameters
script="\"C:/Projects/2023_ECCC4_Biodiv/6.BirdTransfereLearning/vggish_inference.py\""
data_path="\"C:/Users/Jurie/Desktop/VGGish_transfer_learning_data\""
model_saved_path="\"P:/Projets/Actif/2023_ECCC4_Biodiv/3-Analyses/2-Analyses/Fine_Tuned_Bird_Model\""

# Run terminal command
!python {script} --num_units 100 --_NUM_CLASSES 3 --data_path {data_path} --model_saved_path {model_saved_path}

Predictions: [[0.50816524 0.51649326 0.4742054 ]
 [0.48452118 0.5010323  0.45452964]
 [0.51822746 0.5058652  0.4632561 ]
 [0.46989432 0.41727766 0.4205827 ]
 [0.5032104  0.48684517 0.42770538]
 [0.5002762  0.5110371  0.46943647]
 [0.47058296 0.510253   0.4101257 ]
 [0.48229095 0.5119603  0.4632344 ]
 [0.5185407  0.503988   0.48020592]
 [0.48956704 0.51232976 0.44047752]
 [0.49554595 0.4754516  0.43008384]
 [0.5193809  0.5252369  0.46190527]
 [0.48738706 0.43887824 0.434222  ]
 [0.51930934 0.50040036 0.45064166]
 [0.4876032  0.46114442 0.38721654]
 [0.480481   0.5252765  0.47085956]
 [0.4925799  0.5192422  0.3936627 ]
 [0.48519555 0.45814112 0.45107076]
 [0.5088823  0.49923906 0.46741572]
 [0.478486   0.45230237 0.44223654]
 [0.5066587  0.47697797 0.48992696]
 [0.5123584  0.5313728  0.4608509 ]
 [0.49222705 0.49288934 0.43106696]
 [0.55344    0.4886327  0.49478355]
 [0.4947077  0.4790139  0.4565901 ]
 [0.49459967 0.50301826 0.45125178]
 [0.55489767 0.48204517 0.5004328 ]
 [0.49359468 0.

2024-06-26 15:47:50.310790: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX AVX2
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2024-06-26 15:47:50.823463: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1616] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 5450 MB memory:  -> device: 0, name: NVIDIA GeForce RTX 3070 Ti Laptop GPU, pci bus id: 0000:01:00.0, compute capability: 8.6
INFO:tensorflow:Restoring parameters from P:/Projets/Actif/2023_ECCC4_Biodiv/3-Analyses/2-Analyses/Fine_Tuned_Bird_Model\final_model
I0626 15:47:50.990137 14248 saver.py:1410] Restoring parameters from P:/Projets/Actif/2023_ECCC4_Biodiv/3-Analyses/2-Analyses/Fine_Tuned_Bird_Model\final_model
2024-06-26 15:47:51.001279: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:35

## TESTING

In [None]:
import os
os.chdir('P:\\Projets\\Actif\\2023_ECCC4_Biodiv\\3-Analyses\\2-Analyses\\vggish')
from __future__ import print_function
from random import shuffle
import numpy as np
import os
import tensorflow.compat.v1 as tf
import tf_slim as slim
import vggish_input
import vggish_params
import vggish_slim
num_batches=30
num_units=100
train_vggish=False
checkpoint='P:\\Projets\\Actif\\2023_ECCC4_Biodiv\\3-Analyses\\2-Analyses\\vggish\\vggish_model.ckpt'
_NUM_CLASSES=3
data_path='C:\\Users\\Jurie\\Desktop\\VGGish_transfer_learning_data'
model_saved_path="C:\\Users\\Jurie\\Desktop\\New folder\\my_model_final"
saved_model="C:\\Users\\Jurie\\Desktop\\New folder\\my_model_final"
def _get_batches():
  #Definition
  #Returns a shuffled batch of examples with labels of all audio classes.
  #
  # Loop over files in folder and
  # convert to vggish input format
  files_list=[]
  label_list=[]
  folder_indices={}
  folder_index=0
  for root,_,files in os.walk(data_path):
    if root not in folder_indices:
      folder_indices[root]=folder_index
      folder_index+=1
    for file in files:
      # Convert to vggish formate
      examples=vggish_input.wavfile_to_examples(os.path.join(root,file))
      # Create label
      label=np.eye(_NUM_CLASSES,dtype=int)[folder_indices[root]-1].reshape(1, -1)
      label=np.repeat(label,examples.shape[0],axis=0)
      # Add to list
      files_list.append(examples)
      label_list.append(label)
  # Concatenate results
  concatenate_files=np.concatenate(files_list,axis=0)
  concatenate_labels=np.concatenate(label_list,axis=0)
  # Zip and shuffel
  labeled_files=list(zip(concatenate_files,concatenate_labels))
  shuffle(labeled_files)
  # Separate and return the features and labels.
  features=[example for (example,_) in labeled_files]
  labels=[label for (_,label) in labeled_files]
  # Return
  return (features,labels)


In [None]:
# Load model, add head, define loss/optimizer and train
with tf.Graph().as_default(), tf.Session() as sess:
    # Define VGGish.
    embeddings=vggish_slim.define_vggish_slim(training=False)
    # Define a shallow classification model
    with tf.variable_scope('mymodel'):
      # Add a fully connected layer
      fc=slim.fully_connected(tf.nn.relu(embeddings),num_units)
      # Add a classifier layer at the end (classification head), 
      # consisting of parallel logistic classifiers, one per class. 
      # This allows for multi-class tasks.
      logits=slim.fully_connected(fc,_NUM_CLASSES,activation_fn=None,scope='logits')
      tf.sigmoid(logits,name='prediction')
      # Add training ops.
      with tf.variable_scope('train'):
        global_step=tf.train.create_global_step()
        # Labels are assumed to be fed as a batch multi-hot vectors, with
        # a 1 in the position of each positive class label, and 0 elsewhere.
        labels_input=tf.placeholder(
            tf.float32,shape=(None,_NUM_CLASSES),name='labels')
        # Cross-entropy label loss.
        xent=tf.nn.sigmoid_cross_entropy_with_logits(logits=logits,labels=labels_input,name='xent')
        loss=tf.reduce_mean(xent,name='loss_op')
        tf.summary.scalar('loss',loss)
        # We use the same optimizer and hyperparameters as used to train VGGish.
        optimizer=tf.train.AdamOptimizer(
            learning_rate=vggish_params.LEARNING_RATE,
            epsilon=vggish_params.ADAM_EPSILON)
        train_op=optimizer.minimize(loss,global_step=global_step)
    # Initialize all variables in the model, and then load the pre-trained
    # VGGish checkpoint.
    sess.run(tf.global_variables_initializer())
    vggish_slim.load_vggish_slim_checkpoint(sess,checkpoint)
    # Define Saver to save the model
    saver = tf.train.Saver()
    # The training loop.
    features_input=sess.graph.get_tensor_by_name(vggish_params.INPUT_TENSOR_NAME)
    for _ in range(num_batches):
      (features,labels)=_get_batches()
      [num_steps,loss_value,_]=sess.run(
          [global_step,loss,train_op],
          feed_dict={features_input:features,labels_input:labels})
      print('Step %d: loss %g' % (num_steps, loss_value))
    # Save the final model
    saver.save(sess,model_saved_path)

Inference

In [None]:
def _get_batches():
  #Definition
  #Returns a shuffled batch of examples with labels of all audio classes.
  #
  # Loop over files in folder and
  # convert to vggish input format
  files_list=[]
  folder_indices={}
  folder_index=0
  for root,_,files in os.walk(data_path):
    if root not in folder_indices:
      folder_indices[root]=folder_index
      folder_index+=1
    for file in files:
      # Convert to vggish formate
      examples=vggish_input.wavfile_to_examples(os.path.join(root,file))
      # Create label
      label=np.eye(_NUM_CLASSES,dtype=int)[folder_indices[root]-1].reshape(1, -1)
      label=np.repeat(label,examples.shape[0],axis=0)
      # Add to list
      files_list.append(examples)
  # Concatenate results
  concatenate_files=np.concatenate(files_list,axis=0)
  # Return
  return (concatenate_files)


# Load model, add head, define loss/optimizer and train
with tf.Graph().as_default(), tf.Session() as sess:
    # Define VGGish.
    embeddings=vggish_slim.define_vggish_slim(training=False)
    # Define a shallow classification model
    with tf.variable_scope('mymodel'):
      # Add a fully connected layer
      fc=slim.fully_connected(tf.nn.relu(embeddings),num_units)
      # Add a classifier layer at the end (classification head), 
      # consisting of parallel logistic classifiers, one per class. 
      # This allows for multi-class tasks.
      logits=slim.fully_connected(fc,_NUM_CLASSES,activation_fn=None,scope='logits')
      prediction=tf.sigmoid(logits, name='prediction')
    # Restore the model from the checkpoint
    saver=tf.train.Saver()
    
    saver.restore(sess,saved_model)
    # Get new data for prediction
    new_features=_get_batches()
    # Get the tensor for the input features
    features_input=sess.graph.get_tensor_by_name(vggish_params.INPUT_TENSOR_NAME)
    # Run the prediction
    predictions=sess.run(prediction,feed_dict={features_input:new_features})
    # Print
    print('Predictions:', predictions)