# Prediction

**Paper:** Automatic identification of Hainan Gibbon calls in passive acoustic recordings

**Authors:** Emmanuel Dufourq, Ian Durbach, James Hansford, Sam Turvey, Amanda Hoepfner

**Year:** March 2020

**Repository:** https://github.com/emmanueldufourq/GibbonClassifier

Predict on a single .wav audio file.

The weights are saved in the '/Predictions' folder, and the name of the weights file to use is specified in the 'weights_name' variable. In this example the weights file is 'pretrained_weights.hdf5'. The testing file in this example is 'HGSM3B_0+1_20160308_055700.wav' and the location of the test files are in '/Raw_Data/Test/'.

The same time segments which were used in training should be used when testing the model. For example, if 10 second segments were used in the 'Extract_Audio' notebook, then the model expects 10 second inputs for prediction. This is saved in the variable 'time_to_extract'. The correct sampling rate should be used (same value as was used in training). 

Two output files are produced in the folder '/Preedictions/'. The one end with 'binary_prediction.txt' and the other 'prediction.txt'. The former contains the binary predictions as either 0 (non-gibbon) or 1 (gibbon). The latter contains two values (softmax probabilistic output). The values are [probability non-gibbon, probability gibbon]. In this example the output files are named 'HGSM3B_0+1_20160308_055700.wav_binary_prediction.txt' and 'HGSM3B_0+1_20160308_055700.wav_prediction.txt'.

In [None]:
import pandas as pd
import soundfile as sf
from os import listdir
import librosa
import collections
import time

from Augmentation import convert_to_image
from CNN_Network import *
from Predict_Helper import *

## Parameters

In [2]:
testing_file = 'HGSM3B_0+1_20160308_055700.wav'
testing_folder = '../Raw_Data/Test/'
prediction_folder = '../Predictions/'
weights_name = 'pretrained_weights_from_paper.hdf5'
location_model = "../Experiments/"
output_directory = ''
time_to_extract = 10
sample_rate = 4800

## Read test file

In [3]:
start = time.time()
start_reading = time.time()
test_file_audio, test_file_sample_rate = librosa.load(testing_folder + testing_file, 
                                                      sr=sample_rate) 
end_reading = time.time()

## Check the sample rate from the file

In [4]:
test_file_sample_rate

4800

## Extract segments from test file

In [5]:
X = create_X_new(test_file_audio, time_to_extract, 
                 sample_rate, verbose = 0)

## Convert data into spetrograms

In [6]:
start_convert = time.time()
X = convert_to_image(X)
end_convert = time.time()

## Shape of data after converting to spectrograms

In [7]:
X.shape

(28790, 128, 188, 1)

## Build the model and load weights

In [8]:
start_model_loading = time.time()
model = network()
model.load_weights(location_model+weights_name)
end_model_loading = time.time()

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
conv2d_1 (Conv2D)            (None, 113, 173, 8)       2056      
_________________________________________________________________
dropout_1 (Dropout)          (None, 113, 173, 8)       0         
_________________________________________________________________
max_pooling2d_1 (MaxPooling2 (None, 28, 43, 8)         0         
_________________________________________________________________
conv2d_2 (Conv2D)            (None, 13, 28, 8)         16392     
_________________________________________________________________
dropout_2 (Dropout)          (None, 13, 28, 8)         0         
_________________________________________________________________
max_pooling2d_2 (MaxPooling2 (None, 3, 7, 8)           0         
_________________________________________________________________
flatten_1 (Flatten)          (None, 168)               0         
__________

## Predict

In [9]:
start_prediction = time.time()
model_prediction = model.predict(X, batch_size=128)

## Store the results in a dataframe

In [10]:
start_times, end_times = create_time_index(time_to_extract, int(len(test_file_audio)/test_file_sample_rate))
results = pd.DataFrame(np.column_stack((start_times, end_times, model_prediction[:,0],model_prediction[:,1])), 
             columns=['Start(seconds)', 'End(seconds)', 'Pr(absence)', 'Pr(presence)'])

## Save predictions to file

In [11]:
np.savetxt(prediction_folder + testing_file + '_prediction.txt',model_prediction, fmt='%5f')
results.to_csv(prediction_folder + testing_file + '_probabilities.txt', index=False)

## Predicted segments

These correspond to the output for file 3 in the research article.

The correct values for this file are as follows: [3667, 3803], [14750, 14963], [19548, 20265], [20524,
20863]

In [12]:
segments = post_process(model_prediction, 0.76)
end_prediction = time.time()
end = time.time()

print (segments)

[[3623, 3802], [14752, 14962], [19365, 20262], [20526, 20860]]


## Execution clock

(assuming entire script was run in a single execution without delays from the user)

In [13]:
check_clock(start, end, start_reading, end_reading, start_convert ,end_convert,
                start_model_loading, end_model_loading, 
                start_prediction,end_prediction)

Total execution time (seconds): 362

Break down:
Time to read input file (seconds): 190
Time to convert audio to spectrograms (seconds): 162
Time to load CNN model (seconds): 1
Time to perform predictions (seconds): 5
