# Augmentation

**Paper:** Automatic identification of Hainan Gibbon calls in passive acoustic recordings

**Authors:** Emmanuel Dufourq, Ian Durbach, James Hansford, Sam Turvey, Amanda Hoepfner

**Year:** March 2020

**Repository:** https://github.com/emmanueldufourq/GibbonClassifier

Augment the number of gibbon calls and convert both the gibbon and non-gibbon calls into spectrograms. Hyper-parameters control the amount of new data to generate.

For this example, we are processing the file 'HGSM3D_0+1_20160429_051600.wav'. This file is not read from disk, instead the pickled files which are created using the 'Extract_Audio' notebook are used. This allows for more efficiency.

The augmented data is saved to disk and the spectrograms are also saved - both as pickle files. Originally, for the example file, there are 369 gibbon calls. After augmenting with a probiblity of 1 and creating 10 new files for each original file, we obtain 3690 augmented gibbon calls (59 * 10). The sample rate is set to 4800 (downsampled from 9600 in the Extract_Audio notebook).

Before augmented the audio segments had 48000 points (4800 sample rate * 10 seconds). After augmenting, the audio segments are converted to spectrograms and thus an image has the following size: 128 x 188). The code which performs this conversion is in the file 'Augmentation.py'. The number of mells is set to 128 and the hop size to 256 which results in a shape of 128 x 188. Keras requires the depth of an image to be provided, and since the spectrograms are used as image inputs to the CNN we thus reshape the images to have a depth of 1. The final shape of the spectrograms are thus 128 x 188 x 1. The minimum and maximum frequency used when generating the spectrograms were 1Khz and 2Khz respectively - in file 'Augmentation.py'.

The spectrograms are saved to 'Augmented_Image_Data' as two pickle files (one for the calls and the other for the non-calls). In our example these are 'g_HGSM3D_0+1_20160429_051600_augmented_img.pkl' and 'n_HGSM3D_0+1_20160429_051600_augmented_img.pkl' where 'g_' and 'n_' represent gibbon and non-gibbon calls respectively.

In [1]:
import numpy as np
import math
import random
import time
import pickle
from Augmentation import augment_data,augment_background, convert_to_image

## FIle name 

Without extension

In [2]:
file_without_extension = 'HGSM3D_0+1_20160429_051600'

## Parameters

In [3]:
audio_directory = '../Pickled_Data/'
logs_directory = '../Logs/'
augment_directory = '../Augmented_Data/'
augment_image_directory = '../Augmented_Image_Data/'
audio_file_name_gibbon = audio_directory+'g_'+file_without_extension+'.pkl'
audio_file_name_noise = audio_directory+'n_'+file_without_extension+'.pkl'
seed = 1337 # seed for random generator
sample_rate = 4800 # rate used in the Extract_Audio notebook
alpha = 10 # seconds extracted in the Extract_Audio notebook
augmentation_probability = 1.0
augmentation_amount_noise = 2
augmentation_amount_gibbon = 10

## Read pickled extracted audio data

In [4]:
gibbon_extracted = pickle.load(open(audio_file_name_gibbon, "rb" ))
noise_extracted = pickle.load(open(audio_file_name_noise, "rb" ))

## Shapes before augmentation

In [5]:
print ('gibbon_extracted:',gibbon_extracted.shape)
print ('noise_extracted:',noise_extracted.shape)

gibbon_extracted: (369, 48000)
noise_extracted: (1179, 48000)


In [6]:
if gibbon_extracted.shape[0] == 0:
    print('Caution: no gibbon data available. Augmentation operations on the gibbon data will not work.\
          Augmentation operations on non-gibbon data will however still work')

## Augment non-gibbon data

In [7]:
noise_extracted_augmented = augment_background(seed, augmentation_amount_noise, augmentation_probability,
                 noise_extracted, sample_rate, alpha)

## Augment gibbon data

In [8]:
gibbon_extracted_augmented = augment_data(seed, augmentation_amount_gibbon, augmentation_probability, 
                                          gibbon_extracted, noise_extracted_augmented, sample_rate, alpha)

## Shapes after augmentation

In [9]:
print('gibbon_extracted_augmented:',gibbon_extracted_augmented.shape)
print('noise_extracted_augmented:',noise_extracted_augmented.shape)

gibbon_extracted_augmented: (3690, 48000)
noise_extracted_augmented: (2358, 48000)


## Randomly sample from noise

This will sample from the non-gibbon augmented data. This essentially is used to downsize the amount of non-gibbon data to try and create a balanced augmented dataset. The number of non-gibbon calls to sample is equal to the number of gibbon calls available. In the case where there are no gibbon calls, a value will need to be specified. Simply comment out `sample_amount = gibbon_extracted_augmented.shape[0]` and uncomment `#sample_amount = 100 or specify another value` so that a value can be specified.

In [10]:
sample_amount = gibbon_extracted_augmented.shape[0]
#sample_amount = 100 or specify another value

In [11]:
noise_extracted_augmented = noise_extracted_augmented[np.random.choice(noise_extracted_augmented.shape[0], 
                                                                       sample_amount, 
                                                                       replace=True)]

In [12]:
print('noise_extracted_augmented:',noise_extracted_augmented.shape)

noise_extracted_augmented: (3690, 48000)


## Save the augmented data to disk

In [13]:
pickle.dump(gibbon_extracted_augmented, 
            open(augment_directory+'g_'+file_without_extension+'_augmented.pkl', "wb" ))

pickle.dump(noise_extracted_augmented, 
            open(augment_directory+'n_'+file_without_extension+'_augmented.pkl', "wb" ))

## Convert the augmented gibbon audio data into spectrograms

In [14]:
gibbon_extracted_augmented_image = convert_to_image(gibbon_extracted_augmented)

## Convert the augmented non-gibbon audio data into spectrograms

In [15]:
noise_extracted_augmented_image = convert_to_image(noise_extracted_augmented)

## Shapes after augmentation

In [16]:
print ('gibbon_extracted_augmented_image:', gibbon_extracted_augmented_image.shape)

gibbon_extracted_augmented_image: (3690, 128, 188, 1)


In [17]:
print ('noise_extracted_augmented_image:', noise_extracted_augmented_image.shape)

noise_extracted_augmented_image: (3690, 128, 188, 1)


## Save the augmented image data to disk

In [18]:
pickle.dump(gibbon_extracted_augmented_image, 
            open(augment_image_directory+'g_'+file_without_extension+'_augmented_img.pkl', "wb" ))

pickle.dump(noise_extracted_augmented_image, 
            open(augment_image_directory+'n_'+file_without_extension+'_augmented_img.pkl', "wb" ))

### Delete these variables that use a lot of CPU RAM

In [19]:
del noise_extracted_augmented, gibbon_extracted_augmented, gibbon_extracted_augmented_image, noise_extracted_augmented_image