![alt text](https://www.mbari.org/wp-content/uploads/2014/11/logo-mbari-3b.png "MBARI")
<div align="center">Copyright (c) 2021, MBARI</div>

* Distributed under the terms of the GPL License
* Maintainer: dcline@mbari.org
* Author: Danelle Cline dcline@mbari.org

##  Preprocessing using PCEN

Noise removal is an essential aspect of effective sound detection and classification and noise from boats, recording equipment, or "noise" from species vocalizing in the same frequency band all interfere with classification performance.

While we can't do much about species vocalizing in the same band, removal of noise, in particular, stationary narrow-band noise is possible using a method called Per Channel Energy Normalization (PCEN)[1]. PCEN also has other desirable properties including: a) it gaussianizes the background, and b) it can enhance the onset of a call [1][2].
In short, PCEN helps isolate sound units which is essential for both detection and classification.
We have found PCEN improves performance across both supervised and unsupervised machine learning methods we have tried.
Google also found this useful in their exploration of humpback song [3].

## Install dependencies
First, let's install dependencies and include all packages used in this tutorial. This only needs to be done once for the duration of this notebook.

In [None]:
!pip install numpy
!pip install soundfile

In [None]:
import boto3
from botocore import UNSIGNED
from botocore.client import Config
import numpy as np
import soundfile as sf
import librosa
import matplotlib.pyplot as plt
from pathlib import Path

# Read in a sound file

In [None]:

# First, let's download the data used in this notebook
bucket = 'emso-tsc2021-session3-eu-west-3'
wav_filename = 'blue_A_3.wav'

s3 = boto3.resource('s3',
    aws_access_key_id='',
    aws_secret_access_key='',
    config=Config(signature_version=UNSIGNED))

# only download if needed
if not Path(wav_filename).exists():
    print('Downloading')
    s3.Bucket(bucket).download_file(wav_filename, wav_filename)
    print(f'Done downloading {wav_filename}')

samples, sample_rate = sf.read(wav_filename,dtype='float32')
nsec = (samples.size)/sample_rate # number of seconds in vector
print(f'Read {nsec} seconds of data')

# Compute the STFT

In [None]:
num_fft = 1024
overlap = 0.95

# Set high/low frequency
low_freq = 20; high_freq = 100

# compute STFT
P = np.abs(librosa.stft(y=samples, n_fft=num_fft, window="hann", hop_length=int(num_fft*(1-overlap))))
freq_bin = float(P.shape[0]) / float(sample_rate / 2)
minM = -1 * (P.shape[0] - int(low_freq * freq_bin))
maxM = -1 * (P.shape[0] - int(high_freq * freq_bin))
P = P[minM:maxM]

# PCEN Components

The three main components in PCEN are:

1. Gain control
2. Temporal integraion
3. Dynamic range compression

PCEN can be implemented as a neural network layer and jointly optimized with e.g. a CNN. This is an advanced topic outside the scope of this tutorial.

For isolated sound units, gain control is the more significant factor.  You will want to experimentally determine the best settings for your data.

# Gain control

In [None]:
gain = [0.25, 0.5, 0.75]
fig = plt.figure()
width = 10
height = 5
fig.set_size_inches(width, height)
for i, g in enumerate(gain):
    D = librosa.pcen(P, gain=g, sr=sample_rate, hop_length=round(num_fft*overlap))
    plt.subplot(3, 1, i+1)
    plt.axis('off')
    plt.imshow(np.flipud(D), interpolation='bilinear', cmap='Blues')
    plt.tight_layout()
    plt.title(f'PCEN spectrogram gain={g} ');

### References
[1] https://arxiv.org/pdf/1607.05666.pdf
[2] http://www.justinsalamon.com/uploads/4/3/9/4/4394963/lostanlen_pcen_spl2018.pdf
[3] https://tfhub.dev/google/humpback_whale/1
