![alt text](https://www.mbari.org/wp-content/uploads/2014/11/logo-mbari-3b.png "MBARI")
<div align="center">Copyright (c) 2021, MBARI</div>

* Distributed under the terms of the GPL License
* Maintainer: dcline@mbari.org
* Author: Danelle Cline dcline@mbari.org

## Humpback song PCEN versus Log 

This notebook digs a little deeper into PCEN with a few 5-minute sections of humpback songs from the pacific sound archive. 

Humpback song is more complex than the Blue whale calls and covers a larger frequency band, so I thought this would be helpful to share to researchers working with more complex sounds than blue whale sounds. This demontrates side-by-side view of two humpback song segments with varying background levels.

## Install dependencies
First, let's install dependencies and include all packages used in this tutorial. This only needs to be done once for the duration of this notebook.

In [None]:
!pip install seaborn --quiet

In [None]:
import boto3
from botocore import UNSIGNED
from botocore.client import Config
import matplotlib.pyplot as plt
import numpy as np
import seaborn as sns
import librosa
import librosa.display
from pathlib import Path
from scipy.stats import norm
import soundfile as sf
from six.moves.urllib.request import urlopen
import io
%matplotlib inline

# First, let's download the data used in this notebook

In [None]:
bucket = 'emso-tsc2021-session3-eu-west-3'
wav_file_1 = 'MARS_20161221_000046_SongSession_16kHz_HPF5Hz_offset6000.wav'
wav_file_2 = 'MARS_20161221_000046_SongSession_16kHz_HPF5Hz_offset7200.wav'

wav_filenames = [wav_file_1, wav_file_2]

for f in wav_filenames:
    s3 = boto3.resource('s3',
        aws_access_key_id='',
        aws_secret_access_key='',
        config=Config(signature_version=UNSIGNED))

    # only download if needed
    if not Path(f).exists():
        print('Downloading')
        s3.Bucket(bucket).download_file(f, f)
        print(f'Done downloading {f}')

# Logarithmic versus PCEN
PCEN is a better front-end for denoising natural acoustic data sets because it changes the magnitude distributions
to Gaussian while decorrelating frequency bands.  This improves event detection and classification as it
enhances the onset of natural calls while suppressing background noise.
 
First, let's set a few parameters used throughout the notebook then generate data for an ideal Gaussian curve for comparison.

In [None]:
sample_rate = 16000
fmax = 8000  # maximum frequency
window_size = 4096 # fft window size
overlap = 0.5 
hop_length = int(window_size * (1 - overlap))  # convenience as it's used in multiple places

In [None]:
xs_norm = np.arange(-3, 3, 0.001)
ys_norm = norm.pdf(xs_norm, 0, 1)

In [None]:
def display_spec_mag(log_array:np.array([]), pcen_s_array:np.array([])):
    """
    Displays spectrogram and magnitude distribution grid of log versus pcen arrays
    :param log_array:  numpy array of log computed mel spectrogram array
    :param pcen_s_array:  numpy array of PCEN computed mel spectrogram array
    :return: 
    """
    fig = plt.figure(constrained_layout=True,  figsize=(10, 10))
    widths = [4, 1]
    heights = [1, 1]
    axes = []
    spec = fig.add_gridspec(ncols=2, nrows=2, width_ratios=widths, height_ratios=heights )
    for row in range(2):
        for col in range(2):
            axes.append(fig.add_subplot(spec[row, col]))
    def normalize(a):
        return (a - a.mean(axis=0)) / a.std(axis=0)
    
    librosa.display.specshow(log_array, x_axis='time', y_axis='mel', ax=axes[0], cmap='Blues', sr=sample_rate,
                             hop_length=hop_length)
    axes[0].set_title('Logarithmic transformation ')
    sns.distplot(normalize(log_array), ax=axes[1])
    axes[1].set_title('magnitude distribution');axes[1].set_xlim(-4, 4); axes[1].set_ylim(0, 0.5)
    axes[1].plot(xs_norm, ys_norm, 'r--', label="Gaussian")
    
    librosa.display.specshow(pcen_s_array, x_axis='time', y_axis='mel', ax=axes[2], cmap='Blues', sr=sample_rate, 
                             hop_length=hop_length)
    axes[2].set_title('Per-channel energy normalization')
    #plt.colorbar(format='%+2.0f dB')
    sns.distplot(normalize(pcen_s_array), ax=axes[3])
    axes[1].set_title('magnitude distribution');axes[3].set_xlim(-4, 4); axes[3].set_ylim(0, 0.5)
    axes[3].plot(xs_norm, ys_norm, 'r--', label="Gaussian")

# Side-by-side comparison
### Load two 5 minutes segments from a wav file representing two different background levels

In [None]:
def read_from_bucket(wav_file:str):
    bucket = 'emso-tsc2021-session3-eu-west-3'
    url = f'https://{bucket}.s3.amazonaws.com/{wav_file}'
    print(f'Reading from {url}')
    samples, sample_rate = sf.read(io.BytesIO(urlopen(url).read()),dtype='float32')
    # v, sample_rate = sf.read(filename)
    nsec = (samples.size)/sample_rate # number of seconds in vector
    print(f'Read {nsec} seconds of data')
    return samples, sample_rate

samples_1, sample_rate = read_from_bucket(wav_file_1)
samples_2, sample_rate = read_from_bucket(wav_file_2)

# Sound recording sample 1
### Compute the spectrogram both for with the log mel transformation and PCEN for sound recording in the first sample A

Note that the librosa PCEN requires data scaled 2e-31 to 2e+31,  *not* -1 to 1 as is generally the case for many sound files.

In [None]:
S = librosa.feature.melspectrogram(samples_1, sr=sample_rate, power=1, fmax=fmax, n_fft=window_size, hop_length=hop_length)
log_S_1 = librosa.amplitude_to_db(S, ref=np.max)
pcen_S_A = librosa.pcen(S, sr=sample_rate, hop_length=hop_length, gain=0.70, time_constant=0.6, power=0.5, bias=.02, eps=10e-6)
display_spec_mag(log_S_1, (pcen_S_A))

# Sound recording sample 2

In [None]:
S = librosa.feature.melspectrogram(samples_2, sr=sample_rate, power=1, fmax=fmax, n_fft=window_size, hop_length=hop_length)
log_S_2 = librosa.amplitude_to_db(S, ref=np.max)
pcen_S_B = librosa.pcen(S, hop_length=hop_length, gain=0.70, time_constant=0.6, power=0.50, bias=0.02, eps=10e-6)
display_spec_mag(log_S_2, pcen_S_B)