# What this notebook does?

* by applying log-compression and PCEN to the audio files, compare the result of processed audio data

I use librosa 0.9.1 implementation[2] for PCEN.

# Introduction

When converting audio file to mel-frequency spectrogram, we usually use log compression to handle dynamic range of the power spectrum. It is simple and easy to use, but Wang et. al. developed more advanced method: **per-channel energy normalization (PCEN)**[1]. It's named as "normalization", it actually perform more complicated things. Briefly saying, it enhances signals changing frequency with time (chirp[3]), and suppressing signals with static frequency over time. It is also considered as a way of noise-suppression. In some application of sound event detection (SED) or audio tagging, it is reported to perform better compared to the simple log-compression[4].

# Reference

* [1] [Wang, Y., Getreuer, P., Hughes, T., Lyon, R. F., & Saurous, R. A. (2017, March). Trainable frontend for robust and far-field keyword spotting. In Acoustics, Speech and Signal Processing (ICASSP), 2017 IEEE International Conference on (pp. 5670-5674). IEEE.](https://www.semanticscholar.org/paper/Trainable-frontend-for-robust-and-far-field-keyword-Wang-Getreuer/4f23f2d194ddb615a31d6f75da793dcc89962b22)
* [2] https://librosa.org/doc/0.9.1/generated/librosa.pcen.html?highlight=pcen#librosa.pcen
* [3] https://en.m.wikipedia.org/wiki/Chirp
* [4] [Lostanlen, V., Salamon, J., McFee, B., Cartwright, M., Farnsworth, A., Kelling, S., and Bello, J. P. Per-Channel Energy Normalization: Why and How. IEEE Signal Processing Letters, 26(1), 39-43.](https://www.semanticscholar.org/paper/Per-Channel-Energy-Normalization%3A-Why-and-How-Lostanlen-Salamon/ff26a25c0c48666847965f4e9a0691ab6e5a7c75)

In [None]:
!pip install nb-black > /dev/null

In [None]:
import os
import warnings

warnings.filterwarnings("ignore")

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.gridspec as gridspec
import seaborn as sns
import soundfile as sf

import librosa
import librosa.display
import IPython.display as ipd

plt.style.use("ggplot")

%load_ext lab_black
%load_ext autoreload
%autoreload 2

In [None]:
librosa.__version__

# Helper Functions

In [None]:
def create_mel_spectrogram(audio_file, **spec_params):
    sr, hop_length, n_fft, n_mels, fmin, fmax = [
        spec_params[k] for k in ["sr", "hop_length", "n_fft", "n_mels", "fmin", "fmax"]
    ]
    audio, _ = librosa.core.load(audio_file, sr=sr, mono=True)
    melspec = librosa.feature.melspectrogram(
        audio,
        sr=sr,
        n_fft=n_fft,
        hop_length=hop_length,
        n_mels=n_mels,
        fmin=fmin,
        fmax=fmax,
        power=1,
    )
    return melspec


def pcen_bird(melspec, **spec_params):
    """
    parameters are taken from [1]:
        - [1] Lostanlen, et. al. Per-Channel Energy Normalization: Why and How. IEEE Signal Processing Letters, 26(1), 39-43.
    """
    sr, hop_length = [spec_params[k] for k in ["sr", "hop_length"]]
    return librosa.pcen(
        melspec * (2 ** 31),
        time_constant=0.06,
        eps=1e-6,
        gain=0.8,
        power=0.25,
        bias=10,
        sr=sr,
        hop_length=hop_length,
    )


def mel2audio(melspec, **spec_params):
    n_fft, sr, hop_length = [spec_params[k] for k in ["n_fft", "sr", "hop_length"]]
    return librosa.feature.inverse.mel_to_audio(
        melspec, sr=sr, n_fft=n_fft, hop_length=hop_length, power=1
    )


def get_fullpath(filename, audio_path="../input/birdclef-2022/train_audio"):
    return f"{audio_path}/{filename}"


def play_audio(audio_file):
    display(ipd.Audio(audio_file))


def gen_spec_and_audio(audio_name):
    out_file = audio_name.replace("/", "_")[:-4] + ".wav"
    plot_spectrograms(audio_name)
    print("source audio:")
    play_audio(get_fullpath(audio_name))

In [None]:
def plot_spectrograms(
    audio_name,
    spec_params=dict(
        sr=32_000, hop_length=320, n_fft=800, n_mels=128, fmin=20, fmax=14_000
    ),
):
    sr, hop_length, fmin, fmax, n_mels = [
        spec_params[k] for k in ["sr", "hop_length", "fmin", "fmax", "n_mels"]
    ]
    print(f"parameters: {spec_params}")
    audio_file = get_fullpath(audio_name)
    if not os.path.isfile(audio_file):
        raise FileNotFoundError
    melspec = create_mel_spectrogram(audio_file, **spec_params)
    log_melspec = librosa.amplitude_to_db(melspec, ref=np.max)
    pcen_melspec = pcen_bird(melspec, **spec_params)

    # plot
    plt.tight_layout()
    fig = plt.figure(figsize=(24, 8))
    gs0 = gridspec.GridSpec(1, 5, figure=fig)
    gs00 = gridspec.GridSpecFromSubplotSpec(2, 1, subplot_spec=gs0[0:3])
    ax1 = fig.add_subplot(gs00[0])
    ax2 = fig.add_subplot(gs00[1])
    gs01 = gridspec.GridSpecFromSubplotSpec(2, 1, subplot_spec=gs0[3])
    ax3 = fig.add_subplot(gs01[0])
    ax4 = fig.add_subplot(gs01[1])
    gs02 = gridspec.GridSpecFromSubplotSpec(2, 1, subplot_spec=gs0[4])
    ax5 = fig.add_subplot(gs02[0])
    ax6 = fig.add_subplot(gs02[1])

    # ax1
    img = librosa.display.specshow(
        log_melspec,
        sr=sr,
        hop_length=hop_length,
        fmin=fmin,
        fmax=fmax,
        x_axis="time",
        y_axis="mel",
        ax=ax1,
    )
    ax1.set(title="Spectrogram after log-compression", xlabel=None)
    ax1.label_outer()
    fig.colorbar(img, ax=ax1, format="%+2.0f dB")

    # ax2
    img_pcen = librosa.display.specshow(
        pcen_melspec,
        sr=sr,
        hop_length=hop_length,
        fmin=fmin,
        fmax=fmax,
        x_axis="time",
        y_axis="mel",
        ax=ax2,
    )
    ax2.set(title="Spectrogram after PCEN")
    fig.colorbar(img_pcen, ax=ax2)

    # ax3
    img = img.get_array()
    ax3.set(title="Amplitude dist. log-comp")
    sns.histplot(img, ax=ax3)

    # ax4
    img_pcen = img_pcen.get_array()
    ax4.set(title="Amplitude dist. PCEN")
    sns.histplot(img_pcen, ax=ax4)

    # ax5
    img = img.reshape((n_mels, -1))
    corr = np.corrcoef(img)
    assert corr.shape == (n_mels, n_mels)
    sns.heatmap(corr, ax=ax5)
    ax5.set(title="Corr. log-comp.")
    ax5.axis("off")

    # ax6
    img_pcen = img_pcen.reshape((n_mels, -1))
    corr_pcen = np.corrcoef(img_pcen)
    assert corr_pcen.shape == (n_mels, n_mels)
    sns.heatmap(corr_pcen, ax=ax6)
    ax6.set(title="Corr. PCEN")
    ax6.axis("off")

# Visualize scored bird call/song

## Discussion

* Left top & bottom pictures are spectrograms after log-compression and PCEN respectively. We can generally see **PCEN enhances signals changing frequency with time**, whereas other signals are suppressed.
* Middle top & bottom picture: distribution of amplitude in the mel-spectrogram after apllying log-commpression and PCEN respectively. We can see PCEN generally make signals **Gaussianized** than log-compression.
* Ritht top & bottom pictures are self-correlations between frequency channels. It shows PCEN are more close to identity matrix, which means that **each channels are disentangled** (we usually saids it's become **"whitened"**).

In [None]:
gen_spec_and_audio("akiapo/XC306424.ogg")

In [None]:
gen_spec_and_audio("aniani/XC210206.ogg")

In [None]:
gen_spec_and_audio("apapan/XC27331.ogg")

In [None]:
gen_spec_and_audio("barpet/XC441955.ogg")

In [None]:
gen_spec_and_audio("crehon/XC122341.ogg")

In [None]:
gen_spec_and_audio("elepai/XC27344.ogg")

In [None]:
gen_spec_and_audio("elepai/XC27352.ogg")

In [None]:
gen_spec_and_audio("ercfra/XC252696.ogg")

In [None]:
gen_spec_and_audio("ercfra/XC410276.ogg")

In [None]:
gen_spec_and_audio("hawama/XC27350.ogg")

In [None]:
gen_spec_and_audio("hawgoo/XC210217.ogg")

In [None]:
gen_spec_and_audio("hawgoo/XC314509.ogg")

In [None]:
gen_spec_and_audio("hawhaw/XC648121.ogg")

In [None]:
gen_spec_and_audio("hawpet1/XC328230.ogg")

In [None]:
gen_spec_and_audio("houfin/XC423359.ogg")

In [None]:
gen_spec_and_audio("iiwi/XC326576.ogg")

In [None]:
gen_spec_and_audio("jabwar/XC236038.ogg")

In [None]:
gen_spec_and_audio("omao/XC385074.ogg")

In [None]:
gen_spec_and_audio("puaioh/XC144894.ogg")

In [None]:
gen_spec_and_audio("puaioh/XC144893.ogg")

In [None]:
gen_spec_and_audio("skylar/XC595571.ogg")

In [None]:
gen_spec_and_audio("warwhe1/XC503528.ogg")

In [None]:
gen_spec_and_audio("yefcan/XC513862.ogg")