<a href="https://colab.research.google.com/github/jwillbailey/clarity/blob/main/notebooks/baseline_HASPI_scores.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

#**Generating Baseline HASPI Scores**

The Clarity Enhancement Challenge 2 (CEC2) comes with a baseline enhancement model for participants to benchmark and compare their systems performance. <br><br>

To access the enhancer and sample data, first clone the Github repository and run the functions to get the metadata and scenes components of the demo dataset:  

In [None]:
print('Cloning git repo...')
!git clone --quiet https://github.com/jwillbailey/clarity.git

print('Changing directory...')
%cd clarity

print('Installing requirements with pip...')
!pip install -qr requirements.txt
!more setup.py
print('Setting up toolkit modules...')
!pip install -e .
%cd /content/

import clarity
from clarity.notebooks import demo_data

demo_data.get_metadata_demo()
demo_data.get_scenes_demo()


---
The baseline enhancer is based on <a href='https://pubmed.ncbi.nlm.nih.gov/3743918/'>NAL-R prescription fitting</a>. Since output signals are required to be in 16-bit integer format, a slow acting automatic gain control is implemented to reduce clipping of the signal introduced by the NAL-R fitting for audiograms which represent more severe hearing loss.

The NAL-R and AGC (compressor) classes can be accessed by importing them from the <code>clarity.enhancer</code> module.

In [2]:
from clarity.clarity.enhancer.nalr import NALR
from clarity.clarity.enhancer.compressor import Compressor


---
NAL-R fitting involves creating a complimentary filterbank based on the audiogram of a listener. Accessing listener data from scene definitions is covered in more detail in <a href='https://github.com/jwillbailey/clarity/blob/main/notebooks/Installing_clarity_tools_and_using_metadata.ipynb'>this notebook</a>

Listener data should be loaded from a scene definition from the metadata set.
<br><br>
Firstly, load in the scene, listeners and scene_listeners metadata files:

In [3]:
import json

with open('demo/metadata/scenes.demo.json') as f:
  scene_metadata = json.load(f)

with open('demo/metadata/listeners.json') as f:
  listeners_metadata = json.load(f)

with open('demo/metadata/scenes_listeners.dev.json') as f:
  scene_listeners_metadata = json.load(f)

Next, index a scene from <code>scenes_metadata</code> and use it to retrieve an audiogram from the <code>listeners_metadata</code> via the scene ID entry in <code>scene_listeners_metadata</code>. In practice, this procedure is performed within a loop.

Each scene has three listeners which have been randomly selected for the given scene. Accessing via the scene ID ensures the correct listener is accessed and will be consistent with the evaluation procedure of the challenge.

Each listener metadata entry is a dict containing:

- Name
- Audiogram centre frequencies
- Left ear audiogram hearing levels (dBHL)
- Right ear audiogram hearing levels (dBHL)


In [4]:
scene_index = 0
listener_choice = 0

scene = scene_metadata[scene_index]

scene_id = scene['scene']

scene_listeners = scene_listeners_metadata[scene_id]

listener = listeners_metadata[scene_listeners[listener_choice]]

print(listener)

{'name': 'L0064', 'audiogram_cfs': [250, 500, 1000, 2000, 3000, 4000, 6000, 8000], 'audiogram_levels_l': [40, 30, 20, 50, 60, 65, 80, 75], 'audiogram_levels_r': [40, 35, 30, 50, 60, 75, 80, 80]}


---
To allow for scalable and flexible running on both local and HPC platforms, many clarity challenge CEC2 scripts and tools depend on <a href='https://hydra.cc/'>hydra</a> and <a href='https://github.com/facebookincubator/submitit'>submitit</a> for the configuration of python code, for the setting of environment variables such as dataset directories, and for enabling parallisation of python on both HPC and local machines. A full description of how hydra and submitit is used in the clarity challenges is out of the scope of this notebook, but more details can be found <a href=''>here</a>.

For the sake of this notebook, we will be importing a configuration file directly using <code>omegaconf</code>.

In [6]:
import yaml
from omegaconf import OmegaConf, DictConfig

cfg = OmegaConf.load('clarity/recipes/cec2/baseline/config.yaml')
assert type(cfg)==DictConfig

In ordinary use cases, parameter overrides are performed by passing arguments in the command line to a <code>$python my_script.py</code> command. However,for the sake of this notebook we will configure the <code>conf</code> object directly.

We need to supply:
- The root directory of the project data and metadata
- The directory of the metadata
- The directory of the audio data

as these will differ from the standard installation paths for the project in this case.

In [7]:
cfg.path['root'] = 'demo'
cfg.path['metadata_dir'] = '${path.root}/metadata'
cfg.path['scenes_folder'] = '${path.root}/scenes'

With the configuration modified, we can now instantiate our <code>NALR</code> and <code>Compressor</code> objects.

In [8]:
enhancer = NALR(**cfg.nalr)
compressor = Compressor(**cfg.compressor)


Next it is necessary to load in audio to process. 

As with the listener metadata, scene audio should be loaded using the scene ID from <code>scenes_metadata</code>. 

Signals are stored as 16-bit integer audio and must be converted to floating point before use. 

In [15]:
from scipy.io import wavfile
import os


fs, signal = wavfile.read(
  os.path.join(cfg.path.scenes_folder, f"{scene_id}_mix_CH1.wav")
)


signal = signal / 32768.0


The final stage of processing the audio is to build the appropriate filterbank based on the audiogram data and apply the frequency dependent amplification.

Following this, slow AGC is applied and a clip detection pass is performed. A tanh function is applied to remove high frequency distortion components from cliipped samples and the files are converted back to 16-bit integer format for saving.

In [12]:
import numpy as np

nalr_fir, _ = enhancer.build(listener['audiogram_levels_l'], listener['audiogram_cfs'])
out_l = enhancer.apply(nalr_fir, signal[:, 0])

nalr_fir, _ = enhancer.build(listener['audiogram_levels_r'], listener['audiogram_cfs'])
out_r = enhancer.apply(nalr_fir, signal[:, 1])

out_l, _, _ = compressor.process(out_l)
out_r, _, _ = compressor.process(out_r)
            
enhanced_audio = np.stack([out_l, out_r], axis=1)
filename = f"{scene}_{listener}_HA-output.wav"

n_clipped = np.sum(np.abs(enhanced_audio) > 1.0)
if n_clipped > 0:
  print(f"Writing {filename}: {n_clipped} samples clipped")
if cfg.soft_clip:
  enhanced_audio = np.tanh(enhanced_audio)
np.clip(enhanced_audio, -1.0, 1.0, out=enhanced_audio)
signal_16 = (32768.0 * enhanced_audio).astype(np.int16)





---
a section about the haspi scores

load audio, haspi function  - needs target audio


