We use ABX tests to test the discriminative ability of our models. We use h5features and ABXpy modules of cognitive machine learning (CoML) research team.

https://cognitive-ml.fr/

https://github.com/bootphon

In [None]:
!mkdir abx
%cd abx
!git clone https://github.com/bootphon/h5features.git
%cd h5features
!python setup.py build && python setup.py install
!pytest -v ./test

In [None]:
%cd ../..
!rm -rf abx
!git clone https://github.com/bootphon/ABXpy.git
%cd ABXpy

In [None]:
!git checkout 7b254b99d6ce3f386f45d3da9c92cd45720dd9dd
!module load gcc/4.7.2 
!make install
!make test
%cd ..
!rm -rf ABXpy

## 4. Generating test stimuli

We use text-to-speech (TTS) systems to automatically generate sound stimuli for a given language. We choose MBROLA voices, which has 29 usable languages in total (for details of "usable" languages see the corresponding training .ipynb file), one or several voices for each language, and we use espeak-ng as its front-end.

In the first stage of this project, we aim to generate every possible syllable of each language. We proceed like this:

1. Install espeak-ng, MBROLA system and MBROLA voices;

2. Go to [LAPSyD DATABASE](https://lapsyd.huma-num.fr/lapsyd/index.php) to get all the vowels and consonants of a language;

3. Find a nonce word generator (ike [this one](http://akana.conlang.org/tools/awkwords/)) and generate all possible syllables based on a CV rule (consonant+vowel). To do this, set a very big number of samples and filter all duplicate diphones.

4. As the espeak-ng can only speak from X-SAMPA phonemes, we need to convert our generated IPA phonemes to X-SAMPA format. We use [this one](https://tools.lgm.cl/xsampa.html) but do make tests in espeak-ng to test the quality of the conversions.

5. (see the code cell below) We now have a set of diphones seperated by blankspaces:

  1. We split them into separate diphones;

  2. We wrap these diphones in "[[...]]" to tell espeak-ng that the item is a set of phonemes; 

  3. We use espeak-ng to speak them and store the result in xxx.wav files where xxx is the diphone.

In [None]:
# install espeak-ng, MBROLA and MBROLA voices
!apt-get install espeak-ng mbrola
!git clone https://github.com/numediart/MBROLA-voices.git
!mkdir /usr/share/mbrola
!mv ./MBROLA-voices/data/* /usr/share/mbrola/

In [None]:
# paste your generated X-SAMPA phonemes here
phonemes = ""
phonemes = phonemes.split(' ')

phonemes_processed = ['"[[' + phonemes[i] + ']]"' for i in range(len(phonemes))]

for i in range(len(phonemes)):
  phoneme = phonemes[i]
  phoneme_processed = phonemes_processed[i]
  !espeak-ng -v mb-en1 -w {phoneme}.wav {phoneme_processed}

MBROLA: https://github.com/numediart/MBROLA

MBROLA voices: https://github.com/numediart/MBROLA-voices

espeak-ng: https://github.com/espeak-ng/espeak-ng

## 5. Processing test stimuli
In the last section we have generated a bunch of .wav files. Now we want to do two things: we summarize their information in an item file, and we convert them into numpy arrays and extract mel-spectrogram features from them.

In [None]:
# item file generation
with open("stimuli.item", 'w') as f:
  pass

In [None]:
# feature extraction and storage in HDF5 format
import librosa
import os
import h5features as h5f

id_list = []
feature_list = []

for f in os.listdir():
  if f.endswith(".wav"):
    wav_file = f

    # we can also try sr, y = scipy.io.wavfile.read(wav_file) and see which one is more efficient (test if normalizing affects the final mel-spectrogram)
    y, sr = librosa.load(wav_file)

    mel_spectrogram = librosa.feature.melspectrogram(y=y, sr=sr, n_mels=128)

    feature_list.append(mel_spectrogram)

    id = f.replace(".wav", '')
    id_list.append(id)

time_list = [] # .wav file has a sample rate of 44.1kHz, the default window length for librosa is 2048, so it gives a 

feature_file = "stimuli.features"

with h5f.Writer(feature_file) as writer:   
  data = h5f.Data(utts=id_list, times=time_list, feats=feature_list, check=True)
  writer.write(data, 'features')

In [None]:
!python generate_task.py stimuli.item abx.task

In [None]:
!python run_abx.py stimuli.features abx.task res_folder res_id cos true