Localizable Speech Quality Assessment

About

Conventional speech quality assessment (SQA) models predict a single score for the entire audio clip. However, in many applications, it might be desirable to have quality estimates of finer temporal resolution. This repository implements a framework for training SQA models that predict frame-level quality scores. Building upon SSL-MOS [1], the idea is to add a consistency constraint that brings the encoder outputs of audio segments within and detached from the context close to each other in the embedding space.

[1] Erica Cooper, Wen-Chin Huang, Tomoki Toda, and Junichi Yamagishi, “Generalization Ability of MOS Prediction Networks,” International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 8442–8446, 2022.

Installation

Clone and editable install

git clone https://github.com/fgnt/local_sqa.git
cd local_sqa
pip install -e .[fgnt]

Via pip

pip install git+https://github.com/fgnt/local_sqa.git

Data

We train our models on BVCC + NISQA. Training can be extended to other datasets with utterance-level MOS annotations.

Please refer to the data preparation instructions for downloading and preparing the data.

Models

We use SSL-based encoders with a simple decoder architecture (one BLSTM layer + one linear layer and average pooling). Encoder configurations are provided at conf/encoder.

Training

Training can be started with

python -m local_sqa.train

The default behaviour is as follows:

Use conf/default.yaml as configuration
Create a new directory under ./exp/ to save logs and checkpoints
Load bvcc.json and nisqa.json from local_sqa/data
Use wav2vec2_base as encoder

You can visualize the training progress with TensorBoard:

tensorboard --logdir ./exp/

Customization

We use Hydra for configuration management.

Changing where to save logs and checkpoints

You can change the output directory by overwriting base_dir.

Adding or removing a database

Databases are configured under the key databases. You can add or remove databases by adding or deleting entries there. To change the path pointing to the database structure file, overwrite json_path, e.g., databases.bvcc.json_path=path/to/bvcc.json.

Evaluation

Instructions following soon.

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
local_sqa		local_sqa
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Localizable Speech Quality Assessment

About

Installation

Clone and editable install

Via pip

Data

Models

Training

Customization

Changing where to save logs and checkpoints

Adding or removing a database

Evaluation

About

Uh oh!

Releases

Packages

Languages

License

fgnt/local_sqa

Folders and files

Latest commit

History

Repository files navigation

Localizable Speech Quality Assessment

About

Installation

Clone and editable install

Via pip

Data

Models

Training

Customization

Changing where to save logs and checkpoints

Adding or removing a database

Evaluation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages