PhiNet: Speaker Verification with Phonetic Interpretability

This is the official implementation of the paper:

PhiNet: Speaker Verification with Phonetic Interpretability

Yi Ma, Shuai Wang, Tianchi Liu, Haizhou Li

[Paper]

Introduction

Automatic speaker verification (ASV) systems typically lack the transparency required for high-accountability applications. Inspired by how human experts perform forensic speaker comparison (FSC), we propose PhiNet, a speaker verification network with phonetic interpretability, designed to enhance both local and global interpretability by leveraging phonetic evidence in decision-making.

Local Interpretability: PhiNet provides detailed phonetic-level comparisons for each trial, revealing each phoneme's contribution to the verification decision, enabling manual inspection of speaker-specific features.
Global Interpretability: PhiNet ranks phonemes based on their distinctiveness for speaker identification, helping researchers understand potential system biases.

Key Features

First self-interpretable speaker verification network that explains its decision-making process
Dual interpretability through phoneme distinctiveness (local trial-level + global pattern-level)
Training scheme that simulates the verification process, ensuring consistency between training and inference
Achieves performance comparable to black-box ASV models (e.g., ECAPA-TDNN) while providing meaningful explanations

Installation

Requirements

Python 3.8+
PyTorch 1.10+
Other dependencies is same with [WeSpeaker]

git clone https://github.com/mmmmayi/PhiNet.git
cd PhiNet
pip install -r requirements.txt

Data Preparation

PhiNet is trained and evaluated on the following datasets:

Dataset	Usage	Description
VoxCeleb1	Training / Test	Celebrity speech from YouTube interviews
VoxCeleb2	Training	Extended celebrity speech dataset
SITW	Test	Speakers in the Wild
LibriSpeech	Test	Read English speech from audiobooks
MUSAN	Augmentation	Music, speech, and noise corpus
RIR Noises	Augmentation	Room impulse responses

Step-by-step Training

Stage 1: Prepare VoxCeleb Data

Run stage 1 in examples/voxceleb/v2/run.sh. Modify the VoxCeleb2 data path in local/prepare_data.sh (stage 4) to your local directory:

# Edit local/prepare_data.sh, change the VoxCeleb2 path to yours
bash examples/voxceleb/v2/run.sh --stage 1

Stage 2: Extract Features

Run stage 2 with raw data type:

bash examples/voxceleb/v2/run.sh --stage 2 --data_type raw

Stage 3: Prepare Augmentation Data

Generate file lists for RIRS_NOISES and MUSAN datasets. Each file should list the paths to all audio samples (one per line). Refer to the existing rirs_list and musan_list files in the data directory for the format.

Stage 4: Configure Training Parameters

Modify stage 3 parameters in examples/voxceleb/v2/run.sh:

--reverb_data: path to your RIR reverb data list
--noise_data: path to your MUSAN noise data list
--pho_path: path to phoneme alignment files (no change needed if using default)

Stage 5: Configure Phoneme Path

Update the phoneme file path at line 231 in wespeaker/dataset/processor.py to point to each sample's phoneme alignment file.

Stage 6: Start Training

bash examples/voxceleb/v2/run.sh --stage 3

Citation

If you find this work useful, please cite:

@article{ma2025phinet,
  title={PhiNet: Speaker Verification with Phonetic Interpretability},
  author={Ma, Yi and Wang, Shuai and Liu, Tianchi and Li, Haizhou},
  journal={IEEE/ACM Transactions on Audio, Speech, and Language Processing},
  year={2025}
}

Related Work

ExPO: Explainable Phonetic Trait-Oriented Network for Speaker Verification (our prior work)
WeSpeaker: The speaker embedding learning toolkit this project is built upon

Acknowledgements

This project builds upon and is inspired by the work of several open-source repositories. We extend our gratitude to the authors and contributors of the following projects:

Thanks for these authors to open source their code!

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
docs		docs
examples/voxceleb/v2		examples/voxceleb/v2
runtime		runtime
tools		tools
wespeaker		wespeaker
.flake8		.flake8
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
ROADMAP.md		ROADMAP.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PhiNet: Speaker Verification with Phonetic Interpretability

Introduction

Key Features

Installation

Requirements

Data Preparation

Step-by-step Training

Citation

Related Work

Acknowledgements

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 1

Languages

Folders and files

Latest commit

History

Repository files navigation

PhiNet: Speaker Verification with Phonetic Interpretability

Introduction

Key Features

Installation

Requirements

Data Preparation

Step-by-step Training

Citation

Related Work

Acknowledgements

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 1

Languages

Packages