Skip to content

mmmmayi/PhiNet

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

PhiNet: Speaker Verification with Phonetic Interpretability

This is the official implementation of the paper:

PhiNet: Speaker Verification with Phonetic Interpretability

Yi Ma, Shuai Wang, Tianchi Liu, Haizhou Li

[Paper]

Introduction

Automatic speaker verification (ASV) systems typically lack the transparency required for high-accountability applications. Inspired by how human experts perform forensic speaker comparison (FSC), we propose PhiNet, a speaker verification network with phonetic interpretability, designed to enhance both local and global interpretability by leveraging phonetic evidence in decision-making.

  • Local Interpretability: PhiNet provides detailed phonetic-level comparisons for each trial, revealing each phoneme's contribution to the verification decision, enabling manual inspection of speaker-specific features.
  • Global Interpretability: PhiNet ranks phonemes based on their distinctiveness for speaker identification, helping researchers understand potential system biases.

Key Features

  • First self-interpretable speaker verification network that explains its decision-making process
  • Dual interpretability through phoneme distinctiveness (local trial-level + global pattern-level)
  • Training scheme that simulates the verification process, ensuring consistency between training and inference
  • Achieves performance comparable to black-box ASV models (e.g., ECAPA-TDNN) while providing meaningful explanations

Installation

Requirements

  • Python 3.8+
  • PyTorch 1.10+
  • Other dependencies is same with [WeSpeaker]
git clone https://github.com/mmmmayi/PhiNet.git
cd PhiNet
pip install -r requirements.txt

Data Preparation

PhiNet is trained and evaluated on the following datasets:

Dataset Usage Description
VoxCeleb1 Training / Test Celebrity speech from YouTube interviews
VoxCeleb2 Training Extended celebrity speech dataset
SITW Test Speakers in the Wild
LibriSpeech Test Read English speech from audiobooks
MUSAN Augmentation Music, speech, and noise corpus
RIR Noises Augmentation Room impulse responses

Step-by-step Training

Stage 1: Prepare VoxCeleb Data

Run stage 1 in examples/voxceleb/v2/run.sh. Modify the VoxCeleb2 data path in local/prepare_data.sh (stage 4) to your local directory:

# Edit local/prepare_data.sh, change the VoxCeleb2 path to yours
bash examples/voxceleb/v2/run.sh --stage 1

Stage 2: Extract Features

Run stage 2 with raw data type:

bash examples/voxceleb/v2/run.sh --stage 2 --data_type raw

Stage 3: Prepare Augmentation Data

Generate file lists for RIRS_NOISES and MUSAN datasets. Each file should list the paths to all audio samples (one per line). Refer to the existing rirs_list and musan_list files in the data directory for the format.

Stage 4: Configure Training Parameters

Modify stage 3 parameters in examples/voxceleb/v2/run.sh:

  • --reverb_data: path to your RIR reverb data list
  • --noise_data: path to your MUSAN noise data list
  • --pho_path: path to phoneme alignment files (no change needed if using default)

Stage 5: Configure Phoneme Path

Update the phoneme file path at line 231 in wespeaker/dataset/processor.py to point to each sample's phoneme alignment file.

Stage 6: Start Training

bash examples/voxceleb/v2/run.sh --stage 3

Citation

If you find this work useful, please cite:

@article{ma2025phinet,
  title={PhiNet: Speaker Verification with Phonetic Interpretability},
  author={Ma, Yi and Wang, Shuai and Liu, Tianchi and Li, Haizhou},
  journal={IEEE/ACM Transactions on Audio, Speech, and Language Processing},
  year={2025}
}

Related Work

  • ExPO: Explainable Phonetic Trait-Oriented Network for Speaker Verification (our prior work)
  • WeSpeaker: The speaker embedding learning toolkit this project is built upon

Acknowledgements

This project builds upon and is inspired by the work of several open-source repositories. We extend our gratitude to the authors and contributors of the following projects:

charsiu

WeSpeaker

ECAPA-TDNN

voxceleb_trainer

Thanks for these authors to open source their code!

About

official implementation of the paper PhiNet: Speaker Verification With Phonetic Interpretability

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages