s2i-prosody

This repository is the official implementation of the paper Improving End-to-End SLU performance with Prosodic Attention and Distillation (accepted at Interspeech 2023).

Abstract: Most End-to-End SLU methods depend on the pretrained ASR or language model features for intent prediction. However, other essential information in speech, such as prosody, is often ignored. Recent research has shown improved results in classifying dialogue acts by incorporating prosodic information. The margins of improvement in these methods are minimal as the neural models ignore prosodic features. In this work, we propose prosody-attention, which uses the prosodic features differently to generate attention maps across time frames of the utterance. Then we propose prosody-distillation to explicitly learn the prosodic information in the acoustic encoder rather than concatenating the implicit prosodic features. Both the proposed methods improve the baseline results, and the prosody-distillation method gives an intent classification accuracy improvement of 8% and 2% on SLURP and STOP datasets over the prosody baseline.

Installation

git clone https://github.com/skit-ai/s2i-prosody
cd s2i-prosody

conda create -n env_name --file requirements.txt
conda activate env_name

To extract and save the prosodic features into .pt files, add the path to the original dataset after downloading in the scripts.

# SLURP Dataset
python data_preparation/slurp.py

# STOP Dataset
python data_preparation/stop.py

Training and Evaluation

Change the dataset, training method and training hyperparamerters in train.py.

Datasets:

SLURP
STOP

Training Methods:

Baseline Whisper
Baselines Prosody Concat
Prosody-Attention
prosody-Distillation

train.py also returns the evaluation metrics after training.

python train.py

Visualization of Attention Map

Code for visualization of the attention maps of the pretrained models are given in visualize_attention.ipynb

Results

Citation

@misc{rajaa2023improving,
      title={Improving End-to-End SLU performance with Prosodic Attention and Distillation}, 
      author={Shangeth Rajaa},
      year={2023},
      eprint={2305.08067},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
configs		configs
data_preparation		data_preparation
dataset		dataset
imgs		imgs
.gitignore		.gitignore
README.md		README.md
model.py		model.py
requirements.txt		requirements.txt
train.py		train.py
trainer.py		trainer.py
visualize_attention.ipynb		visualize_attention.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

s2i-prosody

Installation

Training and Evaluation

Visualization of Attention Map

Results

Citation

References

About

Releases

Packages

Languages

skit-ai/slu-prosody

Folders and files

Latest commit

History

Repository files navigation

s2i-prosody

Installation

Training and Evaluation

Visualization of Attention Map

Results

Citation

References

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages