Domain-Aware Data Selection for Speech Classification via Meta-Reweighting

This is the code repository for Domain-Aware Data Selection for Speech Classification via Meta-Reweighting. This includes the implementation of DoReMe (Domain-AwaRe Data Selection for Speech Classification via Meta-Reweighting) our novel approach for the speech classification.

Abstract

Given speeches from diverse domains, how can we train an accurate classifier for a specific target domain utilizing the other source domains? The problem commonly arises in real-world scenarios, such as identifying the intents of speeches from individuals with a specific speech disorder using speeches of other disorders. However, previous data selection methods for utilizing the source instances encounter two main challenges: they cannot consider the diversities of source domains, and their hard selecting schemes may ignore helpful source instances if the given information of the target domain is insufficient. In this work, we propose DoReMe, a domain-aware data selection method for accurate speech classification on a target domain. The key idea is to softly select source instances by dynamically assigning important scores to each instance based on two similarities: instance-scores and domain-scores. Extensive experiments show that DoReMe achieves the best classification performance.

Requirements

We recommend using the following versions of packages:

pytorch==1.13.1
tqdm==4.66.2
pandas==2.0.3
torcheval==0.0.7
scikit-learn==1.3.2

Data Overview

We use two datasets. Download the datasets from the official links.

Dataset	Link
Skit-S2I	`https://github.com/skit-ai/speech-to-intent-dataset`
ITALIC	`https://github.com/RiTA-nlp/ITALIC/`

How to Run

You can run the demo script in the directory by the following code.

python main.py

Reference

If you use this code, please cite the following papers.

@inproceedings{Kim24DoReMe,
  author    = {Junghun Kim, Ka Hyun Park, Hoyoung Yoon, and U Kang},
  title     = {Domain-Aware Data Selection for Speech Classification via Meta-Reweighting},
  booktitle = {Interspeech},
  year      = {2024}
}

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
DOREME.py		DOREME.py
README.md		README.md
main.py		main.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Domain-Aware Data Selection for Speech Classification via Meta-Reweighting

Abstract

Requirements

Data Overview

How to Run

Reference

About

Releases

Packages

Languages

snudatalab/DoReMe

Folders and files

Latest commit

History

Repository files navigation

Domain-Aware Data Selection for Speech Classification via Meta-Reweighting

Abstract

Requirements

Data Overview

How to Run

Reference

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages