Skip to content

Domain-Aware Data Selection for Speech Classification via Meta-Reweighting (Interspeech'24)

Notifications You must be signed in to change notification settings

snudatalab/DoReMe

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Domain-Aware Data Selection for Speech Classification via Meta-Reweighting

This is the code repository for Domain-Aware Data Selection for Speech Classification via Meta-Reweighting. This includes the implementation of DoReMe (Domain-AwaRe Data Selection for Speech Classification via Meta-Reweighting) our novel approach for the speech classification.

Abstract

Given speeches from diverse domains, how can we train an accurate classifier for a specific target domain utilizing the other source domains? The problem commonly arises in real-world scenarios, such as identifying the intents of speeches from individuals with a specific speech disorder using speeches of other disorders. However, previous data selection methods for utilizing the source instances encounter two main challenges: they cannot consider the diversities of source domains, and their hard selecting schemes may ignore helpful source instances if the given information of the target domain is insufficient. In this work, we propose DoReMe, a domain-aware data selection method for accurate speech classification on a target domain. The key idea is to softly select source instances by dynamically assigning important scores to each instance based on two similarities: instance-scores and domain-scores. Extensive experiments show that DoReMe achieves the best classification performance.

Requirements

We recommend using the following versions of packages:

  • pytorch==1.13.1
  • tqdm==4.66.2
  • pandas==2.0.3
  • torcheval==0.0.7
  • scikit-learn==1.3.2

Data Overview

We use two datasets. Download the datasets from the official links.

Dataset Link
Skit-S2I https://github.com/skit-ai/speech-to-intent-dataset
ITALIC https://github.com/RiTA-nlp/ITALIC/

How to Run

You can run the demo script in the directory by the following code.

python main.py

Reference

If you use this code, please cite the following papers.

@inproceedings{Kim24DoReMe,
  author    = {Junghun Kim, Ka Hyun Park, Hoyoung Yoon, and U Kang},
  title     = {Domain-Aware Data Selection for Speech Classification via Meta-Reweighting},
  booktitle = {Interspeech},
  year      = {2024}
}

About

Domain-Aware Data Selection for Speech Classification via Meta-Reweighting (Interspeech'24)

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages