Skip to content

slp-rl/SpokenStoryCloze

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 

Repository files navigation

Spoken StoryCloze Benchmark

This is the official repository of the Spoken StoryCloze benchmark as introduced by Hassid, Michael, et al. 2023 Textually Pretrained Speech Language Models.

Textual StoryCloze

The textual StoryCloze benchmark contains 4k five-sentence commonsense stories (split to validation and test sets). For each story, there is an additional negative sample, composed of the first four sentences followed by a negative continuation (ending sentence). The goal is to distinguish the original fifth sentence from the negative one.

Spoken Benchmark

To generate the spoken benchmark, we synthesize the stories from the test set using a single-speaker TTS system as provided by Wang, Changhan, et al. fairseq Sˆ2: A Scalable and Integrable Speech Synthesis Toolkit, EMNLP 2021, comprised of a FastSpeech2.0 (Ren et al., 2020) model and HiFi-GAN vocoder (Kong et al., 2020).

We release two versions of the Spoken Story Cloze benchmark:

  • Topic Story Cloze (tStoryCloze).
  • Spoken Story Cloze (sStoryCloze).

For sStoryCLoze, we follow the original StoryCloze negative samples. With this benchmark, researchers can evaluate the models’ capabilities to capture fine-grained causal and temporal commonsense relations.

For tStoryCloze, we randomly sample the negative ending sentence from the dataset. The premise behind tStoryCloze is to evaluate continuation coherence given a spoken prompt. This version is far easier for text-based language models, but quite challenging for speech-based language models. Similar to previous zero-shot speech metrics (e.g., sWUGGY), both speech segments are fed into the SpeechLM, and the probability of each spoken sentence is measured. The percentage of examples where the probability of the positive sample is higher than the negative ones being reported.

Download

You can download both benchmarks using the following links: sStoryCloze, tStoryCloze.

Evaluation

tStoryCloze

Model Reference Accuracy
LLaMA-7B-text Hassid, et al. 2023 98.3
TWIST-1.3B Hassid, et al. 2023 61.3
TWIST-7B Hassid, et al. 2023 64.4

sStoryCloze

TBD

Citation

If you use this benchmark and find it useful for your research/work, please cite our work using the following bibtex:

@article{hassid2023textually,
  title={Textually Pretrained Speech Language Models},
  author={Hassid, Michael and Remez, Tal and Nguyen, Tu Anh and Gat, Itai and Conneau, Alexis and Kreuk, Felix and Copet, Jade and Defossez, Alexandre and Synnaeve, Gabriel and Dupoux, Emmanuel and others},
  journal={arXiv preprint arXiv:2305.13009},
  year={2023}
}

About

A spoken version of the textual story cloze benchmark

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published