Spoken StoryCloze Benchmark

This is the official repository of the Spoken StoryCloze benchmark as introduced by Hassid, Michael, et al. 2023 Textually Pretrained Speech Language Models.

Textual StoryCloze

The textual StoryCloze benchmark contains 4k five-sentence commonsense stories (split to validation and test sets). For each story, there is an additional negative sample, composed of the first four sentences followed by a negative continuation (ending sentence). The goal is to distinguish the original fifth sentence from the negative one.

Spoken Benchmark

To generate the spoken benchmark, we synthesize the stories from the test set using a single-speaker TTS system as provided by Wang, Changhan, et al. fairseq Sˆ2: A Scalable and Integrable Speech Synthesis Toolkit, EMNLP 2021, comprised of a FastSpeech2.0 (Ren et al., 2020) model and HiFi-GAN vocoder (Kong et al., 2020).

We release two versions of the Spoken Story Cloze benchmark:

Topic Story Cloze (tStoryCloze).
Spoken Story Cloze (sStoryCloze).

For sStoryCLoze, we follow the original StoryCloze negative samples. With this benchmark, researchers can evaluate the models’ capabilities to capture fine-grained causal and temporal commonsense relations.

For tStoryCloze, we randomly sample the negative ending sentence from the dataset. The premise behind tStoryCloze is to evaluate continuation coherence given a spoken prompt. This version is far easier for text-based language models, but quite challenging for speech-based language models. Similar to previous zero-shot speech metrics (e.g., sWUGGY), both speech segments are fed into the SpeechLM, and the probability of each spoken sentence is measured. The percentage of examples where the probability of the positive sample is higher than the negative ones being reported.

Download

You can download both benchmarks using the following links: sStoryCloze, tStoryCloze.

Evaluation

tStoryCloze

Model	Reference	Accuracy
LLaMA-7B-text	Hassid, et al. 2023	98.3
TWIST-1.3B	Hassid, et al. 2023	61.3
TWIST-7B	Hassid, et al. 2023	64.4

sStoryCloze

TBD

Citation

If you use this benchmark and find it useful for your research/work, please cite our work using the following bibtex:

@article{hassid2023textually,
  title={Textually Pretrained Speech Language Models},
  author={Hassid, Michael and Remez, Tal and Nguyen, Tu Anh and Gat, Itai and Conneau, Alexis and Kreuk, Felix and Copet, Jade and Defossez, Alexandre and Synnaeve, Gabriel and Dupoux, Emmanuel and others},
  journal={arXiv preprint arXiv:2305.13009},
  year={2023}
}

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

LICENSE

LICENSE

README.md

README.md

Repository files navigation

Spoken StoryCloze Benchmark

Textual StoryCloze

Spoken Benchmark

Download

Evaluation

tStoryCloze

sStoryCloze

Citation

About

Releases

Packages

License

slp-rl/SpokenStoryCloze

Folders and files

Latest commit

History

LICENSE

LICENSE

README.md

README.md

Repository files navigation

Spoken StoryCloze Benchmark

Textual StoryCloze

Spoken Benchmark

Download

Evaluation

tStoryCloze

sStoryCloze

Citation

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Packages