PAVOQUE Corpus of Expressive Speech

Corpus design

A single speaker, multi-style corpus of German speech, with a large neutral subset, and subsets acting out four different expressive speaking styles, named for virtual characters in the SEMAINE and IDEAS4GAMES projects (quoting the original directors instructions):

Poppy ist fröhlich, optimistisch und sieht das Gute an allen Dingen! (Poppy is cheerful and optimistic.)
Obadiah ist von Natur aus niedergeschlagen und blickt pessimistisch in die Zukunft... (Obadiah is gloomy and pessimistic.)
Spike ist aggressiv und geht keinem Streit aus dem Weg! (Spike is aggressive and confrontational.)
Max ist ein ausgekochter Pokerspieler. Er ist cool, ihn bringt nichts aus der Ruhe. (Max is a hard-boiled poker player. He is cool and laid-back.)

The speaker is Stefan Röttig, a male native speaker of German trained as a professional actor and baritone opera singer.

Data format

Audio

The audio data is provided in the losslessly compressed FLAC format, which can be played by a myriad of software, including Praat. It is sampled at a a rate of 44.1 kHz, with 16 bits per sample, in mono. No filters of any sort have been applied to this raw data, and low-pass filtering at 50 Hz is recommended.

Phonetic segmentation

Annotations are provided as one YAML file per style. These files are lists of utterances, each of which contains

a prompt code (file basename),
the utterance text,
the speaking style,
utterance start and end times (in seconds) in the FLAC file,
optionally, the (manually corrected) phonetic segments, each of which has
- a label (based on SAMPA, _ denotes silence), and
- its end time (in seconds), relative to that utterance's start time

For example,

- prompt: spike0008
  text: Ach ja?
  style: angry
  start: 27.0
  end: 28.92
  segments:
  - {lab: H#, end: 0.280902}
  - {lab: '?', end: 0.324898}
  - {lab: a, end: 0.408238}
  - {lab: x, end: 0.475}
  - {lab: j, end: 0.61}
  - {lab: 'a:', end: 0.963273}
  - {lab: _, end: 1.915}

Downloading the data

Use the links on the releases page, or run the download task (see below).

Converting the data

For convenience, the utterances for each subset can be be extracted from the YAML and FLAC files using simple commands to run Gradle tasks. After cloning or downloading and unpacking this repository, run ./gradlew tasks (or gradlew tasks on Windows) for details.

Prerequisites

You will need Java to run the Gradle tasks. Extracting the utterances to WAV files also requires sox to be installed.

Copyright and license

This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.

Contact

In case of issues, please open a new issue.

Name		Name	Last commit message	Last commit date
Latest commit History 79 Commits
buildSrc		buildSrc
gradle/wrapper		gradle/wrapper
.gitattributes		.gitattributes
.gitignore		.gitignore
LICENSE.md		LICENSE.md
README.md		README.md
build.gradle		build.gradle
gradlew		gradlew
gradlew.bat		gradlew.bat
pavoque-angry.yaml		pavoque-angry.yaml
pavoque-happy.yaml		pavoque-happy.yaml
pavoque-neutral.yaml		pavoque-neutral.yaml
pavoque-outtakes.yaml		pavoque-outtakes.yaml
pavoque-poker.yaml		pavoque-poker.yaml
pavoque-sad.yaml		pavoque-sad.yaml
settings.gradle		settings.gradle

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PAVOQUE Corpus of Expressive Speech

Corpus design

Data format

Audio

Phonetic segmentation

Downloading the data

Converting the data

Prerequisites

Copyright and license

Contact

About

Releases

Packages

Languages

License

marytts/pavoque-data

Folders and files

Latest commit

History

Repository files navigation

PAVOQUE Corpus of Expressive Speech

Corpus design

Data format

Audio

Phonetic segmentation

Downloading the data

Converting the data

Prerequisites

Copyright and license

Contact

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages