DFKI OT Data

Speech database for the OT Turkish TTS voices in MaryTTS.

Data format

Audio

The audio data is provided in the losslessly compressed FLAC format, which can be played by a myriad of software, including Praat. The speaker was recorded at a 44.1 kHz sampling rate, 16 bits per sample, in mono. No filters of any sort have been applied to this raw data.

Phonetic segmentation

Annotations are provided as a single YAML file. It contains a list of utterances, each of which consists of

a prompt code (file basename),
the utterance text,
the recording date,
utterance start and end times (in seconds) in the FLAC file,
the phonetic segments (obtained using the eHMM tool from FestVox 2.1), each of which has
- a label (based on SAMPA, _ denotes silence), and
- its duration (in seconds)

For example,

- prompt: ot0798
  text: Resim ticaretine başladı.
  date: 2009-07-10T07:32:17Z
  start: 4553.4053741500
  end: 4556.6553741500
  segments:
  - { lab: _, dur: 0.324969 }
  - { lab: r, dur: 0.1 }
  - { lab: e, dur: 0.1 }
  - { lab: s, dur: 0.135 }
  - { lab: I, dur: 0.035 }
  - { lab: m, dur: 0.08 }
  - { lab: t, dur: 0.095 }
  - { lab: I, dur: 0.05 }
  - { lab: dZ, dur: 0.115 }
  - { lab: a, dur: 0.11 }
  - { lab: r, dur: 0.025 }
  - { lab: e, dur: 0.065 }
  - { lab: t, dur: 0.125 }
  - { lab: I, dur: 0.025 }
  - { lab: n, dur: 0.035 }
  - { lab: e, dur: 0.07 }
  - { lab: b, dur: 0.065 }
  - { lab: a, dur: 0.12 }
  - { lab: S, dur: 0.18 }
  - { lab: l, dur: 0.015 }
  - { lab: a, dur: 0.06 }
  - { lab: d, dur: 0.07 }
  - { lab: '@', dur: 0.17 }
  - { lab: _, dur: 0.595 }

Extracting the data

Prerequisites

Java 8 (or later) and SoX must be installed.

Assembling the data

The data processing is delegated to Gradle and the FLAML plugin.

To download and extract all data, run

./gradlew downloadAudio extractTextFiles extractLabFiles extractWavFiles

See the FLAML plugin documentation for details.

To prepare the data for distribution, run

./gradlew assemble

Copyright and license

This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.

Name		Name	Last commit message	Last commit date
Latest commit History 31 Commits
buildSrc		buildSrc
gradle/wrapper		gradle/wrapper
.gitattributes		.gitattributes
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
LICENSE.md		LICENSE.md
README.md		README.md
build.gradle		build.gradle
dfki-ot-data.yaml		dfki-ot-data.yaml
gradlew		gradlew
gradlew.bat		gradlew.bat
settings.gradle		settings.gradle

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

buildSrc

buildSrc

gradle/wrapper

gradle/wrapper

.gitattributes

.gitattributes

.gitignore

.gitignore

CHANGELOG.md

CHANGELOG.md

LICENSE.md

LICENSE.md

README.md

README.md

build.gradle

build.gradle

dfki-ot-data.yaml

dfki-ot-data.yaml

gradlew

gradlew

gradlew.bat

gradlew.bat

settings.gradle

settings.gradle

Repository files navigation

DFKI OT Data

Data format

Audio

Phonetic segmentation

Extracting the data

Prerequisites

Assembling the data

Copyright and license

About

Releases 2

Packages

Languages

License

marytts/dfki-ot-data

Folders and files

Latest commit

History

Repository files navigation

DFKI OT Data

Data format

Audio

Phonetic segmentation

Extracting the data

Prerequisites

Assembling the data

Copyright and license

About

Resources

License

Stars

Watchers

Forks

Languages