Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Chinese TTS dataset baker. #1304

Merged
merged 1 commit into from
Apr 23, 2024

Conversation

csukuangfj
Copy link
Contributor

Example labelling file

000001	卡尔普#2陪外孙#1玩滑梯#4。
	ka2 er2 pu3 pei2 wai4 sun1 wan2 hua2 ti1
000002	假语村言#2别再#1拥抱我#4。
	jia2 yu3 cun1 yan2 bie2 zai4 yong1 bao4 wo3
000003	宝马#1配挂#1跛骡鞍#3,貂蝉#1怨枕#2董翁榻#4。
	bao2 ma3 pei4 gua4 bo3 luo2 an1 diao1 chan2 yuan4 zhen3 dong3 weng1 ta4
000004	邓小平#2与#1撒切尔#2会晤#4。
	deng4 xiao3 ping2 yu3 sa4 qie4 er3 hui4 wu4
000005	老虎#1幼崽#2与#1宠物犬#1玩耍#4。
	lao2 hu3 you4 zai3 yu2 chong3 wu4 quan3 wan2 shua3

Example baker_zh_supervisions_all.jsonl

{"id": "000001", "recording_id": "000001", "start": 0.0, "duration": 2.66, "channel": 0, "text": "卡尔普#2陪外孙#1玩滑梯#4。", "language": "Chinese", "gender": "female", "custom": {"pinyin": "ka2 er2 pu3 pei2 wai4 sun1 wan2 hua2 ti1", "normalized_text": "卡尔普陪外孙玩滑梯。"}}
{"id": "000002", "recording_id": "000002", "start": 0.0, "duration": 2.86, "channel": 0, "text": "假语村言#2别再#1拥抱我#4。", "language": "Chinese", "gender": "female", "custom": {"pinyin": "jia2 yu3 cun1 yan2 bie2 zai4 yong1 bao4 wo3", "normalized_text": "假语村言别再拥抱我。"}}
{"id": "000003", "recording_id": "000003", "start": 0.0, "duration": 4.4, "channel": 0, "text": "宝马#1配挂#1跛骡鞍#3,貂蝉#1怨枕#2董翁榻#4。", "language": "Chinese", "gender": "female", "custom": {"pinyin": "bao2 ma3 pei4 gua4 bo3 luo2 an1 diao1 chan2 yuan4 zhen3 dong3 weng1 ta4", "normalized_text": "宝马配挂跛骡鞍,貂蝉怨枕董翁榻。"}}
{"id": "000004", "recording_id": "000004", "start": 0.0, "duration": 2.6, "channel": 0, "text": "邓小平#2与#1撒切尔#2会晤#4。", "language": "Chinese", "gender": "female", "custom": {"pinyin": "deng4 xiao3 ping2 yu3 sa4 qie4 er3 hui4 wu4", "normalized_text": "邓小平与撒切尔会晤。"}}
{"id": "000005", "recording_id": "000005", "start": 0.0, "duration": 3.09, "channel": 0, "text": "老虎#1幼崽#2与#1宠物犬#1玩耍#4。", "language": "Chinese", "gender": "female", "custom": {"pinyin": "lao2 hu3 you4 zai3 yu2 chong3 wu4 quan3 wan2 shua3", "normalized_text": "老虎幼崽与宠物犬玩耍。"}}

Example baker_zh_recordings_all.jsonl

{"id": "000001", "sources": [{"type": "file", "channels": [0], "source": "BZNSYP/Wave/000001.wav"}], "sampling_rate": 48000, "num_samples": 127680, "duration": 2.66, "channel_ids": [0]}
{"id": "000002", "sources": [{"type": "file", "channels": [0], "source": "BZNSYP/Wave/000002.wav"}], "sampling_rate": 48000, "num_samples": 137280, "duration": 2.86, "channel_ids": [0]}
{"id": "000003", "sources": [{"type": "file", "channels": [0], "source": "BZNSYP/Wave/000003.wav"}], "sampling_rate": 48000, "num_samples": 211200, "duration": 4.4, "channel_ids": [0]}
{"id": "000004", "sources": [{"type": "file", "channels": [0], "source": "BZNSYP/Wave/000004.wav"}], "sampling_rate": 48000, "num_samples": 124800, "duration": 2.6, "channel_ids": [0]}
{"id": "000005", "sources": [{"type": "file", "channels": [0], "source": "BZNSYP/Wave/000005.wav"}], "sampling_rate": 48000, "num_samples": 148320, "duration": 3.09, "channel_ids": [0]}

Cutset info

wc -l *
  10000 baker_zh_recordings_all.jsonl
  10000 baker_zh_supervisions_all.jsonl
  20000 total
lhotse cut simple  -r ./baker_zh_recordings_all.jsonl.gz  -s ./baker_zh_supervisions_all.jsonl.gz baker_zh_cuts.jsonl.gz
total 2.5M
-rw-r--r-- 1 kuangfangjun root 1.4M Mar 13 21:21 baker_zh_cuts.jsonl.gz
-rw-r--r-- 1 kuangfangjun root 127K Mar 13 21:19 baker_zh_recordings_all.jsonl.gz
-rw-r--r-- 1 kuangfangjun root 1.1M Mar 13 21:19 baker_zh_supervisions_all.jsonl.gz

lhotse cut describe ./baker_zh_cuts.jsonl.gz
Cut statistics:
╒═══════════════════════════╤══════════╕
│ Cuts count:               │ 10000    │
├───────────────────────────┼──────────┤
│ Total duration (hh:mm:ss) │ 11:51:21 │
├───────────────────────────┼──────────┤
│ mean                      │ 4.3      │
├───────────────────────────┼──────────┤
│ std                       │ 1.3      │
├───────────────────────────┼──────────┤
│ min                       │ 1.4      │
├───────────────────────────┼──────────┤
│ 25%                       │ 3.2      │
├───────────────────────────┼──────────┤
│ 50%                       │ 4.2      │
├───────────────────────────┼──────────┤
│ 75%                       │ 5.2      │
├───────────────────────────┼──────────┤
│ 99%                       │ 7.0      │
├───────────────────────────┼──────────┤
│ 99.5%                     │ 7.3      │
├───────────────────────────┼──────────┤
│ 99.9%                     │ 7.7      │
├───────────────────────────┼──────────┤
│ max                       │ 8.3      │
├───────────────────────────┼──────────┤
│ Recordings available:     │ 10000    │
├───────────────────────────┼──────────┤
│ Features available:       │ 0        │
├───────────────────────────┼──────────┤
│ Supervisions available:   │ 10000    │
╘═══════════════════════════╧══════════╛
SUPERVISION custom fields:
Speech duration statistics:
╒══════════════════════════════╤══════════╤══════════════════════╕
│ Total speech duration        │ 11:51:21 │ 100.00% of recording │
├──────────────────────────────┼──────────┼──────────────────────┤
│ Total speaking time duration │ 11:51:21 │ 100.00% of recording │
├──────────────────────────────┼──────────┼──────────────────────┤
│ Total silence duration       │ 00:00:01 │ 0.00% of recording   │
╘══════════════════════════════╧══════════╧══════════════════════╛

@pzelasko pzelasko added this to the v1.23.0 milestone Apr 23, 2024
@pzelasko
Copy link
Collaborator

Thanks, I missed this somehow.

@pzelasko pzelasko merged commit ed5797c into lhotse-speech:master Apr 23, 2024
11 checks passed
@csukuangfj csukuangfj deleted the baker-zh-tts branch April 23, 2024 15:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants