Zipformer for Common Voice #997

yfyeung · 2023-04-12T06:47:00Z

Data Preparation

The prepare.sh prepares the English (en) dataset of version 13.0 by default.

Version	Date	Size	Recorded Hours	Validated Hours	License	Number of Voices	Audio Format
Common Voice Corpus 13.0	3/15/2023	76.39 GB	3,209	2,429	CC-0	86,942	MP3

Stage	Time	Comment
Stage 0: Download data	-	Use machine in America to download the dataset manuallly
Stage 1: Prepare CommonVoice manifest	50 minutes	num_jobs=8
Stage 2: Prepare musan manifest	Very little	-
Stage 3: Preprocess CommonVoice manifest	5 minutes	-
Stage 4: Compute fbank for dev and test subsets of CommonVoice	2 minutes	1 GPU
Stage 5: Split train subset into 1000 pieces	10 minutes	-
Stage 6: Compute features for train subset of CommonVoice	3 hours	4 processes with 2 GPUs
Stage 7: Combine features for train	15 minutes	-
Stage 8: Compute fbank for musan	15 minutes	-
Stage 9: Prepare BPE based lang	5 minutes	-

Result

	Dev	Test
greedy search	9.96	12.54
modified beam search	9.86	12.48

To reproduce the above result, use the following commands for training:

export CUDA_VISIBLE_DEVICES="0,1,2,3"
./pruned_transducer_stateless7/train.py \
  --world-size 4 \
  --num-epochs 30 \
  --start-epoch 1 \
  --use-fp16 1 \
  --exp-dir pruned_transducer_stateless7/exp \
  --max-duration 550

and the following commands for decoding:

# greedy search
./pruned_transducer_stateless7/decode.py \
  --epoch 30 \
  --avg 5 \
  --decoding-method greedy_search \
  --exp-dir pruned_transducer_stateless7/exp \
  --bpe-model data/lang_bpe_500/bpe.model \
  --max-duration 600

# modified beam search
./pruned_transducer_stateless7/decode.py \
  --epoch 30 \
  --avg 5 \
  --decoding-method modified_beam_search \
  --beam-size 4 \
  --exp-dir pruned_transducer_stateless7/exp \
  --bpe-model data/lang_bpe_500/bpe.model \
  --max-duration 600

Pretrained model is available at
https://huggingface.co/yfyeung/icefall-asr-cv-corpus-13.0-2023-03-09-en-pruned-transducer-stateless7-2023-04-17

The tensorboard log for training is available at
https://tensorboard.dev/experiment/j4pJQty6RMOkMJtRySREKw/

desh2608 · 2023-04-16T13:40:26Z

BTW it seems that num_jobs > 1 is currently not supported for the CommonVoice Lhotse recipe (see here). It may be worth implementing this option to speed up the manifest creation time.

csukuangfj · 2023-04-16T17:45:17Z

BTW it seems that num_jobs > 1 is currently not supported for the CommonVoice Lhotse recipe (see here). It may be worth implementing this option to speed up the manifest creation time.

There's an ongoing PR about that

lhotse-speech/lhotse#1025

csukuangfj · 2023-04-17T04:33:38Z

egs/commonvoice/ASR/local/compute_fbank_commonvoice_splits.py

@@ -90,7 +90,7 @@ def compute_fbank_commonvoice_splits(args):
    subset = "train"


Could you please add a RESULTS.md to document the results, pre-trained models, tensorboard logs?

egs/commonvoice/ASR/RESULTS.md

Co-authored-by: Fangjun Kuang <csukuangfj@gmail.com>

csukuangfj · 2023-04-17T09:17:57Z

egs/commonvoice/ASR/pruned_transducer_stateless7/decoder.py

@@ -0,0 +1,105 @@
+# Copyright    2021  Xiaomi Corp.        (authors: Fangjun Kuang)


Could you replace it with a symlink?

Also for other files like joiner.py and model.py.

csukuangfj

Thanks!

Yifan Yang added 7 commits April 12, 2023 11:44

Add soft links in pruned_transducer_stateless7 for CommonVoice

9b35fa8

Add python files

71d35a4

Update prepare.sh

996f2f7

update

67befd1

Update normalization

788766c

Update prepare.sh

c25f039

Update

ba3c923

Fix for soft links

1197e0d

yfyeung force-pushed the cv branch from ddd07c5 to 1197e0d Compare April 17, 2023 04:27

yfyeung changed the title ~~[WIP] zipformer for Common Voice~~ zipformer for Common Voice Apr 17, 2023

Fix for style check

bc2b751

csukuangfj reviewed Apr 17, 2023

View reviewed changes

Yifan Yang added 2 commits April 17, 2023 16:40

Add some docs

7a4a13d

Add export

ebcf848

yfyeung changed the title ~~zipformer for Common Voice~~ Zipformer for Common Voice Apr 17, 2023

csukuangfj reviewed Apr 17, 2023

View reviewed changes

egs/commonvoice/ASR/RESULTS.md Outdated Show resolved Hide resolved

Update egs/commonvoice/ASR/RESULTS.md

dc892a2

Co-authored-by: Fangjun Kuang <csukuangfj@gmail.com>

csukuangfj reviewed Apr 17, 2023

View reviewed changes

Yifan Yang added 2 commits April 17, 2023 17:34

Add export for onnx

fb58374

Merge branch 'cv' of github.com:yfyeung/icefall

c67351b

csukuangfj approved these changes Apr 17, 2023

View reviewed changes

yfyeung merged commit 8838fe0 into k2-fsa:master Apr 17, 2023
3 checks passed

yfyeung deleted the cv branch April 17, 2023 09:48

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Zipformer for Common Voice #997

Zipformer for Common Voice #997

yfyeung commented Apr 12, 2023 •

edited

desh2608 commented Apr 16, 2023

csukuangfj commented Apr 16, 2023

csukuangfj Apr 17, 2023

yfyeung Apr 17, 2023

csukuangfj Apr 17, 2023

csukuangfj Apr 17, 2023

yfyeung Apr 17, 2023

csukuangfj left a comment

		@@ -90,7 +90,7 @@ def compute_fbank_commonvoice_splits(args):
		subset = "train"

		@@ -0,0 +1,105 @@
		# Copyright 2021 Xiaomi Corp. (authors: Fangjun Kuang)

Zipformer for Common Voice #997

Zipformer for Common Voice #997

Conversation

yfyeung commented Apr 12, 2023 • edited

Data Preparation

Result

desh2608 commented Apr 16, 2023

csukuangfj commented Apr 16, 2023

csukuangfj Apr 17, 2023

Choose a reason for hiding this comment

yfyeung Apr 17, 2023

Choose a reason for hiding this comment

csukuangfj Apr 17, 2023

Choose a reason for hiding this comment

csukuangfj Apr 17, 2023

Choose a reason for hiding this comment

yfyeung Apr 17, 2023

Choose a reason for hiding this comment

csukuangfj left a comment

Choose a reason for hiding this comment

yfyeung commented Apr 12, 2023 •

edited