[WIP] Smaller models #745

pkufool · 2022-12-08T04:07:05Z

I create this PR to post my current results for smaller models (i.e. the models with less parameters). I'd say the parameters used to construct the models were chosen arbitrarily. The aim is to tune a good model with around 15M parameters. The following results were based on pruned_transducer_stateless5, will try zipformer and update the results once available.

Num of params	Greedy search	Modified beam search	Fast beam search	Fast beam search LG		Model desc
8.6 M	4.24 & 10.2	4.09 & 10.03	4.13 & 9.89	4.11 & 9.59	--epoch 25 --avg 3	--num-encoder-layers 16 --encoder-dim 144 --decoder-dim 320 --dim-feedforward 512 --nhead 4 --joiner-dim 320
8.8 M	4.13 & 10.22	4.01 & 9.84	4.05 & 9.83	4.09 & 9.55	--epoch 25 --avg 2	--num-encoder-layers 16 --encoder-dim 144 --decoder-dim 320 --dim-feedforward 512 --nhead 4 --joiner-dim 512
10.4M	4.03 & 10.11	3.89 & 9.82	3.97 & 9.91	4.0 & 9.51	--epoch 25 --avg 3	--num-encoder-layers 12 --encoder-dim 144 --decoder-dim 320 --dim-feedforward 1024 --nhead 4 --joiner-dim 512
19 M	3.29 & 8.24	3.17 & 8.02	3.25 & 8.01	3.33 & 7.93	--epoch 25 --avg 6	--num-encoder-layers 16 --encoder-dim 256 --decoder-dim 512 --dim-feedforward 512 --nhead 4 --joiner-dim 512
15 M	3.7 & 8.94	3.56 & 8.71	3.62 & 8.68	3.7 & 8.53	--epoch 25 --avg 3	--num-encoder-layers 12 --encoder-dim 256 --decoder-dim 512 --dim-feedforward 512 --nhead 4 --joiner-dim 512

These models are all non-streaming models, if someone need them, I can upload them to huggingface.

wangtiance · 2022-12-09T08:17:17Z

These numbers look pretty good. Would love to see some streaming model results too!

pkufool · 2022-12-12T09:29:04Z

Update the results of a streaming model with 15M parameters.

Num of params	Decoding paras	Greedy search	Modified beam search	Fast beam search	Fast beam search LG		Model desc
15 M	decode-chunk-size=8; left-context=32	5.37 & 14.08	5.21 & 13.55	5.26 & 13.53	5.28 & 13.18	--epoch 25 --avg 3	--num-encoder-layers 12 --encoder-dim 256 --decoder-dim 512 --dim-feedforward 512 --nhead 4 --joiner-dim 512
15 M	decode-chunk-size=16; left-context=64	5.01 & 13.0	4.87 & 12.62	4.88 & 12.61	4.88 & 12.34	--epoch 25 --avg 3	--num-encoder-layers 12 --encoder-dim 256 --decoder-dim 512 --dim-feedforward 512 --nhead 4 --joiner-dim 512

desh2608 · 2023-01-25T18:49:48Z

@pkufool Did you try any experiments with smaller Zipformer models?

pkufool · 2023-01-26T14:03:23Z

@pkufool Did you try any experiments with smaller Zipformer models?

I think @yaozengwei tried smaller zipformer, can you share some results? @yaozengwei

yaozengwei · 2023-01-26T16:38:11Z

I have trained a smaller version of the merged Zipformer (pruned_transducer_stateless7) on full librispeech for 30 epochs, with model args:

--num-encoder-layers 2,2,2,2,2 \
--feedforward-dims 768,768,768,768,768 \
--nhead 8,8,8,8,8 \
--encoder-dims 256,256,256,256,256 \
--attention-dims 192,192,192,192,192 \
--encoder-unmasked-dims 192,192,192,192,192 \
--zipformer-downsampling-factors 1,2,4,8,2 \
--cnn-module-kernels 31,31,31,31,31 \
--decoder-dim 512 \
--joiner-dim 512 \

Number of model parameters: 20697573
It got WERs of 2.67 & 6.4 at epoch-30-avg-9 with greedy search.

desh2608 · 2023-01-26T19:26:36Z

@yaozengwei Thanks for your reply, those numbers look very good! Are you planning to upload the pretrained model in HF?

yaozengwei · 2023-01-28T03:17:01Z

I have uploaded the pretrained model on https://huggingface.co/Zengwei/icefall-asr-librispeech-pruned-transducer-stateless7-20M-2023-01-28.

It is a tiny version of Zipformer-Transducer (https://github.com/k2-fsa/icefall/tree/master/egs/librispeech/ASR/pruned_transducer_stateless7).

Number of model parameters: 20697573

Decoding results at epoch-30-avg-9:

greedy_search: 2.67 & 6.4
modified_beam_search: 2.6 & 6.26
fast_beam_search: 2.64 & 6.3

maltium · 2023-02-04T01:33:58Z

@yaozengwei the results you got with the 20M parameter model are better than those with the 70M model according to the results posted https://github.com/k2-fsa/icefall/blob/master/egs/librispeech/ASR/RESULTS.md

Isn't that unexpected?

wangtiance · 2023-02-04T06:29:11Z

@yaozengwei the results you got with the 20M parameter model are better than those with the 70M model according to the results posted https://github.com/k2-fsa/icefall/blob/master/egs/librispeech/ASR/RESULTS.md

Isn't that unexpected?

You're probably comparing it with the streaming ASR results. For non-streaming results see https://github.com/k2-fsa/icefall/blob/master/egs/librispeech/ASR/RESULTS.md#pruned_transducer_stateless7-zipformer

desh2608 · 2023-02-04T20:17:38Z

I have trained a smaller version of the merged Zipformer (pruned_transducer_stateless7) on full librispeech for 30 epochs, with model args:
--num-encoder-layers 2,2,2,2,2 \
--feedforward-dims 768,768,768,768,768 \
--nhead 8,8,8,8,8 \
--encoder-dims 256,256,256,256,256 \
--attention-dims 192,192,192,192,192 \
--encoder-unmasked-dims 192,192,192,192,192 \
--zipformer-downsampling-factors 1,2,4,8,2 \
--cnn-module-kernels 31,31,31,31,31 \
--decoder-dim 512 \
--joiner-dim 512 \
Number of model parameters: 20697573 It got WERs of 2.67 & 6.4 at epoch-30-avg-9 with greedy search.

BTW have you also trained a streaming version of this smaller Zipformer model?

desh2608 · 2023-02-12T18:02:23Z

As a follow up, I trained a small streaming Zipformer model based on the configuration provided by @yaozengwei using the recipe pruned_transducer_stateless7_streaming. It obtains WERs of 3.88/9.53 using modified beam search.

The training logs, tensorboard, and pretrained model are available at: https://huggingface.co/desh2608/icefall-asr-librispeech-pruned-transducer-stateless7-streaming-small

csukuangfj · 2023-02-13T02:04:16Z

As a follow up, I trained a small streaming Zipformer model based on the configuration provided by @yaozengwei using the recipe pruned_transducer_stateless7_streaming. It obtains WERs of 3.88/9.53 using modified beam search.

The training logs, tensorboard, and pretrained model are available at: https://huggingface.co/desh2608/icefall-asr-librispeech-pruned-transducer-stateless7-streaming-small

Could you update RESULTS.md to include this small model?

mohsen-goodarzi · 2023-02-28T12:52:04Z

Why these results are still far behind conformer-s with 10M params from conformer paper (2.7/6.3)?

csukuangfj · 2023-02-28T13:01:16Z

Why these results are still far behind conformer-s with 10M params from conformer paper (2.7/6.3)?

I think it is difficult, if not impossible, to reproduce the results listed in the conformer paper.

danpovey · 2023-02-28T13:55:45Z

You can't really compare streaming vs. non-streaming results; our 20M Zipformer is about the same as the reported 10M Conformer. But no-one has really been able to reproduce that result. For example, here https://arxiv.org/pdf/2207.02971.pdf
in Table 2.1 / 4.3 becomes 2.5/6.0 when they try to reproduce it. I suspect they might have got the scoring wrong: for example, scoring on a token level or something like that. It's still a good architecture, just not quite as good as reported. Either that or something else about their setup is different, that we don't understand.

try adding smaller models

6d49cdf

desh2608 mentioned this pull request Feb 13, 2023

Add small streaming Zipformer transducer model #903

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP] Smaller models #745

[WIP] Smaller models #745

pkufool commented Dec 8, 2022 •

edited

wangtiance commented Dec 9, 2022

pkufool commented Dec 12, 2022

desh2608 commented Jan 25, 2023

pkufool commented Jan 26, 2023

yaozengwei commented Jan 26, 2023

desh2608 commented Jan 26, 2023

yaozengwei commented Jan 28, 2023

maltium commented Feb 4, 2023

wangtiance commented Feb 4, 2023

desh2608 commented Feb 4, 2023

desh2608 commented Feb 12, 2023 •

edited

csukuangfj commented Feb 13, 2023

mohsen-goodarzi commented Feb 28, 2023

csukuangfj commented Feb 28, 2023

danpovey commented Feb 28, 2023

[WIP] Smaller models #745

Are you sure you want to change the base?

[WIP] Smaller models #745

Conversation

pkufool commented Dec 8, 2022 • edited

wangtiance commented Dec 9, 2022

pkufool commented Dec 12, 2022

desh2608 commented Jan 25, 2023

pkufool commented Jan 26, 2023

yaozengwei commented Jan 26, 2023

desh2608 commented Jan 26, 2023

yaozengwei commented Jan 28, 2023

maltium commented Feb 4, 2023

wangtiance commented Feb 4, 2023

desh2608 commented Feb 4, 2023

desh2608 commented Feb 12, 2023 • edited

csukuangfj commented Feb 13, 2023

mohsen-goodarzi commented Feb 28, 2023

csukuangfj commented Feb 28, 2023

danpovey commented Feb 28, 2023

pkufool commented Dec 8, 2022 •

edited

desh2608 commented Feb 12, 2023 •

edited