Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Streaming decoding for multi-zh_hans recipe #1369

Open
wants to merge 18 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
51 changes: 51 additions & 0 deletions .github/scripts/run-multi-corpora-zipformer.sh
Original file line number Diff line number Diff line change
Expand Up @@ -98,6 +98,57 @@ done

rm -rf $repo

log "==== Test icefall-asr-multi-zh-hans-zipformer-ctc-streaming-2023-11-05 ===="
repo_url=https://huggingface.co/zrjin/icefall-asr-multi-zh-hans-zipformer-ctc-streaming-2023-11-05/

log "Downloading pre-trained model from $repo_url"
git lfs install
git clone $repo_url
repo=$(basename $repo_url)


log "Display test files"
tree $repo/
ls -lh $repo/test_wavs/*.wav

pushd $repo/exp
ln -s epoch-20.pt epoch-99.pt
popd

ls -lh $repo/exp/*.pt


./zipformer/pretrained.py \
--checkpoint $repo/exp/epoch-99.pt \
--tokens $repo/data/lang_bpe_2000/tokens.txt \
--use-ctc 1 \
--causal 1 \
--chunk-size 16 \
--left-context-frames 128 \
--method greedy_search \
$repo/test_wavs/DEV_T0000000000.wav \
$repo/test_wavs/DEV_T0000000001.wav \
$repo/test_wavs/DEV_T0000000002.wav

for method in modified_beam_search fast_beam_search; do
log "$method"

./zipformer/pretrained.py \
--method $method \
--beam-size 4 \
--use-ctc 1 \
--causal 1 \
--chunk-size 16 \
--left-context-frames 128 \
--checkpoint $repo/exp/epoch-99.pt \
--tokens $repo/data/lang_bpe_2000/tokens.txt \
$repo/test_wavs/DEV_T0000000000.wav \
$repo/test_wavs/DEV_T0000000001.wav \
$repo/test_wavs/DEV_T0000000002.wav
done

rm -rf $repo

cd ../../../egs/multi_zh_en/ASR
log "==== Test icefall-asr-zipformer-multi-zh-en-2023-11-22 ===="
repo_url=https://huggingface.co/zrjin/icefall-asr-zipformer-multi-zh-en-2023-11-22/
Expand Down
49 changes: 45 additions & 4 deletions egs/multi_zh-hans/ASR/RESULTS.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,47 @@

This is the [pull request #1238](https://github.com/k2-fsa/icefall/pull/1238) in icefall.

#### Streaming (with CTC head)

Best results (num of params : ~69M):

The training command:

```
./zipformer/train.py \
--world-size 4 \
--num-epochs 20 \
--use-fp16 1 \
--max-duration 600 \
--num-workers 8 \
--use-ctc 1 \
--causal 1 \
--exp-dir ./zipformer/exp-streaming
```

The decoding command:

```
./zipformer/decode.py \
--causal 1 \
--avg 1 \
--epoch 20 \
--chunk-size 32 \
--left-context-frames 256 \
--use-ctc 1 \
--exp-dir ./zipformer/exp-streaming
```

Character Error Rates (CERs) listed below are produced by the checkpoint of the 20th epoch using BPE model ( # tokens is 2000, byte fallback enabled).

| Datasets | aidatatang _200zh | aidatatang _200zh | alimeeting | alimeeting | aishell-1 | aishell-1 | aishell-2 | aishell-2 | aishell-4 | magicdata | magicdata | kespeech-asr | kespeech-asr | kespeech-asr | WenetSpeech | WenetSpeech | WenetSpeech |
|--------------------------------|------------------------------|-------------|-------------------|--------------|----------------|-------------|------------------|-------------|------------------|------------------|-------------|-----------------------|-----------------------|-------------|--------------------|-------------------------|---------------------|
| Zipformer CER (%) | dev | test | eval | test | dev | test | dev | test | test | dev | test | dev phase1 | dev phase2 | test | dev | test meeting | test net |
| Encoder CTC Decoding (Non-streaming) | 6.17 | 6.75 | 30.0 | 31.44 | 3.8 | 4.48 | 4.69 | 5.18 | 19.59 | 10.85 | 9.83 | 14.45 | 8.53 | 16.48 | 10.24 | 9.73 | 12.03 |
| Transducer Greedy Search | 4.71 | 5.18 | 33.78 | 35.2 | 3.28 | 3.63 | 4.34 | 4.76 | 21.99 | 4.65 | 3.98 | 10.42 | 3.94 | 12.54 | 8.16 | 9.73 | 9.66 |

Pre-trained model can be found here : https://huggingface.co/zrjin/icefall-asr-multi-zh-hans-zipformer-ctc-streaming-2023-11-05/

#### Non-streaming (with CTC head)

Best results (num of params : ~69M):
Expand Down Expand Up @@ -33,9 +74,9 @@ Character Error Rates (CERs) listed below are produced by the checkpoint of the

| Datasets | aidatatang _200zh | aidatatang _200zh | alimeeting | alimeeting | aishell-1 | aishell-1 | aishell-2 | aishell-2 | aishell-4 | magicdata | magicdata | kespeech-asr | kespeech-asr | kespeech-asr | WenetSpeech | WenetSpeech | WenetSpeech |
|--------------------------------|------------------------------|-------------|-------------------|--------------|----------------|-------------|------------------|-------------|------------------|------------------|-------------|-----------------------|-----------------------|-------------|--------------------|-------------------------|---------------------|
| Zipformer CER (%) | dev | test | eval | test | dev | test | dev | test | test | dev | test | dev phase1 | dev phase2 | test | dev | test meeting | test net |
| CTC Decoding | 2.86 | 3.36 | 22.93 | 24.28 | 2.05 | 2.27 | 3.33 | 3.82 | 15.45 | 3.49 | 2.77 | 6.90 | 2.85 | 8.29 | 9.41 | 6.92 | 8.57 |
| Greedy Search | 3.36 | 3.83 | 23.90 | 25.18 | 2.77 | 3.08 | 3.70 | 4.04 | 16.13 | 3.77 | 3.15 | 6.88 | 3.14 | 8.08 | 9.04 | 7.19 | 8.17 |
| Zipformer CER (%) | dev | test | eval | test | dev | test | dev | test | test | dev | test | dev phase1 | dev phase2 | test | dev | test meeting | test net |
| Encoder CTC Decoding | 2.86 | 3.36 | 22.93 | 24.28 | 2.05 | 2.27 | 3.33 | 3.82 | 15.45 | 3.49 | 2.77 | 6.90 | 2.85 | 8.29 | 9.41 | 6.92 | 8.57 |
| Transducer Greedy Search | 3.36 | 3.83 | 23.90 | 25.18 | 2.77 | 3.08 | 3.70 | 4.04 | 16.13 | 3.77 | 3.15 | 6.88 | 3.14 | 8.08 | 9.04 | 7.19 | 8.17 |

Pre-trained model can be found here : https://huggingface.co/zrjin/icefall-asr-multi-zh-hans-zipformer-ctc-2023-10-24/

Expand Down Expand Up @@ -66,7 +107,7 @@ Character Error Rates (CERs) listed below are produced by the checkpoint of the

| Datasets | aidatatang _200zh | aidatatang _200zh | alimeeting | alimeeting | aishell-1 | aishell-1 | aishell-2 | aishell-2 | aishell-4 | magicdata | magicdata | kespeech-asr | kespeech-asr | kespeech-asr | WenetSpeech | WenetSpeech | WenetSpeech |
|--------------------------------|------------------------------|-------------|-------------------|--------------|----------------|-------------|------------------|-------------|------------------|------------------|-------------|-----------------------|-----------------------|-------------|--------------------|-------------------------|---------------------|
| Zipformer CER (%) | dev | test | eval| test | dev | test | dev| test | test | dev| test | dev phase1 | dev phase2 | test | dev | test meeting | test net |
| Zipformer CER (%) | dev | test | eval| test | dev | test | dev| test | test | dev| test | dev phase1 | dev phase2 | test | dev | test meeting | test net |
| Greedy Search | 3.2 | 3.67 | 23.15 | 24.78 | 2.91 | 3.04 | 3.59 | 4.03 | 15.68 | 3.68 | 3.12 | 6.69 | 3.19 | 8.01 | 9.32 | 7.05 | 8.78 |


Expand Down
1 change: 1 addition & 0 deletions egs/multi_zh-hans/ASR/zipformer/decode_stream.py
1 change: 0 additions & 1 deletion egs/multi_zh-hans/ASR/zipformer/streaming_decode.py

This file was deleted.