Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Offline/Online (standalone) ESPnet2 Transducer #4479

Merged
merged 125 commits into from Aug 17, 2022
Merged
Show file tree
Hide file tree
Changes from 118 commits
Commits
Show all changes
125 commits
Select commit Hold shift + click to select a range
de3976e
remove espnet2 transducer tests
b-flo Feb 4, 2022
170093a
remove previous transducer version
b-flo Feb 4, 2022
a10025b
add new transducer version
b-flo Feb 4, 2022
a9a08f1
add dummy handle for transducer asr task
b-flo Feb 4, 2022
fee3c08
fix conflict
b-flo Feb 4, 2022
4c7351b
cleaner changes to template
b-flo Feb 4, 2022
c9cd279
Merge branch 'master' into espnet2_transducer_v2
b-flo Feb 11, 2022
8488e46
fix conflict
b-flo Feb 14, 2022
8c7fc89
add back initialization options + chainer_espnet1 option
b-flo Feb 14, 2022
32edf84
transducer v2.1
b-flo Mar 21, 2022
e9410ec
fix conflicts
b-flo Mar 21, 2022
75bf418
fix conflicts (2)
b-flo Mar 21, 2022
cdbae6b
fix ci
b-flo Mar 21, 2022
1d0175f
remove joint_network argument
b-flo Mar 21, 2022
a1feb7c
add missing commit (fix)
b-flo Mar 22, 2022
2c2e5c7
fix assert, init and lm batch score
b-flo Mar 22, 2022
26b5519
add first tests
b-flo Mar 22, 2022
d44ec84
fix case where input_conf is empty
b-flo Mar 22, 2022
306db25
fixes and clean-up
b-flo Mar 26, 2022
5caa3f6
second batch of unit tests
b-flo Mar 26, 2022
c99131c
revert autocast changes for ci
b-flo Mar 26, 2022
57d8533
fix conflict
b-flo Mar 27, 2022
0fe2b57
add missing test files
b-flo Mar 28, 2022
d1ece4e
naming + default params
b-flo Mar 30, 2022
5db3e0e
add transducer documentation
b-flo Mar 30, 2022
0872f78
fix/improve doc
b-flo Mar 31, 2022
47695ab
fix error reporting
b-flo Apr 1, 2022
e53077a
improve test coverage
b-flo Apr 1, 2022
24f8751
Merge branch 'master' into espnet2_transducer_v2
b-flo Apr 2, 2022
1ca2089
fix typo
b-flo Apr 4, 2022
3e06834
reduce beam search test time
b-flo Apr 4, 2022
63705c7
Merge branch 'master' into espnet2_transducer_v2
b-flo Apr 7, 2022
c787011
add integration test + missing doc
b-flo Apr 7, 2022
bbe4a44
refactor activation func handler + add new types + add tests
b-flo Apr 11, 2022
372ff80
add + fix documentation for act. func. parameters
b-flo Apr 11, 2022
cec16da
fix/update act. func. docs and types
b-flo Apr 11, 2022
9ffe08e
add/remove activations based on experiments
b-flo Apr 12, 2022
4f400b3
fix label_sequence typing
b-flo Apr 12, 2022
b92aedc
fix label_sequence typing (2)
b-flo Apr 12, 2022
b25d39f
typo
b-flo Apr 14, 2022
bb66c00
fix importerror during inference if warp_rnnt is missing
b-flo Apr 14, 2022
7b5292e
fix bad rendering + sentence
b-flo Apr 17, 2022
8a76ff2
Merge branch 'master' into espnet2_transducer_v2
b-flo Apr 19, 2022
6b22e2b
add quantization capabilities
b-flo Apr 21, 2022
5100601
add quantization capabilities (2)
b-flo Apr 21, 2022
2e4a7d1
add quantization tests
b-flo Apr 21, 2022
7af9159
(mAES) apply recombine_hyps after each timestep
b-flo Apr 22, 2022
8eed522
Merge branch 'master' into espnet2_transducer_v2
b-flo Apr 23, 2022
3c7c4ea
remove transducer reference for enh task
b-flo Apr 23, 2022
0abbf7a
fix quantize test indent
b-flo Apr 23, 2022
2cf36a3
add kwargs to forward and collect_feats for utts_id handling
b-flo Apr 23, 2022
2887f67
(mAES) swap sorted and recombine_hyps
b-flo Apr 24, 2022
fb477ff
Merge branch 'master' into espnet2_transducer_v2
b-flo Apr 25, 2022
60e17b3
move espnet2/asr/transducer to espnet2/asr_transducer
b-flo Apr 26, 2022
9aa2499
add back old transducer version
b-flo Apr 26, 2022
beb786c
add back test for old transducer version and rearrange new tests
b-flo Apr 26, 2022
bd06120
fix file mode
b-flo Apr 26, 2022
e58fc60
fix conflict between asr/transducer and asr_transducer tests
b-flo Apr 26, 2022
70b1db3
add disclaimer to transducer tutorial for the multiple versions
b-flo Apr 26, 2022
294610d
add missing files
b-flo Apr 26, 2022
4f2e28e
remove espnet1 ref (single local usage)
b-flo Apr 27, 2022
4e954d9
revert espnet1 doc changes; clean-up espnet2 doc introduction
b-flo Apr 27, 2022
571814c
use asr_task scheme instead of asr_transducer
b-flo Apr 27, 2022
4fd9e63
fix integration tests
b-flo Apr 27, 2022
0157e81
add back blank lines
b-flo Apr 27, 2022
2d912bc
add streaming custom transducer version
b-flo Jun 28, 2022
78d495c
fix conflicts
b-flo Jun 28, 2022
7b951b8
fix conflicts (2)
b-flo Jun 28, 2022
5ba1015
fix conflicts (3)
b-flo Jun 28, 2022
40e88a1
use correct files version + small clean-up
b-flo Jun 29, 2022
ce44ab3
Merge branch 'master' into streaming_transducer_v2
b-flo Jun 29, 2022
65a8b71
add init + patch1
b-flo Jun 29, 2022
f42068e
add init + patch2
b-flo Jun 29, 2022
4998148
apply black
b-flo Jun 29, 2022
f6bd2b8
apply isort
b-flo Jun 30, 2022
3ec8137
modify recombine_hyps + fix doc and typehint
b-flo Jun 30, 2022
a672fe6
Merge branch 'master' into streaming_transducer_v2
b-flo Jun 30, 2022
0c1603f
fix yseq conversion to str
b-flo Jun 30, 2022
3e67544
patch3: (minor) refactor for caches + fix docstrings + clean-up
b-flo Jul 1, 2022
9a3bdee
patch4: switch distutils for packaging + activations functions fix+doc
b-flo Jul 1, 2022
6d5858c
renaming: sequence -> x
b-flo Jul 1, 2022
75b70cf
patch5: fix conv1d and related issues + docstrings
b-flo Jul 3, 2022
26aa0b3
update tests for new version
b-flo Jul 3, 2022
7d5cc9a
update transducer tests (shared/v1)
b-flo Jul 3, 2022
47dd412
add dummy tests for streaming model
b-flo Jul 4, 2022
4fbfc82
fix docstrings and typehints
b-flo Jul 5, 2022
e0a0431
fix score_cache and docstring
b-flo Jul 5, 2022
5ee5be9
remove unnecessary typehints
b-flo Jul 5, 2022
986f542
address reviews
b-flo Jul 5, 2022
7f03651
address reviews (2)
b-flo Jul 5, 2022
1212fcd
remove blank_id mentions (hardcoded to 0)
b-flo Jul 6, 2022
097c12a
apply black
b-flo Jul 6, 2022
730d874
remove leading comment symbol
b-flo Jul 6, 2022
3e8d90f
handles for left_context=0 + fix right_context default values
b-flo Jul 18, 2022
625f691
fix raw context computation
b-flo Jul 20, 2022
726a87f
modify last chunk condition and padding
b-flo Jul 25, 2022
168ee92
Merge branch 'master' into streaming_transducer_v2
b-flo Jul 25, 2022
d398ff2
apply black
b-flo Jul 25, 2022
89e936c
add simplified attention score
b-flo Jul 26, 2022
e8302b3
add basic norm support
b-flo Jul 26, 2022
3ea97a2
update building blocks
b-flo Jul 26, 2022
ab36203
update unit tests
b-flo Jul 26, 2022
1bc112f
update transducer docs
b-flo Jul 26, 2022
4e16acc
fix norm layer selection and conv_size definition
b-flo Jul 27, 2022
b92faa6
remove init related to RNN encoder
b-flo Jul 27, 2022
ffc9e51
update tests + improve coverage
b-flo Jul 27, 2022
f9b8a36
remove unecessary condition
b-flo Jul 28, 2022
e95ba8a
fix and import various tests
b-flo Jul 28, 2022
f085c83
revert changes
b-flo Jul 28, 2022
59ad8ff
clean up doc, docstring and default parameters
b-flo Aug 1, 2022
b97990a
Merge branch 'master' into streaming_transducer_v2
b-flo Aug 1, 2022
5ef55e7
change condition for RTF computation
b-flo Aug 1, 2022
be2f2d6
typo
b-flo Aug 2, 2022
e3b17f9
add doc for adding new block
b-flo Aug 2, 2022
0f53891
split offline and online decoding methods
b-flo Aug 2, 2022
27c2b92
typo
b-flo Aug 2, 2022
5233e1b
fix FAQ
b-flo Aug 2, 2022
74bc79d
remove initialization
b-flo Aug 2, 2022
9801644
remove initialization (2)
b-flo Aug 2, 2022
84a5181
remove unused initialization code
b-flo Aug 4, 2022
a3c6c22
refactor normalization module
b-flo Aug 4, 2022
5df4f43
update unit tests
b-flo Aug 4, 2022
61eb415
update+fix parameters and doc
b-flo Aug 4, 2022
6d0f7e1
fix conflict
b-flo Aug 11, 2022
64ae24c
fix conflict (2)
b-flo Aug 11, 2022
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
24 changes: 24 additions & 0 deletions ci/test_integration_espnet2.sh
Expand Up @@ -53,6 +53,30 @@ if python3 -c "import k2" &> /dev/null; then
--asr-args "--model_conf extract_feats_in_collect_stats=false --max_epoch=1"
fi

if python3 -c "from warprnnt_pytorch import RNNTLoss" &> /dev/null; then
echo "==== [ESPnet2] ASR Transducer (standalone) ==="

for t in ${token_types}; do
asr_tag="transducer_${t}"

echo "==== [Conformer-RNN-T] feats_type=raw, token_types=${t}, model_conf.extract_feats_in_collect_stats=False, normalize=utt_mvn ==="
./run.sh --asr_task "asr_transducer" --ngpu 0 --stage 10 --stop-stage 13 --skip-upload false --feats-type "raw" --token-type ${t} \
--feats_normalize "utterance_mvn" --lm-args "--max_epoch=1" --python "${python}" --inference_asr_model "valid.loss.best.pth" \
--asr-tag "${asr_tag}_conformer" --asr-args "--model_conf extract_feats_in_collect_stats=false --max_epoch=1 \
--encoder_conf body_conf='[{'block_type': 'conformer', 'hidden_size': 30, 'linear_size': 30, 'heads': 2, 'conv_mod_kernel_size': 3}]' \
--decoder_conf='{'embed_size': 30, 'hidden_size': 30}' --joint_network_conf joint_space_size=30"

echo "==== [Streaming Conformer-RNN-T] feats_type=raw, token_types=${t}, model_conf.extract_feats_in_collect_stats=False, normalize=utt_mvn ==="
./run.sh --asr_task "asr_transducer" --ngpu 0 --stage 10 --stop-stage 13 --skip-upload false --feats-type "raw" --token-type ${t} \
--feats_normalize "utterance_mvn" --lm-args "--max_epoch=1" --python "${python}" --inference_asr_model "valid.loss.best.pth" \
--asr-tag "${asr_tag}_conformer_streaming" --asr-args "--model_conf extract_feats_in_collect_stats=false --max_epoch=1 \
--encoder_conf main_conf='{'dynamic_chunk_training': True}' \
--encoder_conf body_conf='[{'block_type': 'conformer', 'hidden_size': 30, 'linear_size': 30, 'heads': 2, 'conv_mod_kernel_size': 3}]' \
--decoder_conf='{'embed_size': 30, 'hidden_size': 30}' --joint_network_conf joint_space_size=30 " \
--inference-args "--streaming true --chunk_size 2 --left_context 2 --right_context 0"
done
fi

# Remove generated files in order to reduce the disk usage
rm -rf exp dump data
cd "${cwd}"
Expand Down
244 changes: 242 additions & 2 deletions doc/espnet2_tutorial.md

Large diffs are not rendered by default.

46 changes: 24 additions & 22 deletions egs2/TEMPLATE/asr1/asr.sh
Expand Up @@ -83,6 +83,7 @@ num_splits_lm=1 # Number of splitting for lm corpus.
word_vocab_size=10000 # Size of word vocabulary.

# ASR model related
asr_task=asr # ASR task mode. Either 'asr' or 'asr_transducer'.
asr_tag= # Suffix to the result dir for asr model training.
asr_exp= # Specify the directory path for ASR experiment.
# If this option is specified, asr_tag is ignored.
Expand Down Expand Up @@ -203,6 +204,7 @@ Options:
--num_splits_lm # Number of splitting for lm corpus (default="${num_splits_lm}").

# ASR model related
--asr_task # ASR task mode. Either 'asr' or 'asr_transducer'. (default="${asr_task}").
--asr_tag # Suffix to the result dir for asr model training (default="${asr_tag}").
--asr_exp # Specify the directory path for ASR experiment.
# If this option is specified, asr_tag is ignored (default="${asr_exp}").
Expand Down Expand Up @@ -974,7 +976,7 @@ if ! "${skip_train}"; then

# shellcheck disable=SC2046,SC2086
${train_cmd} JOB=1:"${_nj}" "${_logdir}"/stats.JOB.log \
${python} -m espnet2.bin.asr_train \
${python} -m espnet2.bin.${asr_task}_train \
--collect_stats true \
--use_preprocessor true \
--bpemodel "${bpemodel}" \
Expand Down Expand Up @@ -1101,7 +1103,7 @@ if ! "${skip_train}"; then
--num_nodes "${num_nodes}" \
--init_file_prefix "${asr_exp}"/.dist_init_ \
--multiprocessing_distributed true -- \
${python} -m espnet2.bin.asr_train \
${python} -m espnet2.bin.${asr_task}_train \
--use_preprocessor true \
--bpemodel "${bpemodel}" \
--token_type "${token_type}" \
Expand Down Expand Up @@ -1191,24 +1193,24 @@ if ! "${skip_eval}"; then
# 2. Generate run.sh
log "Generate '${asr_exp}/${inference_tag}/run.sh'. You can resume the process from stage 12 using this script"
mkdir -p "${asr_exp}/${inference_tag}"; echo "${run_args} --stage 12 \"\$@\"; exit \$?" > "${asr_exp}/${inference_tag}/run.sh"; chmod +x "${asr_exp}/${inference_tag}/run.sh"
if "${use_k2}"; then
# Now only _nj=1 is verified if using k2
asr_inference_tool="espnet2.bin.asr_inference_k2"

_opts+="--is_ctc_decoding ${k2_ctc_decoding} "
_opts+="--use_nbest_rescoring ${use_nbest_rescoring} "
_opts+="--num_paths ${num_paths} "
_opts+="--nll_batch_size ${nll_batch_size} "
_opts+="--k2_config ${k2_config} "
else
if "${use_streaming}"; then
asr_inference_tool="espnet2.bin.asr_inference_streaming"
elif "${use_maskctc}"; then
asr_inference_tool="espnet2.bin.asr_inference_maskctc"
else
asr_inference_tool="espnet2.bin.asr_inference"
fi
fi

inference_bin_tag=""
if [ ${asr_task} == "asr" ]; then
if "${use_k2}"; then
# Now only _nj=1 is verified if using k2
inference_bin_tag="_k2"

_opts+="--is_ctc_decoding ${k2_ctc_decoding} "
_opts+="--use_nbest_rescoring ${use_nbest_rescoring} "
_opts+="--num_paths ${num_paths} "
_opts+="--nll_batch_size ${nll_batch_size} "
_opts+="--k2_config ${k2_config} "
elif "${use_streaming}"; then
inference_bin_tag="_streaming"
elif "${use_maskctc}"; then
inference_bin_tag="_maskctc"
fi
fi

for dset in ${test_sets}; do
_data="${data_feats}/${dset}"
Expand Down Expand Up @@ -1250,7 +1252,7 @@ if ! "${skip_eval}"; then
rm -f "${_logdir}/*.log"
# shellcheck disable=SC2046,SC2086
${_cmd} --gpu "${_ngpu}" JOB=1:"${_nj}" "${_logdir}"/asr_inference.JOB.log \
${python} -m ${asr_inference_tool} \
${python} -m espnet2.bin.${asr_task}_inference${inference_bin_tag} \
--batch_size ${batch_size} \
--ngpu "${_ngpu}" \
--data_path_and_name_and_type "${_data}/${_scp},speech,${_type}" \
Expand All @@ -1261,7 +1263,7 @@ if ! "${skip_eval}"; then
${_opts} ${inference_args} || { cat $(grep -l -i error "${_logdir}"/asr_inference.*.log) ; exit 1; }

# 3. Calculate and report RTF based on decoding logs
if [ $asr_inference_tool == "espnet2.bin.asr_inference" ]; then
if [ ${asr_task} == "asr" ] && [ -z ${inference_bin_tag} ]; then
log "Calculating RTF & latency... log: '${_logdir}/calculate_rtf.log'"
rm -f "${_logdir}"/calculate_rtf.log
_fs=$(python3 -c "import humanfriendly as h;print(h.parse_size('${fs}'))")
Expand Down
2 changes: 1 addition & 1 deletion espnet2/asr/espnet_model.py
Expand Up @@ -14,7 +14,7 @@
from espnet2.asr.preencoder.abs_preencoder import AbsPreEncoder
from espnet2.asr.specaug.abs_specaug import AbsSpecAug
from espnet2.asr.transducer.error_calculator import ErrorCalculatorTransducer
from espnet2.asr.transducer.utils import get_transducer_task_io
from espnet2.asr_transducer.utils import get_transducer_task_io
from espnet2.layers.abs_normalize import AbsNormalize
from espnet2.torch_utils.device_funcs import force_gatherable
from espnet2.train.abs_espnet_model import AbsESPnetModel
Expand Down
2 changes: 1 addition & 1 deletion espnet2/asr/transducer/beam_search_transducer.py
Expand Up @@ -8,7 +8,7 @@
import torch

from espnet2.asr.decoder.abs_decoder import AbsDecoder
from espnet2.asr.transducer.joint_network import JointNetwork
from espnet2.asr_transducer.joint_network import JointNetwork
from espnet2.lm.transformer_lm import TransformerLM
from espnet.nets.pytorch_backend.transducer.utils import (
is_prefix,
Expand Down
49 changes: 0 additions & 49 deletions espnet2/asr/transducer/utils.py

This file was deleted.

Empty file.