Why T>=S constraint? #20

BuaaAlban · 2022-12-06T10:03:48Z

Why do we need this constraint? In a regular rnnt, normally the joint may emit many blank symbol, and in this condition, T>S. But it's also possilble that S>T, e.g. we emit at least one non-blank symbols for each encoder frames.

Actually I have met this
File "/rnnt_related/rnnt-mlperf-training/model_rnnt.py", line 203, in fast_joint simple_loss, (px_grad, py_grad) = fast_rnnt.rnnt_loss_simple( File "/anaconda3/envs/fast-rnnt/lib/python3.8/site-packages/fast_rnnt-1.2-py3.8-linux-x86_64.egg/fast_rnnt/rnnt_loss.py", line 282, in rnnt_loss_simple px, py = get_rnnt_logprobs( File "/anaconda3/envs/fast-rnnt/lib/python3.8/site-packages/fast_rnnt-1.2-py3.8-linux-x86_64.egg/fast_rnnt/rnnt_loss.py", line 149, in get_rnnt_logprobs assert T >= S, (T, S) AssertionError: (272, 274)

The text was updated successfully, but these errors were encountered:

csukuangfj · 2022-12-06T10:07:08Z

In a regular rnnt

As you have mentioned, that is for regular RNN-T.

The version we are using is not regular. It has the same condition as CTC training, i.e., S <= T.

csukuangfj · 2022-12-06T10:08:05Z

Here is the paper about fast_rnnt:

https://arxiv.org/pdf/2206.13236.pdf

csukuangfj · 2022-12-06T10:09:24Z

Here is the code to filter data that don't satisfy S<=T in icefall:
https://github.com/k2-fsa/icefall/blob/f13cf61b05432a989e6a42c95b843a56639bcbde/egs/librispeech/ASR/pruned_transducer_stateless2/train.py#L958

        # In ./conformer.py, the conv module uses the following expression
        # for subsampling
        T = ((c.num_frames - 1) // 2 - 1) // 2
        tokens = sp.encode(c.supervisions[0].text, out_type=str)

        if T < len(tokens):
            logging.warning(
                f"Exclude cut with ID {c.id} from training. "
                f"Number of frames (before subsampling): {c.num_frames}. "
                f"Number of frames (after subsampling): {T}. "
                f"Text: {c.supervisions[0].text}. "
                f"Tokens: {tokens}. "
                f"Number of tokens: {len(tokens)}"
            )
            return False

BuaaAlban · 2022-12-06T10:19:10Z

Thanks for your fast reply.
I have tried to modify my code based on this example, I thinks it's a normal transducer. I can filter the data as you said to make it work. I just wonder why we has this limitation (for optimization? Actually I have read your paper yesterday but I didn't notice this condition, I will double check it), could I just comment this assert to make the pruned loss work just like the rnnt_loss (like in torchaudio or warp-transducer)

desh2608 · 2023-01-31T15:37:37Z

@BuaaAlban as you noted, this constraint is indeed not required for the "regular" RNNT topology. Only if you train with the "modified" topology, where you are constrained to emit exactly 1 symbol per time frame, will this constraint be required. We have a PR here (k2-fsa/k2#1149) to remove this constraint from k2. I will also make a similar PR for fast_rnnt.

arkadyark · 2023-05-01T22:58:04Z

@desh2608 are you still planning to make this PR? This would be very useful for my work!

desh2608 · 2023-05-01T23:39:39Z

@arkadyark sorry I forgot to actually push the changes. BTW, I believe Dan fixed some OOM issues in the pruned transducer loss in k2, which hasn't yet been merged in fast_rnnt. So you may want to make those changes yourself.

arkadyark · 2023-05-01T23:44:49Z

Thanks! Which changes are you referring to? Looking through recent changes to rnnt_loss.py I don't see anything there.

desh2608 · 2023-05-01T23:51:20Z

Thanks! Which changes are you referring to? Looking through recent changes to rnnt_loss.py I don't see anything there.

Check k2-fsa/k2#1177 and k2-fsa/k2#1183

danpovey · 2023-05-02T08:32:45Z

Ah yes. Arkady, it would be great if you could make a PR to fast_rnnt with those changes, I had forgotten about that. If not LMK, I'll ask someone here.

arkadyark · 2023-05-02T15:44:45Z

I would love to contribute those back, but unfortunately there's a fairly involved open-source contribution process at my organization that would take a while, it'd probably be best to find someone else to do so.

However, I did test this out locally, and re-ran the benchmarking at https://github.com/csukuangfj/transducer-loss-benchmarking - the results look really good, peak memory usage goes from 3820 all the way down to 1182 (!), and from 2647 to 835 when sorting utterances. Step time (on my hardware) went from 343k to 280k us.

Pretty cool! Always gotta be careful with those torch.gathers.

arkadyark · 2023-07-17T16:06:53Z

Hey @danpovey , just wanted to follow up - is anybody able to make those changes here?

danpovey · 2023-07-17T16:29:08Z

@pkufool could you please have a look at this?

pkufool · 2023-07-19T02:56:02Z

@danpovey Yifan has already made PRs here #26 and #24 , you can merge it.

pkufool · 2023-08-25T10:53:35Z

closed by #29

desh2608 mentioned this issue May 1, 2023

Remove T>=S assertion for regular transducer loss #25

Closed

pkufool closed this as completed Aug 25, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Why T>=S constraint? #20

Why T>=S constraint? #20

BuaaAlban commented Dec 6, 2022

csukuangfj commented Dec 6, 2022

csukuangfj commented Dec 6, 2022

csukuangfj commented Dec 6, 2022

BuaaAlban commented Dec 6, 2022

desh2608 commented Jan 31, 2023

arkadyark commented May 1, 2023

desh2608 commented May 1, 2023

arkadyark commented May 1, 2023

desh2608 commented May 1, 2023

danpovey commented May 2, 2023

arkadyark commented May 2, 2023

arkadyark commented Jul 17, 2023

danpovey commented Jul 17, 2023

pkufool commented Jul 19, 2023

pkufool commented Aug 25, 2023

Why T>=S constraint? #20

Why T>=S constraint? #20

Comments

BuaaAlban commented Dec 6, 2022

csukuangfj commented Dec 6, 2022

csukuangfj commented Dec 6, 2022

csukuangfj commented Dec 6, 2022

BuaaAlban commented Dec 6, 2022

desh2608 commented Jan 31, 2023

arkadyark commented May 1, 2023

desh2608 commented May 1, 2023

arkadyark commented May 1, 2023

desh2608 commented May 1, 2023

danpovey commented May 2, 2023

arkadyark commented May 2, 2023

arkadyark commented Jul 17, 2023

danpovey commented Jul 17, 2023

pkufool commented Jul 19, 2023

pkufool commented Aug 25, 2023