Add TDT loss kernel by ebezzam · Pull Request #46048 · huggingface/transformers

ebezzam · 2026-05-19T02:32:47Z

What does this PR do?

Add a kernel for faster TDT loss computation (and thus faster training).

Corresponding PR in kernels-community: huggingface/kernels-community#882

Implement Token-and-Duration Transducer (TDT) decoding for Parakeet models, extending the existing CTC-only support. This adds ParakeetForTDT with greedy TDT decoding in generate(), per-token timestamp generation, and full integration with AutoModelForTDT, processors, and ASR pipeline.

- Use -100 label padding for training (HF convention) - Fix timestamp recording in inner blank-seeking loop - Add max_symbols_per_step guard matching NeMo - Clean up decoding loop - Add TDT training example to docs - Use setUpClass for TDT integration tests

…arakeet-tdt

… nvidia checkpoint, style checks.

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

…arakeet-tdt

github-actions · 2026-05-19T06:46:07Z

[For maintainers] Suggested jobs to run (before merge)

run-slow: parakeet

ebezzam · 2026-05-19T06:46:36Z

    "finegrained-fp8": {"repo_id": "kernels-community/finegrained-fp8", "version": 1},
    "deep-gemm": {"repo_id": "kernels-community/deep-gemm", "version": 1},
    "sonic-moe": {"repo_id": "kernels-community/sonic-moe", "revision": "ep-support"},
+    "tdt-loss": {"repo_id": "eustlb/tdt-loss", "revision": "v1"},


Related comment: https://github.com/huggingface/transformers/pull/44171/changes#r3094638013

ebezzam · 2026-05-19T06:50:14Z

+        Verify that ParakeetForTDT loss matches NeMo's TDT loss (sigma=0) for both
+        the CUDA kernel and the pure PyTorch implementation.
        reproducer: https://gist.github.com/883ea42bf7d8ce2af42f3055627476a7


Should we also test for sigma != 0?

HuggingFaceDocBuilderDev · 2026-05-19T07:01:27Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Hainan Xu and others added 30 commits February 20, 2026 09:45

parakeet tdt intergration

fa7d6e0

Add expected outputs for TDT, small fixes.

fa36657

Separate CTC and TDT generate outputs.

05e2e34

Work with auto device, better init,

bb5ff33

Test timestamps and expose token duration.

9ec79b0

Add reproducer link.

33f128e

revert: restore lasr generated files to original state

b33002f

warn: torchaudio rnnt_loss does not train duration head

48b39dd

Relax timestamp test, and test nits.

e9f23ab

feat: TDT training

e2b97aa

chore: for cuda detection and run without patching

6b9fc73

Equivalent timestamp processing as Nemo, and various nits/cleanup.

6c879bc

Merge branch 'parakeet-tdt' of github.com:lmaksym/transformers into p…

149e17f

…arakeet-tdt

Simplify durations config.

36bfa63

Update training examples.

2df0ccc

chore: enable parralelism

388c6d3

chore: performance optimization

08b2b55

fix: formatting

0c4e05a

Doc and testing nits

1ddd804

Use active mask from current step, and nits.

f512670

Better pre-allocate.

07d8e35

TDT has separate pad token and blank token.

fab050a

Merge branch 'main' into parakeet-tdt

c438565

Regenerate lasr.

86d980c

Merge branch 'parakeet-tdt' of github.com:lmaksym/transformers into p…

895c4a0

…arakeet-tdt

Style checks and nits

ab21380

Nits, put back ctc loss test

d0141d5

More standard model output.

f7529d4

eustlb and others added 21 commits April 16, 2026 16:23

kernel loss

7cc9d2e

test loss integration

e753eab

push to hub pr

ed3fa4d

integration tests to rely fully on transcripts

ab66b23

udpate fixtures

a5ba0c6

we don't need to monkey patch with numba anymore!

48279a6

fix pipeline usage

1d7680d

nit

59ddced

fix usage

31490d1

Pass through tests and examples: improve kernel fallback, update with…

d8eb1b6

… nvidia checkpoint, style checks.

Update checkpoint

1f1b912

Merge branch 'main' into parakeet-tdt

9ab08d1

Add TDT to mapping after merge.

fd9f8b1

Fix lasr generate test.

136f676

Output attention mask if labels provided for computing loss.

833d289

Apply suggestion from @ArthurZucker

a1c62a1

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

Improve ParakeetTDTDecoderCache definition and usage.

8683570

Remove tuple parsing.

1d4b0f4

processor refactor

a418eca

Merge branch 'parakeet-tdt' of github.com:lmaksym/transformers into p…

5d0c631

…arakeet-tdt

Update conversion.

5c603c1

ebezzam marked this pull request as draft May 19, 2026 02:33

ebezzam and others added 2 commits May 19, 2026 15:42

Merge branch 'main' into tdt_loss_kernel

09ba99c

Modular after merge.

e743b2d

ebezzam commented May 19, 2026

View reviewed changes

Don't allow all kernels.

8d09cb6

ebezzam commented May 19, 2026

View reviewed changes

ebezzam mentioned this pull request May 19, 2026

tdt-loss: add TDT loss kernel huggingface/kernels-community#882

Open

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add TDT loss kernel#46048

Add TDT loss kernel#46048
ebezzam wants to merge 82 commits into
huggingface:mainfrom
ebezzam:tdt_loss_kernel

ebezzam commented May 19, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented May 19, 2026

Uh oh!

ebezzam May 19, 2026

Uh oh!

ebezzam May 19, 2026

Uh oh!

HuggingFaceDocBuilderDev commented May 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

ebezzam commented May 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Uh oh!

github-actions Bot commented May 19, 2026

Uh oh!

ebezzam May 19, 2026

Choose a reason for hiding this comment

Uh oh!

ebezzam May 19, 2026

Choose a reason for hiding this comment

Uh oh!

HuggingFaceDocBuilderDev commented May 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

ebezzam commented May 19, 2026 •

edited

Loading