Skip to content

fix(nemo): extract Hypothesis.text for TDT/RNNT ASR models#10012

Merged
mudler merged 2 commits into
mudler:masterfrom
fqscfqj:fix/nemo-tdt-hypothesis-text
May 26, 2026
Merged

fix(nemo): extract Hypothesis.text for TDT/RNNT ASR models#10012
mudler merged 2 commits into
mudler:masterfrom
fqscfqj:fix/nemo-tdt-hypothesis-text

Conversation

@fqscfqj
Copy link
Copy Markdown
Contributor

@fqscfqj fqscfqj commented May 26, 2026

Problem

NeMo Parakeet TDT models (e.g. parakeet-tdt-0.6b-v3) deployed via the nemo backend always produce empty transcription output.

Root Cause

NeMo's transcribe() returns different types depending on the model architecture:

  • CTC models (e.g. Whisper): List[str] — works fine
  • TDT/RNNT models (e.g. parakeet-tdt-0.6b-v3): List[Hypothesis] — the decoded text lives in the Hypothesis.text attribute

The backend code at backend/python/nemo/backend.py line 105 did:

text = results[0]  # Hypothesis object, not a str!

This assigned the entire Hypothesis dataclass to the protobuf string field. When protobuf tried to serialize it, it either raised a TypeError (caught by the except block → returns empty) or silently converted to an empty string. Either way, the transcript was always blank.

Fix

Check the return type and extract .text from Hypothesis objects when present:

result = results[0]
if isinstance(result, str):
    text = result
elif hasattr(result, 'text'):
    text = result.text if result.text else 
else:
    text = str(result) if result else 

This is backward-compatible — CTC models still return strings and take the first branch.

Testing

Verified by reading the NeMo source code:

  • nemo/collections/asr/parts/submodules/rnnt_decoding.pyrnnt_decoder_predictions_tensor() always returns List[Hypothesis] even when return_hypotheses=False
  • nemo/collections/asr/models/rnnt_models.py_transcribe_output_processing() passes this through unchanged

Copilot AI review requested due to automatic review settings May 26, 2026 07:24
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Note

Copilot was unable to run its full agentic suite in this review.

Updates transcript extraction logic in the NeMo backend to correctly handle different model output types (CTC strings vs. RNNT/TDT Hypothesis objects).

Changes:

  • Adds type-aware handling for results[0] to extract text from either str or an object with a .text attribute.
  • Improves inline documentation clarifying expected return types from different model families.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread backend/python/nemo/backend.py Outdated
Comment on lines +108 to +111
elif hasattr(result, 'text'):
text = result.text if result.text else ""
else:
text = str(result) if result else ""
Comment thread backend/python/nemo/backend.py Outdated
Comment on lines +110 to +111
else:
text = str(result) if result else ""
fqscfqj added 2 commits May 26, 2026 08:49
CTC models (e.g. Whisper) return List[str] from transcribe(), but
TDT/RNNT models (e.g. parakeet-tdt-0.6b-v3) return List[Hypothesis]
where the decoded text lives in the Hypothesis.text attribute.

Previously, results[0] was assigned directly to the protobuf string
field, causing silent empty output for non-CTC models.

Now checks the return type and extracts .text from Hypothesis objects,
with a safe fallback via getattr().
Use single getattr() call instead of hasattr() + double access,
and return empty string for unknown types instead of str(result)
to avoid leaking internal repr to clients.
@fqscfqj fqscfqj force-pushed the fix/nemo-tdt-hypothesis-text branch from da11db9 to 2e07e48 Compare May 26, 2026 08:50
@mudler
Copy link
Copy Markdown
Owner

mudler commented May 26, 2026

thanks!

@mudler mudler enabled auto-merge (squash) May 26, 2026 20:10
@mudler mudler merged commit df7623f into mudler:master May 26, 2026
57 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants