What's new
Added support for Adalat AI's whisper-medium-ml-rmft, a Malayalam-specific Whisper fine-tune trained with the R-MFT (Reverse Multi-Stage Fine-Tuning) recipe. The model is now first-class supported in GenSRT's chunked inference pipeline alongside SMC's vegam.
Compared to vegam, on the same hardware and chunking plan:
| Measurement | adalat-ai R-MFT | vegam | Direction |
|---|---|---|---|
| Vividh-ASR Broadcast WER | 31.66 | 55.10 | ~43% relative improvement |
| Vividh-ASR Global WER | 39.64 | 53.39 | ~26% relative improvement |
| Runtime on 197s Malayalam news clip (RTX 3060 Ti) | 155s | 296s | ~2× faster |
| Mid-character truncation rate | 2.6% | 10.2% | ~4× cleaner |
Native-reader review on broadcast Malayalam content (FIFA World Cup news clip) confirms a noticeable quality improvement over vegam, especially on English code-switching, place names, and entity names.
How to use
In GenSRT's footer Model dropdown:
- Click New...
- Paste:
adalat-ai/ct2-whisper-medium-ml-rmft - Click Validate & Add
The model auto-routes to chunked inference and auto-pins source language to Malayalam, same as vegam.
Known limitation
The R-MFT model emits more chunk-tail hallucination fragments (typically 1-second cues containing common verb endings) than vegam. These are perceptually invisible in playback but visible in the SRT cue list. UI affordance for surfacing and cleaning them is planned for v1.3.
Acknowledgments
CT2 conversion published by Adalat AI in cooperation with GenSRT — thanks to Kavya Manohar for the conversion and for the technical guidance on Whisper's CT2 decoder length limit (the 224-token text-token clamp that motivated the chunked inference approach in v1.2.0).
Paper: Vividh-ASR: A Robust ASR Benchmark for Indic Languages (arxiv 2605.13087) by Juvekar, Manohar, et al.
Download
gensrt-install-1.2.1.exe — Windows 7z self-extracting installer. Requirements: Windows 10/11, NVIDIA GPU recommended (CPU fallback works).