Skip to content

Releases: mountlord/GenSRT

Release-1.2.01

18 Jun 04:06

Choose a tag to compare

What's new

Added support for Adalat AI's whisper-medium-ml-rmft, a Malayalam-specific Whisper fine-tune trained with the R-MFT (Reverse Multi-Stage Fine-Tuning) recipe. The model is now first-class supported in GenSRT's chunked inference pipeline alongside SMC's vegam.

Compared to vegam, on the same hardware and chunking plan:

Measurement adalat-ai R-MFT vegam Direction
Vividh-ASR Broadcast WER 31.66 55.10 ~43% relative improvement
Vividh-ASR Global WER 39.64 53.39 ~26% relative improvement
Runtime on 197s Malayalam news clip (RTX 3060 Ti) 155s 296s ~2× faster
Mid-character truncation rate 2.6% 10.2% ~4× cleaner

Native-reader review on broadcast Malayalam content (FIFA World Cup news clip) confirms a noticeable quality improvement over vegam, especially on English code-switching, place names, and entity names.

How to use

In GenSRT's footer Model dropdown:

  1. Click New...
  2. Paste: adalat-ai/ct2-whisper-medium-ml-rmft
  3. Click Validate & Add

The model auto-routes to chunked inference and auto-pins source language to Malayalam, same as vegam.

Known limitation

The R-MFT model emits more chunk-tail hallucination fragments (typically 1-second cues containing common verb endings) than vegam. These are perceptually invisible in playback but visible in the SRT cue list. UI affordance for surfacing and cleaning them is planned for v1.3.

Acknowledgments

CT2 conversion published by Adalat AI in cooperation with GenSRT — thanks to Kavya Manohar for the conversion and for the technical guidance on Whisper's CT2 decoder length limit (the 224-token text-token clamp that motivated the chunked inference approach in v1.2.0).

Paper: Vividh-ASR: A Robust ASR Benchmark for Indic Languages (arxiv 2605.13087) by Juvekar, Manohar, et al.

Download

gensrt-install-1.2.1.exe — Windows 7z self-extracting installer. Requirements: Windows 10/11, NVIDIA GPU recommended (CPU fallback works).

Release-1.2.00

15 Jun 01:46

Choose a tag to compare

GenSRT v1.2 — Chunked inference for fine-tuned Whisper models

What's new

Chunked inference for fine-tuned Whisper models. Community fine-tunes like

smcproject/vegam-whisper-medium-ml

were practically unusable on long-form audio in v1.1 — they would transcribe the first ~6-8 seconds and silently drop the rest. v1.2 solves this with silent-boundary chunked inference: audio is sliced along naturally-detected pauses, each chunk is transcribed independently, and the per-chunk results are assembled into a single SRT with original timestamps preserved.

For Malayalam users with vegam, this typically produces 2-3× more transcribed content than running the same model without chunking. The chunked path engages automatically when a fine-tuned model is selected — no configuration required.

ASR engine factory. Pluggable engine layer under

gensrt/asr/

mirroring the existing translation factory pattern. Two engines ship in v1.2:

Whisper (Multilingual)

for the built-in OpenAI Whisper sizes and

Whisper (Monolingual)

for chunked inference on fine-tunes. Future engines (e.g. IndicConformer) can slot in alongside without changes to the pipeline.

Other improvements

  • Auto-Detect now works correctly with monolingual fine-tunes. Models in the known-monolingual registry (currently the vegam variants from SMC and Kurian Benoy's namespace) use their registered training language directly, skipping the per-chunk language detection that produces unreliable results on fine-tuned models. For unknown custom Whisper models, language is detected on the first chunk and reused for the rest.
  • Translation batching fixed for Indic scripts. Google Translate calls now budget by URL-encoded byte count rather than Unicode character count. Previously, batches of Malayalam (or Hindi, Tamil, Bengali — any 3-byte UTF-8 script) blew past Google's URL length limit and silently fell back to MyMemory.
  • In-player subtitle display works after long-running operations. Several
    <track>
    
    element state issues in CEF/WebView2 (pywebview's renderer) were causing subtitles not to display in the player overlay after Generate completed. Replacing the
    <track>
    
    element on each refresh sidesteps the issue.
  • **Bundled
    ffmpeg
    
    and
    ffprobe
    
    .** No separate ffmpeg install required on target machines (in fact, an outdated ffmpeg on the system PATH is bypassed in favor of the bundled one).

Acknowledgments

GenSRT's chunked inference path was developed against vegam-whisper-medium-ml from Swathanthra Malayalam Computing (SMC). Kavya Manohar, Leena G Pillai, and Elizabeth Sherly's analysis of Indic-script ASR evaluation pitfalls (arxiv 2409.02449) shaped how we think about quality measurement for these models. AI4Bharat's OIWER benchmark (arxiv 2603.00941) provides the most rigorous Malayalam ASR comparison currently published.

Known limitations

  • Vegam occasionally emits a phrase from earlier in the audio at chunk tails — visible as substring overlap with the previous cue's text. A post-processor for this is candidate work for v1.3.
  • Whisper's tokenizer can stop generating mid-character on Indic scripts. A
    at the end of a subtitle line is GenSRT signaling this honestly rather than masking it; the text before the
    is accurate. See
    docs/INVESTIGATIONS.md
    
    for the full story.
  • IndicConformer-based ASR was evaluated during v1.2 development and deferred — vegam with chunked inference proved sufficient for the release quality bar. The investigation arc is documented in
    docs/INVESTIGATIONS.md
    
    and can be re-opened if quality complaints arise.

Getting started

Download

gensrt-install.exe

from the assets below and execute the program in a folder of your choice. GenSRT will be installed in gensrt subfolder. From there run

gensrt.exe

. See

user_guide.html

next to the executable for full usage.

Requirements: Windows 10 or 11, a CUDA-capable NVIDIA GPU (~2 GB VRAM is plenty), and a stable internet connection for the first-run model download.

For Malayalam transcription: select

smcproject/vegam-whisper-medium-ml-int8_float16

from the footer Model dropdown. The model auto-downloads on first use (~1.5 GB) and is cached for subsequent runs.

Release-1.1.00

10 Jun 23:03

Choose a tag to compare

Added ability to burn subtitles into the video using ffmpeg filter. New user_guide in html next to executable file.