High-quality Turkish Dizi transcription – issues with segmentation, alignment, and diarization #2762

abnatan39-dot · 2026-04-16T16:12:00Z

abnatan39-dot
Apr 16, 2026

I’m working on building a pipeline to transcribe Turkish TV series (Dizi) and generate high-quality, well-synchronized Turkish SRT subtitles.
My goal is to reach a very high level of accuracy in terms of:

Timing (precise sync with speech)
Sentence segmentation (natural subtitle breaks)
Speaker separation

Current pipeline:

Extract audio from video using FFmpeg
Separate audio into vocals and background (non-vocals)
Run WhisperX transcription on the vocals track
Perform alignment

Transcription model:

WhisperX with large-v3

Alignment models tested:

ozcangundes/wav2vec2-large-xlsr-53-turkish
mpoyraz/wav2vec2-xls-r-300m-cv7-turkish
Cosmobillian/turkish_whisper_for_noisy_datas_v1

I also tested with and without diarization.

Issues I’m encountering:

Poor silence-based segmentation
WhisperX does not split segments properly on pauses. Long chunks of speech remain merged even when there are clear silences.
Multiple speakers in the same subtitle line
Different speakers are often grouped into a single subtitle line instead of being separated.
Broken subtitle flow between speakers
In some cases, one speaker starts within another speaker’s subtitle line, causing the text to break unnaturally into the next subtitle.
Diarization not effective enough
Enabling diarization does not significantly improve speaker separation.

Goal:

Produce clean, professional-grade Turkish SRT subtitles with:

Accurate timing and alignment
Natural sentence splitting
Clear speaker separation (ideally one speaker per subtitle block)

Questions:

Are there recommended configurations or preprocessing steps specifically for Turkish (Dizi-style content)?
Which alignment model gives the best results for Turkish in WhisperX?
How can I improve silence-based segmentation?
Are there best practices for combining WhisperX with diarization to achieve reliable speaker separation?

Thanks in advance for your help 🙏

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

High-quality Turkish Dizi transcription – issues with segmentation, alignment, and diarization #2762

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

High-quality Turkish Dizi transcription – issues with segmentation, alignment, and diarization #2762

Uh oh!

Uh oh!

abnatan39-dot Apr 16, 2026

Replies: 0 comments

abnatan39-dot
Apr 16, 2026