-
-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
No speaker labels in txt format with diarization enabled #801
Comments
In case anyone is interested...
|
Can you please elaborate on text format? Are you using whisperx on command line or as python library? Could you share example snippet and what are your conclusions about this? |
@SeeknnDestroy,
The python that I included above is the workaround I implemented for the time being, which take the "srt" format and simplifies it to a "txt" format that includes SPEAKER labels, like the example I included above. |
@veenified Currently WriteTXT class writes only the transcripts to the file. We can modify it as follows, class WriteTXT(ResultWriter):
extension: str = "txt"
def write_result(self, result: dict, file: TextIO, options: dict):
for segment in result["segments"]:
start = format_timestamp(segment["start"])
end = format_timestamp(segment["end"])
speaker = segment.get("speaker", "Unknown")
text = segment["text"].strip()
print(f"{start}\t{end}\t{speaker}\t{text}", file=file, flush=True) The output |
@nkilm This works great! I would suggest keying off the I tried to do this and submit a pull request, but I am failing to pass the diarization flag/parameter through to utils.py as an option. |
Please can you explain more about what you trying to achieve?
Is the PR still open? I'll see if I can help. |
Hi, I also use Example:
|
I have been using WhisperX for transcribing multi-speaker audio files and I enabled diarization to distinguish between different speakers. However, I noticed that the TXT format output does not include speaker labels, which are crucial for my use case to identify who is speaking at any given time.
Could you provide some insights on why the speaker labels are missing in the TXT output when diarization is enabled? Is this an intended behavior or a potential oversight? Additionally, if this feature is not currently supported, are there any plans to include speaker labels in future updates of the TXT format output?
Thank you for your assistance and for the great work on this tool!
PS: Here's a sample command if it helps...
whisperx --hf_token <your_hf_token> --print_progress True --language en --diarize --compute_type int8 voice_chat.mp4 -o ~/transcriptions -f txt --min_speakers 4 --max_speakers 12
The text was updated successfully, but these errors were encountered: