features requests when conducting foreign language translation transcripts -- diarization (diarisation), unique names & multiple output options #1849
tapearchives
started this conversation in
Ideas
Replies: 1 comment
-
@tapearchives you can use python package created by me: https://pypi.org/project/speechlib/ this package can do transcription, speaker diarization and speaker recognition all together and give a transcript with actual speaker names! |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
If you are implementing any foreign language speaker identification, please figure out a way to get Whisper to do both the original language and the translated language, and to uniquely identify each speaker, and lastly, to output one file with all the multiple languages heard, but also to provide an option to split all unique speakers (or languages) into distinct output files.
When dealing with and English speaker and a foreign translator, or vice-versa, or with a panel of the same...
Of course it's critical to be able to identify unique speakers. Speaker diarization is very important.
A nice polishing feature for diarization would use a post-transcription call to a LLM to relabel "Speaker 1" to the actual person's name. Prior to the saving of the final text file a call could be made to a LLM that includes the full transcript in order to try to find all individual speakers' names and titles. This speaker label name or title is often given at the introduction in a recording, or possibly used in during a discussion. Using a speaker's name or title instead of "Speaker 1" would increase the value of the transcript significantly.
Also, very useful output is to get the full back-and-forth transcription of a speaker and their translator, but also to have the option to transcribe only one person's output, or all individual's transcriptions to separate output files, e.g. "Recording XYZ - Speaker 1", "Recording XYZ - Speaker 2", etc.
(Output with a multiple language translation could be useful in training additional translation LLM's since you'd have a native speaker translating the text.)
Beta Was this translation helpful? Give feedback.
All reactions