--diarize labels everything (speaker ?) #489

SLong97 · 2023-02-09T23:39:52Z

The following command yields the results snippet below and I was wondering if anyone could provide insight as to why?
(using Windows x64 executable)

$ ./main -m models/ggml-tiny.bin -f audio/The-Big-Tech-Show_Dropbox-CEO.wav --diarize

Also why does it repeat the one sentence for a minute and a half?

[00:09:00.000 --> 00:09:10.000]  (speaker ?) Do you say you're more or less likely or there's no difference in advancing the career giving a promotion to somebody who you don't meet and see regularly?
[00:09:10.000 --> 00:09:23.000]  (speaker ?) I'd say, so to the extent the promotions are based on FaceTime, then clearly folks that have any imbalances there are harmful to equity.
[00:09:23.000 --> 00:09:29.000]  (speaker ?) But that said, you know, FaceTime probably shouldn't be using some other process.
[00:09:29.000 --> 00:09:33.000]  (speaker ?) Then just how much have you been physically together with someone.
[00:09:33.000 --> 00:09:43.000]  (speaker ?) I think one of the great things about the distributed world is some of the ways that it does level the playing field is, you know, I'm the CEO of the company.
[00:09:43.000 --> 00:09:47.000]  (speaker ?) But my tile is not any bigger than anyone else's.
[00:09:47.000 --> 00:09:55.000]  (speaker ?) I think we were all learned a lot before the pandemic about how little subtle biases or where you set it a table.
[00:09:55.000 --> 00:10:02.000]  (speaker ?) You know, what you interrupt people and all these little kind of micro patterns actually have a big effect on how you're perceived.
[00:10:02.000 --> 00:10:05.000]  (speaker ?) I think that's a great way to do that.
[00:10:05.000 --> 00:10:08.000]  (speaker ?) I think that's a great way to do that.
[00:10:08.000 --> 00:10:11.000]  (speaker ?) I think that's a great way to do that.
[00:10:11.000 --> 00:10:14.000]  (speaker ?) I think that's a great way to do that.
[00:10:14.000 --> 00:10:17.000]  (speaker ?) I think that's a great way to do that.
[00:10:17.000 --> 00:10:20.000]  (speaker ?) I think that's a great way to do that.
[00:10:20.000 --> 00:10:23.000]  (speaker ?) I think that's a great way to do that.
[00:10:23.000 --> 00:10:26.000]  (speaker ?) I think that's a great way to do that.
[00:10:26.000 --> 00:10:29.000]  (speaker ?) I think that's a great way to do that.
[00:10:29.000 --> 00:10:32.000]  (speaker ?) I think that's a great way to do that.
[00:10:32.000 --> 00:10:35.000]  (speaker ?) I think that's a great way to do that.

Below is the ffmpeg command I use when converting an MP3 file to a WAV Stereo 16khz file

command = [
    "ffmpeg",
    "-i", input_file,
    "-ac", "2", # stereo
    "-ar", "16000", # 16kHz
    "-acodec", "pcm_s16le",
    output_file
]

The text was updated successfully, but these errors were encountered:

strangelearning · 2023-02-14T15:58:04Z

As a complete novice, I wonder if one can simply "convert" a mono (1 channel) into stereo audio file.

I just ran into the same issue with speaker ? for each speaker. Going to try and download a "natively" stereo audio file and then try that.

strangelearning · 2023-02-14T18:27:40Z

I'm aware of that ffmpeg command, but it would seem weird if that actually worked, and in fact, does not work for me. After running it, I get (? speaker) for each line of transcription unfortunately.

…

Message ID: ***@***.***>

ggerganov · 2023-02-15T18:15:13Z

The existing --diarize option is designed only for stereo recordings where one speaker speaks in channel 1 and the other in channel 2. This is very basic strategy and will not work in the general case.

Converting mono to stereo will not work.
Better diarization might be available in the future.

strangelearning · 2023-02-23T21:40:16Z

The existing --diarize option is designed only for stereo recordings where one speaker speaks in channel 1 and the other in channel 2. This is very basic strategy and will not work in the general case.

Converting mono to stereo will not work. Better diarization might be available in the future.

Thanks for addressing our concerns and for all of your work on this. It is truly appreciated 🙏

albino1 mentioned this issue Feb 10, 2023

Potential whisper.cpp GPU support via the Const-me Windows Implementation SubtitleEdit/subtitleedit#6651

Closed

ggerganov added the question Further information is requested label Feb 15, 2023

ggerganov closed this as completed Feb 15, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

--diarize labels everything (speaker ?) #489

--diarize labels everything (speaker ?) #489

SLong97 commented Feb 9, 2023

strangelearning commented Feb 14, 2023

strangelearning commented Feb 14, 2023 via email

ggerganov commented Feb 15, 2023

strangelearning commented Feb 23, 2023

--diarize labels everything (speaker ?) #489

--diarize labels everything (speaker ?) #489

Comments

SLong97 commented Feb 9, 2023

strangelearning commented Feb 14, 2023

strangelearning commented Feb 14, 2023 via email

ggerganov commented Feb 15, 2023

strangelearning commented Feb 23, 2023