Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

--diarize labels everything (speaker ?) #489

Closed
SLong97 opened this issue Feb 9, 2023 · 4 comments
Closed

--diarize labels everything (speaker ?) #489

SLong97 opened this issue Feb 9, 2023 · 4 comments
Labels
question Further information is requested

Comments

@SLong97
Copy link

SLong97 commented Feb 9, 2023

The following command yields the results snippet below and I was wondering if anyone could provide insight as to why?
(using Windows x64 executable)

$ ./main -m models/ggml-tiny.bin -f audio/The-Big-Tech-Show_Dropbox-CEO.wav --diarize

Also why does it repeat the one sentence for a minute and a half?

[00:09:00.000 --> 00:09:10.000]  (speaker ?) Do you say you're more or less likely or there's no difference in advancing the career giving a promotion to somebody who you don't meet and see regularly?
[00:09:10.000 --> 00:09:23.000]  (speaker ?) I'd say, so to the extent the promotions are based on FaceTime, then clearly folks that have any imbalances there are harmful to equity.
[00:09:23.000 --> 00:09:29.000]  (speaker ?) But that said, you know, FaceTime probably shouldn't be using some other process.
[00:09:29.000 --> 00:09:33.000]  (speaker ?) Then just how much have you been physically together with someone.
[00:09:33.000 --> 00:09:43.000]  (speaker ?) I think one of the great things about the distributed world is some of the ways that it does level the playing field is, you know, I'm the CEO of the company.
[00:09:43.000 --> 00:09:47.000]  (speaker ?) But my tile is not any bigger than anyone else's.
[00:09:47.000 --> 00:09:55.000]  (speaker ?) I think we were all learned a lot before the pandemic about how little subtle biases or where you set it a table.
[00:09:55.000 --> 00:10:02.000]  (speaker ?) You know, what you interrupt people and all these little kind of micro patterns actually have a big effect on how you're perceived.
[00:10:02.000 --> 00:10:05.000]  (speaker ?) I think that's a great way to do that.
[00:10:05.000 --> 00:10:08.000]  (speaker ?) I think that's a great way to do that.
[00:10:08.000 --> 00:10:11.000]  (speaker ?) I think that's a great way to do that.
[00:10:11.000 --> 00:10:14.000]  (speaker ?) I think that's a great way to do that.
[00:10:14.000 --> 00:10:17.000]  (speaker ?) I think that's a great way to do that.
[00:10:17.000 --> 00:10:20.000]  (speaker ?) I think that's a great way to do that.
[00:10:20.000 --> 00:10:23.000]  (speaker ?) I think that's a great way to do that.
[00:10:23.000 --> 00:10:26.000]  (speaker ?) I think that's a great way to do that.
[00:10:26.000 --> 00:10:29.000]  (speaker ?) I think that's a great way to do that.
[00:10:29.000 --> 00:10:32.000]  (speaker ?) I think that's a great way to do that.
[00:10:32.000 --> 00:10:35.000]  (speaker ?) I think that's a great way to do that.

Below is the ffmpeg command I use when converting an MP3 file to a WAV Stereo 16khz file

command = [
    "ffmpeg",
    "-i", input_file,
    "-ac", "2", # stereo
    "-ar", "16000", # 16kHz
    "-acodec", "pcm_s16le",
    output_file
]
@strangelearning
Copy link

As a complete novice, I wonder if one can simply "convert" a mono (1 channel) into stereo audio file.

I just ran into the same issue with speaker ? for each speaker. Going to try and download a "natively" stereo audio file and then try that.

@strangelearning
Copy link

strangelearning commented Feb 14, 2023 via email

@ggerganov ggerganov added the question Further information is requested label Feb 15, 2023
@ggerganov
Copy link
Owner

The existing --diarize option is designed only for stereo recordings where one speaker speaks in channel 1 and the other in channel 2. This is very basic strategy and will not work in the general case.

Converting mono to stereo will not work.
Better diarization might be available in the future.

@strangelearning
Copy link

The existing --diarize option is designed only for stereo recordings where one speaker speaks in channel 1 and the other in channel 2. This is very basic strategy and will not work in the general case.

Converting mono to stereo will not work. Better diarization might be available in the future.

Thanks for addressing our concerns and for all of your work on this. It is truly appreciated 🙏

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

3 participants