Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Real-time identification of microphone has no result. #155

Closed
eugeneYz opened this issue Jan 25, 2024 · 1 comment
Closed

Real-time identification of microphone has no result. #155

eugeneYz opened this issue Jan 25, 2024 · 1 comment
Labels
duplicate This issue or pull request already exists

Comments

@eugeneYz
Copy link

I used a microphone to record and converted the PCM data into a stream in WAV format, which was input into the ProcessAsync method,but stuck at this step all along:await foreach (var result in processor.ProcessAsync(waveStream)).
Unable to obtain result, I suspect it is due to a short recording time, and the totaltime of audio data is only 100ms

            {
                nAudioHelper.PcmDataAvailable += async (data) =>
                {
                    try
                    {
                        if (data != null && data.Length > 0)
                        {
                            pcmData = data;
                            MemoryStream outStream = new();
                            MemoryStream memoryStream = new MemoryStream(pcmData);
                            WaveFormat waveFormat = new WaveFormat(16000, 16, 1); 
                            //WaveStream waveStream = new RawSourceWaveStream(pcmData, 0, pcmData.Length, new WaveFormat(16000, 16, 1));
                            WaveStream waveStream = new RawSourceWaveStream(memoryStream, waveFormat);
                            var pcmStream = WaveFormatConversionStream.CreatePcmStream(waveStream);
                            var resampler = new WdlResamplingSampleProvider(pcmStream.ToSampleProvider(), 16000);

                            WaveFileWriter.WriteWavFileToStream(outStream, resampler.ToWaveProvider16());
                            outStream.Seek(0, SeekOrigin.Begin);

                            await foreach (var result in processor.ProcessAsync(waveStream))
                            {
                                string recognizedText = result.Text;
                                Application.Current.Dispatcher.Invoke(() =>
                                {
                                    Value += recognizedText;
                                });
                            }
@sandrohanea
Copy link
Owner

There are multiple issues with this approach.

  1. Currently, we don't support real-time identification reliably as described here: How to handle real-time sound streams #25

  2. If you provide a small chunk of audio (let's say half a second) and run ProcessAsync => it will be normal that Whisper won't recognize anything as probably there was no word inside half a second. Also whisper.cpp have some limit on minimum duration of 1second: SIGFPE on certain audio files ggerganov/whisper.cpp#39

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
duplicate This issue or pull request already exists
Projects
None yet
Development

No branches or pull requests

2 participants