Real-time identification of microphone has no result. #155

eugeneYz · 2024-01-25T09:21:05Z

I used a microphone to record and converted the PCM data into a stream in WAV format, which was input into the ProcessAsync method,but stuck at this step all along:await foreach (var result in processor.ProcessAsync(waveStream)).
Unable to obtain result, I suspect it is due to a short recording time, and the totaltime of audio data is only 100ms

            {
                nAudioHelper.PcmDataAvailable += async (data) =>
                {
                    try
                    {
                        if (data != null && data.Length > 0)
                        {
                            pcmData = data;
                            MemoryStream outStream = new();
                            MemoryStream memoryStream = new MemoryStream(pcmData);
                            WaveFormat waveFormat = new WaveFormat(16000, 16, 1); 
                            //WaveStream waveStream = new RawSourceWaveStream(pcmData, 0, pcmData.Length, new WaveFormat(16000, 16, 1));
                            WaveStream waveStream = new RawSourceWaveStream(memoryStream, waveFormat);
                            var pcmStream = WaveFormatConversionStream.CreatePcmStream(waveStream);
                            var resampler = new WdlResamplingSampleProvider(pcmStream.ToSampleProvider(), 16000);

                            WaveFileWriter.WriteWavFileToStream(outStream, resampler.ToWaveProvider16());
                            outStream.Seek(0, SeekOrigin.Begin);

                            await foreach (var result in processor.ProcessAsync(waveStream))
                            {
                                string recognizedText = result.Text;
                                Application.Current.Dispatcher.Invoke(() =>
                                {
                                    Value += recognizedText;
                                });
                            }

The text was updated successfully, but these errors were encountered:

sandrohanea · 2024-02-03T15:24:07Z

There are multiple issues with this approach.

Currently, we don't support real-time identification reliably as described here: How to handle real-time sound streams #25
If you provide a small chunk of audio (let's say half a second) and run ProcessAsync => it will be normal that Whisper won't recognize anything as probably there was no word inside half a second. Also whisper.cpp have some limit on minimum duration of 1second: SIGFPE on certain audio files ggerganov/whisper.cpp#39

sandrohanea closed this as completed Feb 3, 2024

sandrohanea added the duplicate This issue or pull request already exists label Feb 3, 2024

zhouwg mentioned this issue Mar 13, 2024

PoC:clean-room implementation of real-time AI subtitle for English online-TV(OTT TV) zhouwg/kantv#64

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Real-time identification of microphone has no result. #155

Real-time identification of microphone has no result. #155

eugeneYz commented Jan 25, 2024

sandrohanea commented Feb 3, 2024

Real-time identification of microphone has no result. #155

Real-time identification of microphone has no result. #155

Comments

eugeneYz commented Jan 25, 2024

sandrohanea commented Feb 3, 2024