False detection when there is no audio #60

shekit · 2016-04-12T03:01:10Z

Hi,

I am using pocketsphinx and it keeps detecting words which are not being spoken. This is visible in the live demo on the website as well. Even in a silent room, it detects and prints out the keywords repeatedly. Is there any way to prevent this? I basically need it to detect a single word and have built my keyword list consisting of the single keyword. However once I press start, it starts printing it out almost continuously even though nothing has been said. It's basically detecting almost any sound as the keyword.

I have edited this in my live_kws.html file to detect a single word.

var wordList = [["PICO", "P IY K OW"]];
var keywords = [{title:"PICO", g:"PICO"}];

The text was updated successfully, but these errors were encountered:

syl22-00 · 2016-04-13T13:39:01Z

You would get more and better answers by reaching out to the pocketsphinx community directly (http://cmusphinx.sourceforge.net/).

For keyword spotting, there are a few parameters you can play with:

-keyphrase              Keyphrase to spot
-kws                    A file with keyphrases to spot, one per line
-kws_delay      10      Delay to wait for best detection score
-kws_plp        1e-1        Phone loop probability for keyword spotting
-kws_threshold      1       Threshold for p(hyp)/p(alternatives) ratio

justinoverton · 2016-05-08T01:03:50Z

I experienced this too. I updated the audioRecord.js file to also output whatever is getting passed to sphinx to the speakers. See below:

    var jolisten = new (window.AudioContext || window.webkitAudioContext)();
    var jobuf = jolisten.createBuffer(1, outputBufferLength, (config.outputSampleRate || 16000));

    worker.onmessage = function(e) {
        if (e.data.error && (e.data.error == "silent")) errorCallback("silent");
        if ((e.data.command == 'newBuffer') && recording) {
        myClosure.consumers.forEach(function(consumer, y, z) {
                    consumer.postMessage({ command: 'process', data: e.data.data });
        });

        //S remove this

        var nowbuf = jobuf.getChannelData(0);
        for(var i=0; i<e.data.data.length; i++) {
            var k = e.data.data[i];
            //This supposedly converts it back to float, but it doesn't matter if you do it or not for the playback
            var f = (k >= 0x8000) ? -(0x10000 - k) / 0x8000 : k / 0x7FFF;
            nowbuf[i] = k;
        }

        var josrc = jolisten.createBufferSource();
        josrc.buffer = jobuf;
        josrc.connect(jolisten.destination);
        josrc.start();
        //E remove this

        }
    };

After much experimentation I've discovered that a part of the conversion from microphone's higher sampling rate to the 16000hz is partly to blame. Specifically the part that converts the Float32 from javascript to the Int16 that sphinx wants:

It looks like this in audioRecorderWorker.js in method record():

    for (var i = 0 ; i < inputBuffer[0].length ; i++) {
    recBuffers.push((inputBuffer[0][i] + inputBuffer[1][i]) * 16383.0);
    }

Basically there's a bunch of loud white-noise in the audio that's getting passed to sphinx. I don't know enough about audio yet to know exactly what to do, but I think maybe a highpass and/or lowpass filter might help.

FYI: If you use the snippet to hear what's coming out of the microphone you need to use headphones. The reverb will be deafening otherwise.

justinoverton · 2016-05-08T03:00:15Z

I created a pull request that graphs the wave form and enables the ability to listen to what is passed to sphinx.

justinoverton · 2016-05-10T02:55:42Z

I have determined that a lowpass filter of 800hz and a highpass filter of 50hz does reduce some of the background noise. However sphinx is still recognizing random words even when there is no speech. When there is speech it recognizes whatever it wants to. It doesn't matter if it's in the normal mode or the keyword spotting mode.

I've tried adjusting the operating system's output levels for the mic, but that doesn't help either. I've tried using the cmusphinx acoustic model, lm, and dict but it doesn't help either.

I'm at a loss for what to do next.

nshmyrev · 2016-05-10T09:09:47Z

Justin, cmusphinx uses a bandwidth between 100 and 6800 Hz, it also tries to repair from filters but overall any signal processing is usually harmful for accuracy.

To debug pocketsphinx keyword spotting the tutorial recommends you to record a file and play with pocketsphinx_continuous on desktop to get a reliable recognition. You need to select a keyphrase of 3-4 syllables for reliable detection and you need to configure the threshold appropriately. You can share the recorded file if you have troubles.

Once you have a reliable detection in command line, you can proceed with the javascript version.

justinoverton · 2016-05-10T12:38:22Z

Nickolay,

Thanks for the info. I thought about the keyphrase threshold, but I'm experiencing the issue without keywords as well. The issue is occurring on the examples for this project. Is anyone able to confirm that the default example "live.html" works as expected on a specific machine?

Tonight I'll play around with some tuning parameters on the command line.

It would be nice if there was a known working model, lm, etc and the cmu args that would enable a dev to test the feasibility of sphinx for his/her project prior to investing a lot of time into building and tuning a grammar/lm/etc.

justgeek · 2016-06-14T11:58:01Z

Decreasing microphone boost from my windows control panel, so I think this is definitely noise issue that is being processed, but the question is how can you process noise to recognized words, is not there a confidence factor ?

nshmyrev · 2016-06-14T15:27:08Z

@justgeek You need to provide more details - configuration, keywords, thresholds, audio data in order to get help with detection. It is better to ask that on cmusphinx forum, not here.

seekM · 2016-08-29T08:52:59Z

@justinoverton I'd be interested to know if you could make progress and maybe share your insights.

justinoverton · 2016-08-29T11:18:51Z

@seekM I think it may be an issue where training the model could help. I'm not working on this at the moment though.

Sent from my iPhone

On Aug 29, 2016, at 3:53 AM, seekM notifications@github.com wrote:

@justinoverton I'd be interested to know if you could make progress and maybe share your insights.

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or mute the thread.

jenweber · 2016-12-04T19:16:19Z

For anyone experiencing many false positives/low detection accuracy with keyword search, try grabbing a fresh copy of the minified pocketsphinx file in .pocketsphinx.js/webapp/js/pocketsphinx.js. I believe that older versions were missing essential components for the keyword search detection threshold variable to work ( -kws_threshold ). As of commit id 67cf722 and adjusting the variable syntax to something like "1e-35" instead of whole numbers, hotword detection was working great for me with very few false positives. When I was using an older copy of the file, I had poor hotword detection yet it would randomly "hear" the hotword in just about any sound.

Check your console for these errors to confirm if your issue is the same as mine. Filter logs by "kws". This is the sign that you need to grab a new file:

ERROR: "cmd_ln.c", line 938: Unknown argument: -kws_threshold

And a closer look may show:
INFO: kws_search.c(405): KWS(beam: -1080, plp: -23, default threshold -524288, delay 10)

A threshold of -524288 would be unbelievably permissive, allowing just about any random noises o be interpreted as the keyword. The useful range of variables appears to be something between "1e-50" which is permissive, through "1e-0" which would be very strict. The documentation about this feature on the CMUSphinx site itself is very poor so I just had to play around with it.

syl22-00 added the question label Apr 13, 2016

justinoverton mentioned this issue May 8, 2016

added some debug capabilities #61

Closed

This was referenced Dec 4, 2016

background noises #74

Closed

Increasing accuracy with multiple keyphrases #65

Open

Fine tune keyword detection demo and documentation #77

Merged

skibulk mentioned this issue Dec 5, 2020

Recognizing Background Noise #89

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

False detection when there is no audio #60

False detection when there is no audio #60

shekit commented Apr 12, 2016

syl22-00 commented Apr 13, 2016

justinoverton commented May 8, 2016

justinoverton commented May 8, 2016

justinoverton commented May 10, 2016

nshmyrev commented May 10, 2016

justinoverton commented May 10, 2016

justgeek commented Jun 14, 2016

nshmyrev commented Jun 14, 2016

seekM commented Aug 29, 2016

justinoverton commented Aug 29, 2016

jenweber commented Dec 4, 2016

False detection when there is no audio #60

False detection when there is no audio #60

Comments

shekit commented Apr 12, 2016

syl22-00 commented Apr 13, 2016

justinoverton commented May 8, 2016

justinoverton commented May 8, 2016

justinoverton commented May 10, 2016

nshmyrev commented May 10, 2016

justinoverton commented May 10, 2016

justgeek commented Jun 14, 2016

nshmyrev commented Jun 14, 2016

seekM commented Aug 29, 2016

justinoverton commented Aug 29, 2016

jenweber commented Dec 4, 2016