Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

False detection when there is no audio #60

Open
shekit opened this issue Apr 12, 2016 · 11 comments
Open

False detection when there is no audio #60

shekit opened this issue Apr 12, 2016 · 11 comments
Labels

Comments

@shekit
Copy link

shekit commented Apr 12, 2016

Hi,

I am using pocketsphinx and it keeps detecting words which are not being spoken. This is visible in the live demo on the website as well. Even in a silent room, it detects and prints out the keywords repeatedly. Is there any way to prevent this? I basically need it to detect a single word and have built my keyword list consisting of the single keyword. However once I press start, it starts printing it out almost continuously even though nothing has been said. It's basically detecting almost any sound as the keyword.

I have edited this in my live_kws.html file to detect a single word.

var wordList = [["PICO", "P IY K OW"]];
var keywords = [{title:"PICO", g:"PICO"}];
@syl22-00
Copy link
Owner

You would get more and better answers by reaching out to the pocketsphinx community directly (http://cmusphinx.sourceforge.net/).

For keyword spotting, there are a few parameters you can play with:

-keyphrase              Keyphrase to spot
-kws                    A file with keyphrases to spot, one per line
-kws_delay      10      Delay to wait for best detection score
-kws_plp        1e-1        Phone loop probability for keyword spotting
-kws_threshold      1       Threshold for p(hyp)/p(alternatives) ratio

@justinoverton
Copy link

I experienced this too. I updated the audioRecord.js file to also output whatever is getting passed to sphinx to the speakers. See below:

    var jolisten = new (window.AudioContext || window.webkitAudioContext)();
    var jobuf = jolisten.createBuffer(1, outputBufferLength, (config.outputSampleRate || 16000));

    worker.onmessage = function(e) {
        if (e.data.error && (e.data.error == "silent")) errorCallback("silent");
        if ((e.data.command == 'newBuffer') && recording) {
        myClosure.consumers.forEach(function(consumer, y, z) {
                    consumer.postMessage({ command: 'process', data: e.data.data });
        });

        //S remove this

        var nowbuf = jobuf.getChannelData(0);
        for(var i=0; i<e.data.data.length; i++) {
            var k = e.data.data[i];
            //This supposedly converts it back to float, but it doesn't matter if you do it or not for the playback
            var f = (k >= 0x8000) ? -(0x10000 - k) / 0x8000 : k / 0x7FFF;
            nowbuf[i] = k;
        }

        var josrc = jolisten.createBufferSource();
        josrc.buffer = jobuf;
        josrc.connect(jolisten.destination);
        josrc.start();
        //E remove this

        }
    };

After much experimentation I've discovered that a part of the conversion from microphone's higher sampling rate to the 16000hz is partly to blame. Specifically the part that converts the Float32 from javascript to the Int16 that sphinx wants:

It looks like this in audioRecorderWorker.js in method record():

    for (var i = 0 ; i < inputBuffer[0].length ; i++) {
    recBuffers.push((inputBuffer[0][i] + inputBuffer[1][i]) * 16383.0);
    }

Basically there's a bunch of loud white-noise in the audio that's getting passed to sphinx. I don't know enough about audio yet to know exactly what to do, but I think maybe a highpass and/or lowpass filter might help.

FYI: If you use the snippet to hear what's coming out of the microphone you need to use headphones. The reverb will be deafening otherwise.

@justinoverton
Copy link

I created a pull request that graphs the wave form and enables the ability to listen to what is passed to sphinx.

@justinoverton
Copy link

I have determined that a lowpass filter of 800hz and a highpass filter of 50hz does reduce some of the background noise. However sphinx is still recognizing random words even when there is no speech. When there is speech it recognizes whatever it wants to. It doesn't matter if it's in the normal mode or the keyword spotting mode.

I've tried adjusting the operating system's output levels for the mic, but that doesn't help either. I've tried using the cmusphinx acoustic model, lm, and dict but it doesn't help either.

I'm at a loss for what to do next.

@nshmyrev
Copy link

Justin, cmusphinx uses a bandwidth between 100 and 6800 Hz, it also tries to repair from filters but overall any signal processing is usually harmful for accuracy.

To debug pocketsphinx keyword spotting the tutorial recommends you to record a file and play with pocketsphinx_continuous on desktop to get a reliable recognition. You need to select a keyphrase of 3-4 syllables for reliable detection and you need to configure the threshold appropriately. You can share the recorded file if you have troubles.

Once you have a reliable detection in command line, you can proceed with the javascript version.

@justinoverton
Copy link

Nickolay,

Thanks for the info. I thought about the keyphrase threshold, but I'm experiencing the issue without keywords as well. The issue is occurring on the examples for this project. Is anyone able to confirm that the default example "live.html" works as expected on a specific machine?

Tonight I'll play around with some tuning parameters on the command line.

It would be nice if there was a known working model, lm, etc and the cmu args that would enable a dev to test the feasibility of sphinx for his/her project prior to investing a lot of time into building and tuning a grammar/lm/etc.

@justgeek
Copy link

Decreasing microphone boost from my windows control panel, so I think this is definitely noise issue that is being processed, but the question is how can you process noise to recognized words, is not there a confidence factor ?

@nshmyrev
Copy link

@justgeek You need to provide more details - configuration, keywords, thresholds, audio data in order to get help with detection. It is better to ask that on cmusphinx forum, not here.

@seekM
Copy link

seekM commented Aug 29, 2016

@justinoverton I'd be interested to know if you could make progress and maybe share your insights.

@justinoverton
Copy link

@seekM I think it may be an issue where training the model could help. I'm not working on this at the moment though.

Sent from my iPhone

On Aug 29, 2016, at 3:53 AM, seekM notifications@github.com wrote:

@justinoverton I'd be interested to know if you could make progress and maybe share your insights.


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or mute the thread.

@jenweber
Copy link
Contributor

jenweber commented Dec 4, 2016

For anyone experiencing many false positives/low detection accuracy with keyword search, try grabbing a fresh copy of the minified pocketsphinx file in .pocketsphinx.js/webapp/js/pocketsphinx.js. I believe that older versions were missing essential components for the keyword search detection threshold variable to work ( -kws_threshold ). As of commit id 67cf722 and adjusting the variable syntax to something like "1e-35" instead of whole numbers, hotword detection was working great for me with very few false positives. When I was using an older copy of the file, I had poor hotword detection yet it would randomly "hear" the hotword in just about any sound.

Check your console for these errors to confirm if your issue is the same as mine. Filter logs by "kws". This is the sign that you need to grab a new file:

ERROR: "cmd_ln.c", line 938: Unknown argument: -kws_threshold

And a closer look may show:
INFO: kws_search.c(405): KWS(beam: -1080, plp: -23, default threshold -524288, delay 10)

A threshold of -524288 would be unbelievably permissive, allowing just about any random noises o be interpreted as the keyword. The useful range of variables appears to be something between "1e-50" which is permissive, through "1e-0" which would be very strict. The documentation about this feature on the CMUSphinx site itself is very poor so I just had to play around with it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

7 participants