-
Notifications
You must be signed in to change notification settings - Fork 261
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
False detection when there is no audio #60
Comments
You would get more and better answers by reaching out to the pocketsphinx community directly (http://cmusphinx.sourceforge.net/). For keyword spotting, there are a few parameters you can play with:
|
I experienced this too. I updated the audioRecord.js file to also output whatever is getting passed to sphinx to the speakers. See below: var jolisten = new (window.AudioContext || window.webkitAudioContext)();
var jobuf = jolisten.createBuffer(1, outputBufferLength, (config.outputSampleRate || 16000));
worker.onmessage = function(e) {
if (e.data.error && (e.data.error == "silent")) errorCallback("silent");
if ((e.data.command == 'newBuffer') && recording) {
myClosure.consumers.forEach(function(consumer, y, z) {
consumer.postMessage({ command: 'process', data: e.data.data });
});
//S remove this
var nowbuf = jobuf.getChannelData(0);
for(var i=0; i<e.data.data.length; i++) {
var k = e.data.data[i];
//This supposedly converts it back to float, but it doesn't matter if you do it or not for the playback
var f = (k >= 0x8000) ? -(0x10000 - k) / 0x8000 : k / 0x7FFF;
nowbuf[i] = k;
}
var josrc = jolisten.createBufferSource();
josrc.buffer = jobuf;
josrc.connect(jolisten.destination);
josrc.start();
//E remove this
}
}; After much experimentation I've discovered that a part of the conversion from microphone's higher sampling rate to the 16000hz is partly to blame. Specifically the part that converts the Float32 from javascript to the Int16 that sphinx wants: It looks like this in audioRecorderWorker.js in method record(): for (var i = 0 ; i < inputBuffer[0].length ; i++) {
recBuffers.push((inputBuffer[0][i] + inputBuffer[1][i]) * 16383.0);
} Basically there's a bunch of loud white-noise in the audio that's getting passed to sphinx. I don't know enough about audio yet to know exactly what to do, but I think maybe a highpass and/or lowpass filter might help. FYI: If you use the snippet to hear what's coming out of the microphone you need to use headphones. The reverb will be deafening otherwise. |
I created a pull request that graphs the wave form and enables the ability to listen to what is passed to sphinx. |
I have determined that a lowpass filter of 800hz and a highpass filter of 50hz does reduce some of the background noise. However sphinx is still recognizing random words even when there is no speech. When there is speech it recognizes whatever it wants to. It doesn't matter if it's in the normal mode or the keyword spotting mode. I've tried adjusting the operating system's output levels for the mic, but that doesn't help either. I've tried using the cmusphinx acoustic model, lm, and dict but it doesn't help either. I'm at a loss for what to do next. |
Justin, cmusphinx uses a bandwidth between 100 and 6800 Hz, it also tries to repair from filters but overall any signal processing is usually harmful for accuracy. To debug pocketsphinx keyword spotting the tutorial recommends you to record a file and play with pocketsphinx_continuous on desktop to get a reliable recognition. You need to select a keyphrase of 3-4 syllables for reliable detection and you need to configure the threshold appropriately. You can share the recorded file if you have troubles. Once you have a reliable detection in command line, you can proceed with the javascript version. |
Nickolay, Thanks for the info. I thought about the keyphrase threshold, but I'm experiencing the issue without keywords as well. The issue is occurring on the examples for this project. Is anyone able to confirm that the default example "live.html" works as expected on a specific machine? Tonight I'll play around with some tuning parameters on the command line. It would be nice if there was a known working model, lm, etc and the cmu args that would enable a dev to test the feasibility of sphinx for his/her project prior to investing a lot of time into building and tuning a grammar/lm/etc. |
Decreasing microphone boost from my windows control panel, so I think this is definitely noise issue that is being processed, but the question is how can you process noise to recognized words, is not there a confidence factor ? |
@justgeek You need to provide more details - configuration, keywords, thresholds, audio data in order to get help with detection. It is better to ask that on cmusphinx forum, not here. |
@justinoverton I'd be interested to know if you could make progress and maybe share your insights. |
@seekM I think it may be an issue where training the model could help. I'm not working on this at the moment though. Sent from my iPhone
|
For anyone experiencing many false positives/low detection accuracy with keyword search, try grabbing a fresh copy of the minified pocketsphinx file in .pocketsphinx.js/webapp/js/pocketsphinx.js. I believe that older versions were missing essential components for the keyword search detection threshold variable to work ( Check your console for these errors to confirm if your issue is the same as mine. Filter logs by "kws". This is the sign that you need to grab a new file:
And a closer look may show: A threshold of -524288 would be unbelievably permissive, allowing just about any random noises o be interpreted as the keyword. The useful range of variables appears to be something between "1e-50" which is permissive, through "1e-0" which would be very strict. The documentation about this feature on the CMUSphinx site itself is very poor so I just had to play around with it. |
Hi,
I am using pocketsphinx and it keeps detecting words which are not being spoken. This is visible in the live demo on the website as well. Even in a silent room, it detects and prints out the keywords repeatedly. Is there any way to prevent this? I basically need it to detect a single word and have built my keyword list consisting of the single keyword. However once I press start, it starts printing it out almost continuously even though nothing has been said. It's basically detecting almost any sound as the keyword.
I have edited this in my live_kws.html file to detect a single word.
The text was updated successfully, but these errors were encountered: