-
Notifications
You must be signed in to change notification settings - Fork 20
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Sphinx does not return results #127
Comments
All the errors until the last one are normal:
This is a warning from the unix library which handles the denoising. Nothing to worry about.
This error happened because the speech detection tests were executed before the audio processing tests, thus indeed not noise profiles exist. If you execute the tests again it will work.
This is an error regarding the Sphinx4 and the language model representation. The newer version of RAPP Platform uses the new Sphinx4, as stated in the mail I sent in Sep. 2:
Please do this and try again. If you don't want to bother with the denoise, use this file and state Also please post the exact arguments of the set_denoise_profile and speech_recognition_sphinx calls in order to check if all the args are correct. Finally, it is proposed to use the provided OVA where everything is setup and tested (if everything else fails). |
The arguments are correct, I can see sphinx being invoked and running - I just don't get results. I am now updating sphinx and rebuilding, this is the VM you fixed, so the paths are different: everything is in rapp_platform/rapp-platform-catkin_ws/src/rapp-platform but other than that, everything else works. Once I've rebuilt sphinx, do I need to rebuild rapp-platform? |
Also the tests should be executed as such: If you like, post the arguments as the running Sphinx does not imply that they are correct; it just tries to perform speech recognition. Regarding the new VM I was inferring to the one that contains the v0.3.5 of the RAPP platform, which is uploaded in the RAPP FTP server. Once Sphinx is rebuilt you theoretically don't have to rebuild rapp platform. |
catkin_make without the job parameter is single thread, giving more than 1 job is multithreaded. I rebuild sphinx and re-run the tests, and it seems to be passing all tests, with only one error found:
However, I do get previous warnings and errrors though, but not the exception. The arguments posted are:
I can still see ROS invoking sphinx just fine, but I still get empty results. |
using file microphone_nai.wav with audio_source headset still returns empty results. |
I'm getting somewhere, I got a time-out.
The JSON Reply was:
This was by using file microphone_nai.wav and audio-type headset. |
Well, I've officially broken it! Here's the output from ROS/Rapp_platform:
And the reply:
UPDATE: Shutting down ros nodes and restarting, actually gave a good hint:
|
catkin_make by default uses all cores, its not single threaded, thus you must specify that you want only 1 core. Regarging the error in test its normal, the file does not exist. If you notice the test passes. Now about the parameters:
For the broken issue: silence_wav_d05_a1.wav and nao_wav_d05_a1.wav are 4 channel files, thus you must declare nao_wav_4_ch. Nevertheless, this is a bug, i'll create an issue. One of the tests which you executed an were successful is here: https://github.com/rapp-project/rapp-platform/blob/master/rapp_speech_detection_sphinx4/tests/functional/batch_functional.py#L26 Please try your calls with these parameters and check the results. |
I thought catkin_make invokes make which by default uses a single thread.
I got a new crash:
I'll try with nao_wav_4_ch, but this is starting to make sense: I used set_denoise_profile for a 1 channel, but using a 4 channel file. |
I tried your suggestion: I still get an empty result, but this time it is an array of 3 empty strings, which is really bizarre:
|
OK, I think one of the problems is that I've been using wrong languages with wrong files:
Personally, I think that this is an issue for the platform, it should not be so much parametrised, and/or it should be more safe regarding crashing (I should NOT be able to crash the platform via a rapp-api call). |
I think the parameterization is not a bad thing, the denoising is a devious issue. Nevertheless we are trying to perform a dynamic denoising approach where the noise is not known a priori. Regarding the crashes, I agree that they must be eliminated, which will be after identifying them by debugging! |
I tried using the English language, and OGG:
I still got an empty result: one empty string
Running it a second time got me a time-out:
I also got:
What puzzles me is that it seems to be recognising there are words, but not getting their values. I'm not gonna argue about the parametrization, I just feel that being able to crash the platform by using wrong parameters, will be an issue. At this point, I'd be happy to know why am I getting empty results. |
Using the Greek dictionary with audio_type set to headset and input file microphone_nai.wav I got:
In ROS/Sphinx, and actual JSON:
If I use nao_wav_1_ch, I get a segmentation fault, and if I use nao_wav_4_ch an exception thrown (see previous post). So, I decided to try with another WAV file (T3 "Desire is irreleveant, I am a machine) which you can find here: http://www.terminatorfiles.com/media/audio Interestingly, the output from sphinx suggests it picked a few words:
Then, the file was no longer found (I guess it was deleted before it was processed?)
|
I tried with the rapp user as by default, using the .ogg files for denoising and speech to text. The error persists:
I also get random messages like this one:
I think one of the underlying issues is that it times-out too soon/fast, before it has had the opportunity to finish (bear in my I am running this VM on a dual-core laptop). Furthermore, the first error is very clear: the file was deleted too early. Our latest ortelio commit produces random file names with the extension audio, so there is no need to instantly delete the file from the HOP cache directory. UPDATE New error regarding the same issue:
This is followed by the previous error:
This seems to be transforming an ogg into a wav. Is this due to the extension? |
Trying with the NAO 4 CHANNEL audio source, also throws an exception:
I used user
Trying the same thing with ogg files and audio source, throws many time outs,
|
This is my last post. Trying with "headset" as audio_source, yes-no.wav and email-robot.wav do not get any type of response. Furthermore, in C++ API, I removed (only for now) the random filename generator, and instead I sent all wav files as audio.wav and all ogg files as audio.ogg. This seems to keep the files instead of deleting them. Obviously theres an issue here, as when I send random string filenames (with or without extension) they seem to be getting prematurely deleted. |
Regarding the time out errors: These errors occur because Sphinx4 Java library crashes. It seems like you have an outdated sphinxbase (or Sphinx4). Did you use the latest VM? Regarding the ogg being treated as wav: It should be treated as wav, after being encoded as wav, as Sphinx supports only wav Regarding the error of the missing file, @klpanagi knows more on the removal of the files by the HOP service, so Kostas jump in About the random words you are getting back: Are you sure you are giving a language model (the |
|
I made it to produce the error. The error states as follows:
So the error is produced when the speech_detection ROS-Service takes more than 15sec to respond to HOP-service, three times in a row. So the HOP-Service deletes the file and returns to client, while the relevant ROS-Service still tries to read the file... I dont know if this is an issue at all as the RAPP Platform should NEVER take 15sec to process a speechToText request... |
If this exists on a low-process-power OS instance you can try to raise the timeoutValue here I will include these configuration parameters into properly formatted configuration files on the next merge to the master-branch. |
I am back! ******* Results ********
Obviously speech recognition works when running the tests. I copied the same tests as those under the python tests in rapp platform. 1 - Using: yes-no.wav, with user rapp, en lang, words ["yes","no"] and sentence "yes no", for nao_wav_1_ch audio source, I got:
This should not happen because I've run the set_denoise using both the Python tests and the C++ examples. 2 - Using: recording_sentence1.ogg, with words: ['I', 'want', 'to', 'go', 'out'] and sentence "I want to go out", for user rapp, en language, nao_ogg audio source, I get:
I also got the error:
Which stopped after I run the example a 2nd time. 3 - Using: email-robot.wav, with words: ['email','robot'] and sentence "email robot", for user rapp, en lang, nao_wav_1_ch audio source, I get:
I run this multiple times, always getting the same error 4 - Using recording_no.ogg, user rapp, en lang, words ["yes","no"] and sentences:
Which is the FIRST test to actually work when using C++ 5 - Using_recording_yes.ogg_, with exact same params as above, it also works:
6 - Same with recording_tuesday.ogg, same params as above, just passed as words: ["monday", "tuesday"], I got:
6 - Using: recording_sentence2.ogg, with params as those in python test, I get:
Which is half the sentence. SO, it becomes apparent to me, that something is broken regarding WAV speech processing. PS: BTW, Whats the point of passing words and sentences as the same thing? This is a duplicate parameter. |
Nice! We are getting somewhere! So: 1 + 3. This error is produced here and means that ".wav" was not in the file's name. That is quite strange, since the HOP service should write the file with the proper postfix. @klpanagi is there a chance to write it without a .wav postfix? 2: That is strange. Normally the word "I" should exist in the English dictionary. I will check it. Regarding the |
Files are stored by the HOP server using the file_uri name value defined into the multipart/form-data post. If the file extension is not defined into the relevant field, then each HOP service is not responsible to append this. Thoughts:
Currently, if you you append the full file name (prefix + extension, 'test.wav') into the file_uri name value, it is meant to work :) |
Regading the WAV extension, I have removed on purpose the extension it in the API: Furthermore, Its not as easy to tell if a file is WAV or OGG, by looking at the file header.
and so do OGG files, they also define themselves as WAVs:
An accurate detection would require the file format to be parsed, which at this point is definitely an over-engineering. Regarding the sentence parameter ( https://github.com/rapp-project/rapp-platform/tree/master/rapp_speech_detection_sphinx4) does not explain why its a duplicate. This upcoming week, I will try to record some audio samples, and try to test them. |
About the sentence and grammar parameters (thay come from Sphinx): http://cmusphinx.sourceforge.net/wiki/tutoriallm For the audio type, I guess for now we have to trust the user that calls the service. If he is wrong, the platform will return an error like the one you got (file is not a wav) or the speech recognition simply won't work. |
@alexge233 Can we close this? |
Well the problem is still there, but you can close it if you wish to. On Thu, 29 Oct, 2015 at 7:40 AM, Manos Tsardoulias
|
What exactly is the current problem? |
I am not getting replies, even of words that are in the audio file, and On Thu, 29 Oct, 2015 at 4:18 PM, Manos Tsardoulias
|
You are probably giving wrong parameters. Try putting the same words in the sentences and in grammar (the exact same string vectors). Also try to have just a few words for starters (2-5). Finally pay attention to the audio file type (if you write a wav with the microphone pass |
Actually, I am closing this as we have been testing it successfully the last months with the python API, with the actual NAO and with a headset.. If you get any errors, open another issue with the exact input parameters and attached the audio file. Thnx! |
I aubergine tries all your suggestions: words, sentences and grammar. I'm any case I will keep trying, as this is absolutely crucial to work properly. A. ----- Reply message ----- You are probably giving wrong parameters. Try putting the same words in the sentences and in grammar (the exact same string vectors). Also try to have just a few words for starters (2-5). Finally pay attention to the audio file type (if you write a wav with the microphone pass Reply to this email directly or view it on GitHub: |
As I suggested, If something doesn't work, post all the parameters and the audio file you are using, in order to easier identify the problem. |
My previous posts (from 27 days ago-scroll up) have all parameters and results) listed. |
I remember the posts and in most of them either the parameters were wrong (especially the audio type) or the denoising had not been performed, or even your deployment had missing packages. Now that your system is up to date and more stable please check the test cases again. Remember that the Sphinx recognition is not suitable for free speech and/or a big pool of words (above 10), since it was created to work for both languages and in simple cases. Thus trying to perform complex speech recognition is not suggested. For reference In the paper we have published, the performance for 10 greek words in audio files recorded from a headset was above 90%. Also we are currently successfully testing it in the NAO cognitive games case for vocabularies of 3-4 words under heave noise. I am looking forward for your new results! I am leaving this open and close it when you see fit. |
I've been having issues with sphinx speech to text:
When I set_denoise_profile, I get a warning:
So I tried running the tests:
And I got a series of errors:
Then later on:
Finally, it throws an exception:
When running a service request, it takes up to 10 seconds to get an empty response, but no errors are thrown.
Before running a service request, I set denoise profile, using silence_sample.wav and then run speech to text, using the yes_no.wav
The text was updated successfully, but these errors were encountered: