Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sphinx does not return results #127

Closed
alexge233 opened this issue Sep 22, 2015 · 36 comments
Closed

Sphinx does not return results #127

alexge233 opened this issue Sep 22, 2015 · 36 comments

Comments

@alexge233
Copy link
Contributor

I've been having issues with sphinx speech to text:

When I set_denoise_profile, I get a warning:

sox WARN trim: End position is after expected end of audio.
sox WARN trim: Last 1 position(s) not reached.

So I tried running the tests:

catkin_make run_tests

And I got a series of errors:

['Error:No noise profile for the nao_wav_4_ch type exists']
testspeech_detection_sphinx4_batch_functional ... ok

Then later on:

INFO: ngram_model_trie.c(456): Trying to read LM in trie binary format
ERROR: "ngram_model_trie.c", line 458: File sentences.lm.dmp not found
INFO: ngram_model_trie.c(189): Trying to read LM in arpa format
ERROR: "ngram_model_trie.c", line 191: File sentences.lm.dmp not found
INFO: ngram_model_trie.c(548): Trying to read LM in DMP format
ERROR: "ngram_model_trie.c", line 550: Dump file sentences.lm.dmp not found
FATAL: "sphinx_lm_convert.c", line 165: Failed to read the model from the file     'sentences.lm.dmp'Configuration:

{'grammar_folder': '/home/alex/rapp_platform/rapp-platform-catkin-ws/src/rapp-        platform/rapp_sphinx4_language_models/tmp_language_pack/', 'configuration_path':         '/home/alex/rapp_platform/rapp-platform-catkin-ws/src/rapp-            platform/rapp_sphinx4_language_models/greekPack/default.config.xml', 'acoustic_model':     '/home/alex/rapp_platform/rapp-platform-catkin-ws/src/rapp-        platform/rapp_sphinx4_acoustic_models/english_acoustic_model', 'dictionary':     '/home/alex/rapp_platform/rapp-platform-catkin-ws/src/rapp-        platform/rapp_sphinx4_language_models/tmp_language_pack/custom.dict', 'jar_path':     '.:/home/alex/rapp_platform/rapp-platform-catkin-ws/src/rapp-        platform/rapp_sphinx4_java_libraries/sphinx4-core-1.0-20150630.174404-    9.jar:/home/alex/rapp_platform/rapp-platform-catkin-ws/src/rapp-v
platform/rapp_speech_detection_sphinx4/src', 'language_model': '/home/alex/rapp_platform/rapp-    platform-catkin-ws/src/rapp-    platform/rapp_sphinx4_language_models/tmp_language_pack/sentences.lm.bin', 'grammar_disabled': True, 'grammar_name': 'custom'}

Finally, it throws an exception:

['Error:No noise profile for the nao_wav_4_ch type exists']
testspeech_detection_sphinx4_batch_functional ... ok

Traceback (most recent call last):
  File "/opt/ros/indigo/share/rostest/cmake/../../../bin/rostest", line 36, in <module>
    rostestmain()
  File "/opt/ros/indigo/lib/python2.7/dist-packages/rostest/__init__.py", line 268, in rostestmain
    _main()
  File "/opt/ros/indigo/lib/python2.7/dist-packages/rostest/rostest_main.py", line 187, in rostestmain
    printRostestSummary(result, subtest_results)
  File "/opt/ros/indigo/lib/python2.7/dist-packages/rostest/rostestutil.py", line 75, in     printRostestSummary
    return rosunit.print_runner_summary(result, rostest_results, runner_name='ROSTEST')
  File "/opt/ros/indigo/lib/python2.7/dist-packages/rosunit/baretest.py", line 480, in     print_runner_summary
    buff.write(tc_result.description)
UnicodeEncodeError: 'ascii' codec can't encode characters in position 229-231: ordinal not in     range(128)

When running a service request, it takes up to 10 seconds to get an empty response, but no errors are thrown.

Before running a service request, I set denoise profile, using silence_sample.wav and then run speech to text, using the yes_no.wav

@etsardou
Copy link
Member

All the errors until the last one are normal:

sox WARN trim: End position is after expected end of audio.
sox WARN trim: Last 1 position(s) not reached.

This is a warning from the unix library which handles the denoising. Nothing to worry about.

['Error:No noise profile for the nao_wav_4_ch type exists']
testspeech_detection_sphinx4_batch_functional ... ok

This error happened because the speech detection tests were executed before the audio processing tests, thus indeed not noise profiles exist. If you execute the tests again it will work.

INFO: ngram_model_trie.c(456): Trying to read LM in trie binary format
ERROR: "ngram_model_trie.c", line 458: File sentences.lm.dmp not found
INFO: ngram_model_trie.c(189): Trying to read LM in arpa format
ERROR: "ngram_model_trie.c", line 191: File sentences.lm.dmp not found
INFO: ngram_model_trie.c(548): Trying to read LM in DMP format
ERROR: "ngram_model_trie.c", line 550: Dump file sentences.lm.dmp not found
FATAL: "sphinx_lm_convert.c", line 165: Failed to read the model from the file     'sentences.lm.dmp'Configuration:

This is an error regarding the Sphinx4 and the language model representation. The newer version of RAPP Platform uses the new Sphinx4, as stated in the mail I sent in Sep. 2:

1. Sphinx4 related update:
A major change is that we now use the new version of Sphinx4 which accepts binary formatted language models. To use the latest version you must:
Fetch the new master branch
Go to the folder cmusphinx/multisphinx (must be in your home folder) and execute `sudo make uninstall`
Clone the newest sphinxbase library (https://github.com/cmusphinx/sphinxbase) and execute `./autogen && make && sudo make install`

Please do this and try again.

If you don't want to bother with the denoise, use this file and state headset as the audio type. This won't look for denoising profiles as supposedly the file is clean of noise.

Also please post the exact arguments of the set_denoise_profile and speech_recognition_sphinx calls in order to check if all the args are correct.

Finally, it is proposed to use the provided OVA where everything is setup and tested (if everything else fails).

@alexge233
Copy link
Contributor Author

The arguments are correct, I can see sphinx being invoked and running - I just don't get results. I am now updating sphinx and rebuilding, this is the VM you fixed, so the paths are different: everything is in rapp_platform/rapp-platform-catkin_ws/src/rapp-platform but other than that, everything else works.

Once I've rebuilt sphinx, do I need to rebuild rapp-platform?
I will try the tests again, and then with the headest as the audio type.

@etsardou
Copy link
Member

Also the tests should be executed as such: catkin_make run_tests -j1 as stated in the README file and not without the -j1, as the tests are not thread safe.

If you like, post the arguments as the running Sphinx does not imply that they are correct; it just tries to perform speech recognition.

Regarding the new VM I was inferring to the one that contains the v0.3.5 of the RAPP platform, which is uploaded in the RAPP FTP server.

Once Sphinx is rebuilt you theoretically don't have to rebuild rapp platform.

@alexge233
Copy link
Contributor Author

catkin_make without the job parameter is single thread, giving more than 1 job is multithreaded.

I rebuild sphinx and re-run the tests, and it seems to be passing all tests, with only one error found:

sox FAIL formats: can't open input file `/home/alex/rapp_platform/rapp-platform-catkin-ws/src/rapp-platform/rapp_testing_tools/testing_tools/test_data/silence_wav_d05_a1_nope.wav': No such file or directory

However, I do get previous warnings and errrors though, but not the exception.

The arguments posted are:

POST /hop/speech_detection_sphinx4 HTTP/1.1
Host: localhost
Connection: close
Content-Length: 82610
Content-Type: multipart/form-data; boundary=eo2HYEK79twZG078

--eo2HYEK79twZG078
Content-Disposition: form-data; name="language"

en
--eo2HYEK79twZG078
Content-Disposition: form-data; name="user"

testuser
--eo2HYEK79twZG078
Content-Disposition: form-data; name="audio_source"

nao_wav_1_ch
--eo2HYEK79twZG078
Content-Disposition: form-data; name="grammar"

[]
--eo2HYEK79twZG078
Content-Disposition: form-data; name="words"

[ "yes","no"]
--eo2HYEK79twZG078
Content-Disposition: form-data; name="sentences"

[ "yes","no"]
--eo2HYEK79twZG078
Content-Disposition: form-data; name="file_uri"; filename="ZoHjad7OtBMYdRAl.wav"
Content-Transfer-Encoding: binary
>> BINARY DATA HERE<<
--eo2HYEK79twZG078--

I can still see ROS invoking sphinx just fine, but I still get empty results.
I will try with different audio files, and then try the headset file as you suggested.

@alexge233
Copy link
Contributor Author

using file microphone_nai.wav with audio_source headset still returns empty results.

@alexge233
Copy link
Contributor Author

I'm getting somewhere, I got a time-out.
Please note, the JSON reply is malformed!!!

[INFO] [WallTime: 1442908667.839133] Client connected.  1 clients total.
[ERROR] [WallTime: 1442908669.033837] SELECT username FROM tblUser WHERE username="testuser"
09:57:49.331 INFO dictionary           Loading dictionary from: file:/home/alex/rapp_platform/rapp-platform-catkin-ws/src/rapp-platform/rapp_sphinx4_language_models/tmp_language_pack/custom.dict
09:57:49.406 INFO dictionary           Loading filler dictionary from: file:/home/alex/rapp_platform/rapp-platform-catkin-ws/src/rapp-platform/rapp_sphinx4_acoustic_models/english_acoustic_model/noisedict
09:57:49.446 INFO trieNgramModel       Loading n-gram language model from: file:/home/alex/rapp_platform/rapp-platform-catkin-ws/src/rapp-platform/rapp_sphinx4_language_models/tmp_language_pack/sentences.lm.bin
09:57:53.758 INFO dictionary           The dictionary is missing a phonetic transcription for the word 'a'
09:57:53.759 WARNING trieNgramModel    The dictionary is missing a phonetic transcription for the word 'a'
09:57:53.761 INFO dictionary           The dictionary is missing a phonetic transcription for the word 'is'
09:57:53.761 WARNING trieNgramModel    The dictionary is missing a phonetic transcription for the word 'is'
09:57:53.762 INFO dictionary           The dictionary is missing a phonetic transcription for the word 'test'
09:57:53.763 WARNING trieNgramModel    The dictionary is missing a phonetic transcription for the word 'test'
09:57:53.764 INFO dictionary           The dictionary is missing a phonetic transcription for the word 'this'
09:57:53.764 WARNING trieNgramModel    The dictionary is missing a phonetic transcription for the word 'this'
09:57:53.765 WARNING trieNgramModel    Dictionary is missing 4 words that are contained in the language model.
09:57:59.666 INFO dictionary           The dictionary is missing a phonetic transcription for the word 'is'
09:57:59.666 INFO dictionary           The dictionary is missing a phonetic transcription for the word 'test'
09:57:59.666 INFO dictionary           The dictionary is missing a phonetic transcription for the word 'a'
09:57:59.667 INFO dictionary           The dictionary is missing a phonetic transcription for the word 'this'
09:57:59.668 INFO lexTreeLinguist      Max CI Units 43
09:57:59.668 INFO lexTreeLinguist      Unit table size 79507
09:57:59.670 INFO speedTracker         # ----------------------------- Timers----------------------------------------
09:57:59.671 INFO speedTracker         # Name               Count   CurTime   MinTime   MaxTime   AvgTime   TotTime   
09:57:59.674 INFO speedTracker         Load Dictionary      3       0.1330s   0.0010s   0.1330s   0.0457s   0.1370s   
09:57:59.675 INFO speedTracker         Compile              3       5.9030s   0.2720s   5.9030s   2.3657s   7.0970s   
09:57:59.676 INFO speedTracker         Score                196     0.0000s   0.0000s   0.0470s   0.0015s   0.2930s   
09:57:59.676 INFO speedTracker         Prune                678     0.0000s   0.0000s   0.0010s   0.0000s   0.0040s   
09:57:59.677 INFO speedTracker         Grow                 686     0.0000s   0.0000s   0.0050s   0.0002s   0.1070s   
09:57:59.678 INFO speedTracker         Load AM              1       1.7530s   1.7530s   1.7530s   1.7530s   1.7530s   
09:57:59.678 INFO speedTracker         Load LM              3       4.3200s   4.0400s   4.3200s   4.1937s   12.5810s  
09:57:59.679 INFO speedTracker         Frontend             104     0.0000s   0.0000s   0.0410s   0.0005s   0.0550s   
09:58:00.294 INFO speedTracker            This  Time Audio: 0.30s  Proc: 0.60s  Speed: 1.99 X real time
09:58:00.294 INFO speedTracker            Total Time Audio: 0.90s  Proc: 0.96s 1.07 X real time
09:58:00.295 INFO memoryTracker           Mem  Total: 297.50 Mb  Free: 208.49 Mb
09:58:00.296 INFO memoryTracker           Used: This: 89.01 Mb  Avg: 123.79 Mb  Max: 147.45 Mb
09:58:00.296 INFO trieNgramModel       LM Cache Size: 0 Hits: 0 Misses: 0
['', 'Error: Time out error']
09:58:00.327 INFO trieNgramModel       LM Cache Size: 0 Hits: 0 Misses: 0
09:58:00.328 INFO speedTracker         # ----------------------------- Timers----------------------------------------
09:58:00.331 INFO speedTracker         # Name               Count   CurTime   MinTime   MaxTime   AvgTime   TotTime   
09:58:00.332 INFO speedTracker         Load Dictionary      3       0.1330s   0.0010s   0.1330s   0.0457s   0.1370s   
09:58:00.333 INFO speedTracker         Compile              3       5.9030s   0.2720s   5.9030s   2.3657s   7.0970s   
09:58:00.335 INFO speedTracker         Score                294     0.0000s   0.0000s   0.0590s   0.0019s   0.5600s   
09:58:00.336 INFO speedTracker         Prune                1017    0.0000s   0.0000s   0.0010s   0.0000s   0.0060s   
09:58:00.337 INFO speedTracker         Grow                 1029    0.0000s   0.0000s   0.0090s   0.0003s   0.2720s   
09:58:00.338 INFO speedTracker         Load AM              1       1.7530s   1.7530s   1.7530s   1.7530s   1.7530s   
09:58:00.340 INFO speedTracker         Load LM              3       4.3200s   4.0400s   4.3200s   4.1937s   12.5810s  
09:58:00.341 INFO speedTracker         Frontend             156     0.0000s   0.0000s   0.0410s   0.0007s   0.1070s   
09:58:00.342 INFO speedTracker            Total Time Audio: 0.90s  Proc: 0.96s 1.07 X real time
09:58:00.343 INFO memoryTracker           Mem  Total: 297.50 Mb  Free: 208.49 Mb
09:58:00.344 INFO memoryTracker           Used: This: 89.01 Mb  Avg: 118.00 Mb  Max: 147.45 Mb
[INFO] [WallTime: 1442908680.402743] Client disconnected. 0 clients total.

The JSON Reply was:

{"words":["","Error: Time out error"],"error":""}

This was by using file microphone_nai.wav and audio-type headset.

@alexge233
Copy link
Contributor Author

Well, I've officially broken it!
I tried set_denoise_profile with file silence_wav_d05_a1.wav and then run speech_to_text using file nao_wav_d05_a1.wav for audio_source * nao_wav_1_ch*.

Here's the output from ROS/Rapp_platform:

[ERROR] [WallTime: 1442909782.653997] SELECT username FROM tblUser WHERE username="testuser"
[ERROR] [WallTime: 1442909783.881579] Error processing request: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()
['Traceback (most recent call last):\n', '  File "/opt/ros/indigo/lib/python2.7/dist-packages/rospy/impl/tcpros_service.py", line 623, in _handle_request\n    response = convert_return_to_response(self.handler(request), self.response_class)\n', '  File "/home/alex/rapp_platform/rapp-platform-catkin-ws/src/rapp-platform/rapp_audio_processing/src/rapp_audio_processing/rapp_audio_processing.py", line 199, in energy_denoise\n    self.energy_denoising_debug)\n', '  File "/home/alex/rapp_platform/rapp-platform-catkin-ws/src/rapp-platform/rapp_audio_processing/src/rapp_audio_processing/rapp_energy_denoise.py", line 57, in energyDenoise\n    if sq_signal[i] < scale * mean_sq:\n', 'ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()\n']
[ERROR] [WallTime: 1442909783.884939] [Client 9] [id: R2TGv] call_service ServiceException: service [/rapp/rapp_speech_detection_sphinx4/batch_speech_to_text] responded with an error: service cannot process request: service [/rapp/rapp_audio_processing/energy_denoise] responded with an error: error processing request: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()

And the reply:

 {"words":[],"error":"RAPP Platform Failure"}

UPDATE:

Shutting down ros nodes and restarting, actually gave a good hint:

/usr/lib/python2.7/dist-packages/scipy/io/wavfile.py:42: WavFileWarning: Unknown wave file format
 warnings.warn("Unknown wave file format", WavFileWarning)

@etsardou
Copy link
Member

catkin_make by default uses all cores, its not single threaded, thus you must specify that you want only 1 core.

Regarging the error in test its normal, the file does not exist. If you notice the test passes.

Now about the parameters:

  • testuser exists in mysql db?
  • you have called the set_denoise_profile for the testuser?
  • is this file (ZoHjad7OtBMYdRAl.wav) indeed from NAO and has 1 channel?

For the broken issue: silence_wav_d05_a1.wav and nao_wav_d05_a1.wav are 4 channel files, thus you must declare nao_wav_4_ch. Nevertheless, this is a bug, i'll create an issue.

One of the tests which you executed an were successful is here: https://github.com/rapp-project/rapp-platform/blob/master/rapp_speech_detection_sphinx4/tests/functional/batch_functional.py#L26

Please try your calls with these parameters and check the results.

@alexge233
Copy link
Contributor Author

I thought catkin_make invokes make which by default uses a single thread.
Regarding your questions:

  • testuser exists
  • I have called set_denoise_profile
  • this is a random string generated by the wav file loaded (for the purpose of non-conflicting filenames)
    it is the file nao_wav_d05_a1.wav

I got a new crash:

Segmentation fault (core dumped)
['Error:System sox malfunctioned']

I'll try with nao_wav_4_ch, but this is starting to make sense:

I used set_denoise_profile for a 1 channel, but using a 4 channel file.
Then I tried speech2text with a 4 channel file, requesting a 1 channel audio source.

@alexge233
Copy link
Contributor Author

I tried your suggestion:
silence_wav_d05_a1.wav and nao_wav_d05_a1.wav for nao_wav_4_ch

I still get an empty result, but this time it is an array of 3 empty strings, which is really bizarre:

JSON:{"words":["","",""],"error":""}

@alexge233
Copy link
Contributor Author

OK, I think one of the problems is that I've been using wrong languages with wrong files:

  • silence_wav_d05_a1.wav and nao_wav_d05_a1.wav are Greek whereas I used English as the parameter
  • silence_wav_d05_a1.wav and nao_wav_d05_a1.wav are nao_wav_4_ch whereas I used nao_wav_1_ch

Personally, I think that this is an issue for the platform, it should not be so much parametrised, and/or it should be more safe regarding crashing (I should NOT be able to crash the platform via a rapp-api call).

@etsardou
Copy link
Member

I think the parameterization is not a bad thing, the denoising is a devious issue. Nevertheless we are trying to perform a dynamic denoising approach where the noise is not known a priori.

Regarding the crashes, I agree that they must be eliminated, which will be after identifying them by debugging!

@alexge233
Copy link
Contributor Author

I tried using the English language, and OGG:

  • silence_ogg_d05_a1.ogg / nao_ogg
  • recording_yes.ogg / nao_ogg / EN

I still got an empty result: one empty string

{"words":[""],"error":""}

Running it a second time got me a time-out:

{"words":["","Error: Time out error"],"error":""}

I also got:

{"words":[],"error":"Error:false"}

What puzzles me is that it seems to be recognising there are words, but not getting their values.

I'm not gonna argue about the parametrization, I just feel that being able to crash the platform by using wrong parameters, will be an issue.

At this point, I'd be happy to know why am I getting empty results.

@alexge233
Copy link
Contributor Author

Using the Greek dictionary with audio_type set to headset and input file microphone_nai.wav I got:

['']
Word: ##

In ROS/Sphinx, and actual JSON:

{"words":[],"error":""}

If I use nao_wav_1_ch, I get a segmentation fault, and if I use nao_wav_4_ch an exception thrown (see previous post).

So, I decided to try with another WAV file (T3 "Desire is irreleveant, I am a machine) which you can find here: http://www.terminatorfiles.com/media/audio

Interestingly, the output from sphinx suggests it picked a few words:

11:47:21.363 INFO trieNgramModel       LM Cache Size: 3021 Hits: 1097275 Misses: 3021
['and', 'you', 'are', 'an', 'Error: Time out error']

Then, the file was no longer found (I guess it was deleted before it was processed?)

11:47:52.506 INFO trieNgramModel       LM Cache Size: 2509 Hits: 3049914 Misses: 7689
cp: cannot stat ‘/home/alex/.hop/cache/services/ppxDyzBjwzi4cLg6-npOdL.audio’: No such file or directory
['Error: Server cp malfunctioned']

@alexge233
Copy link
Contributor Author

I tried with the rapp user as by default, using the .ogg files for denoising and speech to text.

The error persists:

cp: cannot stat ‘/home/alex/.hop/cache/services/OIJZVcZrFmL2r3N4-EOexa.audio’: No such file or directory

I also get random messages like this one:

['did', 'it', 'end', 'Error: Time out error']
['and', 'you', 'are', 'an', 'Error: Time out error']

I think one of the underlying issues is that it times-out too soon/fast, before it has had the opportunity to finish (bear in my I am running this VM on a dual-core laptop).

Furthermore, the first error is very clear: the file was deleted too early.

Our latest ortelio commit produces random file names with the extension audio, so there is no need to instantly delete the file from the HOP cache directory.

UPDATE

New error regarding the same issue:

Exception in thread "main" java.lang.NullPointerException
    at edu.cmu.sphinx.result.Lattice.<init>(Lattice.java:171)
    at edu.cmu.sphinx.api.SpeechResult.<init>(SpeechResult.java:38)
    at edu.cmu.sphinx.api.AbstractSpeechRecognizer.getResult(AbstractSpeechRecognizer.java:61)
    at Sphinx4.main(Sphinx4.java:190)
cp: cannot stat ‘/home/alex/.hop/cache/services/Qwx1bELzDgjywTBD-ldIQa.audio_transformed.wav’: No such file or directory
cp: cannot stat ‘/home/alex/.hop/cache/services/Qwx1bELzDgjywTBD-   ldIQa.audio_transformed.wav_denoised.wav’: No such file or directory
cp: cannot stat ‘/home/alex/.hop/cache/services/Qwx1bELzDgjywTBD-ldIQa.audio_transformed.wav_denoised.wav_energy_denoised.wav’: No such file or directory
rm: cannot remove ‘/home/alex/.hop/cache/services/Qwx1bELzDgjywTBD-ldIQa.audio_transformed.wav’: No such file or directory
rm: cannot remove ‘/home/alex/.hop/cache/services/Qwx1bELzDgjywTBD-ldIQa.audio_transformed.wav_denoised.wav’: No such file or directory
rm: cannot remove ‘/home/alex/.hop/cache/services/Qwx1bELzDgjywTBD-ldIQa.audio_transformed.wav_denoised.wav_energy_denoised.wav’: No such file or directory
['Error: Time out error']

This is followed by the previous error:

[ERROR] [WallTime: 1442933691.012849] Error processing request: [Errno 32] Broken pipe
['Traceback (most recent call last):\n', '  File "/opt/ros/indigo/lib/python2.7/dist-packages/rospy/impl/tcpros_service.py", line 623, in _handle_request\n    response = convert_return_to_response(self.handler(request), self.response_class)\n', '  File "/home/alex/rapp_platform/rapp-platform-catkin-ws/src/rapp-platform/rapp_speech_detection_sphinx4/src/rapp_speech_detection_sphinx4/speech_recognition_sphinx4.py", line 164, in speechRecognitionBatch\n    spee_res = self.speechRecognition(spee_req)\n', '  File "/home/alex/rapp_platform/rapp-platform-catkin-ws/src/rapp-platform/rapp_speech_detection_sphinx4/src/rapp_speech_detection_sphinx4/speech_recognition_sphinx4.py", line 172, in speechRecognition\n    words = self.sphinx4.performSpeechRecognition(req.path, req.audio_source, req.user)\n', '  File "/home/alex/rapp_platform/rapp-platform-catkin-ws/src/rapp-platform/rapp_speech_detection_sphinx4/src/rapp_speech_detection_sphinx4/sphinx4_wrapper.py", line 267, in performSpeechRecognition\n    words = self.callSphinxJava(new_audio_file)\n', '  File "/home/alex/rapp_platform/rapp-platform-catkin-ws/src/rapp-platform/rapp_speech_detection_sphinx4/src/rapp_speech_detection_sphinx4/sphinx4_wrapper.py", line 313, in callSphinxJava\n    self.p.stdin.write("start\\r\\n")\n', 'IOError: [Errno 32] Broken pipe\n']
[ERROR] [WallTime: 1442933691.015660] [Client 10] [id: ldIQa] call_service ServiceException: service [/rapp/rapp_speech_detection_sphinx4/batch_speech_to_text] responded with an error: error processing request: [Errno 32] Broken pipe

This seems to be transforming an ogg into a wav. Is this due to the extension?

@alexge233
Copy link
Contributor Author

Trying with the NAO 4 CHANNEL audio source, also throws an exception:

Exception in thread "main" java.lang.NullPointerException
    at edu.cmu.sphinx.result.Lattice.<init>(Lattice.java:171)
    at edu.cmu.sphinx.api.SpeechResult.<init>(SpeechResult.java:38)
    at edu.cmu.sphinx.api.AbstractSpeechRecognizer.getResult(AbstractSpeechRecognizer.java:61)
    at Sphinx4.main(Sphinx4.java:190)
['Error: Time out error']

I used user rapp with files:

  • silence_wav_d05_a1.wav (denoise)
  • nao_wav_d05_a1.wav (speech2tet)
  • nao_wav_4_ch (audio type)

Trying the same thing with ogg files and audio source, throws many time outs,
but most importantly, I can verify that the ogg file is being manipulated as a wav file:

rm: cannot remove ‘/home/alex/.hop/cache/services/audio-AIt2m.ogg_transformed.wav’: No such file or directory
rm: cannot remove ‘/home/alex/.hop/cache/services/audio-AIt2m.ogg_transformed.wav_denoised.wav’: No such file or directory
...
["they're", 'joie', 'joie', 'de', 'xiao', 'of', 'xiao', 'Error: Time out error']
...
rm: cannot remove ‘/home/alex/.hop/cache/services/audio-    AIt2m.ogg_transformed.wav_denoised.wav_energy_denoised.wav’: No such file or directory
['Error: Time out error']

@alexge233
Copy link
Contributor Author

This is my last post.

Trying with "headset" as audio_source, yes-no.wav and email-robot.wav do not get any type of response.

Furthermore, in C++ API, I removed (only for now) the random filename generator, and instead I sent all wav files as audio.wav and all ogg files as audio.ogg. This seems to keep the files instead of deleting them.

Obviously theres an issue here, as when I send random string filenames (with or without extension) they seem to be getting prematurely deleted.

@etsardou
Copy link
Member

Regarding the time out errors: These errors occur because Sphinx4 Java library crashes. It seems like you have an outdated sphinxbase (or Sphinx4). Did you use the latest VM?

Regarding the ogg being treated as wav: It should be treated as wav, after being encoded as wav, as Sphinx supports only wav

Regarding the error of the missing file, @klpanagi knows more on the removal of the files by the HOP service, so Kostas jump in

About the random words you are getting back: Are you sure you are giving a language model (the words parameter in the service? Because if this is empty it tries to perform speech recognition with the entire english dictionary which is far from optimal.

@alexge233
Copy link
Contributor Author

  • I updated sphinxbase as per your instructions.
  • OK then
  • will wait for Kostas
  • yes I was giving empty parameter for some of the tests, but for example yes-no.wav did have it as parameters
  • I haven't been able to download the VM yet.

@klpanagi
Copy link
Contributor

I made it to produce the error. The error states as follows:

  • Every Hop service defines a timeout value for the websocket communication between the service and the ROS-Service. For speech_detection_sphinx4 HOP service, the timeout value is defined at 15sec.
  • When this timeout value is reached, it is assumed that the websocket communication is broken (for a reason / no response from ROS-Service) and tries to reconnect.
  • This procedure continues for a max defined number of reconnect_tries. For speech_detection_sphinx4 this value is defined at 3.
  • Files get purged from the cache directory ONLY just before response is sent to client.

So the error is produced when the speech_detection ROS-Service takes more than 15sec to respond to HOP-service, three times in a row. So the HOP-Service deletes the file and returns to client, while the relevant ROS-Service still tries to read the file...

I dont know if this is an issue at all as the RAPP Platform should NEVER take 15sec to process a speechToText request...

@klpanagi
Copy link
Contributor

If this exists on a low-process-power OS instance you can try to raise the timeoutValue here

I will include these configuration parameters into properly formatted configuration files on the next merge to the master-branch.

@alexge233
Copy link
Contributor Author

I am back!
I am now using the RAPP VM v.4.
I run the tests:

******* Results ********

[ Succeded ]: {34 / 39}
- speech_detection_sphinx_test_nao_wav_1_ch_nai_oxi
- face_detection_test_medium_straight
- speech_detection_sphinx_test_ogg_no
- ontology_is_subsuperclass_of_microwave_oven
- face_detection_test_far_straight
- speech_detection_sphinx_test_ogg_triti
- ontology_is_subsuperclass_of_SpatialThing
- qr_detection_test_easy_medium
- speech_detection_sphinx_test_ogg_monday
- speech_detection_sphinx_test_nao_wav_1_ch_email_robot
- speech_detection_sphinx_test_ogg_yes
- face_detection_test_lenna_png
- speech_detection_sphinx_test_ogg_oxi
- speech_detection_sphinx_test_ogg_deutera
- face_detection_test_two_faces_jpg
- ontology_superclasses_of_test_1
- denoise_profile_test_1
- qr_detection_test_hard_far
- qr_detection_test_easy_near
- speech_detection_sphinx_test_nao_wav_1_ch_yes_no
- face_detection_test_multi_faces_jpg
- speech_detection_sphinx_test_ogg_tuesday
- qr_detection_test_medium_medium
- qr_detection_test_1
- qr_detection_test_hard_near
- qr_detection_test_hard_medium
- face_detection_test_close_straight
- qr_detection_test_medium_far
- speech_detection_sphinx_test_nao_wav_1_ch_thelw_voithia
- qr_detection_test_medium_near
- speech_detection_sphinx_test_headset_nai_oxi
- available_services_test
- ontology_subclasses_of_test_Oven
- qr_detection_test_easy_far

[ Failed ]: {5 / 39}
- face_detection_test_far_angle
- speech_detection_google_test_ogg_sentence1
- face_detection_test_medium_angle
- speech_detection_google_test_ogg_sentence2
- face_detection_test_close_angle

Obviously speech recognition works when running the tests.
Looking into the dir rapp_platform_files I can see the denoise profiles and the tested files.

I copied the same tests as those under the python tests in rapp platform.
HOWEVER when using the C++ API, I still get errors or empty responses!

1 - Using: yes-no.wav, with user rapp, en lang, words ["yes","no"] and sentence "yes no", for nao_wav_1_ch audio source, I got:

 {"words":[],"error":"Error:The file for denoising is not wav"}

This should not happen because I've run the set_denoise using both the Python tests and the C++ examples.
I run this multiple times, and it gets stuck into this error, never spawning sphinx.

2 - Using: recording_sentence1.ogg, with words: ['I', 'want', 'to', 'go', 'out'] and sentence "I want to go out", for user rapp, en language, nao_ogg audio source, I get:

{"words":["no","no"],"error":""}

I also got the error:

{"words":[],"error":"ERROR: Word I does not exist in the English Dictionary\nERROR: Word I does not exist in the English Dictionary"}

Which stopped after I run the example a 2nd time.
I know this test was meant for Google and not Sphinx!*.

3 - Using: email-robot.wav, with words: ['email','robot'] and sentence "email robot", for user rapp, en lang, nao_wav_1_ch audio source, I get:

{"words":[],"error":"Error:The file for denoising is not wav"}

I run this multiple times, always getting the same error

4 - Using recording_no.ogg, user rapp, en lang, words ["yes","no"] and sentences:
["yes","no"] for ogg audio source, I get:

{"words":["no"],"error":""}

Which is the FIRST test to actually work when using C++

5 - Using_recording_yes.ogg_, with exact same params as above, it also works:

{"words":["yes"],"error":""}

6 - Same with recording_tuesday.ogg, same params as above, just passed as words: ["monday", "tuesday"], I got:

{"words":["tuesday"],"error":""}

6 - Using: recording_sentence2.ogg, with params as those in python test, I get:

 {"words":["check","my"],"error":""}

Which is half the sentence.

SO, it becomes apparent to me, that something is broken regarding WAV speech processing.
Only OGG seems to work, and even then, only for very simple words.
Obviously I am happy because I got it working, but I would still like to see WAV working properly.

PS: BTW, Whats the point of passing words and sentences as the same thing? This is a duplicate parameter.

@etsardou
Copy link
Member

etsardou commented Oct 2, 2015

Nice! We are getting somewhere! So:

1 + 3. This error is produced here and means that ".wav" was not in the file's name. That is quite strange, since the HOP service should write the file with the proper postfix. @klpanagi is there a chance to write it without a .wav postfix?

2: That is strange. Normally the word "I" should exist in the English dictionary. I will check it.

Regarding the words and sentences check here: https://github.com/rapp-project/rapp-platform/tree/master/rapp_speech_detection_sphinx4

@klpanagi
Copy link
Contributor

klpanagi commented Oct 2, 2015

Files are stored by the HOP server using the file_uri name value defined into the multipart/form-data post. If the file extension is not defined into the relevant field, then each HOP service is not responsible to append this.

Thoughts:

  • Relevant Hop services can parse the audio_source and define the extension if it is missing. Though I believe that this should be avoided as we can get into a situation where the actual audio data format and given audio_source do not match.
  • We can use a parser in order to parse the actual data format from the file headers. We will discuss this...

Currently, if you you append the full file name (prefix + extension, 'test.wav') into the file_uri name value, it is meant to work :)

@alexge233
Copy link
Contributor Author

Regading the WAV extension, I have removed on purpose the extension it in the API:
https://github.com/rapp-project/rapp-api/blob/ortelio/cpp/includes/cloud/speechToText/speechToText.hpp#L88
since it is not possible to know the audio type.
If needed, I can create two specialisation classes for audio: a WAV and an OGG.
If the wrong file is used with the wrong class, we would simply catch the error and alert the user.
This approach also simplifies passing the audio source parameters, as they are implied by the object being passed as a parameter.

Furthermore, Its not as easy to tell if a file is WAV or OGG, by looking at the file header.
WAV files have their first bytes define their format:

hexdump -C yes-no.wav | less
00000000  52 49 46 46 24 40 01 00  57 41 56 45 66 6d 74 20  |RIFF$@..WAVEfmt |

and so do OGG files, they also define themselves as WAVs:

hexdump -C recording_monday.ogg | less
00000000  52 49 46 46 24 40 01 00  57 41 56 45 66 6d 74 20  |RIFF$@..WAVEfmt |

An accurate detection would require the file format to be parsed, which at this point is definitely an over-engineering.

Regarding the sentence parameter ( https://github.com/rapp-project/rapp-platform/tree/master/rapp_speech_detection_sphinx4) does not explain why its a duplicate.
In fact I though that the sentence would be one string, and not an array of strings. This needs further documentation and detailed info.
I am still not sure how it affects (if it does indeed) the detection accuracy.

This upcoming week, I will try to record some audio samples, and try to test them.

@etsardou
Copy link
Member

etsardou commented Oct 4, 2015

About the sentence and grammar parameters (thay come from Sphinx): http://cmusphinx.sourceforge.net/wiki/tutoriallm

For the audio type, I guess for now we have to trust the user that calls the service. If he is wrong, the platform will return an error like the one you got (file is not a wav) or the speech recognition simply won't work.

@etsardou
Copy link
Member

@alexge233 Can we close this?

@alexge233
Copy link
Contributor Author

Well the problem is still there, but you can close it if you wish to.

On Thu, 29 Oct, 2015 at 7:40 AM, Manos Tsardoulias
notifications@github.com wrote:

@alexge233 Can we close this?


Reply to this email directly or view it on GitHub.

@etsardou
Copy link
Member

What exactly is the current problem?

@alexge233
Copy link
Contributor Author

I am not getting replies, even of words that are in the audio file, and
included in the words parameter. I'll try with WAV files instead of
OGG, using the headset audio source, and report back.

On Thu, 29 Oct, 2015 at 4:18 PM, Manos Tsardoulias
notifications@github.com wrote:

What exactly is the current problem?


Reply to this email directly or view it on GitHub.

@etsardou
Copy link
Member

You are probably giving wrong parameters. Try putting the same words in the sentences and in grammar (the exact same string vectors). Also try to have just a few words for starters (2-5). Finally pay attention to the audio file type (if you write a wav with the microphone pass headset as type, as referred here: https://github.com/rapp-project/rapp-platform/blob/master/rapp_speech_detection_sphinx4/README.md)

@etsardou
Copy link
Member

Actually, I am closing this as we have been testing it successfully the last months with the python API, with the actual NAO and with a headset.. If you get any errors, open another issue with the exact input parameters and attached the audio file. Thnx!

@alexge233
Copy link
Contributor Author

I aubergine tries all your suggestions: words, sentences and grammar.
I be tried all audio from test data as well as audio I've recorded. It failed in all cases, it rarely recognized a word correctly, and never a sentence.

I'm any case I will keep trying, as this is absolutely crucial to work properly.

A.

----- Reply message -----
From: "Manos Tsardoulias" notifications@github.com
To: "rapp-project/rapp-platform" rapp-platform@noreply.github.com
Cc: "Alex" alexge233@hotmail.com
Subject: [rapp-platform] Sphinx does not return results (#127)
Date: Thu, Oct 29, 2015 18:22

You are probably giving wrong parameters. Try putting the same words in the sentences and in grammar (the exact same string vectors). Also try to have just a few words for starters (2-5). Finally pay attention to the audio file type (if you write a wav with the microphone pass headset as type, as referred here: https://github.com/rapp-project/rapp-platform/blob/master/rapp_speech_detection_sphinx4/README.md)


Reply to this email directly or view it on GitHub:
#127 (comment)

@etsardou
Copy link
Member

As I suggested, If something doesn't work, post all the parameters and the audio file you are using, in order to easier identify the problem.

@alexge233
Copy link
Contributor Author

My previous posts (from 27 days ago-scroll up) have all parameters and results) listed.
I will make a spreadsheet this weekend with what works and what doesn't work.
In general: one word sentences seem to work.
Many word sentences fail to work.
I'll get back on this asap.
Close this if you want, and I'll reopen if needed.

@etsardou
Copy link
Member

I remember the posts and in most of them either the parameters were wrong (especially the audio type) or the denoising had not been performed, or even your deployment had missing packages. Now that your system is up to date and more stable please check the test cases again. Remember that the Sphinx recognition is not suitable for free speech and/or a big pool of words (above 10), since it was created to work for both languages and in simple cases. Thus trying to perform complex speech recognition is not suggested.

For reference In the paper we have published, the performance for 10 greek words in audio files recorded from a headset was above 90%. Also we are currently successfully testing it in the NAO cognitive games case for vocabularies of 3-4 words under heave noise.

I am looking forward for your new results! I am leaving this open and close it when you see fit.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants