Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

transcribe-stream -a not working from input file / stdin #23

Closed
lukifer opened this issue Jul 12, 2020 · 4 comments
Closed

transcribe-stream -a not working from input file / stdin #23

lukifer opened this issue Jul 12, 2020 · 4 comments

Comments

@lukifer
Copy link

lukifer commented Jul 12, 2020

Running the following results in a no-op on both 2.0 and latest:

voice2json transcribe-stream -a etc/test/what_time_is_it.wav --wav-sink streamtest.wav --event-sink streamtest.log

The resulting wav-sink is hiccup-y noise, and the event sink is:

{"type": "speech", "time": 0.06}
{"type": "silence", "time": 0.24}
{"type": "speech", "time": 1.4400000000000008}
{"type": "silence", "time": 1.620000000000001}
{"type": "speech", "time": 8.459999999999981}
{"type": "silence", "time": 8.639999999999983}
{"type": "speech", "time": 8.759999999999984}
{"type": "started", "time": 9.059999999999986}
{"type": "silence", "time": 10.439999999999998}
{"type": "stopped", "time": 11.760000000000009}
{"type": "speech", "time": 0.18}
{"type": "started", "time": 0.48}
{"type": "silence", "time": 0.54}
{"type": "speech", "time": 1.0200000000000005}
{"type": "silence", "time": 4.859999999999998}
{"type": "stopped", "time": 5.459999999999994}
{"type": "speech", "time": 0.54}
{"type": "started", "time": 0.8400000000000003}
{"type": "silence", "time": 1.560000000000001}
{"type": "stopped", "time": 3.5400000000000027}
{"type": "speech", "time": 4.56}
{"type": "silence", "time": 4.859999999999998}

Thanks again for your hard work on voice2json! 🙂

@synesthesiam
Copy link
Owner

You're welcome :)

Everything is working, it's just that the example WAV file has the wrong sample rate (48 Khz). Additionally, there needs to be a bit of silence at the end for the state machine to work.

Here's an example using sox to do both the conversion to 16Khz and the silence padding at the end. Hope this works for you!

$ sox etc/test/what_time_is_it.wav -r 16000 -e signed-integer -c 1 -t raw - pad 0 1 | \
    voice2json transcribe-stream -a - --wav-sink streamtest.wav --event-sink streamtest.log

@lukifer
Copy link
Author

lukifer commented Jul 14, 2020

Thanks for the silence / state machine context, that's very helpful! Still getting a no-op running that command, on both Mac and RPi, now just echoes Ready.

@synesthesiam
Copy link
Owner

Ah, I see the problem now. Fixed in 12dea31

I'll get this fix pushed into the Docker image and Deb packages soon. Thanks!

@lukifer
Copy link
Author

lukifer commented Jul 24, 2020

That fixed it, thanks so much!

@lukifer lukifer closed this as completed Jul 24, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants