Skip to content
This repository has been archived by the owner on Jun 21, 2023. It is now read-only.

Ideas for lipsync and visemes? #47

Open
kinoc opened this issue Dec 21, 2021 · 0 comments
Open

Ideas for lipsync and visemes? #47

kinoc opened this issue Dec 21, 2021 · 0 comments

Comments

@kinoc
Copy link

kinoc commented Dec 21, 2021

First, love the project !

I have a robotic and virtual agent project that I'm trying to get as close to real-time response as possible.
I use the following to generate speech:
python3 fastVoice.py | larynx -v ek --interactive --ssml --raw-stream --cuda --half --max-thread-workers 8 --stdin-format lines --process-on-blank-line| aplay -r 22050 -c 1 -f S16_LE
Where fastVoice.py just dumps the SSML from a socket onto stdin (remember to flush properly ...)
fastVoice.txt

All works very well. Audio generally starts <1s from receiving the message. The question is how to get a phoneme-viseme sequence synced with the audio output.
I can manage to get level 0-ish lipsync by looking at the amplitude of the audio output, but that gives enough info for just the jaw, not the viseme's of the lips.

Do you have any ideas/pointers on how to maintain the responsiveness of "--raw-stream" while getting real-time matching info to generate the matching visemes?

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant