Conversation
|
| # allow your local agent to publish transcripts on behalf of the avatar agent | ||
| .with_attributes({ATTRIBUTE_PUBLISH_ON_BEHALF: ctx.room.local_participant.identity}) |
There was a problem hiding this comment.
Not sure this comment is correct
There was a problem hiding this comment.
I think it's to mark that the avatar agent publishes video and audio on behalf of the agent.
There was a problem hiding this comment.
@longcw maybe related to our prior discussion on #1424: Is there a canonical way for hiding one of the call participants now such that the user has the experience of a 1:1 call?
How we implemented the plugin here is that only the avatar worker publishes to the room such that the agent can be hidden client side later, and we thought this line above allows the agent to publish transcripts in the name of the avatar.
However, from your comment it sounds like the intended way now is to somehow have the avatar publish audio/video in the name of the agent and then have the avatar worker hidden?
Maybe you can clarify this and give us some suggestions for best practices around this once you review this file
There was a problem hiding this comment.
Ideally it works in the latter way you described, the avatar publish audio/video in the name of the agent and then have the avatar worker hidden.
The client can detect this by reading the ATTRIBUTE_PUBLISH_ON_BEHALF attribute and then handle the avatar participant as designed.
There was a problem hiding this comment.
The benefit is that the other operations related to the agent can keep unchanged, e.g. perform a RPC call, send a text or file through data stream to the agent. These operations need a dest participant identity, using the agent identity is straightforward.
There was a problem hiding this comment.
I see the advantage of keeping the other operations unchanged but I guess the main thing we want to achieve is that we can speak to an avatar as a user right? Do you have an example of how the user could see the video through the "agent worker" in the frontend? We're using ATTRIBUTE_PUBLISH_ON_BEHALF like in your latest example on the dev branch https://github.com/livekit/agents/blob/dev-1.0/examples/avatar/agent_worker.py#L48 but in the end we get our "Avatar Worker" who outputs video and audio (see screenshot). The other "agent worker" neither outputs audio or video and I guess is just there to forward the user audio. So we're currently hiding that one (agent worker) on the frontend. So does the ATTRIBUTE_PUBLISH_ON_BEHALF config not work properly? I.e. should the "agent worker" receive the audio and video we send through the "avatar worker"?
There was a problem hiding this comment.
It needs some modification on the frontend client. For example, if you using livekit playground it will only show the avatar video without another participant (just to show it's feasible)

I think we will support this in client SDK or you can customize the frontend to hide the participant with ATTRIBUTE_PUBLISH_ON_BEHALF.
There was a problem hiding this comment.
I guess the misunderstanding part is that ATTRIBUTE_PUBLISH_ON_BEHALF is not supported automatically by the client sdk at this point.
There was a problem hiding this comment.
@longcw could you share a code snippet for how this is implemented in the LK playground?
EDIT: I'd also suggest to add this somewhere in your docs / avatar examples / ... so people know what's the proper way to handle it
There was a problem hiding this comment.
sure, I'll share an example on how to handle the ATTRIBUTE_PUBLISH_ON_BEHALF in the client.
There was a problem hiding this comment.
Is this requirements.txt correct? Not sure, other plugins seem to not specify livekit but maybe I am missing something
| # LiveKit Beyond Presence Avatar Example | ||
|
|
||
| This example demonstrates how to create an animated avatar using Beyond Presence that responds to audio input using LiveKit's agent system. | ||
| The avatar worker generates synchronized video and audio based on received audio input using the Beyond Presence API. | ||
|
|
||
| ## How it Works | ||
|
|
||
| 1. The LiveKit agent and the Beyond Presence avatar worker both join into the same LiveKit room as the user. | ||
| 2. The LiveKit agent listens to the user and generates a conversational response, as usual. | ||
| 3. However, instead of sending audio directly into the room, the agent sends the audio via WebRTC data channel to the Beyond Presence avatar worker. | ||
| 4. The avatar worker only listens to the audio from the data channel, generates the corresponding avatar video, synchronizes audio and video, and publishes both back into the room for the user to experience. |
There was a problem hiding this comment.
I would be inclined to change "avatar worker" to "avatar agent", since that seems more in line with what it actually is (an agent joining the call). A worker, from what I understood, is a process that takes care of a job and can spawn zero or more agents for the room. WDYT?
| @local_agent_session.output.audio.on("playback_finished") | ||
| def on_playback_finished(ev: PlaybackFinishedEvent) -> None: | ||
| logger.info( | ||
| "playback_finished", | ||
| extra={"playback_position": ev.playback_position, "interrupted": ev.interrupted}, | ||
| ) |
There was a problem hiding this comment.
Copied this instruction from other examples, what is its purpose exactly?
There was a problem hiding this comment.
here it's just for logging. You can ignore it.
There was a problem hiding this comment.
@longcw Any reason why this is included in all avatar examples then? Is there some common use case related to avatars that you would use this for?
If not, I'd probably suggest to keep the examples minimal and omit these
| @local_agent_session.output.audio.on("playback_finished") | |
| def on_playback_finished(ev: PlaybackFinishedEvent) -> None: | |
| logger.info( | |
| "playback_finished", | |
| extra={"playback_position": ev.playback_position, "interrupted": ev.interrupted}, | |
| ) |
| ## How it Works | ||
|
|
||
| 1. The LiveKit agent and the Beyond Presence avatar worker both join into the same LiveKit room as the user. | ||
| 2. The LiveKit agent listens to the user and generates a conversational response, as usual. | ||
| 3. However, instead of sending audio directly into the room, the agent sends the audio via WebRTC data channel to the Beyond Presence avatar worker. | ||
| 4. The avatar worker only listens to the audio from the data channel, generates the corresponding avatar video, synchronizes audio and video, and publishes both back into the room for the user to experience. |
There was a problem hiding this comment.
Do you see any problem with this explanation? Please let me know if you think something is wrong or unclear! 🙏
20fcb83 to
3125dbe
Compare
|
@longcw Since you're leading the integration of avatar examples for LK agents, I had a few questions:
Let me know your thoughts. Happy to collaborate to make this as smooth as possible! |
|
3125dbe to
1cd8969
Compare
1cd8969 to
219d5fc
Compare
|
I updated the PR to:
|
219d5fc to
e10f534
Compare
Co-authored-by: Felix Altenberger <felix@beyondpresence.ai> Co-authored-by: Lucas Jacobson <lucas@beyondpresence.ai> Co-authored-by: Nicola De Angeli <nicola@beyondpresence.ai>
7b8acb3 to
d8732d2
Compare
|
Hi @longcw, any actionable for me to help merging this into the main examples PR? 🙏 |
can you rebase this pr to main and change the target branch to main. |
|
@longcw I merged the latest main since the previous history already had a lot of merges which made rebasing a bit difficult, hope that's also ok! The diff should now be meaningful again. |
|
Thanks @niqodea! I have tested your avatar api with the token and it works well. If you don't mind I can take this one, I may create a new pr with some clean up. |
|
Sure, go ahead! Thank you! |
Co-authored-by: Felix Altenberger <felix@beyondpresence.ai> Co-authored-by: Lucas Jacobson <lucas@beyondpresence.ai> Co-authored-by: Nicola De Angeli <nicola@beyondpresence.ai>
|
Add resources for the Beyond Presence API.
livekit-agents-bey: plugin to handle API calls and local setup for avatar generationexamples/avatar/bey: a basic script demonstrating how to use the API via the pluginMarking this as a draft since the API is not live yet. Feedback on integration or improvements is welcome!
Note: we reserved the livekit-plugins-bey PyPI package name, let me know if I should add someone from the LiveKit team as owner. 🙏