Skip to content

Comments

examples for 1.0#1532

Open
tinalenguyen wants to merge 67 commits intolivekit:mainfrom
tinalenguyen:main
Open

examples for 1.0#1532
tinalenguyen wants to merge 67 commits intolivekit:mainfrom
tinalenguyen:main

Conversation

@tinalenguyen
Copy link
Member

@tinalenguyen tinalenguyen commented Feb 20, 2025

to be reviewed:

  • dentist scheduler: a multi-agent example offering different functionalities integrated via Cal.com and Supabase APIs
  • conversation persistor (realtime and pipeline): an updated version for 1.0 events
  • conversation recorder: example of grabbing input/output audioframes for a wav recording via stt and tts nodes

@changeset-bot
Copy link

changeset-bot bot commented Feb 20, 2025

⚠️ No Changeset found

Latest commit: 912c01e

Merging this PR will not cause a version bump for any packages. If these changes should not result in a new version, you're good to go. If these changes should result in a version bump, you need to add a changeset.

This PR includes no changesets

When changesets are added to this PR, you'll see the packages that this PR includes changesets for and the associated semver types

Click here to learn what changesets are, and how to add one.

Click here if you're a maintainer who wants to add a changeset to this PR

@tinalenguyen tinalenguyen reopened this Feb 20, 2025
@tinalenguyen tinalenguyen changed the base branch from dev-1.0 to main February 20, 2025 09:28
@Bilal-io
Copy link

Bilal-io commented May 6, 2025

pydub depends on audioop which was deprecated in Python 3.11 and removed in Python 3.13
There is a PR but not merged yet.

@tinalenguyen
Copy link
Member Author

@Bilal-io thanks for the heads up, i decided to opt for LK's audio resampler which uses SoX

@Bilal-io
Copy link

@tinalenguyen there is an issue since your last update. I tried to debug it but I am unable to find a solution. The final audio includes the participant's speech but not the agent's TTS. Are you seeing the same issue?

@tinalenguyen
Copy link
Member Author

@Bilal-io I can't seem to replicate that problem, are you using the same pipeline setup as the conversation_recorder.py example?

@Bilal-io
Copy link

Yes @tinalenguyen, I am using the same code you shared. Here is a gist
I am able to converse with the agent without any issue. But as stated before the final audio file contains my speech without the TTS part, just silence.

@tinalenguyen
Copy link
Member Author

@Bilal-io Thank you for the gist, I was able to replicate the issue and fix it! Let me know if it works now :)

@Bilal-io
Copy link

Hey @tinalenguyen thank you for the quick fix.
I am seeing two different issues:
1- First call works fine, second call causes an error ...return stt(self, record_audio(), model_settings)... But I fixed this by using the same pattern as the tts_recorder instead of returning the stt(self, record_audio(), model_settings) directly, I did this:

async for result in stt(self, record_audio(), model_settings):
            yield result

2- The audio of the agent sounds great when speaking but comes out choppy in the saved file. This is the case even without the change mentioned above. I've attached an audio sample (converted to mp4 to be able to attach here). Not sure if this is related to Livekit itself or your implementation.

I appreciate your input

audio-sample.mp4

@tinalenguyen
Copy link
Member Author

tinalenguyen commented May 22, 2025

@Bilal-io Good catch, that approach makes more sense!

As for the agent audio, I suspect it's from STT audio cutting in during the agent's speech and not mixing well. I've alleviated it by changing the quality of the resampler to very high:

self._audio_resampler = AudioResampler(input_rate=frame.sample_rate, output_rate=FRAMERATE, quality="very_high")

If the audio still isn't consistent, let me know and I'll look into crossfading/transitioning the audio streams. Thank you again for trying out my work!!

@Bilal-io
Copy link

Thank you @tinalenguyen for looking into this.
The audio recording still contains the stuttering.

Also, the code you change you had requires invoking the record_audio in async for event in stt(self, record_audio, model_settings) so it should be async for event in stt(self, record_audio(), model_settings) instead.

Another issue I faced was with deleting the file due to a deadlock. I had to update the aclose to the following:

    async def aclose(self) -> None:
        self._audio_q.put_nowait(None)
        await self._main_atask
        if self._audio_resampler:
            frames = self._audio_resampler.flush()
            if frames:
                for flushed_frame in frames:
                    self._file.writeframes(flushed_frame.data.tobytes())
        self._audio_resampler = None
        self._current_input_rate = 0
        self._file.close()

@tinalenguyen
Copy link
Member Author

@Bilal-io Thank you for the feedback, I ended up rewriting most of it and I think it works way better now. Sorry about the bugs/delay, this must be a sign for me to stop coding at 4 AM..

Let me know what you think, and thanks again!!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants