# Deploy a Voice AI bot with Pipecat AI and NIM (and Riva TTS & STT)
In this notebook, we walk through how to craft and deploy a voice AI bot using Pipecat AI. We illustrate the basic Pipecat flow with the `nvidia/llama-3.1-nemotron-70b-instruct` LLM model and Riva for STT (Speech-To-Text) & TTS (Text-To-Speech). However, Pipecat is not opinionated and other models and STT/TTS services can easily be used. See [Pipecat documentation](https://docs.pipecat.ai/server/services/supported-services#supported-services) for other supported services.

Pipecat AI is an open-source framework for building voice and multimodal conversational agents. Pipecat simplifies the complex voice-to-voice AI pipeline, and lets developers build AI capabilities easily and with Open Source, commercial, and custom models. See [Pipecat's Core Concepts](https://docs.pipecat.ai/getting-started/core-concepts) for a deep dive into how it works.

The framework was developed by Daily, a company that has provided real-time video and audio communication infrastructure since 2016. It is fully vendor neutral and is not tightly coupled to Daily's infrastructure. That said, we do use it in this demo. Sign up for a Daily-bot API key [here](https://bots.daily.co/sign-up).

## Step 1 - Install dependencies
First we set our environment.

We use Daily for transport, OpenAI for context aggregation, Riva for TTS & TTS, and Silero for VAD (Voice Activity Detection). If using different services, for example Cartesia for TTS, one would run `pip install pipecat-ai[cartesia]`.

In [1]:
!pip install python-dotenv
%load_ext dotenv
%dotenv

!pip install "pipecat-ai[daily,openai,riva,silero]"


[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m24.2[0m[39;49m -> [0m[32;49m24.3.1[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m

[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m24.2[0m[39;49m -> [0m[32;49m24.3.1[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m


## Step 2 - Configure Daily transport for WebRTC communication
- room_url: Where to connect (and where will navigate to to talk to our bot)
- None: No authentication token needed
- "NVIDIA NIM": The bot's display name
- Enable audio output for text-to-speech playback and enable VAD

In [11]:
# Url to talk to the NVIDIA NIM bot
# Update to your room url after obtaining Daily-bot API key
#### NOTE: if this is changed, the link in Step 7 (HERE) will no longer work.
DAILY_SAMPLE_ROOM_URL="https://pc-34b1bdc94a7741719b57b2efb82d658e.daily.co/prod-test"

In [3]:
from pipecat.audio.vad.silero import SileroVADAnalyzer
from pipecat.transports.services.daily import DailyParams, DailyTransport

transport = DailyTransport(
    DAILY_SAMPLE_ROOM_URL,
    None,
    "NVIDIA NIM",
    DailyParams(
        audio_out_enabled=True,
        vad_enabled=True,
        vad_analyzer=SileroVADAnalyzer(),
        vad_audio_passthrough=True,
    ),
)

[32m2024-12-12 22:14:15.138[0m | [1mINFO    [0m | [36mpipecat.audio.vad.vad_analyzer[0m:[36mset_params[0m:[36m69[0m - [1mSetting VAD params to: confidence=0.7 start_secs=0.2 stop_secs=0.8 min_volume=0.6[0m
[32m2024-12-12 22:14:15.139[0m | [34m[1mDEBUG   [0m | [36mpipecat.audio.vad.silero[0m:[36m__init__[0m:[36m114[0m - [34m[1mLoading Silero VAD model...[0m
[32m2024-12-12 22:14:15.237[0m | [34m[1mDEBUG   [0m | [36mpipecat.audio.vad.silero[0m:[36m__init__[0m:[36m136[0m - [34m[1mLoaded Silero VAD[0m


Participant left: {'id': '8613db22-b75e-4494-8259-232a6fd74f00', 'info': {'isOwner': False, 'joinedAt': 1734063286, 'permissions': {'hasPresence': True, 'canAdmin': [], 'canSend': ['customVideo', 'camera', 'customAudio', 'microphone', 'screenAudio', 'screenVideo']}, 'userName': 'vanessa', 'isLocal': False}}


## Step 3 - Initialize LLM, STT, and TTS services
We can customize options, for example a different LLM `model` or `voice_id` for FastPitch TTS.

In [4]:
import os
from pipecat.services.nim import NimLLMService
from pipecat.services.riva import FastPitchTTSService, ParakeetSTTService

stt = ParakeetSTTService(api_key=os.getenv("NVIDIA_API_KEY"))

llm = NimLLMService(
    api_key=os.getenv("NVIDIA_API_KEY"), model="meta/llama-3.1-70b-instruct"
)

tts = FastPitchTTSService(api_key=os.getenv("NVIDIA_API_KEY"))

[32m2024-12-12 22:14:46.247[0m | [34m[1mDEBUG   [0m | [36mpipecat.services.openai[0m:[36m_stream_chat_completions[0m:[36m174[0m - [34m[1mGenerating chat: [{"role": "system", "content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be converted to audio so don't include special characters in your answers. Respond to what the user said in a creative and helpful way that makes a cat pun if it is possible.", "name": "system"}, {"role": "system", "content": "Please introduce yourself to the user and deliver a cat fact.", "name": "system"}][0m
[32m2024-12-12 22:14:46.795[0m | [34m[1mDEBUG   [0m | [36mpipecat.services.riva[0m:[36mrun_tts[0m:[36m98[0m - [34m[1mGenerating TTS: [Hello there, it's great to connect with you.][0m
[32m2024-12-12 22:14:47.824[0m | [34m[1mDEBUG   [0m | [36mpipecat.services.riva[0m:[36mrun_tts[0m:[36m98[0m - [34m[1mGenerating TTS: [ I'm your friendly AI

## Step 4 - Define prompt and initialize context aggregator
Edit the prompt as desired.

In [5]:
from pipecat.processors.aggregators.openai_llm_context import OpenAILLMContext

messages = [
    {
        "role": "system",
        "content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be converted to audio so don't include special characters in your answers. Respond to what the user said in a creative and helpful way that makes a cat pun if it is possible.",
    },
]

context = OpenAILLMContext(messages)
context_aggregator = llm.create_context_aggregator(context)

## Step 5 - Create pipeline
Here we align the services into a pipeline to process speech into text, send to llm, then turn the llm response text into speech.

In [6]:
from pipecat.pipeline.pipeline import Pipeline

pipeline = Pipeline(
    [
        transport.input(),  # Transport user input
        stt,  # STT
        context_aggregator.user(),  # User responses
        llm,  # LLM
        tts,  # TTS
        transport.output(),  # Transport bot output
        context_aggregator.assistant(),  # Assistant spoken responses
    ]
)

[32m2024-12-12 22:14:23.001[0m | [34m[1mDEBUG   [0m | [36mpipecat.processors.frame_processor[0m:[36mlink[0m:[36m150[0m - [34m[1mLinking PipelineSource#0 -> DailyInputTransport#0[0m
[32m2024-12-12 22:14:23.001[0m | [34m[1mDEBUG   [0m | [36mpipecat.processors.frame_processor[0m:[36mlink[0m:[36m150[0m - [34m[1mLinking DailyInputTransport#0 -> ParakeetSTTService#0[0m
[32m2024-12-12 22:14:23.002[0m | [34m[1mDEBUG   [0m | [36mpipecat.processors.frame_processor[0m:[36mlink[0m:[36m150[0m - [34m[1mLinking ParakeetSTTService#0 -> OpenAIUserContextAggregator#0[0m
[32m2024-12-12 22:14:23.003[0m | [34m[1mDEBUG   [0m | [36mpipecat.processors.frame_processor[0m:[36mlink[0m:[36m150[0m - [34m[1mLinking OpenAIUserContextAggregator#0 -> NimLLMService#0[0m
[32m2024-12-12 22:14:23.003[0m | [34m[1mDEBUG   [0m | [36mpipecat.processors.frame_processor[0m:[36mlink[0m:[36m150[0m - [34m[1mLinking NimLLMService#0 -> FastPitchTTSService#0[0m
[3

## Step 6 - Create PipelineTask

In [7]:
from pipecat.pipeline.task import PipelineParams, PipelineTask

task = PipelineTask(pipeline, PipelineParams(allow_interruptions=True))

[32m2024-12-12 22:14:26.329[0m | [34m[1mDEBUG   [0m | [36mpipecat.processors.frame_processor[0m:[36mlink[0m:[36m150[0m - [34m[1mLinking Source#0 -> Pipeline#0[0m
[32m2024-12-12 22:14:26.330[0m | [34m[1mDEBUG   [0m | [36mpipecat.processors.frame_processor[0m:[36mlink[0m:[36m150[0m - [34m[1mLinking Pipeline#0 -> Sink#0[0m


## Step 7 - Create a pipeline runner
This manages the processing pipeline.

In [8]:
from pipecat.pipeline.runner import PipelineRunner

runner = PipelineRunner()

## Step 8 - Set event handlers
The `on_first_participant_joined` handler tells the bot to start the conversation when you join the call.  
The `on_participant_left` handler sends an EndFrame which signals to terminate the pipeline.

In [9]:
from pipecat.frames.frames import LLMMessagesFrame, EndFrame

@transport.event_handler("on_first_participant_joined")
async def on_first_participant_joined(transport, participant):
    # Kick off the conversation.
    messages.append({"role": "system", "content": "Please introduce yourself to the user and deliver a cat fact."})
    await task.queue_frames([LLMMessagesFrame(messages)])

@transport.event_handler("on_participant_left")
async def on_participant_left(transport, participant, reason):
    print(f"Participant left: {participant}")
    await task.queue_frame(EndFrame())   

## Step 7 - Run the bot! Then talk to the bot [HERE](https://pc-34b1bdc94a7741719b57b2efb82d658e.daily.co/prod-test)

In [10]:
await runner.run(task)

[32m2024-12-12 22:14:34.062[0m | [34m[1mDEBUG   [0m | [36mpipecat.pipeline.runner[0m:[36mrun[0m:[36m27[0m - [34m[1mRunner PipelineRunner#0 started running PipelineTask#0[0m
[32m2024-12-12 22:14:34.063[0m | [1mINFO    [0m | [36mpipecat.transports.services.daily[0m:[36mjoin[0m:[36m322[0m - [1mJoining https://pc-34b1bdc94a7741719b57b2efb82d658e.daily.co/prod-test[0m
[32m2024-12-12 22:14:35.698[0m | [1mINFO    [0m | [36mpipecat.transports.services.daily[0m:[36mjoin[0m:[36m340[0m - [1mJoined https://pc-34b1bdc94a7741719b57b2efb82d658e.daily.co/prod-test[0m
[32m2024-12-12 22:14:46.244[0m | [1mINFO    [0m | [36mpipecat.transports.services.daily[0m:[36mon_participant_joined[0m:[36m595[0m - [1mParticipant joined 8613db22-b75e-4494-8259-232a6fd74f00[0m
[32m2024-12-12 22:14:47.823[0m | [34m[1mDEBUG   [0m | [36mpipecat.transports.base_output[0m:[36m_bot_started_speaking[0m:[36m211[0m - [34m[1mBot started speaking[0m
[32m2024-12-12 22