#0. Introduction

This notebook demonstrates how to build a conversational interface that lets users interact with a Retrieval-Augmented Generator (RAG) agent by **voice** or **text**, retrieve up-to-date news articles, and hear the responses via text-to-speech.  

**Key features:**  
- **Speech-to-Text (STT):** Record audio and transcribe user speech into text  
- **Retrieval-Augmented Generation:** Use LangChain’s agent to fetch relevant news snippets and compose answers  
- **Text-to-Speech (TTS):** Convert the agent’s reply into spoken audio  
- **Dual Input Modes:** Switch between typing your query or speaking it, with a unified chat history  
- **Custom UI:** Gradio Blocks layout with theming, CSS customizations, and a toggled voice player  

#1. Importations

Install required Python packages

In [1]:
!pip install langchain langchain-google-genai feedparser google-generativeai --quiet
!pip install openai-whisper ffmpeg-python --quiet
!pip install torch torchaudio --quiet
!pip install numpy scipy librosa unidecode inflect --quiet
!pip install gradio --quiet

  Preparing metadata (setup.py) ... [?25l[?25hdone
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m42.0/42.0 kB[0m [31m454.4 kB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m81.3/81.3 kB[0m [31m6.0 MB/s[0m eta [36m0:00:00[0m
[?25h  Building wheel for sgmllib3k (setup.py) ... [?25l[?25hdone
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m800.5/800.5 kB[0m [31m16.8 MB/s[0m eta [36m0:00:00[0m
[?25h  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
  Preparing metadata (pyproject.toml) ... [?25l[?25hdone
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m363.4/363.4 MB[0m [31m4.0 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m13.8/13.8 MB[0m [31m68.0 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m24.6/24.6 MB[0m [31m61.3 MB/s[0m eta 

General Python and PyTorch imports

In [2]:
import torch
import numpy as np
import scipy
import os
import io
import ffmpeg

Import LangChain core classes

In [3]:
from langchain_core.tools import tool
from langchain_google_genai import ChatGoogleGenerativeAI
from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder
from langchain_core.messages import AIMessage, HumanMessage
from langchain.agents.format_scratchpad.openai_tools import format_to_openai_tool_messages
from langchain.agents.output_parsers.openai_tools import OpenAIToolsAgentOutputParser
from langchain.agents import AgentExecutor

Load Gemini API key and set environment variable

In [4]:
import os

from google.colab import userdata
GOOGLE_API_KEY = userdata.get('GOOGLE_API_KEY')

os.environ["GOOGLE_API_KEY"] = GOOGLE_API_KEY

Select GPU device if available

In [5]:
import torch

if torch.cuda.is_available():
    device = torch.device("cuda")
else:
    device = torch.device("cpu")

#2. Speech To Text

Import Whisper for speech‑to‑text

In [6]:
import whisper

Load Whisper STT model

In [7]:
MODEL_SIZE = "base"
stt_model = whisper.load_model(MODEL_SIZE, device=device)

100%|████████████████████████████████████████| 139M/139M [00:01<00:00, 124MiB/s]


Helper function to transcribe an audio file

In [8]:
def transcribe_audio_filepath(audio_filepath):
    result = stt_model.transcribe(audio_filepath)
    transcribed_text = result["text"].strip()
    return transcribed_text

#3. RAG news

Import feedparser for RSS parsing

In [9]:
import feedparser

Define Google News search tool

In [10]:
@tool
def search_google_news(topic: str, max_results: int = 5) -> str:
    """
    Searches Google News RSS feed for a given topic and returns recent headlines.
    Args:
        topic (str): The topic to search for (e.g., "artificial intelligence", "climate change").
        max_results (int): The maximum number of news headlines to return.
    Returns:
        str: A formatted string containing the news headlines and their links,
             or a message if no news is found or an error occurs.
    """
    if not topic:
        return "Error: Please provide a topic to search for."
    try:
        safe_topic = topic.replace(" ", "+")
        url = f"https://news.google.com/rss/search?q={safe_topic}&hl=en-US&gl=US&ceid=US:en"
        feed = feedparser.parse(url)

        if not feed.entries:
            return f"No recent news found for '{topic}'."

        headlines = []
        for i, entry in enumerate(feed.entries):
            if i >= max_results:
                break
            title = entry.title
            link = entry.link

            headlines.append(f"  - Title: {title}\n    Link: {link}")

        return f"Recent news for '{topic}':\n" + "\n".join(headlines)
    except Exception as e:
        return f"Error fetching news for '{topic}': {str(e)}"

Configure Gemini model with the news tool

In [11]:
rag_llm = ChatGoogleGenerativeAI(model="gemini-1.5-flash-latest", temperature=0.2, google_api_key=GOOGLE_API_KEY)
rag_tools = [search_google_news]
rag_llm_with_tools = rag_llm.bind_tools(rag_tools)

Create the prompt template for the agent

In [12]:
rag_prompt = ChatPromptTemplate.from_messages(
    [
        (
            "system",
            "You are a helpful assistant. You have access to a tool called 'search_google_news' "
            "which can find recent news headlines on a given topic. "
            "When a user asks for news, you should use this tool. "
            "After getting the news headlines from the tool, present them to the user in a readable format. "
            "If the user asks something not related to news, try to answer directly."
        ),
        MessagesPlaceholder(variable_name="chat_history", optional=True),
        ("user", "{input}"),
        MessagesPlaceholder(variable_name="agent_scratchpad"),
    ]
)

Assemble the LangChain agent

In [13]:
rag_agent = (
    {
        "input": lambda x: x["input"],
        "agent_scratchpad": lambda x: format_to_openai_tool_messages(x["intermediate_steps"]),
        "chat_history": lambda x: x.get("chat_history", []),
    }
    | rag_prompt
    | rag_llm_with_tools
    | OpenAIToolsAgentOutputParser()
)

Wrap the agent in an executor

In [14]:
rag_agent_executor = AgentExecutor(
    agent=rag_agent,
    tools=rag_tools,
    verbose=False
)

Print confirmation of agent initialisation

In [15]:
print("RAG News Agent initialized.")

RAG News Agent initialized.


#4. Text To Speech

Load Tacotron2 TTS model

In [16]:
tacotron2 = torch.hub.load('NVIDIA/DeepLearningExamples:torchhub', 'nvidia_tacotron2', model_math='fp16')
tacotron2 = tacotron2.to(device)

Downloading: "https://github.com/NVIDIA/DeepLearningExamples/zipball/torchhub" to /root/.cache/torch/hub/torchhub.zip
Downloading checkpoint from https://api.ngc.nvidia.com/v2/models/nvidia/tacotron2_pyt_ckpt_amp/versions/19.09.0/files/nvidia_tacotron2pyt_fp16_20190427


Load WaveGlow vocoder

In [17]:
waveglow = torch.hub.load('NVIDIA/DeepLearningExamples:torchhub', 'nvidia_waveglow', model_math='fp16')
waveglow = waveglow.remove_weightnorm(waveglow)
waveglow = waveglow.to(device)

Using cache found in /root/.cache/torch/hub/NVIDIA_DeepLearningExamples_torchhub
Downloading checkpoint from https://api.ngc.nvidia.com/v2/models/nvidia/waveglow_ckpt_amp/versions/19.09.0/files/nvidia_waveglowpyt_fp16_20190427
  WeightNorm.apply(module, name, dim)


Load additional TTS utilities

In [18]:
utils = torch.hub.load('NVIDIA/DeepLearningExamples:torchhub', 'nvidia_tts_utils')

Using cache found in /root/.cache/torch/hub/NVIDIA_DeepLearningExamples_torchhub
  return s in _symbol_to_id and s is not '_' and s is not '~'
  return s in _symbol_to_id and s is not '_' and s is not '~'


Filename constants for TTS output

In [19]:
from scipy.io.wavfile import write as write_wav
TTS_OUTPUT_FILENAME = "temp_tts_output.wav"

Helper to synthesize speech to a WAV file

In [20]:
def synthesize_speech_to_file(text_to_speak):
    # create a silent audio file if no text has been transcribed
    if not text_to_speak or text_to_speak.strip() == "":
        print("TTS: Empty text, skipping synthesis.")
        sample_rate = 22050
        silent_audio = np.zeros(int(0.1 * sample_rate)) # 0.1 second of silence
        write_wav(TTS_OUTPUT_FILENAME, sample_rate, silent_audio.astype(np.int16))
        return TTS_OUTPUT_FILENAME

    sequences, lengths = utils.prepare_input_sequence([text_to_speak])

    with torch.no_grad():
        mel, _, _ = tacotron2.infer(sequences, lengths)
        audio_tensor = waveglow.infer(mel)

    audio_numpy = audio_tensor[0].data.cpu().numpy()
    rate = 22050

    write_wav(TTS_OUTPUT_FILENAME, rate, audio_numpy)
    print(f"Speech synthesized and saved to {TTS_OUTPUT_FILENAME}")
    return TTS_OUTPUT_FILENAME

# 5. GUI

Install Gradio

In [21]:
!pip install gradio --quiet

Import Gradio and wire helpers

In [22]:
import gradio as gr

def stt(audio_path):
    return stt_model.transcribe(audio_path)["text"]

def tts(text):
    return synthesize_speech_to_file(text)

Define helper that calls the agent with chat history

In [23]:
def llm(text, history_pairs):
    """
    text           : last user utterance (str)
    history_pairs  : [(user, bot), …] coming from gr.State
    """
    # convert Gradio’s list of tuples into LangChain messages
    chat_history = []
    for human, bot in history_pairs:
        chat_history.append(HumanMessage(content=human))
        chat_history.append(AIMessage(content=bot))

    resp = rag_agent_executor.invoke(
        {"input": text, "chat_history": chat_history}
    )
    return resp["output"]

Define UI theme and custom CSS

In [24]:
theme = gr.themes.Soft(primary_hue="indigo", neutral_hue="slate")

css = """
/* pagina */
.gradio-container{
    background:linear-gradient(135deg,#f9fbff 0%,#ffffff 100%);
}
/* titolo */
#title-bar{
    font-size:2rem;font-weight:700;text-align:center;margin:0.5rem 0;
}
/* microfono e altri pulsanti */
.gr-button{
    border-radius:9999px;font-weight:600;padding:0.6rem 1.4rem;
    box-shadow:0 2px 4px rgba(0,0,0,0.08);
}
"""

Callback: voice interaction (audio → text → answer)

In [25]:
  def voice_chat(audio, history):
      user_msg = stt(audio)
      bot_msg  = llm(user_msg, history)
      history += [(user_msg, bot_msg)]
      bot_voice = tts(bot_msg)
      return history, bot_voice, history, None

Callback: text interaction

In [26]:
def text_chat(user_msg, history):
    bot_msg  = llm(user_msg, history)
    history += [(user_msg, bot_msg)]
    bot_voice = tts(bot_msg)
    return "", history, bot_voice, history

Build the Gradio interface with tabs and toggles

In [27]:
with gr.Blocks(theme=theme, css=css, title="Voice & Text Chat") as demo:
    gr.HTML("<div id='title-bar'>🌿😝⚙️🌿😝⚙️ - Voice & Text Chat by RAGaoutille - 🌿😝⚙️🌿😝⚙️ </div>")

    out_chat = gr.Chatbot(label="Dialog")
    with gr.Accordion("Assistant voice (click to toggle)", open=False):
        out_voice = gr.Audio(label="", interactive=False)
    state = gr.State([])

    with gr.Tabs():

        with gr.Tab("Text"):
            inp_text = gr.Textbox(
                placeholder="💬  Type a message and press Enter …",
                show_label=False,
                lines=1,
            )
            inp_text.submit(
                text_chat,
                [inp_text, state],
                [inp_text, out_chat, out_voice, state],
            )

        with gr.Tab("Voice"):
            inp_audio = gr.Audio(
                sources="microphone",
                type="filepath",
                label="Speak here",
            )

    inp_audio.stop_recording(
        voice_chat,
        [inp_audio, state],
        [out_chat, out_voice, state, inp_audio],   # resets recorder
    )

  out_chat = gr.Chatbot(label="Dialog")


Launch the Gradio app. Very important: run this cell once!!

In [28]:
demo.queue().launch()

It looks like you are running Gradio on a hosted a Jupyter notebook. For the Gradio app to work, sharing must be enabled. Automatically setting `share=True` (you can turn this off by setting `share=False` in `launch()` explicitly).

Colab notebook detected. To show errors in colab notebook, set debug=True in launch()
* Running on public URL: https://1127003130aaedecd0.gradio.live

This share link expires in 1 week. For free permanent hosting and GPU upgrades, run `gradio deploy` from the terminal in the working directory to deploy to Hugging Face Spaces (https://huggingface.co/spaces)


