<a href="https://colab.research.google.com/github/jeffheaton/app_generative_ai/blob/main/t81_559_class_13_4_speechbot.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# T81-559: Applications of Generative Artificial Intelligence
**Module 13: Speech Processing**
* Instructor: [Jeff Heaton](https://sites.wustl.edu/jeffheaton/), McKelvey School of Engineering, [Washington University in St. Louis](https://engineering.wustl.edu/Programs/Pages/default.aspx)
* For more information visit the [class website](https://sites.wustl.edu/jeffheaton/t81-558/).

# Module 13 Material

Module 13: Prompt Engineering

* Part 13.1: Intro to Speech Processing [[Video]]() [[Notebook]](t81_559_class_13_1_speech_models.ipynb)
* Part 13.2: Text to Speech [[Video]]() [[Notebook]](t81_559_class_13_2_text2speech.ipynb)
* Part 13.3: Speech to Text [[Video]]() [[Notebook]](t81_559_class_13_3_speech2text.ipynb)
* **Part 13.4: Speech Bot** [[Video]]() [[Notebook]](t81_559_class_13_4_speechbot.ipynb)
* Part 13.5: Future Directions in GenAI [[Video]]() [[Notebook]](t81_559_class_13_5_future.ipynb)


# Google CoLab Instructions

The following code ensures that Google CoLab is running and maps Google Drive if needed.

In [None]:
import os

try:
    from google.colab import drive, userdata
    COLAB = True
    print("Note: using Google CoLab")
except:
    print("Note: not using Google CoLab")
    COLAB = False

# OpenAI Secrets
if COLAB:
    os.environ["OPENAI_API_KEY"] = userdata.get('OPENAI_API_KEY')

# Install needed libraries in CoLab
if COLAB:
    !pip install langchain langchain_openai openai pydub

Note: using Google CoLab
Collecting langchain
  Downloading langchain-0.3.3-py3-none-any.whl.metadata (7.1 kB)
Collecting langchain_openai
  Downloading langchain_openai-0.2.2-py3-none-any.whl.metadata (2.6 kB)
Collecting openai
  Downloading openai-1.51.2-py3-none-any.whl.metadata (24 kB)
Collecting pydub
  Downloading pydub-0.25.1-py2.py3-none-any.whl.metadata (1.4 kB)
Collecting langchain-core<0.4.0,>=0.3.10 (from langchain)
  Downloading langchain_core-0.3.10-py3-none-any.whl.metadata (6.3 kB)
Collecting langchain-text-splitters<0.4.0,>=0.3.0 (from langchain)
  Downloading langchain_text_splitters-0.3.0-py3-none-any.whl.metadata (2.3 kB)
Collecting langsmith<0.2.0,>=0.1.17 (from langchain)
  Downloading langsmith-0.1.132-py3-none-any.whl.metadata (13 kB)
Collecting tenacity!=8.4.0,<9.0.0,>=8.1.0 (from langchain)
  Downloading tenacity-8.5.0-py3-none-any.whl.metadata (1.2 kB)
Collecting tiktoken<1,>=0.7 (from langchain_openai)
  Downloading tiktoken-0.8.0-cp310-cp310-manylinux_2_17_

# Part 13.4: Speech Bot

In this part we will create a speech chat bot. You will talk to it with your own voice and it will respond in voice. We will use the ChatBot object previously introduced.




In [None]:
from langchain.chains import ConversationChain
from langchain.memory import ConversationSummaryMemory
from langchain_openai import ChatOpenAI
from langchain_core.prompts.chat import PromptTemplate
from IPython.display import display_markdown
import pickle

DEFAULT_TEMPLATE = """You are a helpful assistant. DO not use markdown, just regular text.
limit your response to just a few sentences. If the user says something that indicates
that they wish to end the chat return just "bye" (no quotes), so I can end the loop.

Current conversation:
{history}
Human: {input}
AI:"""

MODEL = 'gpt-4o-mini'

class ChatBot:
    def __init__(self, llm_chat, llm_summary, template):
        """
        Initializes the ChatBot with language models and a template for conversation.

        :param llm_chat: A large language model for handling chat responses.
        :param llm_summary: A large language model for summarizing conversations.
        :param template: A string template defining the conversation structure.
        """
        self.llm_chat = llm_chat
        self.llm_summary = llm_summary
        self.template = template
        self.prompt_template = PromptTemplate(input_variables=["history", "input"], template=self.template)

        # Initialize memory and conversation chain
        self.memory = ConversationSummaryMemory(llm=self.llm_summary)
        self.conversation = ConversationChain(
            prompt=self.prompt_template,
            llm=self.llm_chat,
            memory=self.memory,
            verbose=False
        )

        self.history = []

    def converse(self, prompt):
        """
        Processes a conversation prompt and updates the internal history and memory.

        :param prompt: The input prompt from the user.
        :return: The generated response from the language model.
        """
        self.history.append([self.memory.buffer, prompt])
        output = self.conversation.invoke(prompt)
        return output['response']

    def chat(self, prompt):
        """
        Handles the full cycle of receiving a prompt, processing it, and displaying the result.

        :param prompt: The input prompt from the user.
        """
        print(f"Human: {prompt}")
        output = self.converse(prompt)
        display_markdown(output, raw=True)

    def print_memory(self):
        """
        Displays the current state of the conversation memory.
        """
        print("**Memory:")
        print(self.memory.buffer)

    def clear_memory(self):
        """
        Clears the conversation memory.
        """
        self.memory.clear()

    def undo(self):
        """
        Reverts the conversation memory to the state before the last interaction.
        """
        if len(self.history) > 0:
            self.memory.buffer = self.history.pop()[0]
        else:
            print("Nothing to undo.")

    def regenerate(self):
        """
        Re-executes the last undone interaction, effectively redoing an undo operation.
        """
        if len(self.history) > 0:
            self.memory.buffer, prompt = self.history.pop()
            self.chat(prompt)
        else:
            print("Nothing to regenerate.")

    def save_history(self, file_path):
        """
        Saves the conversation history to a file using pickle.

        :param file_path: The file path where the history should be saved.
        """
        with open(file_path, 'wb') as f:
            pickle.dump(self.history, f)

    def load_history(self, file_path):
        """
        Loads the conversation history from a file using pickle.

        :param file_path: The file path from which to load the history.
        """
        with open(file_path, 'rb') as f:
            self.history = pickle.load(f)
            # Optionally reset the memory based on the last saved state
            if self.history:
                self.memory.buffer = self.history[-1][0]

Next we create a LLM to communicate.

In [None]:
MODEL = 'gpt-4o-mini'

# Initialize the OpenAI LLM with your API key
llm = ChatOpenAI(
  model=MODEL,
  temperature= 0.3,
  n= 1)

c = ChatBot(llm, llm, DEFAULT_TEMPLATE)

response = c.converse("Hello, my name is Jeff.")
print(response)

  self.memory = ConversationSummaryMemory(llm=self.llm_summary)
  self.conversation = ConversationChain(


Nice to meet you, Jeff! How can I assist you today?


We previously introduced the code used to record.

In [None]:
from IPython.display import Javascript, Audio, display, HTML
from google.colab import output
from base64 import b64decode
import io
import time
import uuid
from openai import OpenAI

client = OpenAI()

RECORD = """
const sleep = time => new Promise(resolve => setTimeout(resolve, time))
const b2text = blob => new Promise(resolve => {
    const reader = new FileReader()
    reader.onloadend = e => resolve(e.srcElement.result)
    reader.readAsDataURL(blob)
})
var record = time => new Promise(async resolve => {
    stream = await navigator.mediaDevices.getUserMedia({ audio: true })
    recorder = new MediaRecorder(stream)
    chunks = []
    recorder.ondataavailable = e => chunks.push(e.data)
    recorder.start()
    await sleep(time)
    recorder.onstop = async ()=>{
        blob = new Blob(chunks)
        text = await b2text(blob)
        resolve(text)
    }
    recorder.stop()
})
"""

def generate_text(text):
    response = client.audio.speech.create(
        model="tts-1",
        voice="nova",
        input=text
    )
    audio_data = response.content
    return audio_data  # Return the audio data directly

def speak_text(text):
    audio_data = generate_text(text)

    # Generate a unique ID for this audio element
    audio_id = f"audio_{uuid.uuid4().hex}"

    # Display the audio with the unique ID
    display(Audio(audio_data, autoplay=True, element_id=audio_id))

    # Create a hidden div to store the audio status
    status_div = f'<div id="{audio_id}_status" style="display: none;">playing</div>'
    display(HTML(status_div))

    # JavaScript to handle audio playback and status
    js_code = f"""
    var audioElement = document.getElementById('{audio_id}');
    if (audioElement) {{
        audioElement.onended = function() {{
            document.getElementById('{audio_id}_status').textContent = 'finished';
        }};
    }}
    """

    # Execute the JavaScript
    display(HTML(f"<script>{js_code}</script>"))

    # Wait for the audio to finish
    while True:
        status = eval_js(f"document.getElementById('{audio_id}_status').textContent")
        if status == 'finished':
            break
        time.sleep(0.1)

def eval_js(js_code):
    from google.colab import output
    return output.eval_js(js_code)

def record(seconds=3):
    print(f"Recording now for {seconds} seconds.")
    display(Javascript(RECORD))
    s = output.eval_js('record(%d)' % (seconds * 1000))
    binary = b64decode(s.split(',')[1])

    # Convert to AudioSegment
    audio = AudioSegment.from_file(io.BytesIO(binary), format="webm")

    # Export as WAV
    audio.export("recorded_audio.wav", format="wav")
    print("Recording done.")
    return audio

def transcribe_audio(filename):
    with open(filename, "rb") as audio_file:
        transcription = client.audio.transcriptions.create(
            model="whisper-1",
            file=audio_file
        )
    return transcription.text

We now continue the chat conversation until the user requests it to end.

In [None]:
from pydub import AudioSegment

MODEL = 'gpt-4o-mini'

# Initialize the OpenAI LLM with your API key
llm = ChatOpenAI(
  model=MODEL,
  temperature= 0.3,
  n= 1)

c = ChatBot(llm, llm, DEFAULT_TEMPLATE)

# Transcribe the recorded audio
response = None
while response != "bye":
    audio = record(5)
    transcription = transcribe_audio("recorded_audio.wav")
    print(f"Human: {transcription}")
    response = c.converse(transcription)
    print(f"AI: {response}")
    speak_text(response)

Recording now for 5 seconds.


<IPython.core.display.Javascript object>

Recording done.
Human: Hello, how are you?
AI: I'm doing well, thank you! How can I assist you today?


Recording now for 5 seconds.


<IPython.core.display.Javascript object>

Recording done.
Human: My name is Jeff.
AI: Nice to meet you, Jeff! How can I assist you today?


Recording now for 5 seconds.


<IPython.core.display.Javascript object>

Recording done.
Human: I like things related to technology and history. I am.
AI: That's great, Jeff! Technology and history are fascinating subjects. Is there a specific topic within those areas that you're interested in discussing?


Recording now for 5 seconds.


<IPython.core.display.Javascript object>

Recording done.
Human: Goodbye, in the chat.
AI: bye
