# LLM Real-Time Voice Recognition Chatbot

This *Jupyter Notebook* is dedicated to the study of the **OpenAI API** and **Speech Recognition Algorithms** to create an interactive chatbot. 

The following study will be developed by the implementation of *Python* code using *OpenAI* API and *speech_recognition* library. 

---

In [5]:
# Importing modules...
from openai import OpenAI
from openai import OpenAIError
import speech_recognition as sr

import sys
sys.path.append('..') # Go back to base directory

# Make a `openai_key.py` in `modules\` and insert your token as a string to the openai_key variable
from modules.openai_key import openai_key 

## Large Language Models

A **Large Language Model (LLM)** is an advanced artificial intelligence system trained on vast amounts of text data. LLMs, such as OpenAI's GPT models, can generate human-like responses, assist in writing, summarize content, and much more. They process text input and predict the most likely next words based on the context provided. 

The definition above for instance was generated by ChatGPT-4!

---

## OpenAI's API

To use OpenAI's API in Python or other applications, you need an API key. However, **OpenAI’s API is not free**—you need to add a payment method and will be charged based on usage. For this:

1. **Go to OpenAI’s website: https://openai.com**
2. **Click Sign Up (or Log In if you already have an account)**
3. **Set Up a Payment Method**
    1. Different models have different costs per 1,000 tokens (Check current pricing here: https://openai.com/pricing);
    2. Visit Billing Settings: https://platform.openai.com/account/billing;
    3. Click "Add Payment Method" and enter your credit/debit card details;
    4. OpenAI may require you to set up prepaid credits or link a card for pay-as-you-go billing;
    5. Once your payment method is verified, you’re ready to generate API keys.
4. **Generate an API Key**
    1. Visit API Keys: https://platform.openai.com/account/api-keys;
    2. Click "Create a new secret key";
    3. Copy & store the key securely—OpenAI won’t show it again!

You're now ready to use it!

---

In [6]:
class LargeLanguageModel:
    def __init__(
        self, 
        verbose=False
    ):
        self.verbose = verbose # Toggle for debug messages

        try:
            # Start listening
            self.client = OpenAI(
                api_key=openai_key
            )

            if self.verbose: print("Client successfuly created!")

        except:
            if self.verbose: print("Could not acquire Client.")

    def chat_completion(
        self, 
        system_role,
        prompt, 
        model="gpt-4o-mini", 
        max_tokens=10,
        randomness=0,
        n_responses=1
    ):
        # Generate messages
        messages = [
            {
                "role": "system", 
                "content": system_role
            },
            {
                "role": "user", 
                "content": prompt
            }
        ]

        try:
            if self.verbose: print("Sending message...")

            # Create a request to the GPT model
            completion = self.client.chat.completions.create( 
                model=model,
                messages=messages,
                max_tokens=max_tokens,
                temperature=randomness,
                n=n_responses
            )

            if self.verbose: print("Received response.")

            # Extract response from completion first choice
            return completion.choices[0].message.content

        except OpenAIError as e:
            if self.verbose: print(f"Chat completion failed: {e}.")

            return "" # Return empty string

## Speech Recognizer

A **Speech Recognizer** is a system that converts spoken language into text by analyzing audio input. It uses machine learning models and signal processing techniques to identify words and phrases from speech. Speech recognition technology is commonly used in voice assistants, transcription services, and hands-free control systems. It can operate in real-time or process recorded audio, adapting to different accents, languages, and noise conditions.

The following class works by:
1. **Start Listening**
    - Adjusts for ambient noise to improve accuracy;
    - Starts a background process to listen to audio using `listen_in_background`;
    - Recognized speech is processed via `process_audio`.
2. **Speech Processing**
    - Converts audio into text using Google's Speech API;
    - If successful, adds the recognized text to a speech queue.
4. Retrieve Speech (get_speech Method)
    - Returns and removes the first recognized speech from the queue when `get_speech` is called;
    - If no speech is available, returns an empty string `("")`.
4. **Stop Listening**
    - Stops background listening when `stop_listening` is called;
    - Clears the listening function to prevent memory leaks.

In [7]:
class SpeechRecognizer:
    def __init__(
        self, 
        language="en-US", 
        verbose=True    
    ):
        self.verbose = verbose   # Toggle for debug messages
        self.is_listening = True # Toggle to turn on and off the recognizer
        self.language = language # Recognizer language

        # Create a recognizer and microphone instance
        self.recognizer = sr.Recognizer()
        self.microphone = sr.Microphone()
        
        # Queue of recognized speech messages
        self.speech_queue = []

        # Listen in the background stop function
        self.stop_listen_in_background = None

    def process_audio(self, _, audio):
        try:
            # Recognize the speech using Google's Speech Recognition API
            recognized_speech = self.recognizer.recognize_google(audio, language=self.language)
            self.speech_queue.append(recognized_speech)

        except sr.UnknownValueError:
            if self.verbose: print("Sorry, I didn't catch that. Can you repeat?")

        except sr.RequestError:
            if self.verbose: print(f"Could not request results from Google Speech Recognition service")

    def start_listening(self):
        # Use the microphone as the audio source
        with self.microphone as source:
            # Adjust for ambient noise
            if self.verbose: print("Adjusting for ambient noise...")
            self.recognizer.adjust_for_ambient_noise(source, duration=1)
            
        # Start listening
        if self.verbose: print("Listening for speech...")
        self.stop_listen_in_background = self.recognizer.listen_in_background(self.microphone, self.process_audio)

    def stop_listening(self):
        if self.stop_listen_in_background is not None:
            # Stop listen_in_background and toggle its function flag to false
            self.stop_listen_in_background()
            self.stop_listen_in_background = None

            if self.verbose: print("Stopped listening.")

    def get_speech(self):
        # Delete and return the first element of the queue
        if self.speech_queue:
            return self.speech_queue.pop()
        
        # Return an empty string if the queue is empty
        else:
            return ""

In [8]:
SR = SpeechRecognizer(
    language="en-US",
    verbose=True
) 

LLM = LargeLanguageModel(
    verbose=False
)

system_role = """
You are a voice-based AI chatbot that engages in friendly and natural conversations.  
You listen to user speech inputs and respond clearly and concisely.  
If the user hints they don't want to continue (e.g., "I'm done," "Goodbye," "I don't want to talk anymore"), respond only with: "Bye Bye!"  
Otherwise, keep the conversation flowing naturally.  
Your tone should be polite, engaging, and easy to understand in spoken form.
"""

recognized_speech, response = "", ""

SR.start_listening()

while response != "Bye Bye!":
    recognized_speech = SR.get_speech()

    if recognized_speech:
        print(f"User: '{recognized_speech}'")

    else:
        continue

    response = LLM.chat_completion(
        system_role=system_role,
        prompt=recognized_speech,
        max_tokens=100
    )

    print(f"LLM: '{response}'")

SR.stop_listening()

Adjusting for ambient noise...
Listening for speech...
User: 'hello how are you'
LLM: 'Hello! I'm doing well, thank you! How about you?'
User: 'do you like being the chatbot'
LLM: 'I really enjoy being a chatbot! It’s great to chat and help out with questions or just have a friendly conversation. What about you? Do you like chatting with chatbots?'
User: 'okay I got to go'
LLM: 'Bye Bye!'
Stopped listening.
