# Live Voice AI Assistant with Memory (OpenAI)

## Objective

This notebook demonstrates how to build a real-time Voice AI Assistant that:

- Records live audio from microphone
- Converts Speech â†’ Text using Whisper
- Maintains conversation memory
- Generates contextual GPT responses
- Supports continuous interaction

---

## Architecture Overview

Microphone  
   â†“  
Whisper (Speech-to-Text)  
   â†“  
Conversation Memory  
   â†“  
GPT Model  
   â†“  
Assistant Response  

---

## Models Used

- `whisper-1` â†’ Speech Recognition
- `gpt-4.1-nano` â†’ Fast conversational model

This notebook simulates a mini production-level voice assistant.

# Install Dependencies

In [None]:
!pip install openai soundfile

# Secure API Setup

In [None]:
# Load API credentials securely from Colab
from google.colab import userdata

OPENAI_API_KEY = userdata.get("OPENAI_API_KEY")
OPENAI_BASE_URL = userdata.get("OPENAI_BASE_URL")

# Import OpenAI SDK
from openai import OpenAI

# Initialize client
client = OpenAI(
    api_key=OPENAI_API_KEY,
    base_url=OPENAI_BASE_URL
)

print("OpenAI Client Initialized Successfully")

# Step 1: Initialize Conversation Memory

We maintain a `memory` list that stores:

- System instructions
- User messages
- Assistant responses

This enables contextual, multi-turn conversation.

In [None]:
memory = [
    {
        "role": "system",
        "content": "You are a helpful voice assistant. Remember user information and respond contextually."
    }
]

print("Memory Initialized")

# Step 2: Record Live Audio from Microphone

This function:

- Uses JavaScript inside Colab
- Accesses browser microphone
- Records for a fixed duration
- Saves audio as .wav file

In [None]:
from IPython.display import Javascript, display
from google.colab.output import eval_js
import base64

def record_audio(seconds=5, filename="recorded_audio.webm"):

    print("ðŸŽ¤ Recording... Speak clearly!")

    display(Javascript("""
    async function recordAudio(seconds) {
      const stream = await navigator.mediaDevices.getUserMedia({audio: true});
      const recorder = new MediaRecorder(stream);
      let chunks = [];
      recorder.ondataavailable = e => chunks.push(e.data);
      recorder.start();
      await new Promise(resolve => setTimeout(resolve, seconds * 1000));
      recorder.stop();
      await new Promise(resolve => recorder.onstop = resolve);
      const blob = new Blob(chunks, {type: 'audio/webm'});
      const arrayBuffer = await blob.arrayBuffer();
      return btoa(
        new Uint8Array(arrayBuffer)
          .reduce((data, byte) => data + String.fromCharCode(byte), '')
      );
    }
    """))

    audio_base64 = eval_js(f"recordAudio({seconds})")
    audio_bytes = base64.b64decode(audio_base64)

    with open(filename, "wb") as f:
        f.write(audio_bytes)

    print("âœ… Recording Saved (webm format)")
    return filename

# Step 3: Convert Speech to Text (Whisper)

This function sends recorded audio to the Whisper model
and returns the transcribed text.

In [None]:
def speech_to_text(audio_path):

    with open(audio_path, "rb") as audio_file:

        transcription = client.audio.transcriptions.create(
            model="whisper-1",
            file=audio_file,
            temperature=0
        )

    return transcription.text.strip()

# Step 4: Generate Assistant Response with Memory

This function:

1. Appends user message to memory
2. Sends full memory to GPT
3. Stores assistant reply
4. Returns the response

In [None]:
def get_assistant_response(user_text):

    memory.append({
        "role": "user",
        "content": user_text
    })

    response = client.chat.completions.create(
        model="gpt-4.1-nano",   # upgraded from nano â†’ more accurate
        messages=memory,
        temperature=0.3   # controlled creativity
    )

    assistant_text = response.choices[0].message.content.strip()

    memory.append({
        "role": "assistant",
        "content": assistant_text
    })

    return assistant_text

# Step 5: Continuous Live Voice Interaction

This loop:

- Waits for user command
- Records live audio
- Converts to text
- Generates contextual response
- Continues until user exits

In [None]:
while True:

    command = input("\nType 'speak' to talk or 'quit' to exit: ")

    if command.lower() == "quit":
        print("Exiting Voice Assistant")
        break

    if command.lower() == "speak":

        audio_file = record_audio(seconds=5)

        user_text = speech_to_text(audio_file)
        print("\n You said:", user_text)

        assistant_reply = get_assistant_response(user_text)
        print("\n Assistant:", assistant_reply)

# Final Observations

## What This Notebook Achieves

- Real-time microphone recording
- Speech-to-text conversion using Whisper
- Memory-based contextual conversation
- Continuous voice interaction

---

## Key Learning Points

1. Memory enables multi-turn intelligent dialogue
2. Whisper accurately converts speech to text
3. GPT models handle context effectively
4. JavaScript enables microphone access in Colab

---

## Possible Improvements

- Add Text-to-Speech (assistant speaks back)
-  Add streaming GPT responses
- Add long-term memory using database
- Convert into Flask/FastAPI backend
- Convert into AI Agent architecture

---

## Conclusion

This notebook demonstrates a production-style Voice AI assistant
that listens, remembers, and responds intelligently in real-time.