<a href="https://colab.research.google.com/github/farmountain/SmartGlass-AI-Agent/blob/main/colab_notebooks/Session6_RealTime_Audio_Streaming.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# 🎙️ Session 06: Real-Time Audio Streaming

In this session, we'll simulate real-time audio capture from a smart glass microphone, process it in short chunks, and transcribe it with Whisper.

In production, this would run continuously on-device or stream via low-latency socket.

In [1]:
# ✅ Install core libraries
!pip install -q sounddevice scipy openai-whisper

[?25l     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/803.2 kB[0m [31m?[0m eta [36m-:--:--[0m[2K     [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[91m╸[0m[90m━━━━━━━━━[0m [32m604.2/803.2 kB[0m [31m18.3 MB/s[0m eta [36m0:00:01[0m[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m803.2/803.2 kB[0m [31m14.1 MB/s[0m eta [36m0:00:00[0m
[?25h  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
  Preparing metadata (pyproject.toml) ... [?25l[?25hdone
  Building wheel for openai-whisper (pyproject.toml) ... [?25l[?25hdone


In [3]:
# ✅ Record short audio from microphone (10 seconds)
# import sounddevice as sd # Cannot record directly from microphone in Colab
from scipy.io.wavfile import write
import whisper
import numpy as np # Import numpy to generate dummy audio

duration = 10  # seconds
fs = 16000
# print("🔴 Recording for 10 seconds...") # Recording is not possible
# audio = sd.rec(int(duration * fs), samplerate=fs, channels=1, dtype='int16')
# sd.wait()

# Generate dummy silent audio for demonstration
audio = np.zeros(int(duration * fs), dtype='int16')

write('realtime_input.wav', fs, audio)
print("✅ Saved dummy silent audio as realtime_input.wav") # Indicate dummy file is saved

✅ Saved dummy silent audio as realtime_input.wav


In [4]:
# ✅ Transcribe audio using Whisper
model = whisper.load_model("base")
result = model.transcribe("realtime_input.wav")
print("🗣️ Transcribed Text:", result['text'])

100%|████████████████████████████████████████| 139M/139M [00:01<00:00, 134MiB/s]


🗣️ Transcribed Text: 


### 💡 Notes for Production

- Use `pyaudio`, `ffmpeg`, or `webrtcvad` for true streaming.
- Whisper's newer models support partial transcription.
- Use socket streaming or direct hardware microphone access on real smart glasses (like Ray-Ban Meta SDK when it is available).