Log calories by just looking at your food. Cal Glasses pairs Meta Ray-Ban smart glasses with Cal AI to automatically identify food, estimate nutrition, and upload photos to Cal AI — all hands-free.
- Wear your Meta Ray-Ban glasses and open the app
- Say "log food" (or tap the button) to start a 10-second recording
- Look at your food — the glasses capture high-quality photos
- Describe what you're eating — "I had two RX Bars" (optional, improves accuracy)
- Gemini AI analyzes the photos + transcript → identifies food, reads nutrition labels, estimates calories
- Photos are automatically uploaded to Cal AI via Appium automation on your Mac
Meta Glasses → capture photos → Gemini analyzes → save to camera roll → Appium uploads to Cal AI
https://x.com/mohul_shukla/status/2037226258459656246
- Meta Ray-Ban glasses (any model with camera)
- iPhone (iOS 17+, A13 chip or later)
- Mac (for Appium automation server)
- USB cable connecting iPhone to Mac
- Xcode 15+ (with iOS 17+ SDK)
- Node.js (for Appium)
- Python 3.8+
- Meta AI app on iPhone (for glasses pairing)
- Cal AI app on iPhone
- Google Gemini API key — get one at ai.google.dev
git clone https://github.com/launchpad-reflections/cal-glasses.git
cd cal-glasses
open ActiveSpeaker/ActiveSpeaker.xcodeprojThe on-device speech-to-text models are not included in git:
curl -L -o /tmp/ios-examples.tar.gz \
https://github.com/moonshine-ai/moonshine/releases/latest/download/ios-examples.tar.gz
tar -xzf /tmp/ios-examples.tar.gz -C /tmp Transcriber/models/
cp -r /tmp/Transcriber/models/small-streaming-en ActiveSpeaker/small-streaming-enIn Xcode, verify small-streaming-en appears in the project navigator. If not, drag the folder in (Create folder references → Add to targets: ActiveSpeaker).
In Xcode: File → Add Package Dependencies → paste:
https://github.com/facebook/meta-wearables-dat-ios
Set version to 0.5.0+, add both MWDATCore and MWDATCamera to the ActiveSpeaker target.
Open ActiveSpeaker/ActiveSpeaker/Glasses/GlassesStreamManager.swift and replace:
private let gemini = GeminiService(apiKey: "YOUR_GEMINI_API_KEY")In Xcode:
- Select the ActiveSpeaker target → Signing & Capabilities
- Set your Team (Apple ID)
- Change Bundle Identifier to something unique (e.g.,
com.yourname.calglasses)
- Connect your iPhone via USB
- Select your iPhone as the build target
- Cmd+R to build and run
- Grant camera, microphone, and Bluetooth permissions when prompted
- Tap Connect Glasses in the app
- You'll be redirected to the Meta AI app to authorize
- Return to the app — glasses should show as connected
- Tap Start Glasses to begin streaming
This is the part that automatically uploads food photos to Cal AI on your iPhone.
cd appium
npm install -g appium
appium driver install xcuitest
pip install -r requirements.txtFind your device UDID and Team ID:
xcrun xctrace list devicesEdit appium/config.py:
DEVICE_UDID = "YOUR_DEVICE_UDID"
TEAM_ID = "YOUR_TEAM_ID"
CAL_AI_BUNDLE_ID = "com.viraldevelopment.CalAI" # This is Cal AI's bundle IDWebDriverAgent is a test runner that Appium uses to control your iPhone's UI.
open ~/.appium/node_modules/appium-xcuitest-driver/node_modules/appium-webdriveragent/WebDriverAgent.xcodeprojIn Xcode:
- Select WebDriverAgentRunner target
- Signing & Capabilities → set your Team
- Change Bundle Identifier to
com.yourname.WebDriverAgentRunner - Select your iPhone as device
- Cmd+U (Product → Test) to build and install WDA
On your iPhone:
- Settings → Privacy & Security → Developer Mode → ON
- Settings → Developer → Enable UI Automation → ON
You need 4 terminal tabs running:
Terminal 1 — Tunnel (required for iOS 17+):
sudo pymobiledevice3 remote start-tunnelTerminal 2 — Appium server:
appiumTerminal 3 — WebDriverAgent: Open WebDriverAgent.xcodeproj in Xcode → Cmd+U
Terminal 4 — Automation server:
cd appium
python server.pyTest the Cal AI upload independently:
cd appium
python cal_ai_automate.py upload # Upload most recent photo
python cal_ai_automate.py upload 2 # Upload 2nd most recent
python cal_ai_automate.py upload 3 --inclusive # Upload last 3 photosWith everything running:
- Open the app on your iPhone
- Connect glasses → Start streaming
- Say "log food" or tap Log Food
- Look at your food for 10 seconds while describing it
- Gemini analyzes → identifies items → saves photos
- App automatically triggers Mac server → Appium opens Cal AI → uploads photos
- Cal AI analyzes and logs the food
┌─────────────────────────────────────────────┐
│ Meta Ray-Ban Glasses │
│ Camera (720p@24fps) + Microphone (HFP 8kHz)│
└──────────────┬──────────────────────────────┘
│ Bluetooth
┌──────────────▼──────────────────────────────┐
│ iPhone App │
│ │
│ Stream video ──► Live display │
│ Capture photos ──► High-quality JPEGs │
│ Audio ──► SileroVAD + Moonshine transcription│
│ │
│ "log food" detected ──► 10s recording │
│ Photos + transcript ──► Gemini 2.5 Flash │
│ Results ──► Save photos to camera roll │
│ HTTP POST ──► Mac automation server │
└──────────────┬──────────────────────────────┘
│ HTTP (local network)
┌──────────────▼──────────────────────────────┐
│ Mac (server.py) │
│ │
│ Receives upload request ──► runs Appium │
│ Appium ──► WebDriverAgent on iPhone │
│ WDA ──► Opens Cal AI │
│ WDA ──► Tap + → Scan food → Photo → Select │
│ Cal AI analyzes and logs the food │
└─────────────────────────────────────────────┘
cal-glasses/
├── ActiveSpeaker/ # iOS app (Xcode project)
│ ├── ActiveSpeaker/
│ │ ├── App/ # Entry point, Info.plist
│ │ ├── Glasses/ # Meta glasses integration
│ │ │ ├── GlassesStreamManager.swift # Stream + food logging orchestration
│ │ │ ├── GlassesConnectionManager.swift # Glasses pairing
│ │ │ ├── GlassesAudioCapture.swift # HFP Bluetooth mic
│ │ │ ├── GlassesSpeaker.swift # TTS to glasses speakers
│ │ │ ├── FoodRecordingBuffer.swift # 10s frame + transcript buffer
│ │ │ └── FrameDeduplicator.swift # Perceptual hash dedup
│ │ ├── Services/
│ │ │ ├── GeminiService.swift # Gemini API client
│ │ │ └── CalAITriggerService.swift # HTTP trigger to Mac
│ │ ├── Models/
│ │ │ └── FoodAnalysisResult.swift # Structured food data
│ │ ├── Pipeline/ # Audio/video processing
│ │ ├── Processors/ # VAD, transcription, face detection
│ │ └── UI/
│ │ └── CalGlassesView.swift # Main UI
│ ├── MobileFaceNet.mlpackage/
│ └── silero_vad.onnx
├── appium/ # Cal AI automation
│ ├── cal_ai_automate.py # Main automation script
│ ├── server.py # HTTP server for app triggers
│ ├── config.py # Device config
│ ├── setup.sh # One-time installation
│ └── APPIUM.md # Detailed Appium docs
├── scripts/
│ └── tts.py # Text-to-speech generator
├── PIPELINE.md # Food logging pipeline docs
├── MODELS.md # Model download instructions
└── CLAUDE.md # AI assistant context
In GlassesStreamManager.swift:
| Flag | Default | Description |
|---|---|---|
DEFAULT_CAMERA |
false |
true: use capturePhoto() for high-quality images. false: use deduplicated stream frames |
TEST |
true |
true: skip camera roll save, always upload last 2 photos. false: normal flow |
- Swift / SwiftUI — iOS app
- Meta MWDAT SDK 0.5.0 — glasses connectivity and streaming
- Gemini 2.5 Flash — multimodal food analysis (images + text)
- Moonshine v2 — on-device speech-to-text
- Silero VAD — voice activity detection (ONNX Runtime)
- MobileFaceNet — face recognition (CoreML)
- Appium + XCUITest — iOS UI automation
- pymobiledevice3 — iOS 17+ device tunnel
- Python — automation server and scripts
MIT