Cal Glasses — Hands-Free Calorie Logging with Meta Ray-Ban Glasses

Log calories by just looking at your food. Cal Glasses pairs Meta Ray-Ban smart glasses with Cal AI to automatically identify food, estimate nutrition, and upload photos to Cal AI — all hands-free.

How It Works

Wear your Meta Ray-Ban glasses and open the app
Say "log food" (or tap the button) to start a 10-second recording
Look at your food — the glasses capture high-quality photos
Describe what you're eating — "I had two RX Bars" (optional, improves accuracy)
Gemini AI analyzes the photos + transcript → identifies food, reads nutrition labels, estimates calories
Photos are automatically uploaded to Cal AI via Appium automation on your Mac

Meta Glasses → capture photos → Gemini analyzes → save to camera roll → Appium uploads to Cal AI

Demo

https://x.com/mohul_shukla/status/2037226258459656246

Requirements

Hardware

Meta Ray-Ban glasses (any model with camera)
iPhone (iOS 17+, A13 chip or later)
Mac (for Appium automation server)
USB cable connecting iPhone to Mac

Software

Xcode 15+ (with iOS 17+ SDK)
Node.js (for Appium)
Python 3.8+
Meta AI app on iPhone (for glasses pairing)
Cal AI app on iPhone

API Keys

Google Gemini API key — get one at ai.google.dev

Setup

Step 1: Clone and Open in Xcode

git clone https://github.com/launchpad-reflections/cal-glasses.git
cd cal-glasses
open ActiveSpeaker/ActiveSpeaker.xcodeproj

Step 2: Download Moonshine Models (~160MB)

The on-device speech-to-text models are not included in git:

curl -L -o /tmp/ios-examples.tar.gz \
  https://github.com/moonshine-ai/moonshine/releases/latest/download/ios-examples.tar.gz
tar -xzf /tmp/ios-examples.tar.gz -C /tmp Transcriber/models/
cp -r /tmp/Transcriber/models/small-streaming-en ActiveSpeaker/small-streaming-en

In Xcode, verify small-streaming-en appears in the project navigator. If not, drag the folder in (Create folder references → Add to targets: ActiveSpeaker).

Step 3: Add Meta Wearables SDK

In Xcode: File → Add Package Dependencies → paste:

https://github.com/facebook/meta-wearables-dat-ios

Set version to 0.5.0+, add both MWDATCore and MWDATCamera to the ActiveSpeaker target.

Step 4: Configure API Key

Open ActiveSpeaker/ActiveSpeaker/Glasses/GlassesStreamManager.swift and replace:

private let gemini = GeminiService(apiKey: "YOUR_GEMINI_API_KEY")

Step 5: Configure Signing

In Xcode:

Select the ActiveSpeaker target → Signing & Capabilities
Set your Team (Apple ID)
Change Bundle Identifier to something unique (e.g., com.yourname.calglasses)

Step 6: Build and Run

Connect your iPhone via USB
Select your iPhone as the build target
Cmd+R to build and run
Grant camera, microphone, and Bluetooth permissions when prompted

Step 7: Connect Glasses

Tap Connect Glasses in the app
You'll be redirected to the Meta AI app to authorize
Return to the app — glasses should show as connected
Tap Start Glasses to begin streaming

Cal AI Automation (Appium)

This is the part that automatically uploads food photos to Cal AI on your iPhone.

Step 7: Install Appium

cd appium
npm install -g appium
appium driver install xcuitest
pip install -r requirements.txt

Step 8: Configure Device

Find your device UDID and Team ID:

xcrun xctrace list devices

Edit appium/config.py:

DEVICE_UDID = "YOUR_DEVICE_UDID"
TEAM_ID = "YOUR_TEAM_ID"
CAL_AI_BUNDLE_ID = "com.viraldevelopment.CalAI"  # This is Cal AI's bundle ID

Step 9: Install WebDriverAgent

WebDriverAgent is a test runner that Appium uses to control your iPhone's UI.

open ~/.appium/node_modules/appium-xcuitest-driver/node_modules/appium-webdriveragent/WebDriverAgent.xcodeproj

In Xcode:

Select WebDriverAgentRunner target
Signing & Capabilities → set your Team
Change Bundle Identifier to com.yourname.WebDriverAgentRunner
Select your iPhone as device
Cmd+U (Product → Test) to build and install WDA

Step 10: Enable iPhone Developer Settings

On your iPhone:

Settings → Privacy & Security → Developer Mode → ON
Settings → Developer → Enable UI Automation → ON

Step 11: Start the Automation Stack

You need 4 terminal tabs running:

Terminal 1 — Tunnel (required for iOS 17+):

sudo pymobiledevice3 remote start-tunnel

Terminal 2 — Appium server:

appium

Terminal 3 — WebDriverAgent: Open WebDriverAgent.xcodeproj in Xcode → Cmd+U

Terminal 4 — Automation server:

cd appium
python server.py

Step 12: Test

Test the Cal AI upload independently:

cd appium
python cal_ai_automate.py upload        # Upload most recent photo
python cal_ai_automate.py upload 2      # Upload 2nd most recent
python cal_ai_automate.py upload 3 --inclusive  # Upload last 3 photos

Full End-to-End Flow

With everything running:

Open the app on your iPhone
Connect glasses → Start streaming
Say "log food" or tap Log Food
Look at your food for 10 seconds while describing it
Gemini analyzes → identifies items → saves photos
App automatically triggers Mac server → Appium opens Cal AI → uploads photos
Cal AI analyzes and logs the food

Architecture

┌─────────────────────────────────────────────┐
│           Meta Ray-Ban Glasses               │
│  Camera (720p@24fps) + Microphone (HFP 8kHz)│
└──────────────┬──────────────────────────────┘
               │ Bluetooth
┌──────────────▼──────────────────────────────┐
│              iPhone App                       │
│                                               │
│  Stream video ──► Live display                │
│  Capture photos ──► High-quality JPEGs        │
│  Audio ──► SileroVAD + Moonshine transcription│
│                                               │
│  "log food" detected ──► 10s recording        │
│  Photos + transcript ──► Gemini 2.5 Flash     │
│  Results ──► Save photos to camera roll       │
│  HTTP POST ──► Mac automation server          │
└──────────────┬──────────────────────────────┘
               │ HTTP (local network)
┌──────────────▼──────────────────────────────┐
│              Mac (server.py)                  │
│                                               │
│  Receives upload request ──► runs Appium      │
│  Appium ──► WebDriverAgent on iPhone          │
│  WDA ──► Opens Cal AI                         │
│  WDA ──► Tap + → Scan food → Photo → Select  │
│  Cal AI analyzes and logs the food            │
└─────────────────────────────────────────────┘

Project Structure

cal-glasses/
├── ActiveSpeaker/                  # iOS app (Xcode project)
│   ├── ActiveSpeaker/
│   │   ├── App/                    # Entry point, Info.plist
│   │   ├── Glasses/                # Meta glasses integration
│   │   │   ├── GlassesStreamManager.swift    # Stream + food logging orchestration
│   │   │   ├── GlassesConnectionManager.swift # Glasses pairing
│   │   │   ├── GlassesAudioCapture.swift     # HFP Bluetooth mic
│   │   │   ├── GlassesSpeaker.swift          # TTS to glasses speakers
│   │   │   ├── FoodRecordingBuffer.swift     # 10s frame + transcript buffer
│   │   │   └── FrameDeduplicator.swift       # Perceptual hash dedup
│   │   ├── Services/
│   │   │   ├── GeminiService.swift           # Gemini API client
│   │   │   └── CalAITriggerService.swift     # HTTP trigger to Mac
│   │   ├── Models/
│   │   │   └── FoodAnalysisResult.swift      # Structured food data
│   │   ├── Pipeline/                # Audio/video processing
│   │   ├── Processors/              # VAD, transcription, face detection
│   │   └── UI/
│   │       └── CalGlassesView.swift          # Main UI
│   ├── MobileFaceNet.mlpackage/
│   └── silero_vad.onnx
├── appium/                         # Cal AI automation
│   ├── cal_ai_automate.py          # Main automation script
│   ├── server.py                   # HTTP server for app triggers
│   ├── config.py                   # Device config
│   ├── setup.sh                    # One-time installation
│   └── APPIUM.md                   # Detailed Appium docs
├── scripts/
│   └── tts.py                      # Text-to-speech generator
├── PIPELINE.md                     # Food logging pipeline docs
├── MODELS.md                       # Model download instructions
└── CLAUDE.md                       # AI assistant context

Configuration Flags

In GlassesStreamManager.swift:

Flag	Default	Description
`DEFAULT_CAMERA`	`false`	`true`: use capturePhoto() for high-quality images. `false`: use deduplicated stream frames
`TEST`	`true`	`true`: skip camera roll save, always upload last 2 photos. `false`: normal flow

Tech Stack

Swift / SwiftUI — iOS app
Meta MWDAT SDK 0.5.0 — glasses connectivity and streaming
Gemini 2.5 Flash — multimodal food analysis (images + text)
Moonshine v2 — on-device speech-to-text
Silero VAD — voice activity detection (ONNX Runtime)
MobileFaceNet — face recognition (CoreML)
Appium + XCUITest — iOS UI automation
pymobiledevice3 — iOS 17+ device tunnel
Python — automation server and scripts

License

MIT

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Cal Glasses — Hands-Free Calorie Logging with Meta Ray-Ban Glasses

How It Works

Demo

Requirements

Hardware

Software

API Keys

Setup

Step 1: Clone and Open in Xcode

Step 2: Download Moonshine Models (~160MB)

Step 3: Add Meta Wearables SDK

Step 4: Configure API Key

Step 5: Configure Signing

Step 6: Build and Run

Step 7: Connect Glasses

Cal AI Automation (Appium)

Step 7: Install Appium

Step 8: Configure Device

Step 9: Install WebDriverAgent

Step 10: Enable iPhone Developer Settings

Step 11: Start the Automation Stack

Step 12: Test

Full End-to-End Flow

Architecture

Project Structure

Configuration Flags

Tech Stack

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 29 Commits
ActiveSpeaker		ActiveSpeaker
appium		appium
scripts		scripts
.gitignore		.gitignore
CLAUDE.md		CLAUDE.md
MODELS.md		MODELS.md
PIPELINE.md		PIPELINE.md
README.md		README.md

Folders and files

Latest commit

History

Repository files navigation

Cal Glasses — Hands-Free Calorie Logging with Meta Ray-Ban Glasses

How It Works

Demo

Requirements

Hardware

Software

API Keys

Setup

Step 1: Clone and Open in Xcode

Step 2: Download Moonshine Models (~160MB)

Step 3: Add Meta Wearables SDK

Step 4: Configure API Key

Step 5: Configure Signing

Step 6: Build and Run

Step 7: Connect Glasses

Cal AI Automation (Appium)

Step 7: Install Appium

Step 8: Configure Device

Step 9: Install WebDriverAgent

Step 10: Enable iPhone Developer Settings

Step 11: Start the Automation Stack

Step 12: Test

Full End-to-End Flow

Architecture

Project Structure

Configuration Flags

Tech Stack

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages