SignFlow — Real-Time Bidirectional Sign Language Interpreter

SignFlow is the first real-time bidirectional ASL interpreter built on Stream's Vision Agents SDK. It bridges communication between deaf/hard-of-hearing signers and hearing speakers inside a live video call.

Built for the WeMakeDevs "Vision Possible: Agent Protocol" hackathon.

What It Does

Sign → Speech

A deaf user signs to their webcam. The AI agent watches the video feed, recognizes ASL signs using Gemini's multimodal vision + YOLO Pose skeleton tracking, and speaks the English translation aloud. The translation also appears as text in the side panel.

Speech → Sign

A hearing user speaks normally. The agent transcribes their speech, converts it to ASL sign-by-sign descriptions (handshape, location, movement, facial expressions), and displays the instructions in the UI so the deaf user knows what was said.

Tech Stack

Layer	Technology	Role
Video Transport	Stream Edge Network	WebRTC, <30ms latency
Sign Recognition	Gemini Realtime (5 FPS)	Sees and interprets signs from video
Pose Detection	YOLO Pose (`yolo11n-pose.pt`)	Skeleton overlay + structural data
Speech I/O	Gemini Realtime	Native audio in/out
Backend	Vision Agents SDK (Python)	Orchestrates everything
Frontend	React + `@stream-io/video-react-sdk`	Video call UI + interpreter panels

Quick Start

1. Get API Keys

Service	Sign Up
Stream	getstream.io
Google AI (Gemini)	aistudio.google.com

2. Backend Setup

cd backend

# Create .env with your API keys
cp .env .env.example  # Edit .env with real keys

# Install dependencies
uv sync

# Generate a frontend token
uv run generate_token.py

# Start the agent
uv run agent.py run

3. Frontend Setup

cd frontend

# Add the token from generate_token.py to .env
# Edit .env with VITE_STREAM_API_KEY, VITE_STREAM_TOKEN, VITE_STREAM_USER_ID

# Install dependencies
npm install

# Start the dev server
npm run dev

4. Use It

Open the frontend at http://localhost:5173
Enter a Call ID and click "Join Call"
The agent will automatically join the same call
Start signing or speaking!

Project Structure

WeMakeDevs4/
├── backend/
│   ├── agent.py              # Main Vision Agent
│   ├── instructions.md       # ASL knowledge base system prompt
│   ├── sign_processor.py     # Custom YOLO Pose processor
│   ├── generate_token.py     # Stream token generator
│   └── pyproject.toml        # Python dependencies
│
├── frontend/
│   ├── src/
│   │   ├── App.tsx           # Root with StreamVideo provider
│   │   ├── components/
│   │   │   ├── CallSetup.tsx         # Join/create call UI
│   │   │   ├── VideoCall.tsx         # Main layout (video + panels)
│   │   │   ├── SignToSpeechPanel.tsx  # Detected signs + translation
│   │   │   ├── SpeechToSignPanel.tsx  # Sign instructions
│   │   │   ├── TranscriptLog.tsx     # Conversation history
│   │   │   ├── ModeToggle.tsx        # Switch modes
│   │   │   └── StatusIndicator.tsx   # Connection state
│   │   └── hooks/
│   │       ├── useSignEvents.ts    # Custom event subscription
│   │       └── useTranscript.ts    # Transcript state management
│   └── package.json
│
└── README.md

How It Works

┌─────────────┐    WebRTC    ┌──────────────┐    Gemini    ┌─────────────┐
│  React App  │◄────────────►│ Stream Edge  │◄────────────►│  AI Agent   │
│  (Browser)  │  Video/Audio │   Network    │  Video/Audio │  (Python)   │
└──────┬──────┘              └──────────────┘              └──────┬──────┘
       │                                                          │
       │  Custom Events                              YOLO Pose +  │
       │  (sign_detected,                            Gemini Vision │
       │   sign_guide,                                            │
       │   transcript)                                            │
       └──────────────────────◄───────────────────────────────────┘

Demo Script

Sign "HELLO" → Agent speaks "Hello" + text appears in Sign→Speech panel
Sign "MY NAME" + fingerspell → Agent speaks your name
Sign "HOW ARE YOU" → Agent translates with proper grammar
Switch to Speech→Sign → Speak "Thank you" → Panel shows THANK-YOU sign instructions
Both mode → Full bidirectional conversation

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
backend		backend
frontend		frontend
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SignFlow — Real-Time Bidirectional Sign Language Interpreter

What It Does

Sign → Speech

Speech → Sign

Tech Stack

Quick Start

1. Get API Keys

2. Backend Setup

3. Frontend Setup

4. Use It

Project Structure

How It Works

Demo Script

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

SignFlow — Real-Time Bidirectional Sign Language Interpreter

What It Does

Sign → Speech

Speech → Sign

Tech Stack

Quick Start

1. Get API Keys

2. Backend Setup

3. Frontend Setup

4. Use It

Project Structure

How It Works

Demo Script

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages