Skip to content

manjunath5513/VisionProtocol

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 

Repository files navigation

SignFlow — Real-Time Bidirectional Sign Language Interpreter

SignFlow is the first real-time bidirectional ASL interpreter built on Stream's Vision Agents SDK. It bridges communication between deaf/hard-of-hearing signers and hearing speakers inside a live video call.

Built for the WeMakeDevs "Vision Possible: Agent Protocol" hackathon.

What It Does

Sign → Speech

A deaf user signs to their webcam. The AI agent watches the video feed, recognizes ASL signs using Gemini's multimodal vision + YOLO Pose skeleton tracking, and speaks the English translation aloud. The translation also appears as text in the side panel.

Speech → Sign

A hearing user speaks normally. The agent transcribes their speech, converts it to ASL sign-by-sign descriptions (handshape, location, movement, facial expressions), and displays the instructions in the UI so the deaf user knows what was said.

Tech Stack

Layer Technology Role
Video Transport Stream Edge Network WebRTC, <30ms latency
Sign Recognition Gemini Realtime (5 FPS) Sees and interprets signs from video
Pose Detection YOLO Pose (yolo11n-pose.pt) Skeleton overlay + structural data
Speech I/O Gemini Realtime Native audio in/out
Backend Vision Agents SDK (Python) Orchestrates everything
Frontend React + @stream-io/video-react-sdk Video call UI + interpreter panels

Quick Start

1. Get API Keys

Service Sign Up
Stream getstream.io
Google AI (Gemini) aistudio.google.com

2. Backend Setup

cd backend

# Create .env with your API keys
cp .env .env.example  # Edit .env with real keys

# Install dependencies
uv sync

# Generate a frontend token
uv run generate_token.py

# Start the agent
uv run agent.py run

3. Frontend Setup

cd frontend

# Add the token from generate_token.py to .env
# Edit .env with VITE_STREAM_API_KEY, VITE_STREAM_TOKEN, VITE_STREAM_USER_ID

# Install dependencies
npm install

# Start the dev server
npm run dev

4. Use It

  1. Open the frontend at http://localhost:5173
  2. Enter a Call ID and click "Join Call"
  3. The agent will automatically join the same call
  4. Start signing or speaking!

Project Structure

WeMakeDevs4/
├── backend/
│   ├── agent.py              # Main Vision Agent
│   ├── instructions.md       # ASL knowledge base system prompt
│   ├── sign_processor.py     # Custom YOLO Pose processor
│   ├── generate_token.py     # Stream token generator
│   └── pyproject.toml        # Python dependencies
│
├── frontend/
│   ├── src/
│   │   ├── App.tsx           # Root with StreamVideo provider
│   │   ├── components/
│   │   │   ├── CallSetup.tsx         # Join/create call UI
│   │   │   ├── VideoCall.tsx         # Main layout (video + panels)
│   │   │   ├── SignToSpeechPanel.tsx  # Detected signs + translation
│   │   │   ├── SpeechToSignPanel.tsx  # Sign instructions
│   │   │   ├── TranscriptLog.tsx     # Conversation history
│   │   │   ├── ModeToggle.tsx        # Switch modes
│   │   │   └── StatusIndicator.tsx   # Connection state
│   │   └── hooks/
│   │       ├── useSignEvents.ts    # Custom event subscription
│   │       └── useTranscript.ts    # Transcript state management
│   └── package.json
│
└── README.md

How It Works

┌─────────────┐    WebRTC    ┌──────────────┐    Gemini    ┌─────────────┐
│  React App  │◄────────────►│ Stream Edge  │◄────────────►│  AI Agent   │
│  (Browser)  │  Video/Audio │   Network    │  Video/Audio │  (Python)   │
└──────┬──────┘              └──────────────┘              └──────┬──────┘
       │                                                          │
       │  Custom Events                              YOLO Pose +  │
       │  (sign_detected,                            Gemini Vision │
       │   sign_guide,                                            │
       │   transcript)                                            │
       └──────────────────────◄───────────────────────────────────┘

Demo Script

  1. Sign "HELLO" → Agent speaks "Hello" + text appears in Sign→Speech panel
  2. Sign "MY NAME" + fingerspell → Agent speaks your name
  3. Sign "HOW ARE YOU" → Agent translates with proper grammar
  4. Switch to Speech→Sign → Speak "Thank you" → Panel shows THANK-YOU sign instructions
  5. Both mode → Full bidirectional conversation

License

MIT

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors