SeeHearBraille is an innovative iOS application designed to help deaf-blind users watch TV using their phone and a Bluetooth braille keyboard. The app combines computer vision, speech recognition, and AI to provide real-time descriptions of visual content and audio transcriptions.
This app addresses the unique needs of deaf-blind individuals by:
- Using the phone's camera to identify what's on TV
- Listening to TV audio and converting speech to text
- Providing braille-friendly text output for external braille keyboards
- Offering AI-powered scene descriptions
- Real-time Object Detection: Uses Core ML with DETR (Detection Transformer) model for semantic image segmentation
- Live Camera Feed: Continuous analysis of camera input every 5 seconds
- Object Recognition: Identifies and labels objects, people, and scenes in real-time
- Interactive Masking: Tap on detected objects to highlight them with visual overlays
- Real-time Transcription: Converts TV audio to text using Apple's Speech framework
- Braille Keyboard Support: Splits transcript text into chunks based on braille keyboard key count
- Continuous Listening: Background audio processing for seamless experience
- Location-Based Lookup: Find TV channels by state and city
- Category Filtering: Browse channels by News, Sports, Kids, or Education
- Channel Numbers: Get exact channel numbers for your TV remote
- Accessibility: VoiceOver-friendly interface for easy navigation
- VoiceOver Compatible: Full accessibility support for screen readers
- Braille Integration: Designed for external braille keyboard connectivity
- Haptic Feedback: Tactile responses for user interactions
- Customizable Text Chunking: Configurable text splitting for different braille devices
The project contains the main SeeHearBraille application:
- Entry Point:
SegmentationApp.swift- Main app entry point withIntroView - Welcome Screen:
IntroView- Privacy policy agreement and app introduction - Main Interface:
MyTabView- Tab-based navigation with three main features - Core Features:
- Speech Converter (
SpeechView) - Real-time speech recognition and braille text output - Scene Description (
VisionView) - Camera-based object detection and AI scene analysis - Channel Finder (
ChannelView) - TV channel lookup by location and category
- Speech Converter (
- Note: This folder contains an earlier attempt at the app and is not the current implementation
- Status: Deprecated - kept for reference only
- iOS: 16.0 or later
- Device: iPhone with camera and microphone
- Storage: ~500MB for ML model and app data
- Camera Access: For object detection and scene analysis
- Microphone Access: For speech recognition
- Speech Recognition: For converting audio to text
- Camera: Back-facing camera for object detection
- Microphone: Built-in microphone for audio capture
- Bluetooth: For braille keyboard connectivity (optional)
- Xcode: Version 16.3 or later
- macOS: Latest version recommended
- Apple Developer Account: For device testing and App Store distribution
-
Clone the Repository
git clone https://github.com/saamerm/DeafBlind.git cd DeafBlind -
Open in Xcode
# Open the main SeeHearBraille app open deafblind/SeeHearBraille.xcodeproj -
Configure Project Settings
- Select your development team in Xcode
- Update bundle identifier if needed
- Configure signing certificates
-
Build and Run
- Select target device (iPhone or iPad recommended)
- Press
Cmd + Rto build and run
- Grant Permissions: Allow camera and microphone access when prompted
- Privacy Policy: Accept the privacy policy on first launch
- Model Download: The AI model will download automatically (one-time process)
- Launch the App: Open SeeHearBraille on your device
- Accept Privacy Policy: Review and accept the privacy policy
- Grant Permissions: Allow camera and microphone access
- Navigate to Scene Description Tab: Tap the "Scene Description" tab
- Point Camera: Aim your phone's camera at the TV screen or scene
- Wait for Analysis: The app analyzes the scene every 5 seconds using Core ML
- View Results: See detected objects and AI-generated scene descriptions
- Read Description: View natural language descriptions like "The TV is showing something with a person, chair, and table"
- Navigate to Speech Converter Tab: Tap the "Speech Converter" tab
- Configure Braille Keyboard: Toggle braille keyboard support if using one
- Set Key Count: Enter the number of keys on your braille device
- Start Transcription: Tap "Start Transcribing" to begin real-time speech recognition
- View Text: See real-time transcript or braille-formatted chunks
- Navigate to Channel Finder Tab: Tap the "Channel Finder" tab
- Select Location: Choose your state and city from the pickers
- Choose Category: Select News, Sports, Kids, or Education
- Find Channels: View available TV channels with their numbers
- Use Remote: Use the channel numbers to navigate with your TV remote
- Enable Braille Mode: Toggle "Using Braille Keyboard?" to ON
- Enter Key Count: Specify the number of keys on your device
- Connect Device: Pair your Bluetooth braille keyboard
- Receive Text: Text will be split into appropriate chunks for your device
- SwiftUI: Modern iOS user interface framework
- Core ML: Machine learning framework for on-device inference
- AVFoundation: Camera and audio processing
- Speech Framework: Real-time speech recognition
- Foundation Models: AI-powered text generation (iOS 26+)
- Model: DETR ResNet50 Semantic Segmentation F16P8
- Input: 448x448 RGB images
- Output: Semantic segmentation masks with object labels
- Performance: Optimized for mobile inference
- MVVM Pattern: Model-View-ViewModel architecture
- Async/Await: Modern Swift concurrency
- Combine: Reactive programming for UI updates
- Core Data: Local data persistence (if implemented)
- Deployment Target: iOS 16.0
- Swift Version: 5.9+
- Xcode Version: 16.3+
SegmentationApp.swift: Main app entry pointIntroView.swift: Welcome screen and privacy policyMyTabView.swift: Main navigation interface with three tabsSpeechView.swift: Speech recognition and braille text outputVisionView.swift: Camera-based scene analysis interfaceChannelView.swift: TV channel lookup functionalityMLViewModel.swift: Core ML processing logicSpeechRecognizer.swift: Audio transcription handlingMLMainView.swift: Advanced ML camera interface
# Debug build
xcodebuild -project SeeHearBraille.xcodeproj -scheme SegmentationApp -configuration Debug
# Release build
xcodebuild -project SeeHearBraille.xcodeproj -scheme SegmentationApp -configuration Release- On-Device Processing: All ML inference happens locally
- No Data Collection: No personal data is transmitted
- Privacy Policy: Available at the provided GitHub URL
- Permissions: Minimal required permissions only
- App Sandbox: Enabled for security isolation
- Secure Storage: Local data encryption
- Permission Management: Granular permission controls
- Fork the repository
- Create a feature branch
- Make your changes
- Test thoroughly
- Submit a pull request
- Follow Swift API Design Guidelines
- Use SwiftUI best practices
- Maintain accessibility compliance
- Include proper documentation
This project is licensed under the terms specified in the LICENSE file.
- Camera Not Working: Check permissions in Settings > Privacy & Security
- Speech Recognition Failing: Ensure microphone access is granted
- ML Model Loading: Ensure stable internet connection for initial download
- Braille Keyboard: Verify Bluetooth pairing and key count configuration
- Issues: Report bugs via GitHub Issues
- Documentation: Check Apple's Core ML and Speech framework docs
- Accessibility: Refer to Apple's Accessibility Guidelines
- Custom ML Models: Support for user-trained models
- Offline Mode: Complete offline functionality
- Multi-language Support: Internationalization
- Advanced Braille Support: More braille device compatibility
- Voice Commands: Hands-free operation
- Cloud Sync: Settings synchronization across devices
- Performance Optimization: Faster inference times
- Battery Efficiency: Reduced power consumption
- Error Handling: Improved error recovery
- User Customization: More personalization options
For questions, suggestions, or support:
- Developer: Saamer Mansoor, Kyle Peterson, Hitesh Parikh
- Repository: GitHub Repository
- Privacy Policy: Available in the repository
- Captioned 60 second Ad video URL: https://www.youtube.com/watch?v=qRN_nitff10
- Captioned 5 minute pitch video URL: https://youtu.be/rSBIvyGcbB8,
- Live URL: Waiting for approval from Apple. It's restrictive because they try to ensure safety. Will update the GitHub repo readme once it does get approved
- Pitch Deck: https://docs.google.com/presentation/d/1cZAG-1_gR2UqXI0AKpZboyzmZUgIYYkD
- Responsible AI Disclosure: https://docs.google.com/document/d/1-jSgq9VlbfORuwyGApSwcMOnsKp7N47-gs3_D6NO6Lo
- Project Write-up: https://docs.google.com/document/d/1Inr0gsbQ3PZqHula0KA9lGZx2cskXVQitYTgjnJocrg
Note: This app is designed specifically for deaf-blind users and requires more testing with accessibility tools. Always test with real users to ensure proper functionality.