Important Note: This repository provides an overview of the VISTA project. For implementation details, please visit our dedicated repositories:
- π± Vista-frontend - Flutter mobile application
- π₯οΈ Vista-backend - FastAPI backend server
VISTA aims to revolutionize how Blind and Low Vision (BLV) individuals interact with their environment through cutting-edge AI technologies. Beyond traditional assistive tools, VISTA strives to become a comprehensive multimodal AI companion that enhances perception, cognition, and interaction capabilities.
| Challenge | Solution |
|---|---|
| πΆββοΈ Navigation & Mobility | Advanced sensor fusion (mmWave radar + LiDAR) for all-weather perception |
| π₯ Social Interaction | Real-time social cue interpretation and non-visual feedback |
| π± Digital Accessibility | Seamless multimodal interaction across devices and platforms |
| π₯ Healthcare Access | Intelligent medical assistance and health monitoring |
graph TD
A[Perception Layer] --> B[Inference Layer]
B --> C[Interaction Layer]
C --> D[Execution Layer]
A --> |Sensor Data| E[Event Bus]
B --> |Decisions| E
C --> |User Input| E
D --> |Status| E
-
Perception System
- Multi-sensor fusion
- Environmental mapping
- Real-time object tracking
- Spatial audio processing
-
Inference Engine
- Scene understanding (GPT-4V)
- Risk assessment
- Path planning
- Context awareness
-
Interaction Interface
- Natural language processing
- Haptic feedback system
- 3D audio navigation
- Gesture recognition
- π± Vista-frontend - Flutter mobile application
- π₯οΈ Vista-backend - FastAPI backend server
graph TD
A[Mobile Client] <-->|WebSocket/HTTPS| B[Cloud Server]
B -->|AI Services| A
Core Components
Communication
|
graph TD
A[Mobile Client] <-->|Local Processing| B[Edge Module]
B <-->|Config & Updates| C[Cloud Server]
Key Updates
Architecture Shift
|
graph TD
A[Smart Glasses] <-->|Data Sync| B[Mobile Client]
B <-->|Processing| C[Edge Module]
C <-->|Management| D[Cloud Server]
Innovations
Benefits
|
π Progress (25%)
gantt
title Phase 1 Progress
dateFormat YYYY-MM-DD
section Framework
Basic Architecture :done, 2025-02-20, 3d
section Features
Voice Interface :active, 2025-02-21, 1d
Scene Understanding :active, 2025-02-22, 1d
Text Recognition :active, 2025-02-23, 1d
Status
- β Project initialization
- β Basic architecture setup
- β CI/CD pipeline
- π§ Scene understanding module
- β³ Text recognition system
- β³ Voice interaction interface
- β³ Real-time processing
| Phase | Status | Progress | Timeline |
|---|---|---|---|
| Cloud Architecture | π§ In Progress | 25% | 2025 Q1 |
| Edge Computing | β³ Planned | 0% | 2025 Q2 |
| Wearable Integration | β³ Planned | 0% | 2025 Q2 |
timeline
title Sprint Goals (2025 Q1)
section Scene Understanding
Basic object detection
Environment mapping
Spatial relationships
section Infrastructure
Cloud deployment
API development
Testing framework
- Sensor Fusion: Combining multiple sensor inputs for robust environmental perception
- Privacy Computing: Federated learning and differential privacy protection
- Multimodal AI: Cross-modal learning and understanding
- Edge Intelligence: Distributed AI processing and optimization
We welcome contributions from developers, researchers, and domain experts! Please read our Contributing Guidelines before submitting PRs.
This project is licensed under the MIT License - see the LICENSE file for details.