Skip to content

An open-source project using AI to assist Blind and Low Vision (BLV) individuals. Features scene understanding, text recognition, object detection, and voice interaction. Built with Flutter, FastAPI, and advanced AI tools. Enhance accessibility and contribute to inclusivity!

License

Notifications You must be signed in to change notification settings

shaowenfu/VISTA

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

10 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

πŸ” VISTA

Visual Intelligence Support & Technical Assistant for BLV

License PRs Welcome Flutter FastAPI OpenAI

English | δΈ­ζ–‡

Important Note: This repository provides an overview of the VISTA project. For implementation details, please visit our dedicated repositories:

🌟 Project Vision

VISTA aims to revolutionize how Blind and Low Vision (BLV) individuals interact with their environment through cutting-edge AI technologies. Beyond traditional assistive tools, VISTA strives to become a comprehensive multimodal AI companion that enhances perception, cognition, and interaction capabilities.

🎯 Core Challenges We Address

Challenge Solution
πŸšΆβ€β™‚οΈ Navigation & Mobility Advanced sensor fusion (mmWave radar + LiDAR) for all-weather perception
πŸ‘₯ Social Interaction Real-time social cue interpretation and non-visual feedback
πŸ“± Digital Accessibility Seamless multimodal interaction across devices and platforms
πŸ₯ Healthcare Access Intelligent medical assistance and health monitoring

πŸ—οΈ System Architecture

graph TD
    A[Perception Layer] --> B[Inference Layer]
    B --> C[Interaction Layer]
    C --> D[Execution Layer]
    
    A --> |Sensor Data| E[Event Bus]
    B --> |Decisions| E
    C --> |User Input| E
    D --> |Status| E
Loading

Key Components

  1. Perception System

    • Multi-sensor fusion
    • Environmental mapping
    • Real-time object tracking
    • Spatial audio processing
  2. Inference Engine

    • Scene understanding (GPT-4V)
    • Risk assessment
    • Path planning
    • Context awareness
  3. Interaction Interface

    • Natural language processing
    • Haptic feedback system
    • 3D audio navigation
    • Gesture recognition

πŸ› οΈ Technology Stack

Layer Technologies Features
Frontend Flutter - Cross-platform support
- Accessible UI/UX
- Real-time processing
Backend FastAPI - High-performance API
- Async processing
- Scalable architecture
AI Services GPT-4V - Scene understanding
- Multimodal fusion
- Contextual awareness

πŸ“¦ Related Repositories

Core Components

πŸ—ΊοΈ Development Roadmap

🌀️ Phase 1: Cloud Architecture(CurrentοΌ‰

graph TD
    A[Mobile Client] <-->|WebSocket/HTTPS| B[Cloud Server]
    B -->|AI Services| A
Loading

Core Components

  • πŸ“± Mobile App

    • Lightweight UI
    • Real-time camera
    • Audio I/O
    • State management
    • Network layer
  • ☁️ Cloud Server

    • Vision analysis
    • Speech processing
    • Multimodal fusion
    • Real-time processing

Communication

  • WebSocket streaming
  • RESTful APIs
  • MQTT state sync

πŸŒ₯️ Phase 2: Edge Computing

graph TD
    A[Mobile Client] <-->|Local Processing| B[Edge Module]
    B <-->|Config & Updates| C[Cloud Server]
Loading

Key Updates

  • πŸš€ Local AI inference
  • ⚑ Ultra-low latency (~10ms)
  • πŸ”’ Enhanced privacy
  • πŸ“Š Bandwidth optimization
  • πŸ’ͺ Improved reliability

Architecture Shift

  • Edge AI deployment
  • Cloud management
  • Optimized protocols

β›… Phase 3: Wearable Integration

graph TD
    A[Smart Glasses] <-->|Data Sync| B[Mobile Client]
    B <-->|Processing| C[Edge Module]
    C <-->|Management| D[Cloud Server]
Loading

Innovations

  • πŸ•ΆοΈ Smart glasses integration
  • πŸ“‘ Mesh networking
  • 🀝 Device synchronization
  • πŸ”„ Seamless updates
  • 🎯 Context awareness

Benefits

  • Hands-free operation
  • Real-time assistance
  • Enhanced mobility
πŸ“Š Progress (25%)
gantt
    title Phase 1 Progress
    dateFormat  YYYY-MM-DD
    section Framework
    Basic Architecture    :done, 2025-02-20, 3d
    section Features
    Voice Interface      :active, 2025-02-21, 1d
    Scene Understanding   :active, 2025-02-22, 1d
    Text Recognition     :active, 2025-02-23, 1d
Loading

Status

  • βœ… Project initialization
  • βœ… Basic architecture setup
  • βœ… CI/CD pipeline
  • 🚧 Scene understanding module
  • ⏳ Text recognition system
  • ⏳ Voice interaction interface
  • ⏳ Real-time processing

πŸ“ˆ Overall Progress

Phase Status Progress Timeline
Cloud Architecture 🚧 In Progress 25% 2025 Q1
Edge Computing ⏳ Planned 0% 2025 Q2
Wearable Integration ⏳ Planned 0% 2025 Q2

🎯 Current Sprint Focus

timeline
    title Sprint Goals (2025 Q1)
    section Scene Understanding
        Basic object detection
        Environment mapping
        Spatial relationships
    section Infrastructure
        Cloud deployment
        API development
        Testing framework
Loading

πŸ”¬ Research Areas

  • Sensor Fusion: Combining multiple sensor inputs for robust environmental perception
  • Privacy Computing: Federated learning and differential privacy protection
  • Multimodal AI: Cross-modal learning and understanding
  • Edge Intelligence: Distributed AI processing and optimization

🀝 Contributing

We welcome contributions from developers, researchers, and domain experts! Please read our Contributing Guidelines before submitting PRs.

πŸ“„ License

This project is licensed under the MIT License - see the LICENSE file for details.

πŸ“š Documentation

🌐 Community

About

An open-source project using AI to assist Blind and Low Vision (BLV) individuals. Features scene understanding, text recognition, object detection, and voice interaction. Built with Flutter, FastAPI, and advanced AI tools. Enhance accessibility and contribute to inclusivity!

Resources

License

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published