Skip to content

mihir2452005/object_detection

Repository files navigation

<<<<<<< HEAD

AutoFetch-Detect: An Automated LLM-Enhanced Real-Time Object Detection System

Python 3.10+ Flutter License: MIT

AutoFetch-Detect is an automated system that combines state-of-the-art object detection (YOLOv8) with contextual understanding through a Retrieval-Augmented Generation (RAG) system powered by a local LLM (Llama-2-7B). The system performs real-time object detection with contextual descriptions, suitable for cross-platform deployment on iOS, Android, and web.

Features

  • Real-time Object Detection: Uses YOLOv8 for fast and accurate object detection
  • Contextual Descriptions: Llama-2-7B quantized model with RAG system generates contextual descriptions for detected objects
  • Automated Pipeline: Complete end-to-end pipeline from dataset fetching to deployment
  • Cross-Platform: Flutter frontend for iOS, Android, and web deployment
  • Local Processing: No external APIs or cloud services required after initial setup
  • Modular Design: Easy to swap datasets, models, and components

Architecture

┌─────────────────┐    ┌──────────────────┐    ┌──────────────────┐
│   Flutter App   │───▶│   FastAPI API    │───▶│  YOLO + RAG      │
│                 │    │                  │    │                  │
│  • Camera feed  │    │  • /detect       │    │  • Object        │
│  • Image upload │    │  • /batch_detect │    │    detection     │
│  • Overlay      │    │  • /classes      │    │  • RAG           │
│  • Descriptions │    │  • /model_info   │    │    descriptions  │
└─────────────────┘    └──────────────────┘    └──────────────────┘

Prerequisites

  • Python 3.10+
  • Flutter 3.16+
  • Git
  • NVIDIA GPU (recommended) or CPU
  • At least 15GB free disk space
  • Access to Hugging Face (for Llama-2 model) - you must accept the terms manually

Setup

1. Clone the Repository

git clone https://github.com/yourusername/autofetch-detect.git
cd autofetch-detect

2. Install Python Dependencies

# Create virtual environment (recommended)
conda create -n autofetch python=3.10
conda activate autofetch

# Install Python dependencies
pip install -r requirements.txt

3. Setup Hugging Face Access (for Llama-2)

You need to manually accept the Llama-2 terms at https://huggingface.co/meta-llama/Llama-2-7b-chat-hf and set up your access token:

# Set your Hugging Face token
export HF_TOKEN=your_huggingface_token_here

4. Install Flutter Dependencies

# Navigate to Flutter app
cd flutter_app

# Get Flutter dependencies
flutter pub get

Usage

The system is orchestrated through the main script with different phases:

Phase 1: Setup Environment

python main.py --phase setup

Phase 2: Fetch Dataset

python main.py --phase fetch

Phase 3: Train Models

python main.py --phase train

Phase 4: Run Complete Pipeline

# Run all phases in sequence
python main.py --phase all

Phase 5: Start Server

python main.py --phase serve

Phase 6: Run Flutter App

cd flutter_app
flutter run

Custom Class Filtering

You can specify custom classes to filter the dataset:

python main.py --phase fetch --classes "hammer,screwdriver,drill"

Project Structure

obj_detect_project/
├── README.md                  # Setup, run instructions, thesis outline
├── requirements.txt           # Python deps
├── Dockerfile                 # For backend
├── main.py                    # Orchestrator: python main.py --phase [setup|fetch|train|test|serve|all]
├── data/
│   ├── fetch_dataset.py      # Auto-download/extract/convert COCO
│   ├── coco.yaml             # YOLO config (auto-generated)
│   └── raw/                  # Downloaded zips/JSons (gitignore)
├── models/
│   ├── train.py              # YOLO training + Ray Tune hyperparams
│   ├── rag_setup.py          # LLM download, embed docs, FAISS index
│   └── inference.py          # Detect + RAG query → JSON
├── backend/
│   └── app.py                # FastAPI server (/detect endpoint)
├── tests/
│   ├── test_train.py         # Pytest for metrics
│   └── test_end2end.py       # Full pipeline validation
├── notebooks/
│   └── eval_ablation.ipynb   # Jupyter for results viz (mAP tables, charts)
├── flutter_app/
│   ├── pubspec.yaml          # Flutter deps
│   ├── lib/
│   │   ├── main.dart         # Entry + camera screen
│   │   └── detection_overlay.dart  # Bbox drawing + LLM text
│   └── android/ ios/ web/    # Standard Flutter dirs
├── docs/
│   └── thesis_outline.md     # 50-page thesis template (LaTeX ready)
└── .gitignore                # Ignore data/models/large files

API Endpoints

The backend provides the following endpoints:

  • GET / - Health check
  • POST /detect - Detect objects in uploaded image
  • POST /batch_detect - Detect objects in multiple images
  • GET /classes - Get list of detectable classes
  • GET /model_info - Get information about loaded model
  • GET /stats - Get API usage statistics

Performance

  • Detection Speed: Up to 41+ FPS on RTX 3080, 23+ FPS on RTX 3060
  • Model Accuracy: mAP@0.5 of 0.7+ on COCO dataset
  • RAG Response: <0.8s average response time
  • Memory Usage: 2-8GB depending on configuration

Docker Deployment

To run the backend in Docker:

# Build the image
docker build -t autofetch-detect .

# Run the container
docker run -p 8000:8000 autofetch-detect

Troubleshooting

GPU Issues

If you encounter GPU-related issues:

  1. Ensure CUDA drivers are properly installed
  2. Verify PyTorch is installed with CUDA support
  3. Check that your GPU has enough memory

Model Download Issues

If model downloads fail:

  1. Verify your internet connection
  2. Check if you have accepted the terms for Llama-2 on Hugging Face
  3. Ensure you have sufficient disk space

Flutter Build Issues

For Flutter-related issues:

  1. Ensure Flutter is properly installed and in PATH
  2. Run flutter doctor to check for issues
  3. Verify iOS/Android SDKs are properly configured

License

This project is licensed under the MIT License - see the LICENSE file for details.

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

Acknowledgments

  • YOLOv8 for the object detection backbone
  • Llama-2 for the language model foundation
  • Hugging Face for model hosting
  • Flutter team for the cross-platform framework =======

object_detection

b7b766ae8bc61c27dc3245a7626fff66281fbc98

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors