Skip to content

kiamaikocoders/object-detection

Repository files navigation

Real-Time Object Detection System

A single-stage object detection system built with TensorFlow that predicts bounding boxes and class probabilities for objects in images and videos in real-time. The model architecture is inspired by YOLO (You Only Look Once) and is optimized for both speed and accuracy.

Object Detection Example

Features

  • Single-Stage Detection: Fast, real-time object detection with a single forward pass through the network
  • Multi-Scale Detection: Detection at three different scales for improved accuracy across object sizes
  • Anchor-Based Prediction: Uses pre-defined anchor boxes to improve detection of objects with different aspect ratios
  • Real-Time Inference: Optimized for real-time detection on GPU and CPU hardware
  • Comprehensive Training Pipeline: Complete with data loading, augmentation, and validation
  • Visualization Tools: Real-time visualization of detection results with class labels and confidence scores
  • Performance Metrics: Evaluation using standard metrics like mAP (mean Average Precision)
  • Configurable System: Highly customizable through YAML configuration files
  • Camera & Video Support: Process live camera feeds or pre-recorded videos

Directory Structure

machine_learning/
├── configs/               # Configuration files
│   └── config.yaml        # Main configuration file
├── model_architecture/    # Model architecture definitions
│   └── model.py           # Single-stage detector implementation
├── utils/                 # Utility functions
│   ├── data_processing.py # Data loading and preprocessing
│   ├── visualization.py   # Visualization utilities
│   └── metrics.py         # Evaluation metrics
├── training/              # Training components
│   └── train.py           # Training script
├── checkpoints/           # Model checkpoints (created during training)
├── data/                  # Dataset directory (not included)
│   ├── train.txt          # Training annotations
│   ├── val.txt            # Validation annotations
│   └── coco_classes.txt   # Class names
├── main.py                # Main script for real-time detection
└── README.md              # Project documentation

Installation

Prerequisites

  • Python 3.10+
  • CUDA-compatible GPU (recommended for training)
  • Linux, Windows, or macOS

Setup

  1. Clone this repository:

    git clone https://github.com/yourusername/object-detection.git
    cd object-detection
  2. Create a virtual environment:

    python -m venv venv
    source venv/bin/activate  # On Windows, use `venv\Scripts\activate`
  3. Install dependencies:

    pip install -r requirements.txt

Dependencies

  • TensorFlow 2.18.0+
  • OpenCV 4.11.0+
  • NumPy
  • Matplotlib
  • PyYAML
  • scikit-learn

Configuration

The system is configured through configs/config.yaml. The main configuration sections are:

Model Parameters

model:
  name: "SingleStageDetector"
  input_size: [416, 416]         # Input image dimensions
  channels: 3                     # RGB channels
  backbone: "darknet"             # Feature extraction backbone
  num_classes: 80                 # Number of object classes to detect
  grid_sizes: [13, 26, 52]        # Feature map sizes for multi-scale detection
  anchors: [...]                  # Anchor box dimensions for each scale

Training Parameters

training:
  batch_size: 8                   # Batch size for training
  epochs: 100                     # Total training epochs
  optimizer: "adam"               # Optimizer type
  learning_rate: 0.001            # Initial learning rate
  lr_scheduler: "cosine"          # Learning rate scheduler type
  checkpoint_dir: "checkpoints"   # Directory for model checkpoints

Data Parameters

data:
  train_annotations: "data/train.txt"  # Path to training annotations
  val_annotations: "data/val.txt"      # Path to validation annotations
  augmentation:
    enabled: true                      # Enable/disable data augmentation
    # ... augmentation parameters
  preprocess:
    normalize: true                    # Normalize pixel values
    mean: [0.485, 0.456, 0.406]        # RGB mean for normalization
    std: [0.229, 0.224, 0.225]         # RGB standard deviation

Runtime Parameters

runtime:
  confidence_threshold: 0.5      # Minimum confidence score for detections
  nms_threshold: 0.45            # IoU threshold for non-maximum suppression
  max_boxes: 100                 # Maximum boxes to detect per image

Dataset Preparation

The system expects annotation files in the following format:

path/to/image1.jpg x1,y1,x2,y2,class_id x1,y1,x2,y2,class_id ...
path/to/image2.jpg x1,y1,x2,y2,class_id x1,y1,x2,y2,class_id ...

Where:

  • Each line starts with the path to an image
  • Followed by one or more bounding box annotations
  • Each bounding box is in format: x1,y1,x2,y2,class_id
  • Coordinates are in absolute pixel values
  • Class IDs start from 0

Class Names File

Create a file data/coco_classes.txt with one class name per line:

person
bicycle
car
...

Training

Basic Training

To train the model with default settings:

python training/train.py --config configs/config.yaml

Advanced Training Options

# Resume training from a checkpoint
python training/train.py --config configs/config.yaml --resume checkpoints/model_epoch_050_loss_0.1234.h5

# Use specific GPU(s)
python training/train.py --config configs/config.yaml --gpu 0  # Use first GPU
python training/train.py --config configs/config.yaml --gpu 0,1  # Use multiple GPUs

# Enable debug mode for more information
python training/train.py --config configs/config.yaml --debug

Training Outputs

  • Trained model checkpoints in checkpoints/ directory
  • Training logs in training.log
  • Training metrics visualizations in checkpoints/figures/

Inference & Detection

Real-time Detection with Camera

python main.py --checkpoint checkpoints/your_model.h5 --input 0  # Use camera index 0

Process Video File

python main.py --checkpoint checkpoints/your_model.h5 --input path/to/video.mp4 --output results/output.mp4 --save

Detection Options

# Adjust detection confidence
python main.py --checkpoint checkpoints/your_model.h5 --confidence 0.7

# Adjust NMS threshold
python main.py --checkpoint checkpoints/your_model.h5 --nms 0.5

# Disable display (for headless systems)
python main.py --checkpoint checkpoints/your_model.h5 --no-display --save

GPU Requirements and Setup

Hardware Recommendations

  • Training: NVIDIA GPU with at least 8GB VRAM (16GB+ recommended for larger batch sizes)
  • Inference: NVIDIA GPU with 4GB+ VRAM, or CPU for slower inference

CUDA Setup

The system requires:

  • CUDA 11.8+ (for TensorFlow 2.18.0)
  • cuDNN 8.6+

After installing CUDA, the script automatically manages GPU memory growth to avoid consuming all VRAM. You can specify which GPU(s) to use with the --gpu argument.

Performance Optimization

  • Use smaller input sizes (e.g., [320, 320]) for faster inference
  • Use larger input sizes (e.g., [608, 608]) for more accurate detections
  • Adjust confidence threshold for precision-recall trade-off
  • Export to TensorFlow Lite or ONNX for deployment on edge devices

Acknowledgments

This implementation is inspired by:

License

This project is licensed under the MIT License - see the LICENSE file for details.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages