A single-stage object detection system built with TensorFlow that predicts bounding boxes and class probabilities for objects in images and videos in real-time. The model architecture is inspired by YOLO (You Only Look Once) and is optimized for both speed and accuracy.
- Single-Stage Detection: Fast, real-time object detection with a single forward pass through the network
- Multi-Scale Detection: Detection at three different scales for improved accuracy across object sizes
- Anchor-Based Prediction: Uses pre-defined anchor boxes to improve detection of objects with different aspect ratios
- Real-Time Inference: Optimized for real-time detection on GPU and CPU hardware
- Comprehensive Training Pipeline: Complete with data loading, augmentation, and validation
- Visualization Tools: Real-time visualization of detection results with class labels and confidence scores
- Performance Metrics: Evaluation using standard metrics like mAP (mean Average Precision)
- Configurable System: Highly customizable through YAML configuration files
- Camera & Video Support: Process live camera feeds or pre-recorded videos
machine_learning/
├── configs/ # Configuration files
│ └── config.yaml # Main configuration file
├── model_architecture/ # Model architecture definitions
│ └── model.py # Single-stage detector implementation
├── utils/ # Utility functions
│ ├── data_processing.py # Data loading and preprocessing
│ ├── visualization.py # Visualization utilities
│ └── metrics.py # Evaluation metrics
├── training/ # Training components
│ └── train.py # Training script
├── checkpoints/ # Model checkpoints (created during training)
├── data/ # Dataset directory (not included)
│ ├── train.txt # Training annotations
│ ├── val.txt # Validation annotations
│ └── coco_classes.txt # Class names
├── main.py # Main script for real-time detection
└── README.md # Project documentation
- Python 3.10+
- CUDA-compatible GPU (recommended for training)
- Linux, Windows, or macOS
-
Clone this repository:
git clone https://github.com/yourusername/object-detection.git cd object-detection
-
Create a virtual environment:
python -m venv venv source venv/bin/activate # On Windows, use `venv\Scripts\activate`
-
Install dependencies:
pip install -r requirements.txt
- TensorFlow 2.18.0+
- OpenCV 4.11.0+
- NumPy
- Matplotlib
- PyYAML
- scikit-learn
The system is configured through configs/config.yaml
. The main configuration sections are:
model:
name: "SingleStageDetector"
input_size: [416, 416] # Input image dimensions
channels: 3 # RGB channels
backbone: "darknet" # Feature extraction backbone
num_classes: 80 # Number of object classes to detect
grid_sizes: [13, 26, 52] # Feature map sizes for multi-scale detection
anchors: [...] # Anchor box dimensions for each scale
training:
batch_size: 8 # Batch size for training
epochs: 100 # Total training epochs
optimizer: "adam" # Optimizer type
learning_rate: 0.001 # Initial learning rate
lr_scheduler: "cosine" # Learning rate scheduler type
checkpoint_dir: "checkpoints" # Directory for model checkpoints
data:
train_annotations: "data/train.txt" # Path to training annotations
val_annotations: "data/val.txt" # Path to validation annotations
augmentation:
enabled: true # Enable/disable data augmentation
# ... augmentation parameters
preprocess:
normalize: true # Normalize pixel values
mean: [0.485, 0.456, 0.406] # RGB mean for normalization
std: [0.229, 0.224, 0.225] # RGB standard deviation
runtime:
confidence_threshold: 0.5 # Minimum confidence score for detections
nms_threshold: 0.45 # IoU threshold for non-maximum suppression
max_boxes: 100 # Maximum boxes to detect per image
The system expects annotation files in the following format:
path/to/image1.jpg x1,y1,x2,y2,class_id x1,y1,x2,y2,class_id ...
path/to/image2.jpg x1,y1,x2,y2,class_id x1,y1,x2,y2,class_id ...
Where:
- Each line starts with the path to an image
- Followed by one or more bounding box annotations
- Each bounding box is in format:
x1,y1,x2,y2,class_id
- Coordinates are in absolute pixel values
- Class IDs start from 0
Create a file data/coco_classes.txt
with one class name per line:
person
bicycle
car
...
To train the model with default settings:
python training/train.py --config configs/config.yaml
# Resume training from a checkpoint
python training/train.py --config configs/config.yaml --resume checkpoints/model_epoch_050_loss_0.1234.h5
# Use specific GPU(s)
python training/train.py --config configs/config.yaml --gpu 0 # Use first GPU
python training/train.py --config configs/config.yaml --gpu 0,1 # Use multiple GPUs
# Enable debug mode for more information
python training/train.py --config configs/config.yaml --debug
- Trained model checkpoints in
checkpoints/
directory - Training logs in
training.log
- Training metrics visualizations in
checkpoints/figures/
python main.py --checkpoint checkpoints/your_model.h5 --input 0 # Use camera index 0
python main.py --checkpoint checkpoints/your_model.h5 --input path/to/video.mp4 --output results/output.mp4 --save
# Adjust detection confidence
python main.py --checkpoint checkpoints/your_model.h5 --confidence 0.7
# Adjust NMS threshold
python main.py --checkpoint checkpoints/your_model.h5 --nms 0.5
# Disable display (for headless systems)
python main.py --checkpoint checkpoints/your_model.h5 --no-display --save
- Training: NVIDIA GPU with at least 8GB VRAM (16GB+ recommended for larger batch sizes)
- Inference: NVIDIA GPU with 4GB+ VRAM, or CPU for slower inference
The system requires:
- CUDA 11.8+ (for TensorFlow 2.18.0)
- cuDNN 8.6+
After installing CUDA, the script automatically manages GPU memory growth to avoid consuming all VRAM. You can specify which GPU(s) to use with the --gpu
argument.
- Use smaller input sizes (e.g.,
[320, 320]
) for faster inference - Use larger input sizes (e.g.,
[608, 608]
) for more accurate detections - Adjust confidence threshold for precision-recall trade-off
- Export to TensorFlow Lite or ONNX for deployment on edge devices
This implementation is inspired by:
This project is licensed under the MIT License - see the LICENSE file for details.