A Python-based object detection tool using Facebook's DETR (DEtection TRansformer) model to detect and annotate objects in images. The tool processes all images in a folder, draws bounding boxes around detected objects, and provides comprehensive analysis reports.
- Object Detection: Uses Facebook's DETR-ResNet-50 model for accurate object detection
- Visual Annotations: Draws red bounding boxes around detected objects
- Text Labels: Shows object name and confidence percentage on each detection
- Batch Processing: Processes all images in a folder automatically
- Comprehensive Reports: Provides detailed analysis including:
- Expected vs. detected objects comparison
- Missing objects list
- Unexpected objects found
- Images with no detections
- Output Management: Saves annotated images to an output folder
The tool is configured to detect the following 38 object categories:
- airplane, banana, bench, bird, boat, bowl, broccoli, bus, car, cat
- chair, clock, couch, dining table, dog, elephant, fire hydrant, fork
- giraffe, handbag, horse, person, pizza, potted plant, remote, sheep
- traffic light, truck, zebra, leopard, robot, firearm, tank, tie
- tv, phone, cup, keyboard, mouse, chicken, squirrel, bed, pillow, parsley, knife
- Python 3.7 or higher
- Windows 10/11, Linux (Ubuntu/Debian/CentOS), or macOS
- For Windows: Long path support enabled (see below)
Important: Windows has a default path length limit of 260 characters. To avoid issues with deep folder structures, enable long path support:
-
Open Registry Editor:
- Press
Win + R, typeregedit, and press Enter - Click "Yes" when prompted by User Account Control
- Press
-
Navigate to the correct key:
HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\FileSystem -
Create or modify the LongPathsEnabled value:
- Right-click in the right panel
- Select "New" → "DWORD (32-bit) Value"
- Name it
LongPathsEnabled - Double-click the new value
- Set "Value data" to
1 - Click "OK"
-
Restart your computer for the changes to take effect
# Open PowerShell as Administrator and run:
New-ItemProperty -Path "HKLM:\SYSTEM\CurrentControlSet\Control\FileSystem" -Name "LongPathsEnabled" -Value 1 -PropertyType DWORD -ForceThen restart your computer.
- Press
Win + R, typegpedit.msc, and press Enter - Navigate to:
Computer Configuration→Administrative Templates→System→Filesystem - Double-click "Enable Win32 long paths"
- Select "Enabled" and click "OK"
- Restart your computer
pip install torch torchvision
pip install transformers
pip install pillowOption A: Virtual Environment (Recommended)
# Update package list
sudo apt update
# Install system dependencies
sudo apt install python3-pip python3-dev python3-venv python3-full
# Create a virtual environment
python3 -m venv oDEtector_env
# Activate the virtual environment
source oDEtector_env/bin/activate
# Install Python packages in the virtual environment
pip install torch torchvision
pip install transformers
pip install pillow
# Note: Always activate the virtual environment before running the script
# source oDEtector_env/bin/activateOption B: System-wide Installation (Alternative)
# Update package list
sudo apt update
# Install system dependencies
sudo apt install python3-pip python3-dev
# Install packages system-wide (may require --break-system-packages)
pip3 install --user torch torchvision transformers pillow
# Or if you get "externally managed environment" error:
pip3 install --break-system-packages torch torchvision transformers pillow# Install system dependencies
sudo yum install python3-pip python3-devel
# Create a virtual environment
python3 -m venv oDEtector_env
# Activate the virtual environment
source oDEtector_env/bin/activate
# Install Python packages in the virtual environment
pip install torch torchvision
pip install transformers
pip install pillow# Using pip
pip install torch torchvision
pip install transformers
pip install pillow
# Or using conda
conda install pytorch torchvision -c pytorch
conda install -c conda-forge transformers pillowpip install torch torchvision transformers pillow- Clone or download this repository
- Place your images in the
imagesfolder - Run the script
For convenience, you can use this one-liner to set up everything on Linux:
# Download and setup in one command
git clone <repository-url> oDEtector && cd oDEtector && python3 -m venv oDEtector_env && source oDEtector_env/bin/activate && pip install torch torchvision transformers pillowOr create a setup script:
#!/bin/bash
# setup.sh - Quick setup script for Linux
echo "Setting up oDEtector..."
# Create virtual environment
python3 -m venv oDEtector_env
source oDEtector_env/bin/activate
# Install dependencies
pip install torch torchvision transformers pillow
echo "Setup complete! To run the script:"
echo "1. source oDEtector_env/bin/activate"
echo "2. python3 main.py"-
Prepare your images:
- Place all images you want to process in the
imagesfolder - Supported formats: JPG, JPEG, PNG, BMP, GIF, TIFF (case insensitive)
- Place all images you want to process in the
-
Run the detection:
# Windows python main.py # Linux/macOS (with virtual environment) source oDEtector_env/bin/activate # Activate virtual environment first python3 main.py # Linux/macOS (without virtual environment - if installed system-wide) python3 main.py
-
View results:
- Annotated images will be saved in the
outputfolder - Console will show detailed processing information
- Summary report will be displayed at the end
- Annotated images will be saved in the
project/
├── images/ # Input images folder
│ ├── image1.jpg
│ ├── image2.png
│ └── ...
├── output/ # Generated annotated images
│ ├── annotated_image1.jpg
│ ├── annotated_image2.png
│ └── ...
└── main.py # Main detection script
Loading DETR model...
Model loaded successfully!
Found 25 images to process
================================================================================
Processing image 1/25: IMG_0773.JPG
================================================================================
Detected objects:
- person with confidence 0.95 at location [120.5, 80.2, 200.3, 300.1]
- car with confidence 0.92 at location [50.1, 150.8, 180.5, 250.3]
Saved annotated image: output/annotated_IMG_0773.JPG
================================================================================
PROCESSING SUMMARY
================================================================================
Total images processed: 25
Images with objects detected: 20
Images with no objects detected: 3
Images with errors: 2
Expected objects list:
airplane, banana, bench, bird, boat, bowl, broccoli, bus, car, cat, chair, clock, couch, dining table, dog, elephant, fire hydrant, fork, giraffe, handbag, horse, person, pizza, potted plant, remote, sheep, traffic light, truck, zebra, leopard, robot, firearm, tank, tie, tv, phone, cup, keyboard, mouse, chicken, squirrel, bed, pillow, parsley, knife
Objects actually found: car, cat, dog, person, pizza
Missing objects: airplane, banana, bench, bird, boat, bowl, broccoli, bus, chair, clock, couch, dining table, elephant, fire hydrant, fork, giraffe, handbag, horse, potted plant, remote, sheep, traffic light, truck, zebra, leopard, robot, firearm, tank, tie, tv, phone, cup, keyboard, mouse, chicken, squirrel, bed, pillow, parsley, knife
Unexpected objects found: bicycle, laptop
Images with no objects detected: IMG_0794.JPG, IMG_3133.jpg, IMG_3151.jpg
================================================================================
Processing complete!
Annotated images saved to: output/
================================================================================
The confidence threshold is set to 0.9 (90%). To modify this, edit line 71 in main.py:
results = processor.post_process_object_detection(outputs, target_sizes=target_sizes, threshold=0.9)[0]Change 0.9 to your desired threshold (0.0 to 1.0).
To modify the list of expected objects, edit the expected_objects list in main.py (lines 35-40).
-
"Path too long" error (Windows only):
- Ensure long path support is enabled (see installation step 1)
- Restart your computer after enabling long path support
-
CUDA out of memory:
- The script uses CPU by default
- If you have CUDA and want to use GPU, modify the code to move tensors to GPU
-
Font not found warning:
- The script will fall back to default font if system fonts are not available
- This doesn't affect functionality, only text appearance
- On Linux, install fonts:
sudo apt install fonts-dejavu-core(Ubuntu/Debian)
-
No objects detected:
- Check if your images contain the expected object types
- Try lowering the confidence threshold
- Ensure images are in RGB format (the script converts automatically)
-
Permission denied errors (Linux):
- Ensure you have write permissions in the project directory
- Use
chmod +x main.pyif needed - Check that the output folder can be created
-
Python not found (Linux):
- Use
python3instead ofpython - Install Python 3:
sudo apt install python3(Ubuntu/Debian) - Check Python version:
python3 --version
- Use
-
"This environment is externally managed" error (Linux):
- This is a new protection in modern Linux distributions
- Solution A: Use a virtual environment (recommended)
# Create and activate virtual environment python3 -m venv oDEtector_env source oDEtector_env/bin/activate pip install torch torchvision transformers pillow
- Solution B: Use --user flag for user-level installation
pip3 install --user torch torchvision transformers pillow
- Solution C: Override the protection (use with caution)
pip3 install --break-system-packages torch torchvision transformers pillow
- Solution D: Use pipx for isolated installations
sudo apt install pipx pipx install --include-deps torch transformers pillow
- Batch Processing: The script processes images sequentially for memory efficiency
- Memory Usage: Each image is processed individually to minimize memory usage
- Large Images: Very large images may take longer to process
- Model: facebook/detr-resnet-50
- Architecture: DETR (DEtection TRansformer)
- Backbone: ResNet-50
- Dataset: COCO 2017
- Classes: 91 object categories
- Python 3.7+
- PyTorch
- Transformers library
- Pillow (PIL)
- Windows: Windows 10/11 with long path support
- Linux: Ubuntu 18.04+, Debian 10+, CentOS 7+, or similar
- macOS: macOS 10.14+ (Mojave or later)
This project uses the DETR model from Facebook Research, which is available under the MIT License.
Feel free to submit issues, feature requests, or pull requests to improve this tool.
If you encounter any issues:
- Check the troubleshooting section above
- Ensure all dependencies are properly installed
- Verify that long path support is enabled on Windows
- Check that your images are in supported formats