RotDet is a lightweight PyTorch training and evaluation framework for detecting page orientation (“rotated vs. normal”) in scanned documents.
It re-implements and extends the fcrescio/rotdet model from Hugging Face Hub, supporting both map-style and streaming datasets (e.g. HuggingFaceM4/Docmatix).
- ✅ Training and evaluation of rotation detector CNNs
- ✅ Works with Hugging Face datasets (map-style or streaming)
- ✅ Handles multi-page documents (
imageskey in Docmatix) - ✅ Auto-splitting into train/validation sets
- ✅ Checkpointing (
last.safetensors,best.safetensors) - ✅ Resume or fine-tune from pretrained weights
- ✅ Automatic config selection (
images,pdf,zero-shot-exp, etc.) - ✅ Evaluation with fail logging (JSONL + failed image export)
- ✅ Modular design: reusable data, model, and script modules
rotdet/
├── scripts/
├── rotdet_data.py # Dataset wrappers + loaders
├── rotdet_model.py # SimpleCNN + weight loading utilities
├── rotdet_hf.py # Hugging Face config + dataset loader helpers
├── train_rotdet.py # Training entry point
├── evaluate_rotdet.py # Evaluation entry point
└── pyproject.toml # Build & CLI definitions
When installed, the following console commands are available:
rotdet-train→ train or fine-tune a modelrotdet-eval→ evaluate a model checkpoint
Requires Python ≥3.9 and PyTorch ≥2.0.
Using uv
uv pip install -e .or with pip:
pip install -e .uv run rotdet-eval \
--repo_id fcrescio/rotdet \
--filename model.safetensors \
--dataset nielsr/funsd \
--split testuv run rotdet-train \
--dataset nielsr/funsd \
--split train \
--val-fraction 0.2 \
--epochs 3 --batch-size 64 \
--output-dir checkpoints/funsduv run rotdet-train \
--dataset HuggingFaceM4/Docmatix \
--config images \
--streaming \
--pages-per-doc 3 \
--val-pages 4000 \
--max-train-pages 20000 \
--epochs 2 --batch-size 64 \
--output-dir checkpoints/docmatix_streamuv run rotdet-train \
--resume checkpoints/docmatix_stream/best.safetensors \
--dataset HuggingFaceM4/Docmatix --config images --streaming \
--epochs 2uv run rotdet-eval \
--repo_id checkpoints/docmatix_stream/best.safetensors \
--dataset HuggingFaceM4/Docmatix \
--config images --streaming \
--max_samples 2000uv run rotdet-eval \
--repo_id checkpoints/funsd/best.safetensors \
--dataset nielsr/funsd --split test \
--fail-log failures.jsonlEach line looks like:
{"true":1,"pred":0,"meta":{"row_idx":12,"page_idx":3,"id":"doc123","image_path":"..."}}uv run rotdet-eval \
--repo_id checkpoints/funsd/best.safetensors \
--dataset HuggingFaceM4/Docmatix --config images \
--fail-log failures.jsonl \
--save-fail-images debug_fails/This writes a .jsonl log plus the actual failed page images into debug_fails/.
RotDet works with any Hugging Face dataset that provides images via:
image→ single page per row (e.g. FUNSD)images→ list of pages per document (e.g. Docmatix)image_pathorurl→ path/URL to image
For Docmatix, the config is handled automatically:
# Automatically picks the best available config (usually 'images')
uv run rotdet-train --dataset HuggingFaceM4/Docmatixor explicitly:
uv run rotdet-train --dataset HuggingFaceM4/Docmatix --config zero-shot-expNeed a lightweight document dataset without relying on an existing Hugging Face repo? Use the new helper script to mine the
readerservice collection from Internet Archive and export it as a Hugging Face
DatasetDict:
python scripts/readerservice_miner.py \
--output-dir data/readerservice \
--val-fraction 0.1 \
--max-images 2000The script downloads the images, writes them under data/readerservice/images/, materializes a HF-compliant dataset via
datasets.save_to_disk, and produces a manifest.json with provenance info so you can train with
load_from_disk("data/readerservice/hf_dataset") directly.
Each training run writes its own timestamped folder with:
last.safetensors— weights after the final epochbest.safetensors— best validation accuracytraining_summary.json— per-epoch metricshparams.json— full training configuration
Example:
checkpoints/docmatix_stream/
├── best.safetensors
├── last.safetensors
├── hparams.json
└── training_summary.json{
"epochs": [
{"epoch":1,"train_loss":0.2413,"val_acc":0.974,"time_s":42.1},
{"epoch":2,"train_loss":0.1384,"val_acc":0.982,"time_s":41.8}
],
"best": {"val_acc":0.982}
}The default SimpleCNN is a small three-layer convolutional classifier:
Input: (1, 128, 128)
↓ Conv2d(1→16) + ReLU + MaxPool2d
↓ Conv2d(16→32) + ReLU + MaxPool2d
↓ Conv2d(32→32) + ReLU + MaxPool2d
↓ Flatten → Linear(8192→32) → ReLU → Linear(32→2)
Output: logits for [Normal, Rotated]
You can replace or extend it in rotdet_model.py.
Typical workflow:
uv run rotdet-train # train new model
uv run rotdet-eval # evaluate checkpoint
uv run pytest # (optional) run tests- Use
--output-dirper experiment (each run auto-timestamps). - Enable
--save-every-epochfor detailed logs. - Add
--from-pretrainedto start from Hugging Face weights. - Use Optuna or Hydra for hyperparameter sweeps (supported by design).
- Add
--fail-logduring evaluation to detect dataset issues.
MIT License © 2025 — RotDet contributors Based on the pretrained model fcrescio/rotdet.
- fcrescio/rotdet — Original model
- HuggingFaceM4/Docmatix — Document dataset
- Hugging Face Datasets
- PyTorch DataLoader
✅ Summary: This README documents installation, training, evaluation, dataset support, fail-logging, and experiment organization — everything needed to train or analyze document rotation detection models with your RotDet project.