🌀 RotDet — Document Page Orientation Detection

RotDet is a lightweight PyTorch training and evaluation framework for detecting page orientation (“rotated vs. normal”) in scanned documents.

It re-implements and extends the fcrescio/rotdet model from Hugging Face Hub, supporting both map-style and streaming datasets (e.g. HuggingFaceM4/Docmatix).

✨ Features

✅ Training and evaluation of rotation detector CNNs
✅ Works with Hugging Face datasets (map-style or streaming)
✅ Handles multi-page documents (images key in Docmatix)
✅ Auto-splitting into train/validation sets
✅ Checkpointing (last.safetensors, best.safetensors)
✅ Resume or fine-tune from pretrained weights
✅ Automatic config selection (images, pdf, zero-shot-exp, etc.)
✅ Evaluation with fail logging (JSONL + failed image export)
✅ Modular design: reusable data, model, and script modules

🧱 Project structure

rotdet/
├── scripts/       
├── rotdet_data.py       # Dataset wrappers + loaders
├── rotdet_model.py      # SimpleCNN + weight loading utilities
├── rotdet_hf.py         # Hugging Face config + dataset loader helpers
├── train_rotdet.py      # Training entry point
├── evaluate_rotdet.py   # Evaluation entry point
└── pyproject.toml       # Build & CLI definitions

When installed, the following console commands are available:

rotdet-train → train or fine-tune a model
rotdet-eval → evaluate a model checkpoint

⚙️ Installation

Requires Python ≥3.9 and PyTorch ≥2.0.

Using uv

uv pip install -e .

or with pip:

pip install -e .

🚀 Quick Start

1️⃣ Evaluate the pretrained model

uv run rotdet-eval \
  --repo_id fcrescio/rotdet \
  --filename model.safetensors \
  --dataset nielsr/funsd \
  --split test

2️⃣ Train on a Hugging Face dataset

Map-style example (FUNSD)

uv run rotdet-train \
  --dataset nielsr/funsd \
  --split train \
  --val-fraction 0.2 \
  --epochs 3 --batch-size 64 \
  --output-dir checkpoints/funsd

Streaming example (Docmatix)

uv run rotdet-train \
  --dataset HuggingFaceM4/Docmatix \
  --config images \
  --streaming \
  --pages-per-doc 3 \
  --val-pages 4000 \
  --max-train-pages 20000 \
  --epochs 2 --batch-size 64 \
  --output-dir checkpoints/docmatix_stream

3️⃣ Resume training or fine-tune

uv run rotdet-train \
  --resume checkpoints/docmatix_stream/best.safetensors \
  --dataset HuggingFaceM4/Docmatix --config images --streaming \
  --epochs 2

4️⃣ Evaluate a local checkpoint

uv run rotdet-eval \
  --repo_id checkpoints/docmatix_stream/best.safetensors \
  --dataset HuggingFaceM4/Docmatix \
  --config images --streaming \
  --max_samples 2000

🧩 Evaluation options

Log failed samples (JSONL)

uv run rotdet-eval \
  --repo_id checkpoints/funsd/best.safetensors \
  --dataset nielsr/funsd --split test \
  --fail-log failures.jsonl

Each line looks like:

{"true":1,"pred":0,"meta":{"row_idx":12,"page_idx":3,"id":"doc123","image_path":"..."}}

Save failed images for visual debugging

uv run rotdet-eval \
  --repo_id checkpoints/funsd/best.safetensors \
  --dataset HuggingFaceM4/Docmatix --config images \
  --fail-log failures.jsonl \
  --save-fail-images debug_fails/

This writes a .jsonl log plus the actual failed page images into debug_fails/.

🧠 Dataset support

RotDet works with any Hugging Face dataset that provides images via:

image → single page per row (e.g. FUNSD)
images → list of pages per document (e.g. Docmatix)
image_path or url → path/URL to image

For Docmatix, the config is handled automatically:

# Automatically picks the best available config (usually 'images')
uv run rotdet-train --dataset HuggingFaceM4/Docmatix

or explicitly:

uv run rotdet-train --dataset HuggingFaceM4/Docmatix --config zero-shot-exp

📥 Mining Reader Service scans locally

Need a lightweight document dataset without relying on an existing Hugging Face repo? Use the new helper script to mine the readerservice collection from Internet Archive and export it as a Hugging Face DatasetDict:

python scripts/readerservice_miner.py \
  --output-dir data/readerservice \
  --val-fraction 0.1 \
  --max-images 2000

The script downloads the images, writes them under data/readerservice/images/, materializes a HF-compliant dataset via datasets.save_to_disk, and produces a manifest.json with provenance info so you can train with load_from_disk("data/readerservice/hf_dataset") directly.

🧪 Experiment management

Each training run writes its own timestamped folder with:

last.safetensors — weights after the final epoch
best.safetensors — best validation accuracy
training_summary.json — per-epoch metrics
hparams.json — full training configuration

Example:

checkpoints/docmatix_stream/
├── best.safetensors
├── last.safetensors
├── hparams.json
└── training_summary.json

📊 Example training summary

{
  "epochs": [
    {"epoch":1,"train_loss":0.2413,"val_acc":0.974,"time_s":42.1},
    {"epoch":2,"train_loss":0.1384,"val_acc":0.982,"time_s":41.8}
  ],
  "best": {"val_acc":0.982}
}

🧱 Model architecture

The default SimpleCNN is a small three-layer convolutional classifier:

Input:  (1, 128, 128)
↓ Conv2d(1→16) + ReLU + MaxPool2d
↓ Conv2d(16→32) + ReLU + MaxPool2d
↓ Conv2d(32→32) + ReLU + MaxPool2d
↓ Flatten → Linear(8192→32) → ReLU → Linear(32→2)
Output: logits for [Normal, Rotated]

You can replace or extend it in rotdet_model.py.

🧰 Development

Typical workflow:

uv run rotdet-train   # train new model
uv run rotdet-eval    # evaluate checkpoint
uv run pytest         # (optional) run tests

💡 Experiment tips

Use --output-dir per experiment (each run auto-timestamps).
Enable --save-every-epoch for detailed logs.
Add --from-pretrained to start from Hugging Face weights.
Use Optuna or Hydra for hyperparameter sweeps (supported by design).
Add --fail-log during evaluation to detect dataset issues.

📄 License

🔗 References

fcrescio/rotdet — Original model
HuggingFaceM4/Docmatix — Document dataset
Hugging Face Datasets
PyTorch DataLoader

✅ Summary: This README documents installation, training, evaluation, dataset support, fail-logging, and experiment organization — everything needed to train or analyze document rotation detection models with your RotDet project.

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
scripts		scripts
.gitignore		.gitignore
README.md		README.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

🌀 RotDet — Document Page Orientation Detection

✨ Features

🧱 Project structure

⚙️ Installation

Using uv

🚀 Quick Start

1️⃣ Evaluate the pretrained model

2️⃣ Train on a Hugging Face dataset

Map-style example (FUNSD)

Streaming example (Docmatix)

3️⃣ Resume training or fine-tune

4️⃣ Evaluate a local checkpoint

🧩 Evaluation options

Log failed samples (JSONL)

Save failed images for visual debugging

🧠 Dataset support

📥 Mining Reader Service scans locally

🧪 Experiment management

📊 Example training summary

🧱 Model architecture

🧰 Development

💡 Experiment tips

📄 License

🔗 References

About

Uh oh!

Releases

Packages

Languages

fcrescio/rotdet

Folders and files

Latest commit

History

Repository files navigation

🌀 RotDet — Document Page Orientation Detection

✨ Features

🧱 Project structure

⚙️ Installation

Using uv

🚀 Quick Start

1️⃣ Evaluate the pretrained model

2️⃣ Train on a Hugging Face dataset

Map-style example (FUNSD)

Streaming example (Docmatix)

3️⃣ Resume training or fine-tune

4️⃣ Evaluate a local checkpoint

🧩 Evaluation options

Log failed samples (JSONL)

Save failed images for visual debugging

🧠 Dataset support

📥 Mining Reader Service scans locally

🧪 Experiment management

📊 Example training summary

🧱 Model architecture

🧰 Development

💡 Experiment tips

📄 License

🔗 References

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages