End-to-end instance segmentation on MVTec-D2S using DINOv3 backbone with Mask2Former-lite (default) or EoMT-lite heads, plus COCO mAP evaluation and instance feature extraction readiness for retrieval.
- Instance heads: Mask2Former-lite (default) and EoMT-lite, pluggable
- DINOv3-ViT-B/16 frozen backbone with multi-layer features [2,5,8,11]
- COCO metrics in logs: mAP, AP50, AP75, mAR (per-epoch)
- Albumentations online增强(多尺度/几何/光照),支持 CutMix 可配
- tqdm 训练/验证进度条,A100 80G 可大批量训练
- 数据准备脚本支持合并
dataset/raw/mvtec-d2s/augmented到 train
cd /data3/projects/aipos_training
python -m venv venv && source venv/bin/activate
pip install -r requirements.txt
# Mask2Former (默认)
nohup python scripts/train_segmentation.py --config configs/instance_mask2former.yml --no-dry-run > logs/mask2former_$(date +%Y%m%d_%H%M%S).log 2>&1 &
# EoMT(对比)
nohup python scripts/train_segmentation.py --config configs/instance_eomt.yml --model-type instance_eomt --no-dry-run > logs/eomt_$(date +%Y%m%d_%H%M%S).log 2>&1 &常用参数:
--epochs 80 --batch-size 24(A100 80G 可进一步调大)- 本地 DINOv3 权重(默认):
DINOV3_LOCAL_WEIGHTS=/data3/projects/aipos_training/models/facebook/dinov3-vitb16-pretrain-lvd1689m
DINOv3-ViTB16 (frozen) → FPN → Pixel Decoder → Instance Head (Mask2Former-lite / EoMT-lite) → 实例掩码 + 分类分数
configs/instance_mask2former.yml(paths.checkpoint_dir/log_dir 已分目录)configs/instance_eomt.yml(同上)
目标结构(COCO 实例):
dataset/mvtec-d2s/
├── images/train/*.jpg
├── images/val/*.jpg
└── annotations/instances_{train,val}.json
准备脚本(含 augmented 合并):
python scripts/prepare_mvtec_d2s.py --raw-dir dataset/raw/mvtec-d2s --out-dir dataset/mvtec-d2s --include-augmented训练中验证阶段自动评估并在日志打印:
Epoch X/Y - Val mAP: 0.xxxx, AP50: 0.xxxx, AP75: 0.xxxx, mAR: 0.xxxx
python tests/visualize_instances.py
# 输出到 outputs/instance_vis_3x3.png- PyTorch ≥ 2.0, CUDA GPU(A100 80G 推荐)
- 详见
requirements.txt
- 权重加载:未设置
DINOV3_LOCAL_WEIGHTS时使用默认本地目录;本地 HF 失败后尝试本地 hub,最后远程 hub。 - 显存:调整
dataset.batch_size;已启用混合精度与高效数据加载。