Skip to content

Latest commit

 

History

History
140 lines (101 loc) · 20.8 KB

MODEL_ZOO.md

File metadata and controls

140 lines (101 loc) · 20.8 KB

VLPart model zoo

This file documents a collection of models reported in our paper. The training time was measured on with 8 NVIDIA V100 GPUs & NVLink.

How to Read the Tables

The "Name" column contains a link to the config file.

To train a model, run:

python train_net.py --num-gpus 8 --config-file /path/to/config/name.yaml

To evaluate a model with a trained/pretrained model, run:

python train_net.py --num-gpus 8 --config-file /path/to/config/name.yaml --eval-only MODEL.WEIGHTS /path/to/weight.pth

An example of cross-dataset evaluation:

python train_net.py --num-gpus 8 --config-file configs/partimagenet/r50_partimagenet.yaml --eval-only MODEL.WEIGHTS models/r50_pascalpart.pth

Before training, make sure Preparing Datasets and Preparing Models are well-prepared.


Cross-dataset part segmentation on PartImageNet

Config All(40) AP quad-: head quad-: body quad-: foot quad-: tail Training time Download
pascal_part 4.5 17.4 0.1 0.0 2.9 1h model
+ IN-S11 label 5.4 23.6 3.4 0.8 1.2 1.5h model
+ IN-S11 parsed 7.8 35.0 15.2 3.5 8.9 3h model
Config All(40) AP quad-: head quad-: body quad-: foot quad-: tail Training time Download
pascal_part 4.5 17.4 0.1 0.0 2.9 1h model
+ LVIS_PACO 7.8 22.9 7.1 0.3 4.0 15h + 2.5h model
+ IN-S11 label 8.8 26.3 3.7 0.4 1.0 3h model
+ IN-S11 parsed 11.8 47.5 13.4 4.5 14.8 3h model
  • The evaluation metric is mAPmask@[0.5:0.95] on the validation set of PartImageNet.
  • pascal_part + LVIS_PACO is training first(15h) on LVIS and PACO r50_lvis_paco.pth, then(2.5h) on LVIS, PACO and Pascal Part.
  • Before training on IN-S11 parsed, generate IN-S11 parsed(20min) by:
python train_net.py --num-gpus 8 --config-file configs/ann_parser/build_pascalpart.yaml --eval-only
python train_net.py --num-gpus 8 --config-file configs/ann_parser/find_ins11_mixer.yaml --eval-only 

or download partimagenet_parsed.json and put it to $VLPart_ROOT/datasets/partimagenet/.


Cross-category part segmentation within Pascal Part

Config All(93) AP/AP50 Base(77) AP/AP50 Novel(16) AP/AP50 dog: head dog: torso dog: leg dog: paw dog: tail Training time Download
pascal_part_base 15.0/33.4 17.8/39.6 1.5/3.7 6.1 7.9 2.9 13.8 3.2 1h model
+ VOC object 16.8/36.8 19.9/43.3 2.1/5.9 29.9 22.6 3.2 12.4 2.1 1.5h model
+ IN-S20 label 17.4/37.5 20.8/44.7 1.1/3.1 12.8 17.8 2.0 5.9 0.9 3h model
+ IN-S20 parsed 18.4/39.4 21.3/45.3 4.2/11.0 28.7 34.8 17.2 5.7 14.3 4.5h model
  • The evaluation metric is APmask@0.5 on the validation set of Pascal Part.
  • Before training on IN-S20 parsed, generate IN-S20 parsed(50min) by:
python train_net.py --num-gpus 8 --config-file configs/ann_parser/build_pascalpartbase.yaml --eval-only
python train_net.py --num-gpus 8 --config-file configs/ann_parser/find_ins20_mixer.yaml --eval-only 

or download imagenet_voc_image_parsed.json and put it to $VLPart_ROOT/datasets/imagenet/.


Open-vocabulary object detection and part segmentation

R50 Mask R-CNN:

Name VOC AP/AP50 COCO AP/AP50 LVIS AP/APr PartImageNet AP/AP50 Pascal Part AP/AP50 PACO AP/AP50
Dataset-specific 35.9/69.7 38.0/60.8 28.1/20.8 29.7/54.1 19.4/42.3 10.6/21.7
Config r50_voc r50_coco r50_lvis r50_partimagenet r50_pascalpart r50_paco
Training Time 2h 6.5h 7h 2h 1h 7h
Download r50_voc.pth r50_coco.pth r50_lvis.pth r50_partimagenet.pth r50_pascalpart.pth r50_paco.pth
Config VOC AP/AP50 COCO AP/AP50 LVIS AP/APr PartImageNet AP/AP50 Pascal Part AP/AP50 PACO AP/AP50 Training time Download
joint 44.5/70.3 29.0/48.1 27.3/19.0 5.4/11.3 4.9/11.3 9.6/19.5 15h model
joint* 42.8/70.8 28.6/48.0 26.8/20.4 7.8/15.3 21.6/46.3 9.3/18.9 15h + 2.5h model
joint** 40.6/69.3 28.4/47.8 26.4/16.0 29.1/52.0 22.6/47.8 9.3/18.9 15h + 3h model
+ IN label 38.0/67.8 28.2/47.8 26.0/15.9 30.8/54.4 23.6/49.2 9.0/18.7 15h + 3h + 4h model
+ IN parsed 38.3/67.8 28.5/47.8 26.2/17.8 31.6/55.7 24.0/49.8 9.6/20.2 15h + 3h + 6h model
  • joint is training on LVIS and PACO.
  • joint* is training first(15h) on LVIS and PACO, then(2.5h) on LVIS, PACO, Pascal Part.
  • joint** is training first(15h) on LVIS and PACO, then(3h) on LVIS, PACO, Pascal Part, PartImageNet.
  • Before training on IN parsed, generate IN parsed(100min) by:
bash tools/golden_image_parse.sh

or download golden_image_parsed.zip, put it to $VLPart_ROOT/datasets/imagenet/ and unzip it.


SwinBase Cascade Mask R-CNN:

Name VOC AP/AP50 COCO AP/AP50 LVIS AP/APr PartImageNet AP/AP50 Pascal Part AP/AP50 PACO AP/AP50
Dataset-specific 59.0/82.0 52.5/72.0 43.1/38.7 41.7/68.7 27.4/56.1 15.2/29.4
Config swinbase_voc swinbase_coco swinbase_lvis swinbase_partimagenet swinbase_pascal_part swinbase_paco
Training Time 4h 1day15h 1day15h 4.5h 1.5h 1day2h
Download swinbase_voc.pth swinbase_coco.pth swinbase_lvis.pth swinbase_partimagenet.pth swinbase_pascalpart.pth swinbase_paco.pth
Config VOC AP/AP50 COCO AP/AP50 LVIS AP/APr PartImageNet AP/AP50 Pascal Part AP/AP50 PACO AP/AP50 Training time Download
joint 55.2/72.2 41.0/58.4 41.3/32.8 6.9/13.7 5.6/12.5 15.9/31.9 2day5h model
joint* 52.6/72.4 40.4/57.9 39.9/29.8 11.8/21.8 30.5/59.3 15.4/30.2 2day5h + 4.5h model
joint** 50.3/71.6 40.3/57.8 39.6/30.3 40.0/64.8 31.2/60.5 15.4/30.3 2day5h + 6h model
+ IN label 48.1/69.7 40.3/57.7 39.3/28.9 41.2/66.8 31.7/61.1 15.9/30.8 2day5h + 6h + 8h model
+ IN parsed 47.8/69.7 40.5/58.1 39.6/30.5 42.0/68.2 31.9/61.6 15.6/30.6 2day5h + 6h + 20h model
  • joint is training on LVIS and PACO.
  • joint* is training first(2day5h) on LVIS and PACO, then(4.5h) on LVIS, PACO, Pascal Part.
  • joint** is training first(2day5h) on LVIS and PACO, then(6h) on LVIS, PACO, Pascal Part, PartImageNet.
  • Before training on IN parsed, generate IN parsed(70min) by:
bash tools/golden_image_parse_swinbase.sh

or download golden_image_parsed_swinbase.zip, put it to $VLPart_ROOT/datasets/imagenet/ and unzip it.