Leiyao Cui, Xiaoxue Chen, Hao Zhao, Guyue Zhou, Yixin Zhu
This is an official implementation of STRAP: Structured Object Affordance Segmentation with Point Supervision.
With significant annotation savings, point supervision has been proven effective for numerous 2D and 3D scene understanding problems. This success is primarily attributed to the structured output space; i.e., samples with high spatial affinity tend to share the same labels. Sharing this spirit, we study affordance segmentation with point supervision, wherein the setting inherits an unexplored dual affinity—spatial affinity and label affinity. By label affinity, we refer to affordance segmentation as a multi-label prediction problem: A plate can be both holdable and containable. By spatial affinity, we refer to a universal prior that nearby pixels with similar visual features should share the same point annotation. To tackle label affinity, we devise a dense prediction network that enhances label relations by effectively densifying labels in a new domain (i.e., label co-occurrence). To address spatial affinity, we exploit a Transformer backbone for global patch interaction and a regularization loss. In experiments, we benchmark our method on the challenging CAD120 dataset, showing significant performance gains over prior methods.
We have verified our codebase by pytorch == 1.12.1 with CUDA == 11.6 in python == 3.8.10, the following are requirements.
numpy==1.21.5
pillow==9.2.0
pytorch==1.12.1
pyyaml==6.0
scikit-image==0.19.2
scipy==1.7.3
tensorboard==2.9.0
timm==0.6.7
tqdm==4.64.0
Download the CAD120 affordance dataset from here. All point annotations are stored in ./data_preprocess/CAD120/keypoints.txt
.
Use ./datasets/CAD120/generate.py
to preprocess the dataset. Meanwhile, modify the script to customize your own path.
The dataset after preprocessing is similar to the following.
cad120
├── actor
│ ├── images
│ ├── labels
│ ├── train_affordance_keypoint.yaml
│ ├── train_affordance.txt
│ └── val_affordance.txt
└── object
├── images
├── labels
├── train_affordance_keypoint.yaml
├── train_affordance.txt
└── val_affordance.txt
Before starting your training, you need to modify some variables in train.sh
. The following are some details.
# You should select a split mode of CAD120 dataset. (object or actor)
SPLIT_MODE="object"
# You should assign your dataset's root path which is preprocessed by what mentioned above.
DATASET_ROOT_PATH="../dataset/cad120"
# You can choose where to store the output of training.
OUTPUT_PATH_NAME="outputs"
Use sh train.sh
in terminal to start your training.
We provide a jupyter notebook visualize.ipynb
to get a visualized results.
The variables which you need to customize are shown as follows.
# You should select a split mode of CAD120 dataset. (object or actor)
split_mode = "object"
# You should assign your dataset's root path which is preprocessed by what mentioned above.
dataset_root_path = "../dataset/cad120"
# You should assign the path of your pre-trained model.
resume = "./model.pth"
# And the basename of a file.
file_name = "10001_1"
Stage | Epoch | URL |
---|---|---|
first | 100 | first_100.pth |
second | 100 | second_100.pth |
third | 100 | third_100.pth |
first | BEST | first_best.pth |
second | BEST | second_best.pth |
third | BEST | third_best.pth |
Stage | Epoch | URL |
---|---|---|
first | 100 | first_100.pth |
second | 100 | second_100.pth |
third | 100 | third_100.pth |
first | BEST | first_best.pth |
second | BEST | second_best.pth |
third | BEST | third_best.pth |
The point annotations of CAD120 dataset are duplicated from keypoints.txt.