Contrastively Learning Visual Attention as Affordance Cues from Demonstrations for Robotic Grasping

Cite our Paper

@article{zha2021contrastively,
  title={Contrastively Learning Visual Attention as Affordance Cues from Demonstrations for Robotic Grasping},
  author={Zha, Yantian and Bhambri, Siddhant and Guan, Lin},
  journal={The IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)},
  year={2021}
}

This repository includes codes for reproducing the experiments (Table. 1) in the paper:

Ours (Full Model): Siamese + Coupled Triplet Loss
Ours (Ablation): Siamese + Normal Triplet Loss [To Add]
Ours (Ablation): Without Contrastive Learning [To Add]
Baseline [To Add]

Dependencies (`pip install` with python=3.6 or 3.7):

pybullet
PyTorch (make sure that the torch version matches your cuda version; otherwise, you may still be able to install pytorch but the learning performance could be abnormal)
ray
opencv-python==4.5.2.52 (similar versions around 4.5 may also be fine)
easydict
matplotlib
wandb (removable dependency)
visdom (removable dependency)

Dataset and Mug Models:

Please download here. Change the path correspondingly in the file "src/grasp_bc_13_a.json"

Training:

Modify trainFolderDir,testFolderDir, and obj_folder in src/grasp_bc_13_a.json; Modify result_path and model_path in all training scripts (e.g. trainGrasp_full.py).
Train the full model with coupled triplet loss by running python3 trainGrasp_full.py src/grasp_bc_13_a; If you train the model on a multi-GPU server, you could run CUDA_VISIBLE_DEVICES=1 nohup python3 trainGrasp_full.py src/grasp_bc_13_a > nohup.out & and monitor the training by the command tail -f nohup.out.
To visualize grasping rollout, you can go to the result_path that you set at step 1; you can easily check results by training epoches or mug IDs.

FAQ

Where is the implementation of coupled triplet loss?

Please check this line.
What are the mugs used and how they are different from each other?

Please check this line, which lists three sets of mug model indices that we used for the three affordance categories: body-graspable, handle-left-right-sides-graspable, and handle-front-back-sides-graspable. Note that we consider a challenging case that the three sets of mug indices can be overlapping.
Why the visualization images of observations in demonstration trajectory folders sometimes are different in brightnesses?

We added a grayscale normalization at the part of codes for visualizing recorded demonstrations during the procedure of collecting those trajectories. Since we only changed the visualization part, the trajectory data is ok to use.

Acknowledgement

The first author sincerely appreciates the permission of using some codes from this repo.

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
config		config
dataset		dataset
geometry		geometry
src		src
LICENSE.txt		LICENSE.txt
README.md		README.md
trainGrasp_full.py		trainGrasp_full.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

config

config

dataset

dataset

geometry

geometry

src

src

LICENSE.txt

LICENSE.txt

README.md

README.md

trainGrasp_full.py

trainGrasp_full.py

Repository files navigation

Contrastively Learning Visual Attention as Affordance Cues from Demonstrations for Robotic Grasping

Cite our Paper

Dependencies (`pip install` with python=3.6 or 3.7):

Dataset and Mug Models:

Training:

FAQ

Acknowledgement

About

Releases

Packages

Languages

License

jjhw/Affordance-Aware-Imitation-Learning

Folders and files

Latest commit

History

Repository files navigation

Contrastively Learning Visual Attention as Affordance Cues from Demonstrations for Robotic Grasping

Cite our Paper

Dependencies (pip install with python=3.6 or 3.7):

Dataset and Mug Models:

Training:

FAQ

Acknowledgement

About

Resources

License

Stars

Watchers

Forks

Languages

Dependencies (`pip install` with python=3.6 or 3.7):