Skip to content

shikharbahl/vrb

Repository files navigation

VRB: Affordances from Human Videos as a Versatile Representation for Robotics

Carnegie Mellon University, Meta AI Research

Shikhar Bahl*, Russell Mendonca*, Lili Chen, Unnat Jain, Deepak Pathak

[Paper] [Project] [Demo] [Video] [Dataset] [BibTeX]

Demo of my project

Given a scene, our model (VRB) learns actionable representations for robot learning. VRB predicts contact points and a post-contact trajectory learned from human videos. We aim to seamlessly integrate VRB with robotic manipulation, across 4 real world environments, over 10 different tasks, and 2 robotic platforms operating in the wild.

Our model takes a human-agnostic frame as input. The contact head outputs a contact heatmap (left) and the trajectory transformer predicts wrist waypoints (orange). This output can be directly used at inference time (with sparse 3D information, such as depth, and robot kinematics).

Installation

This code uses python>=3.9, and pytorch>=2.0, which can be installed by running the following:

First create the conda environment:

conda env create -f environment.yml

Install required libraries:

conda activate vrb
pip install -r requirements.txt
pip install -U git+https://github.com/luca-medeiros/lang-segment-anything.git

Either download the model weights and place in models folders or run:

mkdir models
bash download_model.sh

Running VRB

To run the model:

python demo.py --image ./kitchen.jpeg --model_path ./models/model_checkpoint_1249.pth.tar

The output should look like the following:

Helpful pointers

Citing VRB

If you find our model useful for your research, please cite the following:

@inproceedings{bahl2023affordances,
              title={Affordances from Human Videos as a Versatile Representation for Robotics},
              author={Bahl, Shikhar and Mendonca, Russell and Chen, Lili and Jain, Unnat and Pathak, Deepak},
              journal={CVPR},
              year={2023}
            }

Acknowledgements

We thank Shivam Duggal, Yufei Ye and Homanga Bharadhwaj for fruitful discussions and are grateful to Shagun Uppal, Ananye Agarwal, Murtaza Dalal and Jason Zhang for comments on early drafts of this paper. We would also like to thank the authors of HOI-Forecast [1], as the training code for VRB is adapted from their codebase. RM, LC, and DP are supported by NSF IIS-2024594, ONR MURI N00014-22-1-2773 and ONR N00014-22-1-2096.

[1] Joint Hand Motion and Interaction Hotspots Prediction from Egocentric Videos. Shaowei Liu, Subarna Tripathi, Somdeb Majumdar, Xiaolong Wang. CVPR 2022.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published