By Qi Wu, Cheng-Ju (Jimmy) Wu, Yixin Zhu, and Jungseock Joo (IROS 2021)
IROS 2021 Paper | Website | Video | BibTex
This is the source code repository for Ges-THOR framework from this IROS paper. The experiments are tested on Ubuntu 18.04 and 20.04.
If you find this project useful in your research, please consider citing:
@inproceedings{wu2021communicative,
title={Communicative Learning with Natural Gestures for Embodied Navigation Agents with Human-in-the-Scene},
author={Qi Wu and Cheng-Ju Wu and Yixin Zhu and Jungseock Joo},
booktitle={International Conference on Intelligent Robotics and Systems (IROS)}
year={2021},
}
First, clone the repository with git clone https://github.com/qiwu57kevin/iros2021-gesthor.git && cd iros2021-gesthor
.
You will find two directories:
unity
: this directory contains the Unity local build where the user can interact with the simulation environment.allenact
: this directory contains training scripts based on AllenAct. You could start training and evaluation without opening the Unity Editor.
If you want to try the interative feature of our simulation framework, check this document; if you only want to train the navigation agent with gestures, please cd allenact
and follow steps below.
You can download our dataset from here. Unpack this package under datasets
folder. The ithor-objectnav-gesture
folder after unpacking contains:
train
,val
,test
folder for 3 different stages- In each of the above folder, you will see 3 subfolders:
episodes
: it contains episodic information for all scenesmotions
: it contains motion files for referencing gesturesintervention_gestures
: it contains motion files for intervention gestures
After downloading the datasets, you should install all necessary python packages for training. We recommend you to install a conda environment and follow the instructions here provided by AllenAct.
Here we only provide running scripts for training. If you want to know more details about our database, please refer to AllenAct website.
The experiment config file is projects/ithor_gesture/objectnav_ithor_rgbd_resnetgru_ddppo_gesture.py
.
To start a training, say you want to run an agent with referencing gestures in the kichen, try running the following scripts:
python3 main.py \
projects/ithor_gesture/objectnav_ithor_rgbd_resnetgru_ddppo_gesture.py \
-o storage/example_experiment \
-s 12345 \
--config_kwargs "{'recording_percentage':1.0, \
'use_gesture':True \
'add_intervention':False, \
'room_type':'kitchen'}"
A few notes on the scripts:
- With
-o storage/example_experiment
we set the output folder into which results and logs will be saved. - With
-s 12345
we set the random seed. - In
config_kwargs
:recording_percentage
refers to how many samples we need from the dataset. If you only want half of it, just write 0.5. (default1.0
)use_gesture
: whether to use referencing gestures. (defaultfalse
)add_intervention
: whether to use intervention gestures. This could be used with referencing gestures in the same episode. (defaultfalse
)room_types
: we have 5 selections: kitchen, livingroom, bedroom, bathroom, and all, which represents all scenes. (default"all"
)
To evaluate your trained model, first look for your checkpoint. For example, your checkpoint is storage/my_checkpoint.pt
based on your experiment above in training, you could run the following script:
python3 main.py \
projects/ithor_gesture/objectnav_ithor_rgbd_resnetgru_ddppo_gesture.py \
-o storage/example_experiment \
-s 12345 \
--config_kwargs "{'recording_percentage':1.0, \
'use_gesture':True \
'add_intervention':False, \
'room_type':'kitchen'}" \
-c storage/my_checkpoint.pt \
--eval
Here, -c
is to specify which checkpoint model to use, and --eval
is to mark the process as evaluation--or lese, the training will resume from the checkpoint model and continue.