The official PyTorch implementation of "Learning Where to See for Navigation: A Self-Supervised Vision-Action Pre-Training Approach".
Main libraries:
- PyTorch: as the main ML framework
- Comet.ml: tracking code, logging experiments
- OmegaConf: for managing configuration files
First create a virtual env for the project.
python3 -m venv .venv
source .venv/bin/activate
Then install the latest version of PyTorch from the official site. Finally, run the following:
pip install -r requirements.txt
To set up Comet.Ml follow the official documentations.
Please follow this guide to download the dataset.
To run pretext training (edit config first):
./run.sh train
Unlike ImageNet weights which primarily focus on a single salient object within the environment, regardless of its distance, the proposed VANP demonstrates greater accuracy in attending to multiple nearby objects that directly influence the robot's trajectory by activating regions corresponding to pedestrians, cars, trash cans, doors, and other relevant elements.
However, the model sometimes fails to pay attention to the important regions affecting the trajectory. We can see activations in the sky or lots of unnecessary activations:
Thanks for GNM, VICreg, and Barlow papers for making their code public.