Implement imitation learning baseline #31

mathfac · 2019-04-03T23:48:09Z

Implement imitation learning baseline that uses action shortest path for episode for training.
Place it in https://github.com/facebookresearch/habitat-api/tree/master/baselines.

wannabeOG · 2020-01-28T05:41:47Z

Hi
I have been trying to implement an imitation learning baseline (more specifically Behavioral cloning) as detailed in this issue.

First off, just to confirm, when you say "action shortest path", would I be correct in interpreting this to mean that the expert will be an instance of the ShortestPathFollower class?

A really high-level view of what I hope to do is:

For point goal navigation, I hope to implement an "Expert" class that chooses the best action (both the modes: "geodesic" and "greedy"). The corresponding observations, actions and rewards will be recorded and saved as a dataset (for eg as a .npz file)
Implement a Dataset class and a custom data loader that can read in from this .npz file and convert in into tensors
Use this dataloader to feed tensors into a model (some variant of the SimpleCNN model used in the PPO baseline) to predict the action for a particular observation and train the model on the expert trajectories.

Does this seem okay?

dhruvbatra · 2020-02-23T21:00:56Z

Sorry for a delayed response.

would I be correct in interpreting this to mean that the expert will be an instance of the ShortestPathFollower class?

I believe so, but someone like @jacobkrantz can confirm.

The corresponding observations, actions and rewards will be recorded and saved as a dataset (for eg as a .npz file)
Implement a Dataset class and a custom data loader that can read in from this .npz file and convert in into tensors

Unless I'm missing something, this sounds like a bad idea. Writing heavy observations (images, etc) to disk from a simulator feels wrong as a design. Why can't the observation tensors be directly fed from the simulator to the model?

erikwijmans · 2020-02-24T00:19:00Z

One additional node: "(both the modes: "geodesic" and "greedy")." There isn't a difference in the two modes, both result in an extremely similar expert (the names are just bad, going to change them now).

Unless I'm missing something, this sounds like a bad idea.

There are some caveats with generating the dataset on-the-fly -- namely that there is a cost to switching scenes and therefore your episodes won't really be IID.

For PointNav, I agree that on-the-fly is a must -- train has 5 million episodes and that is just way to large too ever fit on disk (just the images would take over 20 TB even with excellent image compression).

wannabeOG · 2020-02-25T10:31:41Z

The reason behind storing an explicit reference to this "Expert dataset" was to ensure that these episodes could be IID when being fed to train the model, but the memory blowup concern is a more pressing one

Does a strategy of having alternate cycles of training the policy and generating expert trajectories sound good enough? To elaborate, suppose I wish to use 100 expert trajectories to train the model. I could generate 5 expert trajectories at a time, temporarily store these trajectories, train the model on it for some epochs and then generate the next 5 trajectories whilst deleting the previously stored trajectories.

Taking this idea to its logical extreme would be to get an (observation, action) pair from the expert and use this to train the model. However, this would break the IID assumption used in Behavioral Cloning since such pairs would be correlated with each other.

In my opinion, having alternate cycles of training and generating would allow for finding a compromise between both the issues of memory consumption and training episodes being IID.

erikwijmans · 2020-02-25T22:14:10Z

Yeah, collecting some set of trajectories from some set of environments, learn on them, delete and repeat makes sense to me.

This PR provides a much faster and reliable greedy follower. While it doesn't generate the shortest path in action space, the paths still tend to be very good.

mathfac · 2021-01-14T01:17:36Z

Thank you @mukulkhanna and @erikwijmans, @Skylion007 for the reviews. Closing the issue and open another one for Dagger baseline.

mathfac added the enhancement New feature or request label Apr 3, 2019

mathfac added this to the Imitation learning baselines milestone Sep 13, 2019

dhruvbatra pushed a commit that referenced this issue May 10, 2020

Greedy Follower (#31)

59fe2bc

This PR provides a much faster and reliable greedy follower. While it doesn't generate the shortest path in action space, the paths still tend to be very good.

mukulkhanna mentioned this issue Jan 9, 2021

NAV-PACMAN baseline implementation for EQA #539

Merged

7 tasks

mathfac closed this as completed Jan 14, 2021

mathfac mentioned this issue Jan 14, 2021

Implement DAgger baseline #562

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement imitation learning baseline #31

Implement imitation learning baseline #31

mathfac commented Apr 3, 2019

wannabeOG commented Jan 28, 2020

dhruvbatra commented Feb 23, 2020

erikwijmans commented Feb 24, 2020

wannabeOG commented Feb 25, 2020

erikwijmans commented Feb 25, 2020

mathfac commented Jan 14, 2021

Implement imitation learning baseline #31

Implement imitation learning baseline #31

Comments

mathfac commented Apr 3, 2019

wannabeOG commented Jan 28, 2020

dhruvbatra commented Feb 23, 2020

erikwijmans commented Feb 24, 2020

wannabeOG commented Feb 25, 2020

erikwijmans commented Feb 25, 2020

mathfac commented Jan 14, 2021