Frequently Asked Questions

Are the step-by-step instructions aligned with subgoals?

Yes, each step-by-step instruction has a corresponding subgoal in the training and validation trajectories. If you use this alignment during training, please see the submission guidelines for leaderboard submissions.

Getting 100% success rate with ground-truth trajectories

You should be able to achieve >99% success rate on training and validation tasks with the ground-truth actions and masks from the dataset. Occasionally, some non-determistic behavior in THOR can lead to failures, but they are extremely rare.

Can you train an agent without mask prediction?

Mask prediction is an important part of the ALFRED challenge. Unlike non-interactive environments (e.g vision-language navigation), here it's necessary for the agent to specify what exactly it wants to interact with.

Why do `feat_conv.pt` in Full Dataset have 10 more frames than the number of images?

The last 10 frames are copies of the features from the last image frame.

How do I get panoramic image observations?

You can use augment_trajectories.py to replay all the trajectories and augment the visual observations. At each step, use the THOR API to look around and take 6-12 shots of the surrounding. Then stitch together these shots to create a panoramic image for each frame. You might have to set 'forceAction': True for smooth moveahead/rotate/look. Note that getting panoramic images during test time would incur the additional cost of looking around with the agent.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

FAQ.md

FAQ.md

Frequently Asked Questions

Are the step-by-step instructions aligned with subgoals?

Getting 100% success rate with ground-truth trajectories

Can you train an agent without mask prediction?

Why do `feat_conv.pt` in Full Dataset have 10 more frames than the number of images?

How do I get panoramic image observations?

Files

FAQ.md

Latest commit

History

FAQ.md

File metadata and controls

Frequently Asked Questions

Are the step-by-step instructions aligned with subgoals?

Getting 100% success rate with ground-truth trajectories

Can you train an agent without mask prediction?

Why do feat_conv.pt in Full Dataset have 10 more frames than the number of images?

How do I get panoramic image observations?

Why do `feat_conv.pt` in Full Dataset have 10 more frames than the number of images?