Skip to content
This repository has been archived by the owner on Nov 3, 2023. It is now read-only.

[Chunk teacher] Bug with distributed evaluation #2935

Merged
merged 2 commits into from
Aug 4, 2020

Conversation

emilydinan
Copy link
Contributor

@emilydinan emilydinan commented Aug 4, 2020

Patch description
Very subtle bug using chunk teacher with distributed evaluation. The count to place samples for different gpus was being reset every time a chunk was being loaded, which led to the wrong number of samples being placed on certain gpus IF the valid set had more than one chunk assigned to it. This led to the system hanging forever at the self.samples.get(), because num_episodes exceeded the actual number of samples on the queue.

💀 RIP a day of my life 💀

Copy link
Contributor

@stephenroller stephenroller left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cheers

Copy link
Contributor

@wyshi wyshi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Woohoo! It was fun debugging this one.

@emilydinan emilydinan merged commit 77f34c9 into master Aug 4, 2020
@emilydinan emilydinan deleted the reallyfunbuggreattimes branch August 4, 2020 22:02
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants