Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix load hanging in distributed processes #3839

Merged
merged 2 commits into from
Feb 15, 2023
Merged

Conversation

muellerzr
Copy link
Contributor

Should solve #3836

Basically similar to the export function we need to include a distrib_barrier() in here so that all the processes are waiting and synced here before loading in the model. The wait_for_everyone() I found actually doesn't apply inside the Learner because since it's added during the context manager that version of the Learner is stored differently (so it doesn't have learn.accelerator for instance). Fun memory things 🙃

cc @jph00

@review-notebook-app
Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@muellerzr
Copy link
Contributor Author

@jph00 not sure where the diff is coming in, nbdev_export shows nothing new changing when building from 2.3.9 and 2.3.10 😕

@jph00 jph00 merged commit 1e0e2d7 into fastai:master Feb 15, 2023
@jph00 jph00 added the bug label Feb 15, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants