Fix load hanging in distributed processes #3839

muellerzr · 2022-11-14T15:47:28Z

Should solve #3836

Basically similar to the export function we need to include a distrib_barrier() in here so that all the processes are waiting and synced here before loading in the model. The wait_for_everyone() I found actually doesn't apply inside the Learner because since it's added during the context manager that version of the Learner is stored differently (so it doesn't have learn.accelerator for instance). Fun memory things 🙃

cc @jph00

review-notebook-app · 2022-11-14T15:47:32Z

Check out this pull request on

See visual diffs & provide feedback on Jupyter Notebooks.

Powered by ReviewNB

muellerzr · 2022-11-14T21:04:51Z

@jph00 not sure where the diff is coming in, nbdev_export shows nothing new changing when building from 2.3.9 and 2.3.10 😕

Fix load hanging in distributed processes

44a5b08

muellerzr requested a review from jph00 as a code owner November 14, 2022 15:47

Merge branch 'master' into fix-save

a8c2e9e

jph00 merged commit 1e0e2d7 into fastai:master Feb 15, 2023

jph00 added the bug label Feb 15, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix load hanging in distributed processes #3839

Fix load hanging in distributed processes #3839

muellerzr commented Nov 14, 2022

review-notebook-app bot commented Nov 14, 2022

muellerzr commented Nov 14, 2022

Fix load hanging in distributed processes #3839

Fix load hanging in distributed processes #3839

Conversation

muellerzr commented Nov 14, 2022

review-notebook-app bot commented Nov 14, 2022

muellerzr commented Nov 14, 2022