Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixed bug in EthosBinary dataset class and model directory copying logic in RayTuneReportCallback #1129

Merged
merged 13 commits into from
Mar 25, 2021

Conversation

ANarayan
Copy link
Collaborator

This PR fixes two bugs:

  1. processes the downloaded ethos binary dataset to be comma separated in oppose to semicolon separated.
  2. ensures that the model directory copying process in the RayTuneReportCallback is atomic: prior to this addition, issues arose where files within the model checkpoint directory were being accessed before being completely written.

cc: @tgaddair

@@ -722,7 +723,14 @@ def on_epoch_end(self, trainer, progress_tracker, save_path):
if trainer.is_coordinator():
with tune.checkpoint_dir(step=progress_tracker.epoch) as checkpoint_dir:
checkpoint_model = os.path.join(checkpoint_dir, 'model')
shutil.copytree(save_path, checkpoint_model)
if not os.path.isdir(checkpoint_model):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe add a comment here about the fact that the previous implementation was copytree (you can leave that line commented) but that that's not atomic and so we have to do this temp + move thing, otherwise in the future we may forget about it and maybe replace it with copytree again :)

Copy link
Collaborator Author

@ANarayan ANarayan Mar 25, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

great feedback -- addressed in this commit here: c924b9f

Copy link
Collaborator

@tgaddair tgaddair left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Atomic copy is awesome.

@w4nderlust w4nderlust merged commit e281f74 into ludwig-ai:master Mar 25, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants