-
Notifications
You must be signed in to change notification settings - Fork 115
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adding SnapshotManager #1598
Adding SnapshotManager #1598
Conversation
This allows users to create a snapshot of the current `git` repository and launch the job from this snapshot. This can prevent jobs that are slow to start or requeued from picking up local changes
submitit/snapshot.py
Outdated
continue | ||
dest_file = os.path.join(self.snapshot_dir, file) | ||
os.makedirs(os.path.dirname(dest_file), exist_ok=True) | ||
shutil.copy(os.path.join(root_dir, file), dest_file) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rsync of a bunch of files together is likely faster
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we use Rsync then I think Rsync should appear in the name just to make it clear on the user.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also rsync can respect .gitignore which we probably want to do to avoid copying models/checkpoints in the root dir: rsync --exclude='.git/' --filter=':- .gitignore'
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @stephenroller for the suggestion! I'm getting a list of all the checked in files using git ls-files
and stashing them in a temporary file that gets passed to rsync now. I think this should address the above comments (fast and avoid copying models/checkpoints etc).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we use Rsync then I think Rsync should appear in the name just to make it clear on the user.
I would also tend to check that Rsync is installed in the init, and provide a nice message as error explaining that it is needed and how to install it? Would be a nice user experience
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the contribution, this has been asked by several users !
submitit/snapshot.py
Outdated
continue | ||
dest_file = os.path.join(self.snapshot_dir, file) | ||
os.makedirs(os.path.dirname(dest_file), exist_ok=True) | ||
shutil.copy(os.path.join(root_dir, file), dest_file) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we use Rsync then I think Rsync should appear in the name just to make it clear on the user.
submitit/snapshot.py
Outdated
continue | ||
dest_file = os.path.join(self.snapshot_dir, file) | ||
os.makedirs(os.path.dirname(dest_file), exist_ok=True) | ||
shutil.copy(os.path.join(root_dir, file), dest_file) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also rsync can respect .gitignore which we probably want to do to avoid copying models/checkpoints in the root dir: rsync --exclude='.git/' --filter=':- .gitignore'
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My comment is addressed, but deferring to Guillaume
@stephenroller do you know what would be the best way to have Apparently we can just add |
AFAIK, rsync is installed by default on nearly every Linux distribution out there. |
Hmm, I'm not sure I understand what's going on with rsync. I can see it being installed in the |
Sorry for the mess. I think you need to add the and the changes to venv caching aren't needed because rsync isn't part of the venv. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor nit, LGTM otherwise.
@jrapin do you think we should put this as submitit.SnapshotManager
or as submitit.helpers.SnapshotManager
?
If it's the second I think snapshot.py can by inlined inside helpers.py.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor nit, LGTM otherwise.
@jrapin do you think we should put this as
submitit.SnapshotManager
or assubmitit.helpers.SnapshotManager
?If it's the second I think snapshot.py can by inlined inside helpers.py.
I was going to say exactly that before seeing your comment. helpers provides some tools (eg: CommandFunction, FunctionSequence etc) that can be used to launch jobs but are not strictly necessary, so I think it fits well.
and btw, a line explaining how it works there would be great: https://github.com/facebookincubator/submitit/blob/master/docs/structure.md#helpers |
Also adding brief description in docs
Thanks a lot @lematt1991 for keeping up with the streams of comment ! |
It's part of 1.2.0 release, I'll think on if we should add a |
This allows users to create a snapshot of the current
git
repositoryand launch the job from this snapshot. This can prevent jobs that
are slow to start or re-queued from picking up local changes