-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
local override #55
Comments
@julianhess I'm moving the discussion here just so there's a more organized record than our slack convo, but I think here are the remaining open questions:
Here is my personal opinion:
|
Copying over my comments from the Slack convo:I think that the entrypoint should provision disks. There are a few reasons for this, ordered loosely by importance:
The only downside I see is if the user is running Canine on an on-prem Slurm cluster or any environment where it's not possible to dynamically attach/detach disks. We'd need to test for that. In the future, we'd also need to test for different cloud providers. But for running on GCP, I can't see any downsides. |
I'm already planning on using the google metadata service to get some information about the current node when attaching a disk. I could add a failure case if it's unable to reach the service and we can assume we're not on a GCP server |
Current draft of setup tasks added by local download:
Sudo may be a problem, but I figure 99% of users on a GCP cluster will have root access, since they're probably running their own machines. Since this won't work in on-prem clusters anyways, sudo isn't a concern there. |
Another thought: the disk will exist outside of the directories normally bind-mounted for docker jobs. |
As mentioned in the GDAN meeting, some people are interested in adding an option to save inputs to node-local storage instead of over the NFS. This is particularly useful for large input files which are only needed once and may clog NFS bandwidth and storage.
In terms of implementation, I think this makes sense as an override which follows the behavior of
Delayed
, except that the file is downloaded to local storage (not over NFS). I'm leaning towards calling this overrideLocal
, but it is very similar toLocalize
, so it may not be the best choice.I'm going to try to get to this today or tomorrow, and should have a PR open soon
The text was updated successfully, but these errors were encountered: