Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

local override #55

Closed
agraubert opened this issue Apr 9, 2020 · 5 comments · Fixed by #56
Closed

local override #55

agraubert opened this issue Apr 9, 2020 · 5 comments · Fixed by #56
Assignees
Labels
enhancement New feature or request

Comments

@agraubert
Copy link
Collaborator

As mentioned in the GDAN meeting, some people are interested in adding an option to save inputs to node-local storage instead of over the NFS. This is particularly useful for large input files which are only needed once and may clog NFS bandwidth and storage.

In terms of implementation, I think this makes sense as an override which follows the behavior of Delayed, except that the file is downloaded to local storage (not over NFS). I'm leaning towards calling this override Local, but it is very similar to Localize, so it may not be the best choice.

I'm going to try to get to this today or tomorrow, and should have a PR open soon

@agraubert agraubert added the enhancement New feature or request label Apr 9, 2020
@agraubert agraubert self-assigned this Apr 9, 2020
@agraubert
Copy link
Collaborator Author

@julianhess I'm moving the discussion here just so there's a more organized record than our slack convo, but I think here are the remaining open questions:

  1. Where do downloaded files go? Should jobs just mktmp -d and save local files there or should the user have control over where files are saved?
  2. Should workers mount secondary disks over the download location (ie: over /tmp or user-specified dir)?

Here is my personal opinion:

  1. It's obviously simpler just to say "no, you don't get to say where these files go", but I'm definitely not opposed to adding a localizer arg to specify where these files should be saved on workers
  2. Personally, I'm against the nodes themselves provisioning extra disks. I think it's more straightforward if we just go with "Transient backends can specify the boot disk size of workers, to accommodate for local file downloads". Again, I'm okay if we want transient backends to be able to mount secondary disks to their workers, but I think that should be a backend feature. It would definitely add a bit of complexity, as the gcloud commands to boot workers will need to include secondary disk options

@julianhess
Copy link
Collaborator

Copying over my comments from the Slack convo:

I think that the entrypoint should provision disks. There are a few reasons for this, ordered loosely by importance:

  • There is currently no “spectrum” of disks within the node configurations. We don’t have a set of predefined nodes with different disk sizes the way we do with RAM/CPU, and indeed, this would be impractical (the number of all possible nodes with {RAM}x{CPU} is already large; adding {disk} as a third Cartesian product would be even larger). Thus, the only way for the user to ensure the local disk is big enough is to create all nodes with the minimum required disk size, which would be a waste of resources.
    • Relatedly, we could delete the disk once the task finishes. Since instances get recycled between tasks, this would save a lot of resources: if Task A requires 1 TB local disk, that local disk need only exist for the duration of Task A. If a 1 TB disk were permanently attached to an instance, the disk would likely sit idle when other tasks which don't require any local disk are dispatched to the instance.
  • This would be really easy to add to the entrypoint — it would just require adding a line in localization.sh script, and adding a line in teardown.sh. Plus, by making this a localizer option, it fits nicely in the current framework of constructing localization.sh/teardown.sh.
  • Provisioning the disk on the fly solves Slurm’s crappy disk accounting problem (at least for tasks that need local disk). If multiple tasks share a single local disk, it’s possible to oversubscribe it, but if 1 disk corresponds to 1 Slurm task, then we only need request a disk as large as the files being localized (thanks to Aaron for pointing out that the only files that would be stored on a local disk would be from cloud storage, so we know ahead of time how large the disk needs to be).

The only downside I see is if the user is running Canine on an on-prem Slurm cluster or any environment where it's not possible to dynamically attach/detach disks. We'd need to test for that. In the future, we'd also need to test for different cloud providers. But for running on GCP, I can't see any downsides.

@agraubert
Copy link
Collaborator Author

I'm already planning on using the google metadata service to get some information about the current node when attaching a disk. I could add a failure case if it's unable to reach the service and we can assume we're not on a GCP server

@agraubert
Copy link
Collaborator Author

Current draft of setup tasks added by local download:

[
                'export CANINE_LOCAL_DISK_SIZE={}GB'.format(local_download_size),
                'export CANINE_LOCAL_DISK_TYPE={}'.format(self.temporary_disk_type),
                'export CANINE_NODE_NAME=$(curl -H "Metadata-Flavor: Google" http://metadata.google.internal/computeMetadata/v1/instance/name)',
                'export CANINE_NODE_ZONE=$(basename $(curl -H "Metadata-Flavor: Google" http://metadata.google.internal/computeMetadata/v1/instance/zone))',
                'gcloud compute disks create {} --size {} --type pd-{} --zone $CANINE_NODE_ZONE'.format(
                    disk_name,
                    local_download_size,
                    self.temporary_disk_type
                ),
                'gcloud compute instances attach-disk $CANINE_NODE_NAME --zone $CANINE_NODE_ZONE --disk {} --device-name {}'.format(
                    disk_name,
                    device_name
                ),
                'gcloud compute instances set-disk-auto-delete $CANINE_NODE_NAME --zone $CANINE_NODE_ZONE --disk {}'.format(disk_name),
                'sudo mkfs.ext4 -m 0 -E lazy_itable_init=0,lazy_journal_init=0,discard /dev/disk/by-id/google-{}'.format(device_name),
                'sudo mkdir -p {}/{}'.format(self.local_download_dir, disk_name),
                'sudo mount -o discard,defaults /dev/disk/by-id/google-{} {}/{}'.format(
                    device_name,
                    self.local_download_dir,
                    disk_name
                ),
                'sudo chmod a+w {}'.format(self.local_download_dir)
            ]

Sudo may be a problem, but I figure 99% of users on a GCP cluster will have root access, since they're probably running their own machines. Since this won't work in on-prem clusters anyways, sudo isn't a concern there.

@agraubert
Copy link
Collaborator Author

agraubert commented Apr 10, 2020

Another thought: the disk will exist outside of the directories normally bind-mounted for docker jobs.
wolf flows using the task auto-dockerization will need a way to detect the extra directory that needs mounting. Other tasks and pipelines which use docker in their script will also need to bind-mount the right path. Maybe canine should export $CANINE_DOCKER_ARGS in the entrypoint and just suggest users include that in their docker command?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants