local override #55

agraubert · 2020-04-09T18:36:04Z

As mentioned in the GDAN meeting, some people are interested in adding an option to save inputs to node-local storage instead of over the NFS. This is particularly useful for large input files which are only needed once and may clog NFS bandwidth and storage.

In terms of implementation, I think this makes sense as an override which follows the behavior of Delayed, except that the file is downloaded to local storage (not over NFS). I'm leaning towards calling this override Local, but it is very similar to Localize, so it may not be the best choice.

I'm going to try to get to this today or tomorrow, and should have a PR open soon

The text was updated successfully, but these errors were encountered:

agraubert · 2020-04-09T19:05:56Z

@julianhess I'm moving the discussion here just so there's a more organized record than our slack convo, but I think here are the remaining open questions:

Where do downloaded files go? Should jobs just mktmp -d and save local files there or should the user have control over where files are saved?
Should workers mount secondary disks over the download location (ie: over /tmp or user-specified dir)?

Here is my personal opinion:

It's obviously simpler just to say "no, you don't get to say where these files go", but I'm definitely not opposed to adding a localizer arg to specify where these files should be saved on workers
Personally, I'm against the nodes themselves provisioning extra disks. I think it's more straightforward if we just go with "Transient backends can specify the boot disk size of workers, to accommodate for local file downloads". Again, I'm okay if we want transient backends to be able to mount secondary disks to their workers, but I think that should be a backend feature. It would definitely add a bit of complexity, as the gcloud commands to boot workers will need to include secondary disk options

julianhess · 2020-04-10T13:11:30Z

Copying over my comments from the Slack convo:

I think that the entrypoint should provision disks. There are a few reasons for this, ordered loosely by importance:

There is currently no “spectrum” of disks within the node configurations. We don’t have a set of predefined nodes with different disk sizes the way we do with RAM/CPU, and indeed, this would be impractical (the number of all possible nodes with {RAM}x{CPU} is already large; adding {disk} as a third Cartesian product would be even larger). Thus, the only way for the user to ensure the local disk is big enough is to create all nodes with the minimum required disk size, which would be a waste of resources.
- Relatedly, we could delete the disk once the task finishes. Since instances get recycled between tasks, this would save a lot of resources: if Task A requires 1 TB local disk, that local disk need only exist for the duration of Task A. If a 1 TB disk were permanently attached to an instance, the disk would likely sit idle when other tasks which don't require any local disk are dispatched to the instance.
This would be really easy to add to the entrypoint — it would just require adding a line in localization.sh script, and adding a line in teardown.sh. Plus, by making this a localizer option, it fits nicely in the current framework of constructing localization.sh/teardown.sh.
Provisioning the disk on the fly solves Slurm’s crappy disk accounting problem (at least for tasks that need local disk). If multiple tasks share a single local disk, it’s possible to oversubscribe it, but if 1 disk corresponds to 1 Slurm task, then we only need request a disk as large as the files being localized (thanks to Aaron for pointing out that the only files that would be stored on a local disk would be from cloud storage, so we know ahead of time how large the disk needs to be).

The only downside I see is if the user is running Canine on an on-prem Slurm cluster or any environment where it's not possible to dynamically attach/detach disks. We'd need to test for that. In the future, we'd also need to test for different cloud providers. But for running on GCP, I can't see any downsides.

agraubert · 2020-04-10T15:48:58Z

I'm already planning on using the google metadata service to get some information about the current node when attaching a disk. I could add a failure case if it's unable to reach the service and we can assume we're not on a GCP server

agraubert · 2020-04-10T16:19:56Z

Current draft of setup tasks added by local download:

[
                'export CANINE_LOCAL_DISK_SIZE={}GB'.format(local_download_size),
                'export CANINE_LOCAL_DISK_TYPE={}'.format(self.temporary_disk_type),
                'export CANINE_NODE_NAME=$(curl -H "Metadata-Flavor: Google" http://metadata.google.internal/computeMetadata/v1/instance/name)',
                'export CANINE_NODE_ZONE=$(basename $(curl -H "Metadata-Flavor: Google" http://metadata.google.internal/computeMetadata/v1/instance/zone))',
                'gcloud compute disks create {} --size {} --type pd-{} --zone $CANINE_NODE_ZONE'.format(
                    disk_name,
                    local_download_size,
                    self.temporary_disk_type
                ),
                'gcloud compute instances attach-disk $CANINE_NODE_NAME --zone $CANINE_NODE_ZONE --disk {} --device-name {}'.format(
                    disk_name,
                    device_name
                ),
                'gcloud compute instances set-disk-auto-delete $CANINE_NODE_NAME --zone $CANINE_NODE_ZONE --disk {}'.format(disk_name),
                'sudo mkfs.ext4 -m 0 -E lazy_itable_init=0,lazy_journal_init=0,discard /dev/disk/by-id/google-{}'.format(device_name),
                'sudo mkdir -p {}/{}'.format(self.local_download_dir, disk_name),
                'sudo mount -o discard,defaults /dev/disk/by-id/google-{} {}/{}'.format(
                    device_name,
                    self.local_download_dir,
                    disk_name
                ),
                'sudo chmod a+w {}'.format(self.local_download_dir)
            ]

Sudo may be a problem, but I figure 99% of users on a GCP cluster will have root access, since they're probably running their own machines. Since this won't work in on-prem clusters anyways, sudo isn't a concern there.

agraubert · 2020-04-10T16:23:45Z

Another thought: the disk will exist outside of the directories normally bind-mounted for docker jobs.
wolf flows using the task auto-dockerization will need a way to detect the extra directory that needs mounting. Other tasks and pipelines which use docker in their script will also need to bind-mount the right path. Maybe canine should export $CANINE_DOCKER_ARGS in the entrypoint and just suggest users include that in their docker command?

agraubert added the enhancement New feature or request label Apr 9, 2020

agraubert self-assigned this Apr 9, 2020

agraubert mentioned this issue Apr 10, 2020

Add local override #56

Merged

agraubert mentioned this issue May 1, 2020

Added $CANINE_DOCKER_ARGS for streaming #59

Merged

agraubert closed this as completed in #56 May 12, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

local override #55

local override #55

agraubert commented Apr 9, 2020

agraubert commented Apr 9, 2020

julianhess commented Apr 10, 2020

agraubert commented Apr 10, 2020

agraubert commented Apr 10, 2020

agraubert commented Apr 10, 2020 •

edited

Loading

local override #55

local override #55

Comments

agraubert commented Apr 9, 2020

agraubert commented Apr 9, 2020

julianhess commented Apr 10, 2020

Copying over my comments from the Slack convo:

agraubert commented Apr 10, 2020

agraubert commented Apr 10, 2020

agraubert commented Apr 10, 2020 • edited Loading

agraubert commented Apr 10, 2020 •

edited

Loading