Girder Worker provides a built-in task that can be used to run docker containers. Girder Worker makes it easy to work on data held in girder from within a docker containers.
The docker_run
task exposes a container_args
parameter which can be used to pass arguments to the container entrypoint.
The volumes to be bind mounted into a container can be passed to the docker_run
task in one of two ways.
In this case the value of the volumes
parameter is a dict
conforming to specification defined by docker-py, which is passed directly to docker-py. For example
volumes = {
'/home/docker/data': {
'bind': '/mnt/docker/',
'mode': 'rw'
}
}
docker_run.delay('my/image', pull_image=True, volumes=volumes)
Girder Worker provides a utility class :pygirder_worker.docker.transforms.BindMountVolume
that can be used to define volumes that should be mounted into a container. These classes can also be used in conjunction with other parts of the girder_work docker infrastructure, for example providing a location where a file should be downloaded to. See Downloading files from Girder. When using the :pygirder_worker.docker.transforms.BindMountVolume
class a list of instances is provided as the value for the volumes
parameter, Girder Worker will take care of ensuring that these volumes are mounted. In the example below we are creating a :pygirder_worker.docker.transforms.BindMountVolume
instance and passing it as a container argument to provide the mounted location to the container. Girder Worker will take care of transforming the instance into the approriate path inside the container.
vol = BindMountVolume('/home/docker/data', '/mnt/docker/')
docker_run.delay('my/image', pull_image=True, volumes=[vol], container_args=[vol])
A :pygirder_worker.docker.transforms.TemporaryVolume
class is provided representing a temporary directory on the host machine that is mounted into the container. :pygirder_worker.docker.transforms.TemporaryVolume.default
holds a default instance that is used as the default location for many other parts of the Girder Worker docker infrastructure, for example when downloading a file. See Downloading files from Girder. However, it can also be used explicitly, for example, here it is being passed as a container argument for use within a container. Again, Girder Worker will take care of transforming the :pygirder_worker.docker.transforms.TemporaryVolume
instance into the appropriate path inside the container, so the container entrypoint will simply received a path.
vol = BindMountVolume('/home/docker/data', '/mnt/docker/')
docker_run.delay('my/image', pull_image=True, container_args=[TemporaryVolume.default])
Note that because we are using the default path, we don't have to add the instance to the volumes
parameter as it is automatically added to the list of volumes to mount.
Accessing files held in girder from within a container is straightforward using the :pygirder_worker.docker.transforms.girder.GirderFileIdToVolume
utility class. One simply provides the file id as an argument to the constructor and passes the instance as a container argument.
docker_run.delay('my/image', pull_image=True,
container_args=[GirderFileIdToVolume(file_id)])
The :pygirder_worker.docker.transforms.girder.GirderFileIdToVolume
instance will take care of downloading the file from Girder and passing the path it was downloaded to into the docker container's entrypoint as an argument.
If no volume
parameter is specified then the file will be downloading to the task temporary volume. The file can also be downloaded to a specific :pygirder_worker.docker.transforms.BindMountVolume
by specifying a volume parameter, as follows:
vol = BindMountVolume(host_path, container_path)
docker_run.delay('my/image', pull_image=True,
container_args=[GirderFileIdToVolume(file_id,volume=vol)])
If the file being downloaded is particularly large you may want to consider streaming it into the container using a named pipe. See Streaming Girder files into a container for more details.
Utility classes are also provided to simplify uploading files generated by a docker container. The :pygirder_worker.docker.transforms.girder.GirderUploadVolumePathToItem
provides the functionality to upload a file to an item. In the example below, we use the :pygirder_worker.docker.transforms.VolumePath
utility class to define a file path that we then pass to the docker container. The docker container can write data to this file path. As well as passing the :pygirder_worker.docker.transforms.VolumePath
instance as a container argument we also pass it to :pygirder_worker.docker.transforms.girder.GirderUploadVolumePathToItem
, the :pygirder_worker.docker.transforms.girder.GirderUploadVolumePathToItem
instance is added to girder_result_hooks
. This tells Girder Worker to upload the file path to the item id provided once the docker container has finished running.
volumepath = VolumePath('write_data_to_be_upoaded.txt')
docker_run.delay('my/image', pull_image=True, container_args=[volumepath],
girder_result_hooks=[GirderUploadVolumePathToItem(volumepath, item_id)])
Girder Worker uses named pipes as a language agnostic way of streaming data in and out of docker containers. Basically a named pipe is created at a path that is mounted into the container. This allows the container to open that pipe for read or write and similarly the Girder Worker infrastructure can open the pipe on the host, thus allowing data write and read from the container.
The are two utility classes used to represent a named pipe, :pygirder_worker.docker.transforms.NamedOutputPipe
and :pygirder_worker.docker.transforms.NamedInputPipe
.
This represents a named pipe that can be opened in a docker container for write, allowing data to be streamed out of a container.
This represents a named pipe that can be opened in a docker container for read, allowing data to be streamed into a container.
These pipes can be connected together using the :pygirder_worker.docker.transforms.Connect
utility class.
One common example of using a named pipe is to stream a potentially large file into a container. This approach allows the task to start processing immediately rather than having to wait for the entire file to download, it also removes the requirement that the file is held on the local filesystem. In the example below we are creating an instance of :pygirder_worker.docker.transforms.girder.GirderFileIdToStream
that provides the ability to download a file in chunks. We are also creating a named pipe called read_in_container
, as no volume
argument is provided this pipe will be created on the temporary volume automatically mounted by Girder Worker. Finally, we are using the :pygirder_worker.docker.transforms.Connect
class to "connect" the stream to the pipe and we pass the instance as a container argument. Girder Worker will take care of the select logic to stream the file into the pipe.
stream = GirderFileIdToStream(file_id)
pipe = NamedInputPipe('read_in_container')
docker_run('my/image', pull_image=True, container_args=[Connect(stream, pipe)])
All the container has to do is open the path passed into the container entry point and start reading. Below is an example python entry point:
# Simply open the path passed into the container.
with open(sys.argv[1]) as fp:
fp.read() # This will be reading the files contents
Due to some odd symlinking behavior by Docker engine on MacOS, it may be necessary to add a workaround when running the girder_worker. If your TMPDIR
environment variable is underneath the /var
directory and you see errors from Docker about MountsDenied
, try running girder worker with the TMPDIR
set underneath /private/var
instead of /var
. The location should be equivalent since /var
is a symlink to /private/var
.