workspaces (patch/canary) -- overlay local changes on top of image and deploy to remote

## Description


The current workflow for developing a component with Docker based schedulers requires manually building the Docker image every time there's a change. This is an extra layer of friction and requires significant Docker knowledge and maintenance. It would be nice to provide a way to do this via the `torchx` CLI to reduce user friction.


## Detailed Proposal


This is going to add the concept of "workspaces". These workspaces look like file systems and can be implemented as one via fsspec. This can either map to an on-disk project with a `.torchxconfig` or in memory for use with notebooks.

This requires adding a workspace support to the runner and schedulers. There'll be a couple of standard patching implementations, a stub one for local, one for docker, etc.

### Programmatic Experience

For programmatic access we need to implement a high level concept of a workspace. For CLI commands this will map to the project folder (i.e. with the .torchxconfig). For programmatic, it's a bit more abstract in order to support the notebook workspaces #344

Workspace basically just track files. This means that we can potentially use any fsspec filesystem interface to find and build files. When using Docker, we need to build a tarball of the local files to upload the context. fsspec provides a clean interface to be able to find all files and tarball them.

```py
from torchx.runner import get_runner

app: specs.AppDef = ...

runner = get_runner()
runner.run(app, "kubernetes", workspace="file:///home/d4l3k/my_project")
```

For things like notebooks we can use an in memory file system:

```py
runner.run(app, "kubernetes", workspace="memory://torchx-notebook/")
```

### CLI Experience

Before:
```shell
$ docker build -t repo.sh/my_image:my_tag .
$ docker push repo.sh/my_image:my_tag
$ torchx run -s kubernetes dist.ddp --image repo.sh/my_image:my_tag my_trainer.py
```

After:
```shell
# in folder w/ .torchxconfig
$ torchx run -s kubernetes -cfg push=repo.sh/my_image dist.ddp my_trainer.py
```

This canary syntax with Docker can work with local_docker, kubernetes and potentially Ray.

### Docker

Docker supports layering so we just have to create a small Dockerfile such as

```Dockerfile
FROM ghcr.io/pytorch/torchx:0.1.1dev0 # the specified image
COPY . .
```

We'll walk the workspace and upload all files as the Docker context when building.

For local running we can have to build it and use the local tag. For remote we need a repository to push it to. We can default to pushing to the same one the package is specifies and use the image hash as the label. This will be an extra run config required to override if they're building off of a standard Docker images such as the provided torchx one and can be specified in the `.torchxconfig` file.

## Alternatives


There's some question about whether we only want to support Docker or should use buildah instead since that seems to be the more robust option. https://github.com/containers/buildah However, for now it seems like users are most familiar with Docker and buildah provides a compatible API so for maximum support we can use the existing Docker API.

For small components we can inline the file via the existing python component. This will work for many things but not everything.

This also doesn't quite address how we can do the same thing on Slurm. Though we could potentially support Docker on slurm which would be interesting. Slurm does support OCI images so we should potentially migrate towards that as a first class support from TorchX.

## Additional context/links


Docker Python SDK: https://docker-py.readthedocs.io/en/stable/images.html

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

workspaces (patch/canary) -- overlay local changes on top of image and deploy to remote #333

Description

Detailed Proposal

Programmatic Experience

CLI Experience

Docker

Alternatives

Additional context/links

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

workspaces (patch/canary) -- overlay local changes on top of image and deploy to remote #333

Description

Description

Detailed Proposal

Programmatic Experience

CLI Experience

Docker

Alternatives

Additional context/links

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions