Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CML docker version tagging #217

Closed
DavidGOrtega opened this issue Aug 14, 2020 · 15 comments · Fixed by #494 or #496
Closed

CML docker version tagging #217

DavidGOrtega opened this issue Aug 14, 2020 · 15 comments · Fixed by #494 or #496
Assignees
Labels
cml-image Subcommand p0-critical Max priority (ASAP)

Comments

@DavidGOrtega
Copy link
Contributor

NPM is registering the versions properly, however DockerHub is not.
Be able to pin the exact version if CML is important to warranty the reproducibility of the workflow. Changes in latest may break existing workflows and enforces us to work with backward compatibility

@hsharrison
Copy link

I brought this up her but was redirected here.

We would like to have a pinned version.

In the other issue I was asked what the pain point was, and it's true that you all have not made any backwards incompatibilities. So the pain point is merely that we have policies about pinning images and not having versions forces us to break that policy and then explain a comment why it's pinned latest, having team members get confused about this every now and then and think someone made a mistake (e.g. if they see cml:latest in the CI logs), etc.

Maybe there will never be a problem and nothing will ever be broken but it does take some "cognitive load" to remember why it's not version pinned.

@DavidGOrtega
Copy link
Contributor Author

👋 @hsharrison I will review this soon

@DavidGOrtega DavidGOrtega added the cml-image Subcommand label Feb 23, 2021
@DavidGOrtega DavidGOrtega added the p0-critical Max priority (ASAP) label Mar 8, 2021
@DavidGOrtega
Copy link
Contributor Author

With DVC 2.0 breaking changes this is actually very important.
@0x2b3bfa0 we should even review if this is not a stopper for the latest release

@hsharrison
Copy link

With DVC 2.0 breaking changes this is actually very important.
@0x2b3bfa0 we should even review if this is not a stopper for the latest release

Oh yeah, have you guys already updated the image? 😬 I can tell you soon if it is causing any problems...

@0x2b3bfa0
Copy link
Member

0x2b3bfa0 commented Mar 8, 2021

@hsharrison, no worries: we haven't (yet) and we won't without considering the implications.

I wonder if we could avoid shipping DVC with the base CML containers and ask users to explicitly install it, following the reasoning behind some of the comments on iterative/dvc#2774.

We could also ship a DVC major version with each CML release and tag the latter, but that would (potentially) keep users tied to DVC 1 from ejoying the benefits of the new CML releases. Another option would be shipping the two most recent major versions, but the bloat might be worse than the benefits.

@0x2b3bfa0 0x2b3bfa0 added this to the 0.3.1 milestone Mar 8, 2021
@DavidGOrtega
Copy link
Contributor Author

DavidGOrtega commented Mar 10, 2021

@0x2b3bfa0 lets attack this definitely.

We have to tag with:

  • CUDA 10 and 11
  • Python 3
  • DVC 2

Not sure about DVC

So the image could be

  • dvcorg/cml:cuda10-python-dvc2:{version}
  • dvcorg/cml:cuda11-python3-dvc:{version}

We can also for easiness and clarity get rid off elgohr/Publish-Docker-Github-Action. Has served us very well but for this might be easier for us to all without it like we are tagging later on.

We can do also #383

We should probably also freeze dvcorg/cml:latest or always stick with dvcorg/cml to be cuda 10.

@dmpetrov

@DavidGOrtega DavidGOrtega reopened this Mar 10, 2021
@0x2b3bfa0
Copy link
Member

0x2b3bfa0 commented Mar 10, 2021

About the Docker tagging, I wouldn't mind spending some time deprecating all the legacy images and tags (if any), and setting up automated tagging for the next releases, but the main issue is what to ship and how to ship it.

  • CUDA supporting libraries are required for all the GPU-enabled orchestrators, so that's definitely nice to have, though I'm not familiar with its usage and don't know how many versions people will need to choose from.

  • Python is also a must-have for most of the training jobs that will run on these containers, but we can't ship the latest 3 because it usually breaks the usual suspects (numpy, scipy) for a while due to CPython API deprecations. Having a container with 2 (latest) and 3 (latest - minor) could be interesting.

  • DVC is also really nice to have, and we could ship the latest if that doesn't break anything, but it would be better yet to pin it to the major version to avoid incompatibilities.

  • CML itself would always be latest, as we don't plan to make any breaking change, at least until we hit 1.0.0, right?

We could automatically publish containers with some tag tetrads like cml-0-11-3-2 for CML-CUDA-Python-DVC if we don't find a better solution. Anyhow, it would be nice to test all the containers for correctness with some sort of touchstone workflow collection.

(Originally written as a Slack direct message)

@DavidGOrtega
Copy link
Contributor Author

DavidGOrtega commented Mar 10, 2021

We could automatically publish containers with some tag tetrads like cml-0-11-3-2

I prefer to be explicit.

On thing also is that our py3 image is an image not a tag. Should we continue that way?

dvcorg/cml:cuda10-dvc2
dvcorg/cml-py3:cuda10-dvc

@0x2b3bfa0
Copy link
Member

0x2b3bfa0 commented Mar 10, 2021

I prefer to be explicit.

Excellent! Me too. 👍

On thing also is that our py3 image is an image not a tag. Should we continue that way?

If we choose this approach, I would suggest to migrate everything to tags for consistency and soft-deprecate the cml-py3 image.

@0x2b3bfa0

This comment has been minimized.

@0x2b3bfa0
Copy link
Member

0x2b3bfa0 commented Mar 11, 2021

@dmpetrov, it looks like we aren't using the cml-dev images, at least not on any of the repositories I'm familiar with. May we deprecate them in favor of the newly proposed tagging convention?

Once they become useful, we can start publishing images without a pinned major version, like cml:1-··· for the sake of future predictions. The absence of a version number would denote that it's the latest development version, but maybe we could also use another placeholder to make it explicit.

@0x2b3bfa0
Copy link
Member

0x2b3bfa0 commented Mar 11, 2021

When deprecating the old images, we might also want to pin the provider version here, and then release a latest version with the old tags with that change and a deprecation warning message. That would keep the current users well informed and ensure that their workflows won't break if we modify the provider.

@0x2b3bfa0
Copy link
Member

0x2b3bfa0 commented Mar 16, 2021

Unification and tagging proposal

Images

Hard deprecations (#383)

Soft deprecations

Updated images

Tag components

  • CML
    • 0
  • DVC
    • 1
    • 2
  • BASE
    • 0
      • CUDA 10.1
      • cuDNN 7
      • Python 2.7
      • Ubuntu 18.04
    • 1
      • CUDA 11.0.1
      • cuDNN 8
      • Python 3.8
      • Ubuntu 20.04

Note: items in bold won't be provided for non-GPU images.

Tag example

  • cml-gpu:0-dvc-2-base-1 would refer to a GPU-enabled image with CML 0.X.X, DVC 2.X.X and the base 1 as per the list above.

Rationale

This proposal tries to simplify the image offering by tying packages to known major versions, creating flexible labels for users to choose from, and reducing the total number of image names.

Why separate cml and cml-gpu

Given that we're using totally different base images —ubuntu/(bionic|focal) for the former and nvidia/cuda for the latter— it might make sense to use separate names instead of tags to differentiate them. Also, there is a huge difference in their size (30 MB versus 1.11 GB), so we might want to take advantage of this naming scheme to differentiate both use cases as early as possible and let users choose the most appropriate one since the very beginning.

⚠️ See #494 for the most recent structure

Why pin major versions by default

Endorsing the usage of the latest tag won't allow us to (eventually) make a breaking change without disrupting the workflow of our existing users. On the other hand, forcing users to pin a minor version could keep us from delivering new useful features to existing users without requiring them to update their pinned versions.

Why group system packages

Users might have very different requirements for their system and core packages depending on their stack, but a few of these setups are a lot more common than many others. Ideally, we should be offering a few curated combinations that are known to work correctly, and let the users choose fine-grained variations over that, be it by building images on top of ours or simply by installing additional packages when builiding each job.

Why merge cml-dev into cml-test

I'm suggesting this additional action because, even if it's useful to have test, development and staging images for internal consumption, there wouldn't be any noticeable benefit in separating them beyond the label level. A single image might be enough for grouping all the transient builds that shouldn't matter for end users.


💡 Note: this proposal has its roots in this Dockerfile and this release.yml workflow.

@0x2b3bfa0
Copy link
Member

0x2b3bfa0 commented Mar 16, 2021

Tangentially, we probably should also consider the possibility of using GitHub Environments as mentioned above for deploying cml-test images with UID tags from any pull request and running cloud tests on them. It would be a good step towards hopefully automated testing of cloud runners.

@0x2b3bfa0 0x2b3bfa0 modified the milestones: 0.3.1, 0.3.2 Mar 17, 2021
@0x2b3bfa0
Copy link
Member

0x2b3bfa0 commented Mar 19, 2021

Given that image tags should be as stable as possible over time, we probably should discuss this proposal and triple-check everything before applying any of the proposed changes.

🔔 @iterative/cml

@0x2b3bfa0 0x2b3bfa0 modified the milestones: 0.3.2, 0.3.3 Mar 22, 2021
@0x2b3bfa0 0x2b3bfa0 modified the milestones: 0.3.3, 0.3.4 Apr 5, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cml-image Subcommand p0-critical Max priority (ASAP)
Projects
None yet
3 participants