Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Docker image resolver is too strict #93

Open
mariusae opened this issue Nov 29, 2018 · 4 comments
Open

Docker image resolver is too strict #93

mariusae opened this issue Nov 29, 2018 · 4 comments
Assignees

Comments

@mariusae
Copy link
Collaborator

mariusae commented Nov 29, 2018

Trying to run a simple Reflow program with the image "biocontainers/bwa" fails:

$ reflow run  align.rf 
obtaining manifest digest for "index.docker.io/biocontainers/bwa:latest": MANIFEST_UNKNOWN: "manifest unknown"
$ 
@mariusae
Copy link
Collaborator Author

Probably the behavior should be to not provide a resolution to such images.

@siddharthab
Copy link

These images will also fail on docker pull. So I thought we would want to fail fast. Otherwise, the pipeline fails after several minutes of retries. I don't have my computer with me this week so can not verify.

Examples of failed pulls with this error - openshift/origin#17031
https://stackoverflow.com/questions/41810104/docker-manifest-unknown-manifest-unknown

I think we can provide a flag to turn off this feature explicitly in case there are bugs in this feature. But I think the current behaviour is the desired behaviour for this issue.

@mariusae
Copy link
Collaborator Author

I'm not sure: we may never hit the exec with this image, and there are other reasons resolution can fail but execution can succeed. (e.g., think of using an image that I've pushed locally but not published to a public repository yet).

@siddharthab
Copy link

My main objection to making errors in the resolver non-fatal is that name resolution and cache key computation should be deterministic and not best effort to avoid confusion for users. I say should and not must to acknowledge that caching in general is best-effort. I think it is OK to make the feature opt-in or opt-out, but when this feature is being used, it should not be best effort.

As to the points you bring up, I think the benefits outweigh the costs.

  1. I think the correct solution here will be to resolve the images lazily. But as a user, I don't think this is a drawback. If a certain code path can use an image, I would rather fail early in any run of the workflow if a potential image does not exist.
  2. The fact that we don't do a docker pull before using an image raises correctness/reproducibility concerns in local mode (only run mode where this will be an issue). But if you want, I can make name resolution in local run mode check for image availability in the local docker client first.

I can send a change to make the feature opt-in or opt-out. But as a user, I think we should not make it best effort. Please consider.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants