Skip to content
This repository has been archived by the owner on Feb 24, 2020. It is now read-only.

[RFC] define fetch behavior between rkt fetch and rkt run/prepare #1196

Closed
sgotti opened this issue Jul 28, 2015 · 14 comments
Closed

[RFC] define fetch behavior between rkt fetch and rkt run/prepare #1196

sgotti opened this issue Jul 28, 2015 · 14 comments

Comments

@sgotti
Copy link
Contributor

sgotti commented Jul 28, 2015

#1087 (comment) initially proposed two different behaviors between rkt fetch and rkt run/prepare: For rkt fetch, always bypass the on-disk store. but it wasn't implemented in the PR.

The implementation in #1087 probably tries to make rkt fetch/run/prepare mimic the kubernetes container imagePullPolicy behavior.

This is going to create some problems trying to control when an image should be fetched from a remote and when it should be firstly searched in the store.
For example, before #1181, if in the docker remote repository a tag (different than latest) is assigned from an already fetched image to another image it won't be noticed as the image is already in the store and this makes impossible to fetch the new image having that tag without removing the actual image from the store. After #1181, adding cache control and etags, the special docker:// scheme doesn't have these properties so the image is downloaded every time. Both behaviors are not optimal.
The same applies with ACI urls on servers not supporting ETag or Cache-Control max age.

One way to get both behavior will be to make rkt fetch always try to fetch an image, while rkt run/prepare to always try to use store images and fetch if not available.

In this way:

  • a user can "force" a download using rkt fetch
  • kubernetes can implement the imagePullPolicy:
    • when Always calling first rkt fetch
    • when IfNotPresent calling rkt run/prepare
    • when Never calling rkt run/prepare --nofetch

Proposal

I'm going to propose two different behaviors for rkt fetch vs rkt run/prepare and the removal of the --local option in favor of a --nofetch option for run/prepare: rkt fetch will always fetch while rkt run/prepare will always try to use the local store and fetch only when the image is not in the store (and when --nofetch=false as by default). As an ACI has the concept of image's dependencies, this logic will be applied also to them.

A little explanation of this proposal:

rkt supports different kind of image strings for fetching:

  • URLs (file:// http:// https:// docker://)
  • ACI image strings made by a name + labels (coreos.com/etcd:2.0.11)

For URLs the image is fetched from that URL.
For ACI image strings the URL must be retrieved with the image discovery process.

Internally to rkt, a docker image is considered an URL with scheme docker://, this URL is passed to docker2aci that does the docker image fetching, squashing and conversion to an ACI. So, with docker images there's no image discovery to find the download URL as available with an ACI image name.
In a next step (going to open a new issue) I think it will be good (if possible) to make docker images much more like ACIs, removing the special docker:// scheme, introducing a docker image discovery and caching using docker registry apis.

Additionally for run/prepare there's the --local options that:

  • for image name+labels strings: it bypasses image discovery and url downloading using directly the store GetACI function
  • for URLs it bypasses downloading and just check the url in the remote table and uses the provided BlobKey.
    This is probably creating more confusion than providing some benefits.

IMHO I think that differentiating the behavior between fetch and run/prepare should be useful for different cases like the previously explained.

Trying to sum it up:

rkt command image behavior
fetch file:// http:// https:// Fetch image from url (for http:// and https:// check the store remote table for the provided URL and if available use the saved ETag and Cache-Control maxage, in this case the image isn't downloaded if it's considered a valid cached image ).
fetch image name (ex. coreos.com/etcd:v2.1.1) Do discovery. If discovery is successful use the discovered URL as in the above step
fetch docker:// use docker2aci (the image is always fetched as no caching checks are available).
run/prepare file:// http:// https:// docker:// Check the store remote table if the URL is available, in that case use the saved BlobKey, otherwise download as with fetch (if --nofetch=false)
run/prepare image name (ex. coreos.com/etcd:v2.1.1) Check store using GetACI(name, labels), if found use that image. If not found, do discovery. If discovery is successful use the discovered URL as in the above step

Note: every step is done also for every image's dependency.

/cc @jonboulle @yifan-gu

@yifan-gu
Copy link
Contributor

@sgotti Thanks for the proposal! There is some back-forth discussion, made me lost...

a user can "force" a download using rkt fetch
kubernetes can implement the imagePullPolicy:
when Always calling first rkt fetch
when IfNotPresent calling rkt run/prepare
when Never calling rkt run/prepare --nofetch

For the kubernetes case, I don't think this solves IfNotPresent or Never, as the image pull policy is per container. For example, how does this solve a pod with container a(pull always) and container b(pull if not present) without launching a pod?

But refrencing kubernetes's image policy is a good point, this makes me lean against my old idea:
Just add something like --no-fetch and --force-fetch, plus the default behavior, which fetch the local disk first, and then the remote(with cache enabled). No different behavior, and easy to understand.

Would like to hear feedback on this :)

@sgotti
Copy link
Contributor Author

sgotti commented Jul 28, 2015

There is some back-forth discussion, made me lost...

Sorry for the confusion.

For the kubernetes case, I don't think this solves IfNotPresent or Never. As the image pull policy is per container. For example, how does this solve a pod with container a(pull always) and container b(pull if not present) without launching a pod?

For the a(pull always) and b(pull if not present) it should work with something like:

rkt fetch a
rkt prepare a b

so a will be always fetched while b only if not present.

The problem arises with a(pull never) and b(pull if not present) as --no-fetch is per command.
Doing this will need making --no-fetch per app (like --signature)

rkt prepare a --no-fetch b

--no-fetch here is after the app name as parseApps firstly wants the appName and then its flags.

But refrencing kubernetes's image policy is a good point, this makes me lean against my old idea:
Just add something like --no-fetch and --force-fetch, plus the default behavior, which fetch the local disk first, and then the remote(with cache enabled). No different behavior, and easy to understand.

Do you mean keeping the actual implementation and adding --no-fetch and --force-fetch to rkt fetch/run/prepare? (having a rkt fetch --no-fetch seems strange...)

which fetch the local disk first

just a note: fetching from the local disk is not just checking the URL in the remote store table. Before there's also image discovery for image strings (coreos.com/etcd:v2.1.1). Actually this isn't IMHO very consistent:

  • Without --local: always do discovery and define if latest is true for calling fetchImageFromURL.
  • With --local: Call fetchImageFromStore wich uses store.GetACI(name, labels) and if not found exit without doing discovery.

This one of the reasons to remove --local and split the fetch behavior from the run/prepare one (something like docker pull vs docker run) and always do GetACI in run/prepare, then discovery, then fetching.

@sgotti
Copy link
Contributor Author

sgotti commented Jul 28, 2015

After out of band talk with @yifan-gu we converged that having different behaviors for fetch and run/prepare will create more confusion to the users.

I'll try to summarize with a matrix the proposed idea from @yifan-gu (please correct me if there's something wrong):

For fetch/run/prepare. fetch won't have a --no-fetch command...

--no-fetch --force-fetch behavior
0 0 from store first, then remote
1 0 from store only
0 1 remote only

Implementation details related to provided image

behavior image detailed behavior
store then remote file:// just use the file.
store then remote http(s):// Search in the store remote table if the URL is available. If it's available: If the saved Cache-Control maxage > 0 use it to determine if the image should be downloaded. If the saved ETag != "" download passing the ETag to the server. If maxage == 0 && ETag == "" use the saved BlobKey. If the URL is not found in the store remote table then download.
store then remote docker:// Search in the store remote table if the URL is available, in that case use the saved BlobKey, otherwise fetch using docker2aci (no http caching data available for docker://)
store then remote image name (ex. coreos.com/etcd:v2.1.1) Search in the store using GetACI(name, labels), if found use that image, If not found, do discovery. If discovery is successful use the discovered URL as in the above http(s):// case.
store only file:// just use the file.
store only http(s):// Search in the store remote table if the URL is available and use the saved BlobKey, otherwise fail.
store only docker:// Search in the store remote table if the URL is available and use the saved BlobKey, otherwise fail.
store only image name (ex. coreos.com/etcd:v2.1.1) Check store using GetACI(name, labels), if found use that image otherwise fail.
remote only file:// just use the file.
remote only http(s):// download ignoring the caching data saved in the remote table (Cache-Control maxage and ETag).
remote only docker:// fetch using docker2aci
remote only image name (ex. coreos.com/etcd:v2.1.1) Do discovery. If discovery is successful use the discovered URL as in the above http(s):// case

Apply the above logic recursively to image dependencies.

kubernets imagePullPolicy integration

a possible solution can be, for every image in the pod:

Always: rkt fetch --force-fetch image

IfNotPresent: rkt fetch image

Never: do not run rkt fetch image

Then prepare the container with rkt prepare --no-fetch

If preferred I'll close this issue and open a new one with this last proposal.

@yifan-gu
Copy link
Contributor

@sgotti Thank you very much for the summary! This looks great and much clearer than our current state :)
/cc @jonboulle @vcaputo @alban @iaguis for feedbacks.

@iaguis
Copy link
Member

iaguis commented Jul 29, 2015

What happens with the docker://image:latest case? An image can be updated and we'd always return the image that's in the store.

@sgotti
Copy link
Contributor Author

sgotti commented Jul 29, 2015

@iaguis about the latest case (for both ACIs name string and for docker url) I'm on the fence if differentiating it from other docker tags/ACIs labels.

With the current proposal the logic doesn't considers the latest value and behave identically for latest/non-latest images. But you can force fetch with --force-fetch. Considering also the latest value will add an additional behavior.

These are my considerations on the "latest" case:

Actually latest is populated differently based on the passed image argument:

  • ACI discovered URLs (coreos.com/etcd:latest). Derived from the image string.
  • directly passed ACI URLs: latest is always false as it cannot be gathered.
  • docker:// URLs: Derived from the docker tag (true when :latest or no tag).

Plus, also if it's probably a less common case, also other docker tags can be moved to different images and for ACI URLs the same URL can contain an updated file (in the actual proposal, the default behavior (first disk, then remote) will use http caching info if available).

Given this I'm not sure if latest should be really considered a special case.

@yifan-gu
Copy link
Contributor

FWIW, in docker it seems to fetch the digest of the image no matter what the tag is. I am thinking that we could fetch the signature/digest, do comparison in our default behavior (the local then remote case).

@iaguis
Copy link
Member

iaguis commented Jul 30, 2015

FWIW, in docker it seems to fetch the digest of the image no matter what the tag is. I am thinking that we could fetch the signature/digest, do comparison in our default behavior (the local then remote case).

Is that only for v2 registries? docker2aci doesn't support them yet

@sgotti
Copy link
Contributor Author

sgotti commented Jul 30, 2015

@yifan-gu @iaguis for docker v1 we can probably save the docker imageID somewhere (not sure if in the remote table, in an new docker_remote table, or save and read it in the converted image manifest metadata) and compare it with the one provided by the server. If it's different then the tag was assigned to a different image that should be downloaded.

Docker does this check only on docker pull, while docker run just uses the local image if available.

As in the current proposal fetch/run/prepare should have the same behavior (not like docker) I have some doubts on what I wrote in the table:

  • should the default behavior check for http(s):// caching infos and updated docker tags (these will do a remote call) or if the image is in the store any caching info should be ignored?
  • if the default behavior should check on remote, in the case of a remote call error, should this error be ignored and fall back to the local image?
  • should --force-fetch really ignore caching infos as I wrote in the table (I'm not sure about it)?

I'm thinking on changing the logic in the table to avoid any remote call in the default behavior and make --force-fetch handle caching data. This will make it clearer and with less corner cases based on the provided image string.

@sgotti
Copy link
Contributor Author

sgotti commented Aug 5, 2015

Given the various suggestions/comments I tried to rework it again. Thank for the patience!
In the previous table there were 3 distinct and sometimes confusing behaviors, now I tried to just make 2 behaviors: store and remote. The store then remote (default) behavior just uses the store and, if it doesn't find an image, uses the remote behavior.

This makes the overall behavior IMHO clearer.
The primary differences from the previous table in #1196 (comment) are:

  • Store behavior doesn't do any http/docker caching checks
  • Only remote does the http/docker caching checks (previously it ignored caching). I think that the ignore any caching data requirement, if really needed should go in an additional option (like --no-cache)

In this way, in the default behavior, if an image for a predefined URL is found in the store, no http/docker caching check is done and no remote call is done. If someone wants to check if an updated image is provided for a specified URL should use --force-fetch

this is IMHO cleaner and removes a lot of corner cases and additional doubts like mine in the above comment.

Here it is:

For fetch/run/prepare. fetch won't have a --no-fetch command...

--no-fetch --force-fetch behavior description
1 0 store Just check the store
0 1 remote Do remote download (handling caching logic for http(s):// and docker://)
0 0 store then remote Default Behavior. Does store and if it doesn't return an image calls remote

behavior details related to provided image

behavior image detailed behavior
store file:// just use the file.
store http(s):// Search in the store remote table if the URL is available and use the saved BlobKey.
store docker:// Search in the store remote table if the URL is available and use the saved BlobKey.
store image name (ex. coreos.com/etcd:v2.1.1) Check store using GetACI(name, labels), if found use that image.
remote file:// just use the file.
remote http(s):// Search in the store remote table if the URL is available. If it's available and the saved Cache-Control maxage > 0 determine if the image should be downloaded. If it's not expired use the saved BlockKey. Otherwise download (sending if available the saved ETag). If download returns a 304 Not Modified use the saved BlobKey.
remote docker:// (TO BE IMPLEMENTED) Search in the store remote/docker_remote table if the URL is available. If it's available ask the server for the current imageID. If the saved imageID matches use the saved BlobKey, otherwise fetch using docker2aci.
remote image name (ex. coreos.com/etcd:v2.1.1) Do discovery. If discovery is successful use the discovered URL doing the above remote http(s):// image case.

Apply the above logic recursively to image dependencies.

kubernets imagePullPolicy integration

a possible solution can be, for every image in the pod:

Always: rkt fetch --force-fetch image

IfNotPresent: rkt fetch image

Never: do not run rkt fetch image

Then prepare the container with rkt prepare --no-fetch

@yifan-gu
Copy link
Contributor

yifan-gu commented Aug 5, 2015

@sgotti This looks good to me. Thanks!

@sgotti
Copy link
Contributor Author

sgotti commented Aug 10, 2015

@jonboulle @vcaputo @alban @iaguis @yifan-gu If you don't have any comments/objections I'd like to start working on it in the next days.

@sgotti
Copy link
Contributor Author

sgotti commented Sep 21, 2015

@xaduha If you're interested the implementing PR and additional discussion are in #1353. If I correctly understand your question, the signature verification behavior is the same as the current one. An image in the store means that it was previously imported via file, http, docker URLs and, in the applicable cases, the signature was already verified.
If you are instead referring to the http caching part done in the remote behavior, are you suggesting to use the image signature to check if the remote image wasn't updated instead of using Cache-Control max age and Etag?

@sgotti
Copy link
Contributor Author

sgotti commented Sep 27, 2015

I'm going to close this because #1353 has been merged!

@sgotti sgotti closed this as completed Sep 27, 2015
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

4 participants