You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I think I found a bug where Buildkit will use stale/expired credentials when attempting to communicate with Quay. We are having users who appear to be affected by this in Earthly: earthly/earthly#890
Lets see if I can provide a decent summary. Its one of those bugs thats a confluence of a couple factors - so here it goes:
Quay returns the bare minimum in authentication. See here for details, heading "Token Response Fields". Notably, Quay omits the expires_in field. The documentation says that if it is missing, you should assume the duration is 60 seconds.
Here is a sample Quay auth payload (JWT elided):
{
"token": "eyJhb..."
}
Containerd interprets this missing field as an expires_in of 0 (see this function, and this struct for deserialization), and notifies Buildkit that this is zero. This is due to the inability of FetchTokenResponse to properly interpret missing vs. true zero values on the expires_in field.
Buildkit treats an expires_in of 0 as "doesn't expire" (see here), and dutifully caches the token, to prevent overhead from re-authentication.
Additionally, Buildkit's gc cleans up all cached credentials when they haven't been used for more than 10 minutes, checking every 5. (see here)
So, to reproduce this, you need to execute a build that talks to Quay, at least once every 10-ish minutes for at least an hour (the JWT coming from Quay in my case has an exp of 1 hour in the future, why its not also in the external JSON is beyond me).
You should be able to reproduce this by doing a build that uses a private Quay image, I was able to reproduce this error with Earthly (see linked bug above), but an equivalent Dockerfile would be something like this:
Dockerfile:
FROM quay.io/dchw/testing
RUN echo bye > goodbye.txt
After running this for an hour; I start to see logs like this: unexpected status code [manifests ci]: 401 UNAUTHORIZED
So - this unfortunately means that workarounds are few and far between. Options are to either:
Restart the affected Buildkit daemon
Wait 10-15 minutes for the GC to clean up the "unused" token
I should also mention that I don't know whose side of the fence this lands on - containerd or buildkit. I am raising it here since it is visible from buildkit, and that you'll send me to the right place if it doesn't belong here... and if it is in containerd we would need an update here too, probably.
The text was updated successfully, but these errors were encountered:
I think I found a bug where Buildkit will use stale/expired credentials when attempting to communicate with Quay. We are having users who appear to be affected by this in Earthly: earthly/earthly#890
Lets see if I can provide a decent summary. Its one of those bugs thats a confluence of a couple factors - so here it goes:
expires_in
field. The documentation says that if it is missing, you should assume the duration is 60 seconds.Here is a sample Quay auth payload (JWT elided):
expires_in
of 0 (see this function, and this struct for deserialization), and notifies Buildkit that this is zero. This is due to the inability ofFetchTokenResponse
to properly interpret missing vs. true zero values on theexpires_in
field.expires_in
of 0 as "doesn't expire" (see here), and dutifully caches the token, to prevent overhead from re-authentication.gc
cleans up all cached credentials when they haven't been used for more than 10 minutes, checking every 5. (see here)So, to reproduce this, you need to execute a build that talks to Quay, at least once every 10-ish minutes for at least an hour (the JWT coming from Quay in my case has an
exp
of 1 hour in the future, why its not also in the external JSON is beyond me).You should be able to reproduce this by doing a build that uses a private Quay image, I was able to reproduce this error with Earthly (see linked bug above), but an equivalent Dockerfile would be something like this:
Dockerfile:
After running this for an hour; I start to see logs like this:
unexpected status code [manifests ci]: 401 UNAUTHORIZED
So - this unfortunately means that workarounds are few and far between. Options are to either:
I should also mention that I don't know whose side of the fence this lands on -
containerd
orbuildkit
. I am raising it here since it is visible frombuildkit
, and that you'll send me to the right place if it doesn't belong here... and if it is incontainerd
we would need an update here too, probably.The text was updated successfully, but these errors were encountered: