Skip to content

Pulls from proxy cache not updating DB, S3 or Redis #21750

@Lep3188

Description

@Lep3188

We are running harbor v2.11.1-6b7ecba1on a Kubernetes cluster, Using Postgres DB 13.15 and for Storage AWS S3.
In our registry we had a situation which made us notice some discrepancies between the index manifests digest in DB, S3 and external registry.

We would like to know if what we are seeing is normal behavior on Proxy Cache images that have changed over time in the external repo.

Below you will see different scenarios we have seen. These are all for images that have OCI Index. On all of these scenarios we expect Harbor to validate if there is a new digest in the Proxied Repo, update our DB and S3 storage with it and provide the user with the new image. By not seeing a change locally we think that we are just passing through what the proxied registry has.

How we assume Harbor should work:
Image

  1. User does a docker pull for Proxy cache image
  2. Registry compares with external repo for the newest digest.
  3. Registry Updates the value in DB due to difference.
  4. S3 digest changes to the new one.
  5. User machine receives the external digest not what we had on DB previously.

First Scenario: Pull by tag sometimes updates DB and doesn't update S3.
Assume we already have a cached image that hasn’t been pulled in one month. We can see in the external registry that a update was done to the specific tag and the Manifest digest changed. When a user does a pull sometimes the DB changes it to another value that does not match, but in the S3 bucket nothing changes at all.
Also, if the DB changes we have an example like the one below that it changed to a digest that does not match the external one. That pull and every other after that one will serve the external registry digest only.

Digests we see:
External registry: sha256:42124d7a0f4d3fcb82524a0fee72a513a8e575e7398e7ddca8f380a57c76146f
Our DB before pull: sha256:6ef06e35dee686ff88448aba45cfc8076457a138eea0413b4e38c45227f3ad4c
S3 storage: Has nothing in the manifests directory.
DB after pull: sha256:b2dddc9d80bf40665f452fd5b42a942ad9e83c7a651c0d5b991d9b3a0c0de930
When pulling Users get: sha256:42124d7a0f4d3fcb82524a0fee72a513a8e575e7398e7ddca8f380a57c76146f

Second Scenario: Pull doesn't update DB or S3 on constantly used images but we have matching data locally between DB and S3

On the second scenario we use an image that is constantly pulled (we have pipelines that can run these every 15 minutes every day). For this scenario since the images are pulled often we can see that there is a S3 manifest directory with digest that matches what we have in the DB.

On this second scenario we believed this could be caused due to the redis cache. But since the registry is serving the external registry digest instead of the local, I think we are just having issues. In some very rare cases sometimes it can pull the local digest but after a repull it goes back to the external and not what is cached.

Digests we see:
External Registry: sha256:7a5342b7662db8de99e045a2b47b889c5701b8dde0ce5ae3f1577bf57a15ed40
Our DB: sha256:605644ccbf77342e408186a4c4f1a97cdbae8623f712382ad632690b9d01f482
S3 Storage: sha256:605644ccbf77342e408186a4c4f1a97cdbae8623f712382ad632690b9d01f482
When pulling Users get: sha256:7a5342b7662db8de99e045a2b47b889c5701b8dde0ce5ae3f1577bf57a15ed40

There is a 3rd scenario which is the same as 2nd just that there is no matching between s3, DB and external registry and nothing gets updated after a pull.

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions