We're considering using Harbor as a proxy cache with JFrog Artifactory as the source registry. We have 24 AWS EKS clusters, each with ~100 nodes, running on spot instances. Since these nodes recycle weekly, images are pulled frequently, leading to a high load on the registry.
To improve availability and redundancy, we plan to run a Harbor instance in each cluster. However, we're concerned whether to deploy an external Redis and PostgreSQL or rely on the built-in database and cache.
Key concerns:
- Will Harbor’s built-in database and caching mechanisms be sufficient to handle the frequent image pulls, or will we see performance
degradation?
- What kind of impact (latency, failures) should we expect if we don’t use external Redis and PostgreSQL?
- Are there any benchmarks or guidelines on when external Redis/PostgreSQL becomes necessary for a Harbor proxy cache?
Any insights or best practices would be greatly appreciated!