Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reindex is too slow in repos with thousands of charts #336

Open
drewwells opened this issue Nov 28, 2023 · 3 comments
Open

Reindex is too slow in repos with thousands of charts #336

drewwells opened this issue Nov 28, 2023 · 3 comments

Comments

@drewwells
Copy link

We have 150k charts, 2.3gb size bucket, and a 40MB index file. I ran reindex in a k8s pod inside aws and it took a week to complete. I can suggest a few optimizations just from looking at the code.

Reindex is downloading each chart as it iterates through the index. Instead sync the entire s3 bucket locally then process or perhaps trusting the old index and only download objects that are missing from the index.
Optimize or expose client configuration options see https://stackoverflow.com/a/48114553

It takes me 5mins to download all the objects with the above optimizations. I'd guess it would take hours with default settings

@hypnoglow
Copy link
Owner

Wow that's a real number. I agree that the plugin was never designed to handle such volumes, and there is definitely a room for improvement.

Out of curiosity, why do you have so many charts? Have you considered cleaning up i.e. removing unused versions?

@drewwells
Copy link
Author

We have actually, but that itself is a challenge. Each delete (like upload) takes a considerable amount of time and probably 99% of the references need to be cleaned. The criteria I can think of is remove any charts over {time period old} then also consider an allow list of currently used chart-versions that are in use. I thought about it a while and forking s3 is probably the easiest way for us to fine tune reindex to have a prune option to apply that criteria.

@hypnoglow hypnoglow changed the title reindex is too slow Reindex is too slow in repos with thousands of charts Dec 1, 2023
@Makeshift
Copy link

At a guess I would assume that deletion takes a while because helm-s3 downloads, edits and uploads the index on every delete. With large repos the index gets pretty beefy so this adds a lot of time. Would it be possible to support deleting multiple versions in one go?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants