Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

index.yaml regenerating every request even without any package updated #152

Closed
IMBlues opened this issue Jul 26, 2018 · 17 comments · Fixed by #314
Closed

index.yaml regenerating every request even without any package updated #152

IMBlues opened this issue Jul 26, 2018 · 17 comments · Fixed by #314

Comments

@IMBlues
Copy link

IMBlues commented Jul 26, 2018

The index.yaml will regenerate every request to /index.yaml even though there is no any package changed.

When we compare the LastModified between objects from cache and storage:
https://github.com/helm/chartmuseum/blob/master/pkg/storage/storage.go#L64

The objects from cache is out of order due to map ranging....
https://github.com/helm/chartmuseum/blob/master/pkg/chartmuseum/server/multitenant/index.go#L107

func (server *MultiTenantServer) getRepoObjectSlice(entry *cacheEntry) []cm_storage.Object {
	var objects []cm_storage.Object
	for _, entry := range entry.RepoIndex.Entries {
		for _, chartVersion := range entry {
			object := cm_repo.StorageObjectFromChartVersion(chartVersion)
			objects = append(objects, object)
		}
	}
	return objects
}

And the LastModifed timestamp of object from storage seems has higher precision (I only test it on s3 backend)

LastModifed from cache: 2018-06-11T03:31:52.000Z
LastModifed from storage: 2018-06-11T03:31:52.094Z

Those above would cause the regenerating of index.yaml every request.

This was referenced Jul 26, 2018
@jdolitsky
Copy link
Contributor

@IMBlues thanks for detailed issue and PR! Few things:

  1. How are you determining that "index.yaml will regenerate every request"? I have not experienced this, and there should be tests to confirm. It is the intent that the check for regeneration happens every time, but that regeneration only happens when necessary. It should only happen one time between any number of objects uploaded and /index.yaml being requested.

  2. If you run chartmuseum with --debug there should be something in the logs such as "Change detected between cache and storage" to indicate this event. Are you seeing this occur more than once after uploading a package?

  3. Why does the order returned by getRepoObjectSlice() matter? Is this an optimization for calculating the diff?

@IMBlues
Copy link
Author

IMBlues commented Jul 27, 2018

1 & 2. I can confirm this behavior because "Change detected between cache and storage" showed every request to /index.yaml with --debug
3. I reviewed the code and found it's none of business about order of objects, but LastModified of the object really matters:

The timestamp from storage(S3 backend, ceph rgw provides) has higher precision than from cache, so this comparison will always return true:

if !o1.LastModified.Equal(o2.LastModified) {
    diff.Updated = append(diff.Updated, o2)
}

print them:

o1.LastModifed: 2018-06-11T03:31:52.000Z (from cache)
o2.LastModifed: 2018-06-11T03:31:52.094Z (from storage)

I have not tested it on Amazon S3, so I'm not sure from whether it's related to ceph rgw.

@jdolitsky
Copy link
Contributor

@IMBlues sorry we left this unfinished. Were you able to determine if this is related to Ceph?

@IMBlues
Copy link
Author

IMBlues commented Sep 18, 2018

No, I didn't test it on other s3 backend and this pr does solve my problem. I think we should close it now, I will reopen it if I have more time to determine .

@jdolitsky
Copy link
Contributor

@IMBlues if the fix works for you, then we should add a config option that lets us get to the "ignore milliseconds" functionality. Maybe something like a --timestamp-comparison=ms (default)?

You then use something like ----timestamp-comparison=sec to fix your issue? Let me know what you think

@IMBlues
Copy link
Author

IMBlues commented Sep 18, 2018

@jdolitsky I think adding flag is better than what I did(directly ignore milliseconds in code), maybe I'll try it later this week and make another pr

@mateuszkwiatkowski
Copy link

Hello,

We also see this problem when using Ceph as a backend for chartmuseum.

@airadier
Copy link

We also suffered this issue with Dell ECS S3 storage backend. The previously provided PR by @IMBlues apparently has the Substraction operation reversed, as the o1 (in the cache) can never be newer than o2 (in the s3), so a new file will never be detected. I am providing a new PR with same fix but reversing the substraction order.

@jopfeffe
Copy link

Same issue with OTC S3 backend. I would love to see a fix in the upcoming release. Otherwise I have to build chartmuseum on my own, just because of ~3 lines of code.

@jdolitsky
Copy link
Contributor

@IMBlues @mateuszkwiatkowski @airadier @jopfeffe Would one of you be able to test out master branch with new option --storage-amazon-nearestsecond / STORAGE_AMAZON_NEARESTSECOND=1? This should fix the issue.

You can use Docker image chartmuseum/chartmuseum:latest@sha256:75dd03cc7111cd5bc6fac87429a09c9f91e30d0acaeaf2cb2d7ac3568344d83d

Once I can get confirmation that this is finally fixed (as well as #280), I'll go ahead and release chartmuseum v1.12.0. Thanks in advance!

@rajiteh
Copy link
Contributor

rajiteh commented Feb 24, 2020

We are experiencing the same issue with our Openstack Swift based storage. The index is always rebuilt and takes around 30 seconds to complete every helm dep update. @jdolitsky is it possible to make this argument also effect swift backends too?

EDIT:
As suggested earlier, I propose that a new CLI flag --timestamp-comparison=<sec|msec> is implemented in a more generic sense to account for backends that report timestamps with reduced precision/accuracy.

@jdolitsky
Copy link
Contributor

@rajiteh - please see https://github.com/chartmuseum/storage/blob/master/amazon.go#L109

We will need to add a NearestSecond option to the openstack backend similarly.

and yes, youre probably right about this (--timestamp-comparison). Please feel free to propose this change in chartmuseum/storage

@rajiteh
Copy link
Contributor

rajiteh commented Feb 25, 2020

It turns out the issue I had with openstack is fixed by this change: chartmuseum/storage@a49401b

So my issue in particular should be resolved in 1.12.0 release.

However, I already went ahead and implemented the timestamp argument.

@jdolitsky
Copy link
Contributor

@rajiteh thank you very much, looking at your change

@jopfeffe
Copy link

jopfeffe commented Feb 27, 2020

You can use Docker image chartmuseum/chartmuseum:latest@sha256:75dd03cc7111cd5bc6fac87429a09c9f91e30d0acaeaf2cb2d7ac3568344d83d

Works in my case.

Thank you very much for the effort. Looking forward for the 1.12.0 release.

@jdolitsky
Copy link
Contributor

@jopfeffe thanks for trying this out! glad to hear

@jdolitsky
Copy link
Contributor

Hi all, new version of chartmuseum has been released (v0.12.0) with a solution for this:

--storage-timestamp-tolerance=1s

Big thanks to @rajiteh for putting this together! Thanks all

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants