Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Composer metadata is not being rebuilt when a package is reuploaded #20

Closed
fjmilens3 opened this issue Jun 12, 2018 · 6 comments
Closed
Assignees
Labels
wontfix This will not be worked on

Comments

@fjmilens3
Copy link
Contributor

@TheBay0r has noticed that there may be issues with regenerating provider-level metadata when uploading a new build of an existing package and version. Details are part of the comments thread in PR #14 and would have entered the codebase as part of PR #18.

@fjmilens3 fjmilens3 self-assigned this Jun 12, 2018
@fjmilens3
Copy link
Contributor Author

@TheBay0r, when you're testing this, are you modifying the file in any way between uploads? (For example, are you reuploading the exact same file without any changes at all, or are you changing it in some way before reuploading?)

@TheBay0r
Copy link
Contributor

@fjmilens3 I wasn't changing the files in between. Will try to test the case where the content of the zip is slightly changed if that has an impact on the JSON generated.

@TheBay0r
Copy link
Contributor

@fjmilens3 So I tested this case. When the zip file contains a change it seems that the json is rebuilt

@fjmilens3
Copy link
Contributor Author

@TheBay0r:

So I tested this case. When the zip file contains a change it seems that the json is rebuilt

This is something we're going to have to live with because of other factors related to suppressing unneeded rebuilds, but fortunately shouldn't be that much of a problem.

Explanation

Every time a new upload comes in for an existing artifact, we generate an update event within the data layer:

https://github.com/sonatype/nexus-public/blob/e9668b4f9aeff4a19c263c40121ad40e2780182a/components/nexus-orient/src/main/java/org/sonatype/nexus/orient/entity/EntityHook.java#L272

We then receive this event and if certain conditions hold, we use that to generate a rebuild event for the metadata:

https://github.com/sonatype-nexus-community/nexus-repository-composer/blob/master/src/main/java/org/sonatype/nexus/repository/composer/internal/ComposerHostedMetadataFacetImpl.java#L82

However, since these are database-level events, other events, notably downloads, can also cause the same record to be updated (as we have to increment the last downloaded time, etc.).

We don't want to rebuild metadata in this event, and until we have a better application-level solution, our preferred workaround for this problem is to see whether or not the blob was updated within a short period of time.

If it has, we assume that it was the result of the blob changing, and if has not, we infer that it was the result of a download (or other operation) that touched the asset but should not force a rebuild of any associated metadata:

https://github.com/sonatype-nexus-community/nexus-repository-composer/blob/master/src/main/java/org/sonatype/nexus/repository/composer/internal/ComposerHostedMetadataFacetImpl.java#L116

Along with the above, Nexus Repository Manager tries to deduplicate blobs (for the same asset), such that if we receive a blob for an asset that's identical to the blob we already have for that asset, we don't have any churn at the storage level. As a practical matter, that means that the blob was not updated, so the blob updated timestamp is not updated either:

https://github.com/sonatype/nexus-public/blob/729ac4987d99f581e6ff95a2c1b92945057107aa/components/nexus-repository/src/main/java/org/sonatype/nexus/repository/storage/StorageTxImpl.java#L722

The end result being that you won't have any metadata rebuilt because the blob has not changed. Of course, there are changes within the repository manager itself that could be used to handle this, or we could broadcast custom events from within the content facet; however, I'm trying to minimize the divergence between the approach we have here and the approach we have in our supported/proprietary format implementations.

Conclusions

Under normal circumstances this won't matter as if the blob hasn't changed then the generated metadata would not change (at least not in any meaningful way) either, as the metadata is extracted from the content of the blob (in our case, the composer.json file in the archive).

However, under unusual circumstances it can be advantageous to have a scheduled task within Nexus that can rebuild metadata for all or part of a repository's contents. This is typically useful either to mitigate the effects of some breaking change or to recover from some unexpected situation where the generated metadata is inaccurate or incomplete (a special case of which would be the scenario you first encountered, where I'd made breaking changes with the metadata generation in that PR, but you didn't see the metadata regenerate by reuploading the same artifact).

If you have no objections (and are satisfied with the above explanation), I would like to consider this "closed" and implement the aforementioned scheduled task in #21 before we someday promote this to 1.0.0.

@TheBay0r
Copy link
Contributor

Ah, wow! Thank you for the detailed explanation. My naive approach to this was that it just would be hooked up to the post request and whenever a post request comes in an update is triggered no matter what 🤔
But this approach makes sense of course! 🙂

From my point of view this one can be closed, thanks.

@fjmilens3
Copy link
Contributor Author

Closing based on conversation with @TheBay0r.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
wontfix This will not be worked on
Projects
None yet
Development

No branches or pull requests

2 participants