Generate static provider.json content using composer.json files #18

fjmilens3 · 2018-05-04T06:17:05Z

Fixes #17 by rebuilding the provider.json for a particular vendor and project whenever an asset associated with that vendor/project pair is created, updated, or deleted.

Additional fields for autoload, require, require-dev, and suggest are included in the generated provider.json in order to correctly support autoload and dependency resolution for hosted artifacts.

Much of this implementation is essentially a stripped-down form of what we do for our Yum implementation, so hopefully keeping it running or applying the same shared lessons in the future will be made easier by this design choice. There are opportunities for improvement here as well.

Note that if this PR goes in before the outstanding group PR, then we'll need to update group merge. If the group PR goes in before this, then we should also update this to correctly account for the new fields on group merge.

doddi · 2018-05-17T15:23:47Z

src/main/java/org/sonatype/nexus/repository/composer/internal/ComposerContentFacetImpl.java

-    Asset asset = getOrCreateAsset(path, assetKind);
+    Asset asset = getOrCreateAsset(path);
+
+    asset.formatAttributes().set(P_ASSET_KIND, assetKind.toString());


Question: is there a reason this has been moved out other than avoiding passing in the assetKind? I think (most?) formats pass in the attribute map to getOrCreateAsset and interate through them adding to the formatAttributes

@fjmilens3 Does this imply that the asset kind can change on a redeploy?

Question: is there a reason this has been moved out other than avoiding passing in the assetKind? I think (most?) formats pass in the attribute map to getOrCreateAsset and interate through them adding to the formatAttributes

You do have a point. It's been long enough since I wrote this that I'm not totally sure myself, but my suspicion was that I wanted to consolidate all the setting of format attributes in one place, and make that similar across both implementations in the content facet. I don't think that there was a technical reason as such, other than that perhaps I wanted to clear out the existing format attributes on update, so we'd lose the asset kind set in the previous operation if it was a get rather than a create. From there I'm guessing I moved it in both places for the sake of symmetry.

Does this imply that the asset kind can change on a redeploy?

No, that's just where we're setting it (see my other note above). Also note that as the caller is determining the asset kind (which itself comes from being attributed via the route) we should always be passing in the same AssetKind for the same path anyway, even on a redeploy scenario.

doddi · 2018-05-17T15:31:42Z

...in/java/org/sonatype/nexus/repository/composer/internal/ComposerHostedMetadataFacetImpl.java

+  @Subscribe
+  @Guarded(by = STARTED)
+  public void on(final ComposerHostedMetadataInvalidationEvent event) throws IOException {
+    if (getRepository().getName().equals(event.getRepositoryName())) {


can you use matchesepository(event) here instead

can you use matchesepository(event) here instead

Nope, there's no shared superclass, and I doubt we want to introduce a new interface to handle this (nor could we without touching the main NXRM codebase). So in this case it's more of a typing issue than anything else (I'm not sure how we could imagine this being an AssetEvent, for example).

doddi · 2018-05-17T15:33:39Z

...in/java/org/sonatype/nexus/repository/composer/internal/ComposerHostedMetadataFacetImpl.java

+  @AllowConcurrentEvents
+  public void on(final AssetDeletedEvent deleted) {
+    if (matchesRepository(deleted) && isEventRelevant(deleted)) {
+      invalidateMetadata(deleted);


Is there a concern of multiple deletes/adds/updates happening here? For yum I think there is a configurable hold off period to try and avoid invalidation happening constantly

Is there a concern of multiple deletes/adds/updates happening here? For yum I think there is a configurable hold off period to try and avoid invalidation happening constantly

For the Yum implementation we do have a hold-off period that's implemented within the invalidateMetadata() method.

However, I'd like to avoid doing that here until we find out that we need to in order to avoid the additional complexity. Unlike something that's being shipped as part of the main NXRM codebase I think we can be more flexible in what we do until we find out someone is encountering a problem of that nature at scale, then resolve it.

Reasons I'm less concerned include:

The rebuild metadata operation here should be a lot less onerous in that it's scoped to a particular vendor/project pair , so I would expect the number of assets to load and process to be on the order of tens rather than hundreds or thousands.

While I can see lots of new builds being pushed into a large Yum repo and triggering rebuilds, in this case (much as in my previous point) the impact is limited to a particular vendor/project. So I suspect we'll have fewer problems with that (and fewer back-to-back rebuilds) than we would globally for a Yum repository.

On the other hand, if we turn out to need that optimization, I would definitely do something similar to what was done for the Yum implementation as a place to start.

The time buffer in Yum reduces the number of metadata generations when multiple RPMs are being uploaded (i.e. if you upload 60 RPMs 1 a second for a minute you get 1 regeneration rather than 60), however there is another mechanism buried in that logic that avoid firing any more events than we need to. There was a problem whereby if the invalidation event was fired too many times that we depleted the thread pool in the event manager. That was likely more visible in yum because we were waiting. So it's something to be aware of here but not necessarily change.

Given that the metadata is vendor and project scoped and this will be quicker I think it would take some serious effort to cause any issues (You'd likely need to upload 1000s of packages a minute).

As a side note we had a conversation recently about making it clearer what features are implemented/missing from this kind of work and also what kind of scale it is tested up to.

doddi · 2018-05-17T15:34:52Z

src/main/java/org/sonatype/nexus/repository/composer/internal/ComposerHostedUploadHandler.java

@@ -37,14 +37,17 @@
  @Nonnull
  @Override
  public Response handle(@Nonnull final Context context) throws Exception {
-    String path = ComposerPathUtils.buildZipballPath(context);
+    String vendor = ComposerPathUtils.getVendorToken(context);


Could statically import here. I dont think it looses context

Could statically import here. I dont think it looses context

I just don't like static imports, but since we use them all over NXRM, I'm glad you pointed it out. :) There's something to be said for consistency within a codebase (or related codebases).

Static imports added in 66f75d4.

doddi · 2018-05-17T15:38:13Z

src/main/java/org/sonatype/nexus/repository/composer/internal/ComposerJsonExtractor.java

+    String name = entry.getName();
+    int filenameIndex = name.indexOf("/composer.json");
+    int separatorIndex = name.indexOf("/");
+    if (filenameIndex >= 0 && filenameIndex == separatorIndex) {


Its a little difficult to follow what this is doing, maybe extract it to another readable method?

Its a little difficult to follow what this is doing, maybe extract it to another readable method?

Extracted in fd17562.

doddi · 2018-05-17T15:39:45Z

src/main/java/org/sonatype/nexus/repository/composer/internal/ComposerJsonProcessor.java

    Map<String, Map<String, Object>> packages = new LinkedHashMap<>();
    for (Component component : components) {
+      Asset asset = storageTx.firstAsset(component);


Question: Is it possible that there are no assets asssociate with the component? I think the answer is that this cant happen because the component will be deleted when it has no assets but thought it is worth checking :-)

Question: Is it possible that there are no assets asssociate with the component? I think the answer is that this cant happen because the component will be deleted when it has no assets but thought it is worth checking :-)

You're correct. That would be an error condition as it shouldn't happen. There might be an argument for more gracefully handling it and logging that the associated component that was missing an asset (unfortunately we rarely do this sort of thing in the formats we support, and I do wonder if a requireFirstAsset() in the codebase might be beneficial).

In a larger sense, since this was evolved from a POC I think it might be better to go through and add some debug-level logging throughout the code before we make an official release of it.

j-s-3

I think the approach here is great especially as it takes a lot from some of the hard lessons we've already learned around this kind of metadata regeneration. Great work 👍

j-s-3 · 2018-05-17T21:56:30Z

src/main/java/org/sonatype/nexus/repository/composer/internal/ComposerContentFacetImpl.java

-    Asset asset = getOrCreateAsset(path, assetKind);
+    Asset asset = getOrCreateAsset(path);
+
+    asset.formatAttributes().set(P_ASSET_KIND, assetKind.toString());


@fjmilens3 Does this imply that the asset kind can change on a redeploy?

j-s-3 · 2018-05-17T22:37:24Z

...in/java/org/sonatype/nexus/repository/composer/internal/ComposerHostedMetadataFacetImpl.java

+  @AllowConcurrentEvents
+  public void on(final AssetDeletedEvent deleted) {
+    if (matchesRepository(deleted) && isEventRelevant(deleted)) {
+      invalidateMetadata(deleted);


The time buffer in Yum reduces the number of metadata generations when multiple RPMs are being uploaded (i.e. if you upload 60 RPMs 1 a second for a minute you get 1 regeneration rather than 60), however there is another mechanism buried in that logic that avoid firing any more events than we need to. There was a problem whereby if the invalidation event was fired too many times that we depleted the thread pool in the event manager. That was likely more visible in yum because we were waiting. So it's something to be aware of here but not necessarily change.

Given that the metadata is vendor and project scoped and this will be quicker I think it would take some serious effort to cause any issues (You'd likely need to upload 1000s of packages a minute).

As a side note we had a conversation recently about making it clearer what features are implemented/missing from this kind of work and also what kind of scale it is tested up to.

j-s-3 · 2018-05-17T22:49:18Z

src/main/java/org/sonatype/nexus/repository/composer/internal/ComposerJsonExtractor.java

+      return Collections.emptyMap();
+    }
+    catch (ArchiveException e) {
+      throw new IOException("Error reading from archive", e);


Question: Should we fail all metadata generation if only a single package is corrupted?

Question: Should we fail all metadata generation if only a single package is corrupted?

As a general rule, I would say no; you don't want to block regeneration of everything on the basis of one flawed entry in the group. In this particular case, however, this same code path is invoked on the initial upload of the content, so if the archive is bad, it shouldn't be in the system in the first place.

So unless something very bad has happened within the repo we shouldn't be seeing this under normal circumstances when rebuilding metadata. If we wanted to deal with that we'd want to add a try-catch within the code iterating over the components and log it there before proceeding.

There's also the more selfish reason that right now I'm more writing things to fail fast so that we can get better feedback from our early adopters. I'd rather something blow up with a lot of smoke and fire so we can surface any underlying problems accordingly.

(But I do agree that by the time this ever reached something along the lines of a beta quality, even, it would be nice to add the try/catch further up the call chain to deal with the concern you mentioned.)

fjmilens3 · 2018-05-18T02:04:59Z

@TheBay0r, I think we're in a good place on our end to merge this down, but I'd be curious if this solves your autoload problem before I do. Most of this work is something I wanted to do later on as a technical improvement, but the main reason I did this now was to see if including this additional information solves the autoload problem.

TheBay0r · 2018-05-18T12:49:28Z

@fjmilens3 I'll have a look at the moment. Where some busy weeks on my end. I'll let you know about my findings! 👍

TheBay0r · 2018-05-22T13:56:35Z

Hey @fjmilens3 . I just saw this duplicated structure in the repository viewer, which looks to me that all packages are now /vendor/package and all the jsons are /p/vendor/package. Is this intentional?

fjmilens3 · 2018-05-22T14:27:56Z

I just saw this duplicated structure in the repository viewer, which looks to me that all packages are now /vendor/package and all the jsons are /p/vendor/package. Is this intentional?

@TheBay0r, these changes shouldn't have changed the path in any way. These changes are focused on building the JSON content statically rather than dynamically, and including the additional fields for things like autoload as part of that process.

For what it's worth, I think they've always been that way, or at least have been for some time; I checked out commit d31c2fd and downloaded a package via proxy and had the following setup:

(That said, there's still some room for improvement on my end here; before we go "live" with this it'd be nice to move all the downloads under their own subpath to avoid even the chance of a collision. On the other hand, even something that simple constitutes a "breaking change" because we conflate asset paths and asset names in our data model. So I've been holding off on those kinds of changes until we're feature-complete enough to at least declare this as alpha quality.)

TheBay0r · 2018-05-23T10:10:58Z

Ah yea, I saw that in the proxy before and didn't question it. Thanks.

And sorry that it took me that long, I built a couple of services using the nexus as a composer repository with this branch. I deployed these builds to our test environments. It looks very good there. Also I checked the composer.lock and it also contains all the data now.

From my point of view, your branch resolves #17

fjmilens3 · 2018-05-25T21:23:22Z

Thanks, @TheBay0r! No worries about the delay.

As I said, you should at least consider opening up a PR to add your name to the contributors list (or let me know so that I can do so for you). Good QA counts for a lot in getting this to a releasable state.

sonatypecla · 2020-07-28T19:17:11Z

Thanks for the contribution! Unfortunately we can't verify if the committer(s), Frederick John Milens III fmilens@sonatype.com, signed the CLA because they have not associated their commits with their GitHub user. Please follow these instructions to associate your commits with your GitHub user. Then sign the Sonatype Contributor License Agreement and this Pull Request will be revalidated.

Frederick John Milens III added 5 commits May 3, 2018 16:22

Extract composer.json extraction to its own class

55821a8

Include additional fields in generated provider json

ec68594

Refactor upload method

29b9daa

Regenerate provider JSON on uploads and save

d715aab

Initial implementation of event-based provider.json maintenance

443baca

fjmilens3 added the wip Work in progress label May 4, 2018

fjmilens3 self-assigned this May 4, 2018

fjmilens3 mentioned this pull request May 4, 2018

autoload.php isn't created correctly #17

Closed

Frederick John Milens III added 2 commits May 4, 2018 02:27

Save inaccurate comment from copy-and-paste

5a1f8db

Additional test cases for event-driven rebuilding

287d804

fjmilens3 removed the wip Work in progress label May 17, 2018

doddi reviewed May 17, 2018

View reviewed changes

Frederick John Milens III added 2 commits May 17, 2018 17:48

Extract check for composer.json filename to separate method

fd17562

Use static imports

66f75d4

j-s-3 approved these changes May 17, 2018

View reviewed changes

doddi approved these changes May 18, 2018

View reviewed changes

fjmilens3 merged commit ef0079f into master May 25, 2018

fjmilens3 deleted the autoload-information branch May 25, 2018 21:23

fjmilens3 mentioned this pull request Jun 12, 2018

Composer metadata is not being rebuilt when a package is reuploaded #20

Closed

kuzne4ik537 mentioned this pull request Aug 13, 2019

composer.lock #43

Closed

sonatypecla bot added the cla:missing label Jul 28, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Generate static provider.json content using composer.json files #18

Generate static provider.json content using composer.json files #18

fjmilens3 commented May 4, 2018

doddi May 17, 2018

j-s-3 May 17, 2018

fjmilens3 May 17, 2018

fjmilens3 May 18, 2018

doddi May 17, 2018

fjmilens3 May 17, 2018

doddi May 17, 2018

fjmilens3 May 17, 2018

j-s-3 May 17, 2018

doddi May 17, 2018

fjmilens3 May 17, 2018

doddi May 17, 2018

fjmilens3 May 17, 2018

doddi May 17, 2018

fjmilens3 May 17, 2018

j-s-3 left a comment

j-s-3 May 17, 2018

j-s-3 May 17, 2018

j-s-3 May 17, 2018

fjmilens3 May 18, 2018

fjmilens3 commented May 18, 2018

TheBay0r commented May 18, 2018

TheBay0r commented May 22, 2018

fjmilens3 commented May 22, 2018

TheBay0r commented May 23, 2018

fjmilens3 commented May 25, 2018

sonatypecla bot commented Jul 28, 2020

Generate static provider.json content using composer.json files #18

Generate static provider.json content using composer.json files #18

Conversation

fjmilens3 commented May 4, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

j-s-3 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

fjmilens3 commented May 18, 2018

TheBay0r commented May 18, 2018

TheBay0r commented May 22, 2018

fjmilens3 commented May 22, 2018

TheBay0r commented May 23, 2018

fjmilens3 commented May 25, 2018

sonatypecla bot commented Jul 28, 2020