Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Generate static provider.json content using composer.json files #18

Merged
merged 9 commits into from
May 25, 2018

Conversation

fjmilens3
Copy link
Contributor

Fixes #17 by rebuilding the provider.json for a particular vendor and project whenever an asset associated with that vendor/project pair is created, updated, or deleted.

Additional fields for autoload, require, require-dev, and suggest are included in the generated provider.json in order to correctly support autoload and dependency resolution for hosted artifacts.

Much of this implementation is essentially a stripped-down form of what we do for our Yum implementation, so hopefully keeping it running or applying the same shared lessons in the future will be made easier by this design choice. There are opportunities for improvement here as well.

Note that if this PR goes in before the outstanding group PR, then we'll need to update group merge. If the group PR goes in before this, then we should also update this to correctly account for the new fields on group merge.

@fjmilens3 fjmilens3 added the wip Work in progress label May 4, 2018
@fjmilens3 fjmilens3 self-assigned this May 4, 2018
@fjmilens3 fjmilens3 removed the wip Work in progress label May 17, 2018
Asset asset = getOrCreateAsset(path, assetKind);
Asset asset = getOrCreateAsset(path);

asset.formatAttributes().set(P_ASSET_KIND, assetKind.toString());
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question: is there a reason this has been moved out other than avoiding passing in the assetKind? I think (most?) formats pass in the attribute map to getOrCreateAsset and interate through them adding to the formatAttributes

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@fjmilens3 Does this imply that the asset kind can change on a redeploy?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question: is there a reason this has been moved out other than avoiding passing in the assetKind? I think (most?) formats pass in the attribute map to getOrCreateAsset and interate through them adding to the formatAttributes

You do have a point. It's been long enough since I wrote this that I'm not totally sure myself, but my suspicion was that I wanted to consolidate all the setting of format attributes in one place, and make that similar across both implementations in the content facet. I don't think that there was a technical reason as such, other than that perhaps I wanted to clear out the existing format attributes on update, so we'd lose the asset kind set in the previous operation if it was a get rather than a create. From there I'm guessing I moved it in both places for the sake of symmetry.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this imply that the asset kind can change on a redeploy?

No, that's just where we're setting it (see my other note above). Also note that as the caller is determining the asset kind (which itself comes from being attributed via the route) we should always be passing in the same AssetKind for the same path anyway, even on a redeploy scenario.

@Subscribe
@Guarded(by = STARTED)
public void on(final ComposerHostedMetadataInvalidationEvent event) throws IOException {
if (getRepository().getName().equals(event.getRepositoryName())) {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you use matchesepository(event) here instead

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you use matchesepository(event) here instead

Nope, there's no shared superclass, and I doubt we want to introduce a new interface to handle this (nor could we without touching the main NXRM codebase). So in this case it's more of a typing issue than anything else (I'm not sure how we could imagine this being an AssetEvent, for example).

@AllowConcurrentEvents
public void on(final AssetDeletedEvent deleted) {
if (matchesRepository(deleted) && isEventRelevant(deleted)) {
invalidateMetadata(deleted);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a concern of multiple deletes/adds/updates happening here? For yum I think there is a configurable hold off period to try and avoid invalidation happening constantly

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a concern of multiple deletes/adds/updates happening here? For yum I think there is a configurable hold off period to try and avoid invalidation happening constantly

For the Yum implementation we do have a hold-off period that's implemented within the invalidateMetadata() method.

However, I'd like to avoid doing that here until we find out that we need to in order to avoid the additional complexity. Unlike something that's being shipped as part of the main NXRM codebase I think we can be more flexible in what we do until we find out someone is encountering a problem of that nature at scale, then resolve it.

Reasons I'm less concerned include:

  • The rebuild metadata operation here should be a lot less onerous in that it's scoped to a particular vendor/project pair , so I would expect the number of assets to load and process to be on the order of tens rather than hundreds or thousands.
  • While I can see lots of new builds being pushed into a large Yum repo and triggering rebuilds, in this case (much as in my previous point) the impact is limited to a particular vendor/project. So I suspect we'll have fewer problems with that (and fewer back-to-back rebuilds) than we would globally for a Yum repository.

On the other hand, if we turn out to need that optimization, I would definitely do something similar to what was done for the Yum implementation as a place to start.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The time buffer in Yum reduces the number of metadata generations when multiple RPMs are being uploaded (i.e. if you upload 60 RPMs 1 a second for a minute you get 1 regeneration rather than 60), however there is another mechanism buried in that logic that avoid firing any more events than we need to. There was a problem whereby if the invalidation event was fired too many times that we depleted the thread pool in the event manager. That was likely more visible in yum because we were waiting. So it's something to be aware of here but not necessarily change.

Given that the metadata is vendor and project scoped and this will be quicker I think it would take some serious effort to cause any issues (You'd likely need to upload 1000s of packages a minute).

As a side note we had a conversation recently about making it clearer what features are implemented/missing from this kind of work and also what kind of scale it is tested up to.

@@ -37,14 +37,17 @@
@Nonnull
@Override
public Response handle(@Nonnull final Context context) throws Exception {
String path = ComposerPathUtils.buildZipballPath(context);
String vendor = ComposerPathUtils.getVendorToken(context);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could statically import here. I dont think it looses context

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could statically import here. I dont think it looses context

I just don't like static imports, but since we use them all over NXRM, I'm glad you pointed it out. :) There's something to be said for consistency within a codebase (or related codebases).

Static imports added in 66f75d4.

String name = entry.getName();
int filenameIndex = name.indexOf("/composer.json");
int separatorIndex = name.indexOf("/");
if (filenameIndex >= 0 && filenameIndex == separatorIndex) {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Its a little difficult to follow what this is doing, maybe extract it to another readable method?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Its a little difficult to follow what this is doing, maybe extract it to another readable method?

Extracted in fd17562.

Map<String, Map<String, Object>> packages = new LinkedHashMap<>();
for (Component component : components) {
Asset asset = storageTx.firstAsset(component);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question: Is it possible that there are no assets asssociate with the component? I think the answer is that this cant happen because the component will be deleted when it has no assets but thought it is worth checking :-)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question: Is it possible that there are no assets asssociate with the component? I think the answer is that this cant happen because the component will be deleted when it has no assets but thought it is worth checking :-)

You're correct. That would be an error condition as it shouldn't happen. There might be an argument for more gracefully handling it and logging that the associated component that was missing an asset (unfortunately we rarely do this sort of thing in the formats we support, and I do wonder if a requireFirstAsset() in the codebase might be beneficial).

In a larger sense, since this was evolved from a POC I think it might be better to go through and add some debug-level logging throughout the code before we make an official release of it.

Copy link
Contributor

@j-s-3 j-s-3 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the approach here is great especially as it takes a lot from some of the hard lessons we've already learned around this kind of metadata regeneration. Great work 👍

Asset asset = getOrCreateAsset(path, assetKind);
Asset asset = getOrCreateAsset(path);

asset.formatAttributes().set(P_ASSET_KIND, assetKind.toString());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@fjmilens3 Does this imply that the asset kind can change on a redeploy?

@AllowConcurrentEvents
public void on(final AssetDeletedEvent deleted) {
if (matchesRepository(deleted) && isEventRelevant(deleted)) {
invalidateMetadata(deleted);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The time buffer in Yum reduces the number of metadata generations when multiple RPMs are being uploaded (i.e. if you upload 60 RPMs 1 a second for a minute you get 1 regeneration rather than 60), however there is another mechanism buried in that logic that avoid firing any more events than we need to. There was a problem whereby if the invalidation event was fired too many times that we depleted the thread pool in the event manager. That was likely more visible in yum because we were waiting. So it's something to be aware of here but not necessarily change.

Given that the metadata is vendor and project scoped and this will be quicker I think it would take some serious effort to cause any issues (You'd likely need to upload 1000s of packages a minute).

As a side note we had a conversation recently about making it clearer what features are implemented/missing from this kind of work and also what kind of scale it is tested up to.

return Collections.emptyMap();
}
catch (ArchiveException e) {
throw new IOException("Error reading from archive", e);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question: Should we fail all metadata generation if only a single package is corrupted?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question: Should we fail all metadata generation if only a single package is corrupted?

As a general rule, I would say no; you don't want to block regeneration of everything on the basis of one flawed entry in the group. In this particular case, however, this same code path is invoked on the initial upload of the content, so if the archive is bad, it shouldn't be in the system in the first place.

So unless something very bad has happened within the repo we shouldn't be seeing this under normal circumstances when rebuilding metadata. If we wanted to deal with that we'd want to add a try-catch within the code iterating over the components and log it there before proceeding.

There's also the more selfish reason that right now I'm more writing things to fail fast so that we can get better feedback from our early adopters. I'd rather something blow up with a lot of smoke and fire so we can surface any underlying problems accordingly.

(But I do agree that by the time this ever reached something along the lines of a beta quality, even, it would be nice to add the try/catch further up the call chain to deal with the concern you mentioned.)

@fjmilens3
Copy link
Contributor Author

@TheBay0r, I think we're in a good place on our end to merge this down, but I'd be curious if this solves your autoload problem before I do. Most of this work is something I wanted to do later on as a technical improvement, but the main reason I did this now was to see if including this additional information solves the autoload problem.

@TheBay0r
Copy link
Contributor

@fjmilens3 I'll have a look at the moment. Where some busy weeks on my end. I'll let you know about my findings! 👍

@TheBay0r
Copy link
Contributor

Hey @fjmilens3 . I just saw this duplicated structure in the repository viewer, which looks to me that all packages are now /vendor/package and all the jsons are /p/vendor/package. Is this intentional?

screen shot 2018-05-22 at 15 54 31

@fjmilens3
Copy link
Contributor Author

I just saw this duplicated structure in the repository viewer, which looks to me that all packages are now /vendor/package and all the jsons are /p/vendor/package. Is this intentional?

@TheBay0r, these changes shouldn't have changed the path in any way. These changes are focused on building the JSON content statically rather than dynamically, and including the additional fields for things like autoload as part of that process.

For what it's worth, I think they've always been that way, or at least have been for some time; I checked out commit d31c2fd and downloaded a package via proxy and had the following setup:

screen shot 2018-05-22 at 10 22 25 am

(That said, there's still some room for improvement on my end here; before we go "live" with this it'd be nice to move all the downloads under their own subpath to avoid even the chance of a collision. On the other hand, even something that simple constitutes a "breaking change" because we conflate asset paths and asset names in our data model. So I've been holding off on those kinds of changes until we're feature-complete enough to at least declare this as alpha quality.)

@TheBay0r
Copy link
Contributor

Ah yea, I saw that in the proxy before and didn't question it. Thanks.

And sorry that it took me that long, I built a couple of services using the nexus as a composer repository with this branch. I deployed these builds to our test environments. It looks very good there. Also I checked the composer.lock and it also contains all the data now.

From my point of view, your branch resolves #17

@fjmilens3 fjmilens3 merged commit ef0079f into master May 25, 2018
@fjmilens3
Copy link
Contributor Author

Thanks, @TheBay0r! No worries about the delay.

As I said, you should at least consider opening up a PR to add your name to the contributors list (or let me know so that I can do so for you). Good QA counts for a lot in getting this to a releasable state.

@sonatypecla
Copy link

sonatypecla bot commented Jul 28, 2020

Thanks for the contribution! Unfortunately we can't verify if the committer(s), Frederick John Milens III fmilens@sonatype.com, signed the CLA because they have not associated their commits with their GitHub user. Please follow these instructions to associate your commits with your GitHub user. Then sign the Sonatype Contributor License Agreement and this Pull Request will be revalidated.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

autoload.php isn't created correctly
4 participants