Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add an option to disable asset git/GitHub API requests #251

Closed
schmunk42 opened this issue Aug 25, 2016 · 23 comments
Closed

Add an option to disable asset git/GitHub API requests #251

schmunk42 opened this issue Aug 25, 2016 · 23 comments
Assignees
Milestone

Comments

@schmunk42
Copy link
Contributor

Continued from yiisoft/yii2#8688 (comment)

Proposal: Add an option, like --skip-asset-info-update or ENV variable FXP_SKIP_ASSET_INFO_UPDATE to disable git and/or GitHub API requests, which query for new versions/tags, since this might take a while.

The option should only have an effect, when local information about the repository already exists. If not the plugin should download the information.

@francoispluchino
Copy link
Member

Add command option is not possible, so only env variable is possible.

@schmunk42
Copy link
Contributor Author

Ah sorry, I mixed that up with new commands, but I think it does not make sense (or maybe it's not even possible) to have something like composer update-skip-asset-info.

@francoispluchino
Copy link
Member

This, it's possible.

@schmunk42
Copy link
Contributor Author

So I looked a bit into this and if you skip updating the remote git server the performance gain are about ~30-50% (!). I.e. commenting this line.

Updates still worked if the cache is filled.

Another huge improvement would be to load bower.json files also from cache, this removes another 0,5-1 seconds for every bower package from the total time taken for an update:

[89.2MB/9.38s] Downloading https://bower.herokuapp.com/packages/select2
[89.2MB/10.05s] Writing /Users/tobias/.composer/cache/repo/https---bower.herokuapp.com-packages/select2-ca22abe1ad1684dd62ec875a0b5c25a533cf287a-package.json into cache

Looks to me like the file is always downloaded and then written to the cache.

Maybe this could also require some changes in composer itself. Do you think it's worth to create an issue like "Feature: Allow updates from cache only"?

Last but not least it would be awesome if Git updates could be run in parallel, like downloads with prestissimo. I think this would make the above changes obsolete anyway.

@schmunk42
Copy link
Contributor Author

This could be combined with a functionality checking an expire setting, like: "Warning Asset package definitions have not been updated for 7 days".

FXP_SKIP_ASSET_INFO_UPDATE_DAYS=7

While I'd also like a separate command, the ENV variable is the way to set a default.

@francoispluchino Any advice where to start or do you have plans about this feature, it would be great improvement.


related: yiisoft/yii2#13064

@francoispluchino
Copy link
Member

The bower.json and package.json files are cached. It only has files in the main branch that are not cached.

Regarding the update-skip-asset-info command, Maybe prefer a namespace fxp:update (plus court), And it would only configure the variable FXP_SKIP_ASSET_INFO_UPDATE_DAYS (via an option, 7 by default), and use the same code and options as the update command.

You can make a pull request for this feature!

@schmunk42
Copy link
Contributor Author

I like the syntax fxp:update - using it in phd since a while :) The only thing is that you actually don't do a fxp update, rather the opposite, like (update-fast, fxp:update-quick)

@schmunk42
Copy link
Contributor Author

schmunk42 commented Feb 17, 2017

So, @handcode and I dug a bit on this ...

To really get a huge performance gain, we'd need to address two things.

One is this https://github.com/fxpio/composer-asset-plugin/blob/master/Repository/BowerRepository.php#L58
Here the asset plugin tells composer where it can find the git repo for an extension.

We hacked it like so:

    // fake local URL, if exists
    $local = $data['url'];
    $local = str_replace('/','-', $local);
    $local = str_replace('https:','/path/to/user/.composer/cache/vcs/https-', $local);
    if (is_dir($local)) {
        $data['url'] = $local;
    }

This skips the update of the git repository, if it exists, actually by setting the URL to a local path.


The other one is the URL of the repo itself, see https://github.com/fxpio/composer-asset-plugin/blob/master/Repository/BowerRepository.php#L34

return 'file:///path/to/bower-definition.json';

This one is harder to fake, because we don't know the hash of the repo. The files we need would be in ~/.composer/cache/repo/https---bower.herokuapp.com-packages.

[update] Would also be great if we could just delegate this to prestissimo, that might also be sufficient.


So, both points mentioned above would essentially skip the update of asset repositories.
Some benchmarks with a really huge project: 336 packages (55 of them are from bower):

  • current default behavior, including github-api-requests ~300-500 secs (honestly, I didn't even let it finish)
  • without API requests ~80 secs
  • with both hotfixes from above <20 secs (!!!)

Would love to hear some feedback about this.

@francoispluchino
Copy link
Member

@schmunk42 Do not forget NPM Repository. Otherwise, your idea is interesting.

Regarding the manipulation of the plugin options, I have already transferred the configuration of the plugin (formerly extra.asset-*) in the config.fxp-asset.* section. Because the config section doesn't change the hash of the locked file composer.lock, which is not the case for the extra section.

Furthermore, I will add support for the global configuration (#273), as well as support of the environment variables to change the configuration, to override the configuration with formatting: FXP_ASSET__<VARIABLE_NAME_WITH_UPPERCASES_AND_UNDERSCORES> (#274).

In this way you will can propose your solution to disable the analysis of the asset dependencies.

@francoispluchino francoispluchino added this to the 1.3.0 milestone Feb 18, 2017
@schmunk42
Copy link
Contributor Author

Do not forget NPM Repository.

I haven't tested this, but it looked to me, like this would also be the way npm works with fxp, simply because composer works that way.

@francoispluchino
Copy link
Member

Your last comment change the behavior directly on the BowerRepository class, so, it's independant of the Composer, and the change is only for Bower in this plugin.

@schmunk42
Copy link
Contributor Author

So, I played around a bit with npm-assets ...

One thing I noticed is, that despite the fact npm is handing out a lot more information than bower, it takes only about 0,2 secs to get that information compared to 0,6 secs for bower (which is only the repo info) - I am speaking of Downloading https://bower.herokuapp.com/packages/bootstrap vs. Downloading https://registry.npmjs.org/bootstrap - these are pretty rough numbers, but they sum up if you have a lot of asset packages.

The rest of the processes are the same, pruning and updating the repo takes the most amount of time, usually ~1-3 secs per asset repository (depending on the size)

Executing command (/Users/tobias/.composer/cache/vcs/https---github.com-twbs-bootstrap.git/): git remote set-url origin 'https://github.com/twbs/bootstrap.git' && git remote update --prune origin

This could also be almost completely eliminated when skipping that check by using a local repo. Another option would be to optimize that step, but that would also involve composer itself.

CC: @tonydspaniard @cebe @samdark @mikehaertl

@schmunk42
Copy link
Contributor Author

Some more findings ...

The line in composer which runs the update is here.
If this line would be missing origin (or if it would be configurable) - we could do something like this:

bildschirmfoto 2017-02-18 um 22 18 55

Taken from https://git-scm.com/docs/git-remote - just configure that origin has not to be updated - or do that manually.

I know that this somehow defeats the purpose of "update" but on the other hand, you do not have to update & purge your local mirror every time - other package manager warn you, if this hasn't been done for some time (eg. two weeks on Macports).

I hardy dare to CC: @Seldaek - but I'd really like to hear his opinion about it.

@francoispluchino
Copy link
Member

@schmunk42 We can avoid making the 5-6 queries by using the cache and checking the update date on the first query (only for the Github driver).

Currently, Each repository addition systematically performs 5 minimum requests (More if several pages of tags and/or branch). Example for Jquery with Bower:

Downloading https://api.github.com/repos/jquery/jquery-dist
Downloading https://api.github.com/repos/jquery/jquery-dist/contents/bower.json?ref=master
Downloading https://api.github.com/repos/jquery/jquery-dist/commits/master
Downloading https://api.github.com/repos/jquery/jquery-dist/tags?per_page=100
Downloading https://api.github.com/repositories/28825109/tags?per_page=100&page=2
Downloading https://api.github.com/repos/jquery/jquery-dist/git/refs/heads?per_page=100

The result of:

Downloading https://api.github.com/repos/jquery/jquery-dist

contains the created_at, updated_at, and pushed_at fields. It's therefore possible to keep the updated date and check whether it has changed or not. Everything happens in the Github driver, so it's compatible with Bower and NPM.

With this cache, each repository addition systematically performs 1 minimum request, and not 5.

Of course, this optimization can come in addition to your proposal.

@schmunk42
Copy link
Contributor Author

All my tests already have the github-no-api option enabled, that was actually the first performance improvement. Because updating takes so much longer when using the API, especially with repos with a huge amount of tags. (Eg. Angular has 1000s of tags resulting in dozens of requests)

IMHO a decent performance can only be achieved, when disabling the API requests at all. And reading the tags from local git clones. I think bower, npm, yarn, ... do the same, or? (not 100% sure)

@dxops
Copy link

dxops commented Feb 20, 2017

Does it impact composer outdated? For now command downloads every tag for every bower dependency and makes command unusable

@francoispluchino
Copy link
Member

francoispluchino commented Feb 20, 2017

As already mentioned in the issue #226, Bower use a shallow clone to download the history truncated to the first level (--depth=1) for all branches (ex. for Angular 1.x, the clone size is 8MiB versus 575MiB, and the resolution of deltas is much faster) or use the git ls-remote to retrieve the list of tags and branches if the shallow clone is not supported by the server or the protocol (http).

Bower can use a 'slow clone' (full clone) or 'fast clone' (shallow clone), see the code here. The 'slow clone' is used if the resolution use a commit id (sha).

Bower verifies whether the server supports shallow cloning. This is done according to the rules found in the following links:

Summary of the rules: (extracted from the source)

  • Protocols like ssh or git always support shallow cloning
  • HTTP-based protocols can be verified by sending a HEAD or GET request to the URI (appended to the URL of the Git repo):
    /info/refs?service=git-upload-pack
  • If the server responds with a Content-Type header of application/x-git-upload-pack-advertisement, the server supports shallow cloning ("smart server")
  • If the server responds with a different content type, the server does not support shallow cloning ("dumb server")
  • Instead of doing the HEAD or GET request using an HTTP client, we're letting Git and Curl do the heavy lifting.
    Calling Git with the GIT_CURL_VERBOSE=2 env variable will provide the Git and Curl output, which includes the content type. This has the advantage that Git will take care of using stored credentials and any additional negotiation that needs to take place.

The above should cover most cases, including BitBucket.

IMHO, this optimization should be directly integrated to Composer, because it will benefits all VCS Repositories.

@samdark
Copy link

samdark commented Feb 20, 2017

That could be a huge improvement in install times for PHP packages as well.

@francoispluchino
Copy link
Member

@schmunk42 Your proposal can be available for the 1.3.0 release ?

@schmunk42
Copy link
Contributor Author

I played around a bit with shallow clones, but the operation mentioned in #251 (comment) doesn't seem faster on a shallow clone. For sure the initial download is faster, but you usually have to take this penalty only once.

I can't really dig it up right now, but I remember a discussion on CocoaPods that shallow clones might be even slower under some circumstances, anyhow ... I don't think they will bring much improvement overall.

What costs so much time is that fxp creates vcs repositories, which are fully scanned on every update by composer. If you have 20 asset repos you need about 10 seconds for downloading the repo info (bower) and ~30 secs for updating the repo (syncMirror) - the rest of the update, like downloading info from packagist and SAT solver take usually less than 10 seconds.

The former two tasks are usually not needed on every update, maybe once a day would be totally fine (or on demand). The (cache) info would be shared across projects on your host.

But when you need to run update, let's say 5 times, because you need to find a matching set of extensions, this becomes really annoying - even more with API requests.

@francoispluchino
Copy link
Member

I was re-examining this problem, and trying to remember why the discussion had stopped on this subject. The shallow clone may become slower, if for example, you have to download each branch and tag to retrieve the .json file. The analysis and downloading could become longer than a full clone (after, there may be other cases, but I have never met them).

We cannot do this with Composer because the solver SAT retrieves the full list of dependencies with all versions even if the version of the package doesn't match. Therefore the lazy loading is not used. And if we implement the clone shallow, to retrieve the .json file from each version, We should clone each branch and each tags analyzed. In this case, it is better to download the archive directly.

So this option is not possible without a modification of the solver SAT (and they do not want to touch it).

To return to your proposal to "disable" the search of update of assets, which in the case that you explosed, and actually interesting, do you have a PR?

@schmunk42
Copy link
Contributor Author

To return to your proposal to "disable" the search of update of assets, which in the case that you explosed, and actually interesting, do you have a PR?

Not yet :) One thing would be to skip syncMirror in composer - could this be solved in a plugin?

@francoispluchino francoispluchino modified the milestones: 1.4.0, 1.3.0 Mar 17, 2017
@francoispluchino
Copy link
Member

Added by #289

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging a pull request may close this issue.

4 participants