Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CLI: Further improve startup performance #1476

Merged
merged 2 commits into from
Aug 23, 2020
Merged

CLI: Further improve startup performance #1476

merged 2 commits into from
Aug 23, 2020

Conversation

Krinkle
Copy link
Member

@Krinkle Krinkle commented Aug 21, 2020

When using a glob or directory argument, we currently recursively traverse the entire working directory (except for .git and node_modules) and then filter out results later with Minimatch. This is inefficient to say the least, and is especially noticable on large repositories.

I looked into a bunch of glob libraries but couldn't find any that uphold our security and simplicity principles, in so far that most of them bring in far too many dependencies that I can't feasibly audit or keep up with and (unfortunately) most likely the library author hasn't audited or trusts deeply either to e.g. not get compromised or taken over in subtle ways.

Except for one, tiny-glob, this seems to fit the bill nicely and would be a great library to boost through the QUnit community. I instrumented it with some checks and confirmed that it isn't just very fast (which some globbers achieve by spawning processes rather than being efficient), but indeed it is also efficient in not traversing directories that it knows cannot match any part of the pattern. e.g. src/ will be entered for **/*.js but will not be entered for test/**/*.js.

@SparshithNR @rwjblue I tried this out on a few small code basis and an artificially bloated directory tree and it seems to work fine. I'd be great if you could give this a try and let me know if you find any issues and/or could measure improvements. Thanks!

@rwjblue
Copy link
Contributor

rwjblue commented Aug 22, 2020

I'll try to test this branch out this weekend or early in the week, but JIC I don't get to it broccoli-typescript-compiler was a good OSS repo to test the speed out in. I think I made a comment over in @SparshithNR's PR with instructions for reproducing a pathological case...

4 of 7 dependencies for the QUnit CLI are 'minimatch' and its
three dependencies. A fairly popular alternative is micromatch's
picomatch library, which is a drop-in replacement for what we
need.

Ref <http://npm.broofa.com/?q=qunit@2.11.0>.
This follows-up 837af39, which introduced the ignore list to
the `--watch` mode when `run.restart()` calls `findFiles()`.

This was a big improvement for watch mode, but it still left two
major issues behind:

1. It did not apply to when running qunit normally (without --watch).
   This could be easily fixed by passing IGNORED_GLOBS in run()
   down to getFilesFromArgs/findFiles the same way.

2. We are still scanning all other top-level directories that are
   not ignored, even if they do not match any part of the glob pattern.

I investigated numerous matching libraries (picomatch, minimatch, etc.)
but it seems none of of them offer a way to determine whether
a given directory could contain a matching file. E.g. something
that returns false for `src/` when given `test/*.js`, but returns
true for `src/` if given `**/*.js`.

So rather than approaching it from the angle of a matching library,
I went looking for various "proper" glob libraries instead which
handle both the pattern matching and the directory scanning
responsibilities together.

I considered:

* isaacs/node-glob <http://npm.broofa.com/?q=glob@7.1.6>.
  10 dependencies, including 3 for minimatch. Not an option.

* mrmlnc/fast-glob <http://npm.broofa.com/?q=fast-glob@3.2.4>
  This uses picomatch, which is promising as being a popular
  dependency-free alternative to minimatch.
  But, it unfortunately does add up to 16 dependencies in total.

* Crafity/node-glob <http://npm.broofa.com/?q=node-glob@1.2.0>
  A mostly self-contained implementation with 2 dependencies
  ('async' and 'glob-to-regexp'). Unfortunately, it seems to
  not do a limited traversal but rather

* terkelg/tiny-glob <http://npm.broofa.com/?q=tiny-glob@0.2.6>
  Another self-contained implementation, with two local
  dependencies by the same author. Claims to be much faster
  than both fast-glob and node-glob. I think we have a winner.

Ref #1384.
@Krinkle
Copy link
Member Author

Krinkle commented Aug 23, 2020

@rwjblue Thanks, I forgot about that one.

# Before
time node ~/Development/qunit/bin/qunit.js 'dist/tests/path-utils-*.js'
real	0m1.097s

time node ~/Development/qunit/bin/qunit.js 'dist/tests/path-utils-*.js'
real	0m1.159s

time node ~/Development/qunit/bin/qunit.js 'dist/tests/path-utils-*.js'
real	0m1.117s
# After
time node ~/Development/qunit/bin/qunit.js 'dist/tests/path-utils-*.js'
real	0m0.100s

time node ~/Development/qunit/bin/qunit.js 'dist/tests/path-utils-*.js'
real	0m0.082s

time node ~/Development/qunit/bin/qunit.js 'dist/tests/path-utils-*.js'
real	0m0.091s

Quite the gain :)

@Krinkle Krinkle merged commit b7fb112 into master Aug 23, 2020
@Krinkle Krinkle deleted the cli-glob branch August 23, 2020 16:53
@Krinkle
Copy link
Member Author

Krinkle commented Aug 26, 2020

4 of 7 dependencies for the QUnit CLI are minimatch and its three dependencies. A fairly popular alternative is micromatch's picomatch library, which is a drop-in replacement for what we need, and with a decent npm score. I've also audited the code and its tests, and it seems pretty solid.

http://npm.broofa.com/?q=qunit@2.11.0
NPMGraph: qunit 2.11.0

@gabrielcsapo
Copy link
Contributor

I just wanted to say thank you so much for this! the CLI is so much faster!!!

} );

watchedFiles.forEach( file => delete require.cache[ path.resolve( file ) ] );
changedPendingPurge.forEach( file => delete require.cache[ path.resolve( file ) ] );
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW, I think we also have to prune its parent modules reference. I know that isn't related to this PR, I'll try to make another issue and / or PR for it specifically.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants