Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve performance #50

Open
sindresorhus opened this issue Oct 22, 2017 · 16 comments
Open

Improve performance #50

sindresorhus opened this issue Oct 22, 2017 · 16 comments

Comments

@sindresorhus
Copy link
Owner

sindresorhus commented Oct 22, 2017

This issue is dedicated to gathering ideas on how we can improve performance.

Some things that might help is to do filtering in the correct order to ensure minimal work is done, and cache and memoize as much as possible. The most important thing is that we skip ignored directories like node_modules as early as possible. It's the source of a lot of performance problems. The hard drive is the bottleneck, so we should ensure we interact with it as little as possible.

Some more ideas in #44 and mrmlnc/fast-glob#15

This Node.js feature request would also help a lot: nodejs/node#15699

@zachleat
Copy link

FWIW, the biggest performance bottleneck in my testing on eleventy (for whatever reason) was the gitignore: true option. Removing this and just looping over .gitignore myself and adding the entries as ! lines was much faster, for whatever reason.

(Love globby—thanks!)

@yaodingyd
Copy link

About filtering out .gitignore patterns before globbing, is translating gitignore rules and adding them to fast-glob ignore the right direction to go?

@fabiospampinato
Copy link

I'm adding this here as it's somewhat related: right now just requiring globby on my system often takes anywhere between 100ms and 250ms, and just installing it on a new directory downloads more than 4MB of stuff.

I've switched to tiny-glob in a project of mine, which takes less than 10ms to require on my system and install less than 100kb of stuff.

Perhaps some heavy dependencies could be removed/replaced.

@sindresorhus
Copy link
Owner Author

About filtering out .gitignore patterns before globbing, is translating gitignore rules and adding them to fast-glob ignore the right direction to go?

I think so, yes.

@sindresorhus
Copy link
Owner Author

I'm adding this here as it's somewhat related: right now just requiring globby on my system often takes anywhere between 100ms and 250ms, and just installing it on a new directory downloads more than 4MB of stuff.

Yeah, that is quite unfortunate. It's because of micromatch, a dependency of fast-glob. See: mrmlnc/fast-glob#154

screen shot 2019-02-15 at 18 41 10

// @mrmlnc

@dwelle
Copy link

dwelle commented Feb 27, 2019

I concur with @zachleat that the gitignore: true has a huge perf cost for whatever reason.

On our codebase:

  1. using gitignore: true:

    ~8sec

    await globby([ `**` ], { gitignore: true });
  2. custom:

    ~100ms

    const ignored = fs.readFileSync(".gitignore", { encoding: "utf8" })
        .split("\n")
        .filter( line => !/^\s*$/.test(line) && !/^\s*#/.test(line))
        .map( line => "!" + line.trim().replace(/^\/+|\/+$/g, ""));
    
    const files = await globby([
        `**`,
        `!.git`,
        ...ignored
    ]);

~8000% difference is a lot...

@janat08
Copy link

janat08 commented May 1, 2019

rollup is said to be good at removing dead code, I don't know if it goes as far as to repackage packages.

@kangax
Copy link

kangax commented Mar 3, 2020

Just ran into this as well in our codebase. Commenting out gitignore leads to an improvement from 500 sec to 2 sec.

@DavidAnson
Copy link

For folks complaining about the performance hit of the Options.gitignore, my understanding of the current implementation is enabling that option adds a blocking preprocessing step that enumerates and parses potentially every .gitignore file in the directory tree regardless of the specified patterns value. It looks like Options.ignore is respected, but for many (perhaps most?) inputs to globby, the scope of this additional work may be vastly larger than what's done when gitignore is not specified.

Some of this additional work is unnecessary in some conditions, but I don't know offhand how difficult it is to determine that at runtime given the recursive nature of .gitignore and patterns.

Reference:

@DavidAnson
Copy link

On our codebase:

  1. using gitignore: true:
    ~8sec
  2. custom:
    ~100ms
    ~8000% difference is a lot...

FYI, this is not an apples-to-apples comparison as the gitignore option for globby honors every .gitignore file in the tree while the custom code honors only the one at the root.

@DavidAnson
Copy link

For scenarios where only the root .gitignore file matters, setting Options.ignoreFiles to ".gitignore" should be very fast as it should avoid the (correct by specification!) ** part of the default behavior.

@fabiospampinato
Copy link

fabiospampinato commented Apr 1, 2024

By coincidence I have recently written a little benchmark comparing globby against tiny-readdir-glob-gitignore, here's the result:

bench

Basically as mentioned in the original post indeed skipping ignored directories as early as possible is significant. Additionally a much faster implementation than ignore is possible, and handling all .gitignore-like files found shouldn't be a problem. All in all when optimized enabling handling of .gitignore-like files should speed things up rather than slow things down in many real world scenarios, I think.

@danielbayley
Copy link

danielbayley commented Apr 2, 2024

By coincidence I have recently written …

@fabiospampinato Nice! I was just in the process of writing my own ideal glob function—based on fs.readdir and picomatch—before coming across tiny-readdir-glob[-gitignore]… So maybe now I don’t have to :)

But just have a couple of questions:

  1. How does performance compare to fdir?
  2. Why the need for tiny-readdir, rather than import {readdir} from 'node:fs/promises'?

@fabiospampinato
Copy link

@danielbayley could you open an issue about that in the other repo? That seems unrelated to globby.

@danielbayley
Copy link

@danielbayley could you open an issue about that in the other repo? That seems unrelated to globby.

@fabiospampinato Fair point. Sure, see fabiospampinato/tiny-readdir#2 and fabiospampinato/tiny-readdir-glob#2.

@LastDragon-ru
Copy link

LastDragon-ru commented Jul 18, 2024

I our case enabling gitignore increase processing time from <1min to ~40min 😱 (because of a lot of php/npm dependencies which added to .gitignore)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

10 participants