Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature Request: Add 'onRebuildStart' to Watch mode options #1409

Closed
wessberg opened this issue Jun 30, 2021 · 6 comments
Closed

Feature Request: Add 'onRebuildStart' to Watch mode options #1409

wessberg opened this issue Jun 30, 2021 · 6 comments

Comments

@wessberg
Copy link

wessberg commented Jun 30, 2021

Hey there.

Watch mode works great in the current version of esbuild.
However, I would very much like a way to provide a callback that will be invoked immediately before an incremental rebuild, after the file watcher has detected a change.

Ideally, I'd like the WatchMode options to extended with a onRebuildStart hook that can be used like this:

build({
  // ...
  watch: {
    onRebuildStart({changes}) {
      console.log("Files that triggered rebuild:", changes);
    },
    onRebuild(error, result) {
      if (err != null) {
        console.error("build failed:", error);
      } else {
        console.log("build succeeded:", result);
      }
    }
  }
});

Ideally, takes as argument an object with a property, changes, that is the Set of files that changed and triggered a rebuild. This is useful, for example, for build pipelines in which some assets are cached and the end-user may want to invalidate those caches as the files change. Having a start hook is also useful for logging to the end-user that a build is currently active.

For consistency, it may be best if there's an onRebuildEnd callback hook too, and that onRebuild is as an alias for it.

@evanw
Copy link
Owner

evanw commented Jun 30, 2021

I'm actually planning on removing onRebuild since it's sort of redundant with the onStart and onEnd callbacks that already exist for plugins. I'd like for there to be one obvious way of doing things that works the same way across different modes (both watching and non-watching). This unification of the API will also help enable other use cases as well such as using watch mode in combination with other modes, which is something people have requested (such as #805).

So I recommend doing what you are asking for by just creating a small plugin with onStart and onEnd callbacks. You can do this today without needing any changes to esbuild:

build({
  // ...
  plugins: [{
    name: 'start/end',
    setup(build) {
      build.onStart(() => { /* ... */ })
      build.onEnd(result => { /* ... */ })
    },
  }],
});

In the future I may add a shortcut for doing this where you can add top-level onStart and onEnd callbacks inline in the options object that basically have the same effect as making a simple plugin. But that shortcut doesn't currently exist yet.

Ideally, takes as argument an object with a property, changes, that is the Set of files that changed and triggered a rebuild.

This set of files that changed and triggered a rebuild cannot be provided because it's never calculated. A rebuild is triggered immediately when the first change is detected, so there is only ever one file change detected per build. For performance, esbuild doesn't continue scanning all files for changes before starting a build (since watch mode is polling-based instead of change-based). See also #1165.

A similar-ish thing that could be provided is the set of input files that ended up being different from the previous build after the current build finishes. For example the metafile data could potentially have all such files marked with changed: true. This is something that I have thought about potentially adding. It would also be independent of whether esbuild is currently in watch mode or not, which is also a better design IMO.

@wessberg
Copy link
Author

wessberg commented Jul 1, 2021

Hey there,
Thanks for the quick follow-up.

It makes perfect sense to keep this functionality in plugins, for coherence.

A similar-ish thing that could be provided is the set of input files that ended up being different from the previous build after the current build finishes.

The reason why I'm interested in the files before the next rebuild is that I want to be able to clean up caches prior to compilation.

Let me explain my use-case.

Because esbuild doesn't only invoke plugin onLoad hooks with the changed files (which as you say isn't calculated), but rather all of the files matching the filter, the user may implement their own caching layer to avoid unnecessarily redundant computations for unchanged files.

For example, consider a simple plugin that compiles .scss files:

{
	name: "styles",

	setup(build) {
		build.onLoad({ filter: /.*\.scss/ }, async args => {
                         // Check for a cache hit
			if (cache.has(args.path)) {
				return { contents: cache.get(args.path) };
			}

                        // Otherwise, recompile
                        const contents = await doThings(readFileSync(args.path, "utf8"));
                        cache.set(args.path, contents);
                        return {contents};

		});
	}
}

In this example, because the onLoad hook is invoked for every file with a .scss extension inside the compilation unit, for every rebuild, a good optimization is a little cache. But we'll need to know when to remove a from the cache. We could either read the files from disk and compare their contents/checksums, which will require a lot of IO operations, but ideally we can leverage the File System watcher to remove the file(s) from the cache before the plugin hooks are called.

Currently, I have to run my own watcher, building upon the incremental option, because I need to know which files changed. Do you see a way forward with esbuilds native watch mode where something like this might be achieved at some point? I realize this may warrant its own issue instead at this point.

@evanw
Copy link
Owner

evanw commented Jul 1, 2021

We could either read the files from disk and compare their contents/checksums, which will require a lot of IO operations

That's the way you are supposed to do this. There's example code here if you need it: https://esbuild.github.io/plugins/#caching-your-plugin. The built-in watch mode implementation in esbuild works via polling so you actually wouldn't be avoiding any I/O by having esbuild determine the set of changed files instead of you doing it yourself. Polling is used both because it's maximally portable and because I've heard that native file system change notification APIs can be flaky (so relying on them for cache invalidation decisions can lead to incorrect behavior).

If you really don't want to use that approach, you are welcome to call into the native system watcher APIs for your operating system yourself and use that to invalidate your cache. In that case you would enable esbuild's incremental build API instead of esbuild's watch mode API and you would call rebuild() when the system file watcher detects a change. More info here: https://esbuild.github.io/api/#incremental.

@wessberg
Copy link
Author

wessberg commented Jul 1, 2021

I see. While it makes sense to offload this responsibility to plugins, I'm thinking about how this may perform in cases where there's multiple plugins in the chain, some of which the end-user doesn't control, and several of them are generating and comparing checksums for the files their respective onLoad hooks are invoked with, rather than them having a way to rely on some shared data, for example on the metafile, such that this cost is only "paid once". One might think of a scenario where maybe there's a "styles" plugin, and a "json" plugin, and maybe some third esoteric thing, and they are all implementing their own caching strategies. What are your thoughts on this? For context, I'm coming from Rollup, where the transform plugin hooks are called only with those files that Rollup decides have changed/need recompilation, which means that plugins always work on the maximally minimal set of input files.

@evanw
Copy link
Owner

evanw commented Jul 1, 2021

several of them are generating and comparing checksums for the files their respective onLoad hooks are invoked with

The I/O overhead probably still only happens once per file because different plugins should usually apply to different files. Styles plugins will probably apply to .css files and JSON plugins will probably apply to .json files. And esbuild doesn't check the contents of the file first before running plugins on it since for correctness reasons, esbuild doesn't make any assumptions about what plugins do. So each file is generally only read once.

It would be incorrect for esbuild to skip running a plugin when the input file hasn't changed. For example, you might have a SASS plugin that bundles CSS @import rules using package path resolution that implicitly depends on the contents of a Yarn 2 .pnp.js file. In that case esbuild has no way of knowing about this implicit relationship the plugin has with the .pnp.js file and it would be incorrect to skip plugin evaluation if the input file hasn't changed. In this case the plugin also probably doesn't know about this implicit relationship because of how Yarn 2 monkey-patches everything, so it's not a matter of the plugin just telling esbuild about its dependencies. Cache invalidation can get very complicated and is best handled by the plugin itself. In this case the plugin would likely need to always re-run require.resolve (which may be monkey-patched by Yarn 2) on each @import and only reuse the cache entry if all of those resolved paths and their file contents are also the same.

they are all implementing their own caching strategies

It seems most appropriate to me for plugins to all implement their own caching strategies because plugins can do completely different things that may require custom caching strategies. A plugin could embed the current date into file every build and should never be cached. The result of a plugin may involve making a HTTP request and evaluating cache-control headers. Plugins for languages that are extremely slow to compile such as Rust may even want to look up into a distributed build cache.

For context, I'm coming from Rollup, where the transform plugin hooks are called only with those files that Rollup decides have changed/need recompilation

I'm not familiar with Rollup but it looks like the transform plugin hook takes the contents of the file instead of the path. That means Rollup has already done the I/O for the file to read the file contents. The equivalent in esbuild would be an onLoad callback that reads the contents of the file and then only does the transform step if the contents of the file are different than last time.

@wessberg
Copy link
Author

wessberg commented Jul 1, 2021

The I/O overhead probably still only happens once per file because different plugins should usually apply to different files. Styles plugins will probably apply to .css files and JSON plugins will probably apply to .json files.

It's a great point you're making here, though I will say ultimately I expect the fastest approach to be not to read the files at all, but rather react to file watcher events and clean up caches as a reaction. In projects with thousands of files, it feels like a bit much to read and diff thousands of files for every change to a single file. Especially because early adopters of esbuild I would assume are quite often those, like my team, who have very large code bases that takes a very long time to build, comparably, with alternative bundlers. And in my case at least, I'm operating with code bases with a substantial amount of files that needs to be diffed for every change.

In my case, it has been the faster approach to build a custom watch mode that cleans caches as a reaction to file changes reported by chokidar, and that leverages the incremental option and manually calls rebuild when a change is detected. But naturally I would love to stop doing that and just build upon esbuild's built-in watch mode.

I'm not familiar with Rollup but it looks like the transform plugin hook takes the contents of the file instead of the path.

Correct, transform hooks are invoked with both the path and the resolved contents on that path which may or may not come from a file system lookup, or from another plugin.

It would be incorrect for esbuild to skip running a plugin when the input file hasn't changed. For example, you might have a SASS plugin that bundles CSS @import rules using package path resolution that implicitly depends on the contents of a Yarn 2 .pnp.js file. In that case esbuild has no way of knowing about this implicit relationship the plugin has with the .pnp.js file and it would be incorrect to skip plugin evaluation if the input file hasn't changed

So, the way these things are solved in Rollup is, first, that the dependency graphs of files are discovered on the fly and then added to Rollup's internal module graph and file watcher. And for implicit dependencies, plugins can notify Rollup that a file depends on another file via the plugin API such that when the file watcher detects a change for file A, then file B is marked as dirty and the transform hooks are called with both files.

Ultimately, I think this issue can be closed, as you've answered my questions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants