Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Plugin cache #2389

Merged
merged 20 commits into from
Aug 12, 2018
Merged

Plugin cache #2389

merged 20 commits into from
Aug 12, 2018

Conversation

guybedford
Copy link
Contributor

@guybedford guybedford commented Aug 8, 2018

This provides a plugin cache on plugin contexts with the API:

class PluginCache {
  get (key: string): any;
  set (key: string, value: any): void;
  has (key: string): boolean;
  delete (key: string): boolean;
}

The cache is stored in bundle.cache.plugins[cacheKey] as a plain object, which is not directly accessible.

Each value in the cache is stored as [number, value], where the first number indicates the access count. This is zero if it was accessed in the last build, and is incremented each build. When it reaches 10 (10 builds without any access through get, set, or has) we automatically evict the cache item.

The cache key for a plugin is plugin.cacheKey || plugin.name. When using plugin.name this will throw if on duplicate or missing plugin names as soon as cache access is attempted (a get / set / has / delete call).

Two instances of the same plugin can support sharing a cache instance in the same build provided they use a plugin.cacheKey and not a plugin.name.

There are some complexities around supporting the transform caching and transform dependencies stuff. We basically detect this.cache usage in transform and then fully opt-out of the automatic transform caching - no transform dependencies invalidations, no asset reemissions etc.

The hope is to eventually deprecate the automatic transform caching along with its invalidations and asset reemissions. To prepare for this watchFiles has been refactored in the build as well, with the goal to just always rerunning all hooks and relying on the plugin internal cache.

Edit - New additions:

  • cacheExpires option: allows setting an integer representing how many builds should run before unused keys are cleared from the plugin cache
  • this.addWatchFile in plugins - allows a more generic feature for plugin watcher dependencies, as previously tracked in Consider supporting transformDependencies for load #2363. Again this aligns with moving away from special transform handling and deprecating both transform dependencies and auto caching.


let hasLoadersOrTransforms = false;

const pluginContexts = plugins.map(plugin => {
let cacheable = true;
if (typeof plugin.name === 'string') {
if (existingPluginKeys.indexOf(plugin.name) === -1) cacheable = false;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this correct? Do you mean !== -1 here? I'm a little confused at this block of code.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup, did not mean to imply this was bug-free just yet :)

if (typeof plugin.name === 'string') {
if (existingPluginKeys.indexOf(plugin.name) !== -1) cacheable = false;
existingPluginKeys.push(plugin.name);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This block might be better to start with cacheable = false, then setting it to true if it is unique. Currently, if I'm following the logic correctly, plugins with no name will be left in a state of cachable = true

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this was half-baked and missed the logic I intended wrt cacheKey allowing shared instance caches between multiple instances of the same plugin in the build.

@guybedford guybedford force-pushed the plugin-cache branch 2 times, most recently from 67ae9d3 to fad9921 Compare August 8, 2018 12:54
@shellscape
Copy link
Contributor

This is shaping up nicely 👍

@guybedford
Copy link
Contributor Author

Ok the tests are up, ready for review.


let hasLoadersOrTransforms = false;

const pluginContexts = plugins.map(plugin => {
let cacheable = true;
if (typeof plugin.cacheKey === 'string') {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an empty condition. Is it necessary?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a ruling-out case for the others to avoid a repeated check. Could be done better certainly, but that was the logic that made sense to me.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've "conventionalized" it :P

@guybedford
Copy link
Contributor Author

I've updated the description in the PR now as well so it is clear what is being provided here.

src/Graph.ts Outdated
@@ -34,6 +36,9 @@ import relativeId, { getAliasName } from './utils/relativeId';
import { timeEnd, timeStart } from './utils/timers';
import transform from './utils/transform';

// clear plugin cache items after 10 builds unnaccessed
const PLUGIN_CACHE_EVICTION_EXPIRY = 10;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be good to see this number configurable; CI machines might build dozens of branches per day may benefit from larger numbers so they aren't evicting cache quite as much.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@keithamus would that be better as an environment variable, or configuration option?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess a configuration option - considering pretty much everything else is. With a rollup.config.js it's relatively easy to expose it as a environment variable.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added a cacheExpiry configuration option.

Copy link
Member

@lukastaegert lukastaegert left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! The only thing that bothers me somewhat is the cache expiry: What use should this have?

If we want to prevent invalid caches by resetting the cache after n runs, why would there be no invalid cache after n-1 runs? Or is the idea that if the watcher does not trigger, people should just keep making changes until they hit the expiry limit?

I would rather have we double down on making sure that the cache is always cleared in all necessary situations. Otherwise this will also mask errors in the caching that people will probably not report to us because they consider them to be "slip-ups" while making it slower for everyone.

@@ -353,9 +364,14 @@ export interface OutputChunk {
map?: SourceMap;
}

export interface SerialisablePluginCache {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Talking about rebelliously European spelling...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have to double down on that stuff :P

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah serialize is British English - will change.

@@ -96,6 +97,11 @@ export function createAssetPluginHooks(
const asset: Asset = { name, source, fileName: undefined, dependencies, transform: null };
if (outputBundle && source !== undefined) finaliseAsset(asset, outputBundle, assetFileNames);
assetsById.set(assetId, asset);
if (!asset.transform && asset.dependencies && asset.dependencies.length) {
for (const depId of asset.dependencies) {
if (watchFiles.indexOf(depId) === -1) watchFiles.push(depId);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe use an object to store the names of watched files for slightly better performance? But probably only a micro optimization

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would also provide automatic deduplication, though.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, added.

cache[id] = [0, value];
},
delete(id: string) {
return delete cache[id];
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As we only check for thruthiness in .has, maybe overwrite deleted keys with null instead of actually deleting to avoid lots of hidden class transformations and deoptimizations in the engine?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Objects with lots of key changes are treated as "dictionary objects" with hash table keys (which is exactly what we need here). In that case the delete cost is the delete cost of the hash table which shouldn't be a problem. Also we do want to evict unused keys to avoid cache bloat.

@guybedford
Copy link
Contributor Author

If we want to prevent invalid caches by resetting the cache after n runs, why would there be no invalid cache after n-1 runs? Or is the idea that if the watcher does not trigger, people should just keep making changes until they hit the expiry limit?

The cache expiry is used to avoid memory bloat, not to avoid false positives. That is, when we persist the cache to disk, then we need to ensure that the size of the cache is not going to be monotonic. So every time an entry is touched we bump its counter, and if it hasn't been touched for 10 builds, we evict it to handle this case.

Definitely worth making sure this is properly documented though, given the ease of confusion here.

@guybedford
Copy link
Contributor Author

Included feedback, and rebased to plugin-driver.

@lukastaegert
Copy link
Member

Definitely worth making sure this is properly documented though, given the ease of confusion here

Yes indeed. In that case, I would strongly suggest to make this an experimental option for now to signal we might still change how this is configured. Does not feel like something people should play around with usually and in case there are reasons to change it, maybe there is a better way to configure it than with a number but we do not know yet

@tivac
Copy link
Contributor

tivac commented Aug 12, 2018

How will using this.addWatchFile work in practice? Currently with transform dependencies if a dependency changes the build will restart and transform will be called for the module that registered that dependency. If that approach is now deprecated I'd like to better understand the replacement since I'll need to port over to it and the previous behavior, while not ideal, eventually worked for my needs.

@lukastaegert lukastaegert changed the base branch from plugin-driver to master August 12, 2018 08:54
@lukastaegert
Copy link
Member

Sorry, forgot to change the base of this branch before merge.

@lukastaegert lukastaegert reopened this Aug 12, 2018
@guybedford
Copy link
Contributor Author

Just rebasing, will merge shortly.

@guybedford
Copy link
Contributor Author

@tivac

How will using this.addWatchFile work in practice? Currently with transform dependencies if a dependency changes the build will restart and transform will be called for the module that registered that dependency. If that approach is now deprecated I'd like to better understand the replacement since I'll need to port over to it and the previous behavior, while not ideal, eventually worked for my needs.

So the idea is to deprecate both the current transform dependencies and transform cache, eventually, but the deprecation path hasn't even started yet so no need to worry here things will continue to work fine even in 1.0 most likely for a 2.0 removal.

The new approach is:

  • addWatchFile adds a file for the watcher to invalidate the build on. This invalidation does not affect the build caching at all though, unlike currently where the transform dependencies invalidate the transform cache.
  • Instead of the transform hook by default being cached, all hooks will always be re-run with cache information at this.cache. The plugin has its own responsibility to emit assets and check this.cache to handle cached running.
  • The full separation between addWatchFile and transform cache will provide more flexible fine-grained cache behaviours.

@guybedford guybedford merged commit 4d491a4 into master Aug 12, 2018
@guybedford guybedford deleted the plugin-cache branch August 12, 2018 12:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants