experimentalCodeSplitting chunkNaming #1995

jthoms1 · 2018-02-20T14:26:08Z

I would like to have a way of adding hashes to chunkNames to have the ability to set long caches on files in production use cases. Currently chunkNames are generated based on a counter and I do not see a good way of tying into this for changing fileNames.

Without experimentalCodeSplitting you could use a plugin that would use the onwrite lifecycle function. Now that files are referenced between each other this is no longer an option.

I assume resolution would either be a change to lifecycle events to allow for filename changes or to provide some other mechanism to change chunk names.

The text was updated successfully, but these errors were encountered:

guybedford · 2018-02-20T15:28:54Z

Opening up the chunk aliasing hook to plugins sounds like it could be sensible.

guybedford · 2018-02-20T15:30:04Z

One thing to note is that even with a hook, determining the correct hash would need to know all the details of the modules and their transformations.... so perhaps this is something we should implement as an option internally for that reason?

jthoms1 · 2018-02-20T15:39:25Z

That would be reasonable as well.

lukastaegert · 2018-02-20T16:43:06Z

I had some thoughts about this as well and I actually agree with you, @guybedford. My suggestion would be to combine this feature with a way to specify output chunk names. If we provide a way to specify these names (and paths?) that allows for a "hash" placeholder (+ possibly a length for the hash), we would not even need a new config option.

As for the hashing itself, I think an algorithm similar to the chunking algorithm might do the trick:

Calculate a content hash for each chunk. This initial hash should only depend on the content of this chunk itself, not on its imports
Combine the content hash of each file + all content hashes of its dependencies into a final hash
Make sure all imports are adjusted accordingly

To achieve this, we could first render all chunks (without adjusting import files), then calculate the content hashes and the resulting hashes, then traverse the top level statements again to adjust the import files.

Just some vague ideas, no work has been done in this direction by myself.

lukastaegert · 2018-02-20T16:44:37Z

Also I am now convinced this is something that is much easier to do in core rollup than in a plugin.

guybedford · 2018-03-06T14:02:10Z

The approach described in #1995 (comment) sounds great to me. One of the issues I'm running into here is that we create chunk names as part of the bundle.chunks object that is created before the write process has even started.

So the API is:

rollup.rollup(...): {
  chunks: {
    [name: chunkName]: {
      imports: string[],
      exports: string[],
      modules: string[]
    }
  },
  generate: () => {
    [chunkName: string]: { code, map, ast }
  };
  write: () => void
}

where the import ids include external Ids, but also other chunk ids, which may in turn be dependent on hashing or chunk naming.

What we probably need to do is flip this API around something like:

rollup.rollup(...): {
  generate: () => {
    [name: chunkName]: {
      code: string;
      map: SourceMap;
      ast: Ast;
      imports: string[];
      exports: string[];
      modules: string[];
    }
  },
  write: () => <as above>
}

so that the chunk information is only provided after the write call is run.

Let me know how that sounds!

guybedford · 2018-03-10T11:37:25Z

@lukastaegert any feedback on this one? Happy to get to work on it if it seems good.

lukastaegert · 2018-03-10T16:44:45Z

Will need to think a little more about this (only have a phone available right now). So you still want the .rollup() call to do the actual analysis/tree-shaking so it is still possible to create several bundles from the same analysis result by calling generate several times. But the chunk information should only be available after each call to generate? That seems like a really good idea as it may open doors to other generation features later!

My only wish is to consider not closing any doors to finally making the non-chunking code a special case of the chunking one as the current code duplication here is painful.

Also, might be helpful to put some thoughts into what the config file API for providing chunk names could look like before we go too deep into this. My initial thoughts were to enable a third form of the input property where the user provides an object mapping entry points to output file names (this would be easy to understand for users as it is also very similar to how other tools like webpack handle this). With your suggestion it would make more sense to put the chunk file names into the output properties. In that case, what would be an easy API for specifying names?

Otherwise, please go ahead!

guybedford · 2018-03-12T12:06:22Z

Will need to think a little more about this (only have a phone available right now). So you still want the .rollup() call to do the actual analysis/tree-shaking so it is still possible to create several bundles from the same analysis result by calling generate several times. But the chunk information should only be available after each call to generate? That seems like a really good idea as it may open doors to other generation features later!

Yes exactly

My only wish is to consider not closing any doors to finally making the non-chunking code a special case of the chunking one as the current code duplication here is painful.

We can potentially make the corresponding API change in the non-code splitting case so that there remains a rough compatbility. Perhaps with the previous bundle imports etc being deprecated for the return values of generate and write eventually. So basically - to do the same thing for single-file builds with backwards compatibility and a deprecation path.

Also, might be helpful to put some thoughts into what the config file API for providing chunk names could look like before we go too deep into this. My initial thoughts were to enable a third form of the input property where the user provides an object mapping entry points to output file names (this would be easy to understand for users as it is also very similar to how other tools like webpack handle this). With your suggestion it would make more sense to put the chunk file names into the output properties. In that case, what would be an easy API for specifying names?

I really like this suggestion, the only issue is how to use it to name chunks. Perhaps we could have a separate chunkNames output option which can provide the chunk pattern. So something like -

rollup.rollup({
  input: {
    'entryOne.js': 'path/to/entry1.js',
    'entryTwo.js': 'path/to/entry2.js'
  }, // while still supporting the array form
  output: {
    dir: 'dist',
    chunkNames: 'chunk-[hash].js' // or a function hook
  }
});

Let me know how that seems.

lukastaegert · 2018-03-12T12:55:22Z

The current non-code-splitting API is

rollup.rollup(...): {
  imports: string[],
  exports: string[],
  modules: ModuleJSON[], 
  generate: () => {
    [chunkName: string]: { code, map, ast }
  };
  write: () => void
}

imports and exports do not really make sense in the code-splitting case but modules actually does so we could consider adding it to this case as well. Then the unified backward-compatible API could be

rollup.rollup(...): {
  imports?: string[], // only added if we have a single entry point
  exports?: string[], // only added if we have a single entry point
  modules: ModuleJSON[], 
  
generate: () => {
    code?: string // only added if we have a single entry point
    map?: SourceMap // only added if we have a single entry point
    [name: chunkName]: { // for single entry points, this just reflects the one created chunk
      code: string;
      map: SourceMap;
      imports: string[];
      exports: string[];
      modules: string[];
    }
  },
  write: () => void
}

I really like this suggestion, the only issue is how to use it to name chunks. Perhaps we could have a separate chunkNames output option which can provide the chunk pattern.

Yes, that seems nice!

The only thing where we are wasting flexibility is if we specify the file names for the chunks in the input options instead of the output options. If we create several versions in one run (e.g. an ESM and an CJS version), the entry chunk names would still need to be the same (even though you may want to have e.g. .mjs extensions for the ESM output)

guybedford · 2018-03-13T13:01:34Z

@lukastaegert note that imports, exports and modules are actually exactly the same thing for both chunks and single-file builds. Imports are just externals when there are no chunks, and include other chunks when code splitting.

I also mistyped the modules item as string[] instead of ModuleJSON[] as it actually is.

Looking more closely at the watcher API it seems like the watcher does use this ModuleJSON before write, but I don't see why the code at https://github.com/rollup/rollup/blob/master/src/watch/index.ts#L200 couldn't be moved to after write at https://github.com/rollup/rollup/blob/master/src/watch/index.ts#L223?

The ModuleJSON does contain ID information, so we'd need to provide this after write/generate for code splitting, and it would be nice to keep the correlation between the APIs.

If so, my suggestion would be for the API:

Single-file case:

interface OuptputChunk {
  code: string;
  map: SourceMap;
  imports: string[];
  exports: string[];
  modules: ModuleJSON[];
};

rollup.rollup(...): {
  // --- these to be deprecated: ---
  imports: string[];
  exports: string[];
  modules: ModuleJSON[];

  generate: () => OutputChunk;
  write: () => OutputChunk;
}

Code-splitting case:

rollup.rollup(...): {
  generate: () => {
    [chunkName: string]: OutputChunk;
  };
  write: () => {
    [chunkName: string]: OutputChunk;
  };
}

Let me know if that seems like it might be an ok approach?

As a reminder - all this is necessary because the hash calculations in chunk dependency trees mean that the rendered output of one chunk affects the import ids of another.

lukastaegert · 2018-03-13T13:25:10Z

I still think it might be useful to have a list of included modules available after .rollup in both cases as it can give you a glimpse into the analysis results. Why do you want to deprecate that? Or do you expect this list to change depending on the output? Or is it performance concerns?

Otherwise I am ok with this API. We just need to communicate well that once the experimentalCodeSplitting flag is abandoned, the format of the return value will depend on whether input is an array or not. Or however we want to distinguish between the two cases. The less difference the better IMO.

guybedford · 2018-03-13T13:35:40Z

Why do you want to deprecate that? Or do you expect this list to change depending on the output? Or is it performance concerns?

To quote the important bits from my previous comment...

As a reminder - all this is necessary because the hash calculations in chunk dependency trees mean that the rendered output of one chunk affects its ID, and hence the IDs of all its parent importing chunks.
The ModuleJSON does contain output chunk ID information, so we'd need to provide this after write/generate for code splitting as that is the only point we know the IDs if using hashes, and it would be nice to keep the correlation between the APIs.

Otherwise we can diverge between code-splitting and non-code-splitting as well here.

There could also be the option to return an object when not code splitting too to get the APIs on the same footing.

lukastaegert · 2018-03-13T14:03:22Z

The ModuleJSON does contain output chunk ID information

It does? Looks only like module ids to me which are basically the names of source files, aren't they?

Module.toJSON(): ModuleJSON {
		return {
			id: this.id,
			dependencies: this.dependencies.map(module => module.id),
			code: this.code,
			originalCode: this.originalCode,
			originalSourcemap: this.originalSourcemap,
			ast: this.astClone,
			sourcemapChain: this.sourcemapChain,
			resolvedIds: this.resolvedIds
		};
	}

But really, I'm not attached to this. We could just leave out this information until someone tells us why it was better it was included or which were the important parts they relied on.

There could also be the option to return an object when not code splitting too to get the APIs on the same footing.

Yes, that was the direction I was thinking of.

guybedford · 2018-03-13T14:16:15Z

@lukastaegert you're right yes, its specifically in how to name the chunks to list this information in the code splitting case that is the problem, not in the ModuleJSON[] output itself.

We could always choose not to deprecate imports, exports and modules in the non-code-splitting case certainly. Will leave things like that for now anyway.

I must admit I do still like returning the single chunk in the single-file case for ease of use in the API to not have to think about output file names in pure in-memory generation cases.

Think we've discussed this enough though - will try to get to actually writing code :P Thanks for feedback

lukastaegert · 2018-03-13T15:22:55Z

👍

guybedford · 2018-03-19T17:14:47Z

PR created at #2068.

guybedford · 2018-03-26T06:49:53Z

Merged in e96f3fa.

morganney · 2023-08-10T19:27:21Z

Is it possible to change the extension to .mjs by using output.entryFileNames, for instance?

jthoms1 mentioned this issue Feb 20, 2018

Code splitting - chunk file names should be versioned ionic-team/stencil#486

Closed

1 task

guybedford mentioned this issue Mar 6, 2018

Allow for system format modules to be named. Fixes #2027 #2028

Merged

guybedford mentioned this issue Mar 12, 2018

Code-Splitting Features Feedback #2049

Closed

guybedford mentioned this issue Mar 19, 2018

Chunk naming with hashing #2068

Merged

guybedford added the code splitting label Mar 20, 2018

guybedford closed this as completed Mar 26, 2018

gregmagolan mentioned this issue Jul 5, 2018

Rollup code splitting bazelbuild/rules_nodejs#243

Merged

pzuraq mentioned this issue Jan 25, 2019

[FEAT] Adds initial implementation ember-cli/ember-cli-babel-polyfills#1

Merged

philipwalton mentioned this issue May 5, 2019

Chunk hashes should be based on the output #2839

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

experimentalCodeSplitting chunkNaming #1995

experimentalCodeSplitting chunkNaming #1995

jthoms1 commented Feb 20, 2018

guybedford commented Feb 20, 2018

guybedford commented Feb 20, 2018

jthoms1 commented Feb 20, 2018

lukastaegert commented Feb 20, 2018

lukastaegert commented Feb 20, 2018

guybedford commented Mar 6, 2018

guybedford commented Mar 10, 2018

lukastaegert commented Mar 10, 2018

guybedford commented Mar 12, 2018

lukastaegert commented Mar 12, 2018

guybedford commented Mar 13, 2018

lukastaegert commented Mar 13, 2018

guybedford commented Mar 13, 2018 •

edited

Loading

lukastaegert commented Mar 13, 2018

guybedford commented Mar 13, 2018

lukastaegert commented Mar 13, 2018

guybedford commented Mar 19, 2018

guybedford commented Mar 26, 2018

morganney commented Aug 10, 2023

experimentalCodeSplitting chunkNaming #1995

experimentalCodeSplitting chunkNaming #1995

Comments

jthoms1 commented Feb 20, 2018

guybedford commented Feb 20, 2018

guybedford commented Feb 20, 2018

jthoms1 commented Feb 20, 2018

lukastaegert commented Feb 20, 2018

lukastaegert commented Feb 20, 2018

guybedford commented Mar 6, 2018

guybedford commented Mar 10, 2018

lukastaegert commented Mar 10, 2018

guybedford commented Mar 12, 2018

lukastaegert commented Mar 12, 2018

guybedford commented Mar 13, 2018

lukastaegert commented Mar 13, 2018

guybedford commented Mar 13, 2018 • edited Loading

lukastaegert commented Mar 13, 2018

guybedford commented Mar 13, 2018

lukastaegert commented Mar 13, 2018

guybedford commented Mar 19, 2018

guybedford commented Mar 26, 2018

morganney commented Aug 10, 2023

guybedford commented Mar 13, 2018 •

edited

Loading