Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

experimentalCodeSplitting chunkNaming #1995

Closed
jthoms1 opened this issue Feb 20, 2018 · 19 comments
Closed

experimentalCodeSplitting chunkNaming #1995

jthoms1 opened this issue Feb 20, 2018 · 19 comments

Comments

@jthoms1
Copy link

jthoms1 commented Feb 20, 2018

I would like to have a way of adding hashes to chunkNames to have the ability to set long caches on files in production use cases. Currently chunkNames are generated based on a counter and I do not see a good way of tying into this for changing fileNames.

Without experimentalCodeSplitting you could use a plugin that would use the onwrite lifecycle function. Now that files are referenced between each other this is no longer an option.

I assume resolution would either be a change to lifecycle events to allow for filename changes or to provide some other mechanism to change chunk names.

@guybedford
Copy link
Contributor

Opening up the chunk aliasing hook to plugins sounds like it could be sensible.

@guybedford
Copy link
Contributor

One thing to note is that even with a hook, determining the correct hash would need to know all the details of the modules and their transformations.... so perhaps this is something we should implement as an option internally for that reason?

@jthoms1
Copy link
Author

jthoms1 commented Feb 20, 2018

That would be reasonable as well.

@lukastaegert
Copy link
Member

I had some thoughts about this as well and I actually agree with you, @guybedford. My suggestion would be to combine this feature with a way to specify output chunk names. If we provide a way to specify these names (and paths?) that allows for a "hash" placeholder (+ possibly a length for the hash), we would not even need a new config option.

As for the hashing itself, I think an algorithm similar to the chunking algorithm might do the trick:

  1. Calculate a content hash for each chunk. This initial hash should only depend on the content of this chunk itself, not on its imports
  2. Combine the content hash of each file + all content hashes of its dependencies into a final hash
  3. Make sure all imports are adjusted accordingly

To achieve this, we could first render all chunks (without adjusting import files), then calculate the content hashes and the resulting hashes, then traverse the top level statements again to adjust the import files.

Just some vague ideas, no work has been done in this direction by myself.

@lukastaegert
Copy link
Member

Also I am now convinced this is something that is much easier to do in core rollup than in a plugin.

@guybedford
Copy link
Contributor

The approach described in #1995 (comment) sounds great to me. One of the issues I'm running into here is that we create chunk names as part of the bundle.chunks object that is created before the write process has even started.

So the API is:

rollup.rollup(...): {
  chunks: {
    [name: chunkName]: {
      imports: string[],
      exports: string[],
      modules: string[]
    }
  },
  generate: () => {
    [chunkName: string]: { code, map, ast }
  };
  write: () => void
}

where the import ids include external Ids, but also other chunk ids, which may in turn be dependent on hashing or chunk naming.

What we probably need to do is flip this API around something like:

rollup.rollup(...): {
  generate: () => {
    [name: chunkName]: {
      code: string;
      map: SourceMap;
      ast: Ast;
      imports: string[];
      exports: string[];
      modules: string[];
    }
  },
  write: () => <as above>
}

so that the chunk information is only provided after the write call is run.

Let me know how that sounds!

@guybedford
Copy link
Contributor

@lukastaegert any feedback on this one? Happy to get to work on it if it seems good.

@lukastaegert
Copy link
Member

Will need to think a little more about this (only have a phone available right now). So you still want the .rollup() call to do the actual analysis/tree-shaking so it is still possible to create several bundles from the same analysis result by calling generate several times. But the chunk information should only be available after each call to generate? That seems like a really good idea as it may open doors to other generation features later!

My only wish is to consider not closing any doors to finally making the non-chunking code a special case of the chunking one as the current code duplication here is painful.

Also, might be helpful to put some thoughts into what the config file API for providing chunk names could look like before we go too deep into this. My initial thoughts were to enable a third form of the input property where the user provides an object mapping entry points to output file names (this would be easy to understand for users as it is also very similar to how other tools like webpack handle this). With your suggestion it would make more sense to put the chunk file names into the output properties. In that case, what would be an easy API for specifying names?

Otherwise, please go ahead!

@guybedford
Copy link
Contributor

Will need to think a little more about this (only have a phone available right now). So you still want the .rollup() call to do the actual analysis/tree-shaking so it is still possible to create several bundles from the same analysis result by calling generate several times. But the chunk information should only be available after each call to generate? That seems like a really good idea as it may open doors to other generation features later!

Yes exactly

My only wish is to consider not closing any doors to finally making the non-chunking code a special case of the chunking one as the current code duplication here is painful.

We can potentially make the corresponding API change in the non-code splitting case so that there remains a rough compatbility. Perhaps with the previous bundle imports etc being deprecated for the return values of generate and write eventually. So basically - to do the same thing for single-file builds with backwards compatibility and a deprecation path.

Also, might be helpful to put some thoughts into what the config file API for providing chunk names could look like before we go too deep into this. My initial thoughts were to enable a third form of the input property where the user provides an object mapping entry points to output file names (this would be easy to understand for users as it is also very similar to how other tools like webpack handle this). With your suggestion it would make more sense to put the chunk file names into the output properties. In that case, what would be an easy API for specifying names?

I really like this suggestion, the only issue is how to use it to name chunks. Perhaps we could have a separate chunkNames output option which can provide the chunk pattern. So something like -

rollup.rollup({
  input: {
    'entryOne.js': 'path/to/entry1.js',
    'entryTwo.js': 'path/to/entry2.js'
  }, // while still supporting the array form
  output: {
    dir: 'dist',
    chunkNames: 'chunk-[hash].js' // or a function hook
  }
});

Let me know how that seems.

@lukastaegert
Copy link
Member

The current non-code-splitting API is

rollup.rollup(...): {
  imports: string[],
  exports: string[],
  modules: ModuleJSON[], 
  generate: () => {
    [chunkName: string]: { code, map, ast }
  };
  write: () => void
}

imports and exports do not really make sense in the code-splitting case but modules actually does so we could consider adding it to this case as well. Then the unified backward-compatible API could be

rollup.rollup(...): {
  imports?: string[], // only added if we have a single entry point
  exports?: string[], // only added if we have a single entry point
  modules: ModuleJSON[], 
  
generate: () => {
    code?: string // only added if we have a single entry point
    map?: SourceMap // only added if we have a single entry point
    [name: chunkName]: { // for single entry points, this just reflects the one created chunk
      code: string;
      map: SourceMap;
      imports: string[];
      exports: string[];
      modules: string[];
    }
  },
  write: () => void
}

I really like this suggestion, the only issue is how to use it to name chunks. Perhaps we could have a separate chunkNames output option which can provide the chunk pattern.

Yes, that seems nice!

The only thing where we are wasting flexibility is if we specify the file names for the chunks in the input options instead of the output options. If we create several versions in one run (e.g. an ESM and an CJS version), the entry chunk names would still need to be the same (even though you may want to have e.g. .mjs extensions for the ESM output)

@guybedford
Copy link
Contributor

@lukastaegert note that imports, exports and modules are actually exactly the same thing for both chunks and single-file builds. Imports are just externals when there are no chunks, and include other chunks when code splitting.

I also mistyped the modules item as string[] instead of ModuleJSON[] as it actually is.

Looking more closely at the watcher API it seems like the watcher does use this ModuleJSON before write, but I don't see why the code at https://github.com/rollup/rollup/blob/master/src/watch/index.ts#L200 couldn't be moved to after write at https://github.com/rollup/rollup/blob/master/src/watch/index.ts#L223?

The ModuleJSON does contain ID information, so we'd need to provide this after write/generate for code splitting, and it would be nice to keep the correlation between the APIs.

If so, my suggestion would be for the API:

Single-file case:

interface OuptputChunk {
  code: string;
  map: SourceMap;
  imports: string[];
  exports: string[];
  modules: ModuleJSON[];
};

rollup.rollup(...): {
  // --- these to be deprecated: ---
  imports: string[];
  exports: string[];
  modules: ModuleJSON[];

  generate: () => OutputChunk;
  write: () => OutputChunk;
}

Code-splitting case:

rollup.rollup(...): {
  generate: () => {
    [chunkName: string]: OutputChunk;
  };
  write: () => {
    [chunkName: string]: OutputChunk;
  };
}

Let me know if that seems like it might be an ok approach?

As a reminder - all this is necessary because the hash calculations in chunk dependency trees mean that the rendered output of one chunk affects the import ids of another.

@lukastaegert
Copy link
Member

I still think it might be useful to have a list of included modules available after .rollup in both cases as it can give you a glimpse into the analysis results. Why do you want to deprecate that? Or do you expect this list to change depending on the output? Or is it performance concerns?

Otherwise I am ok with this API. We just need to communicate well that once the experimentalCodeSplitting flag is abandoned, the format of the return value will depend on whether input is an array or not. Or however we want to distinguish between the two cases. The less difference the better IMO.

@guybedford
Copy link
Contributor

guybedford commented Mar 13, 2018

Why do you want to deprecate that? Or do you expect this list to change depending on the output? Or is it performance concerns?

To quote the important bits from my previous comment...

  • As a reminder - all this is necessary because the hash calculations in chunk dependency trees mean that the rendered output of one chunk affects its ID, and hence the IDs of all its parent importing chunks.
  • The ModuleJSON does contain output chunk ID information, so we'd need to provide this after write/generate for code splitting as that is the only point we know the IDs if using hashes, and it would be nice to keep the correlation between the APIs.

Otherwise we can diverge between code-splitting and non-code-splitting as well here.

There could also be the option to return an object when not code splitting too to get the APIs on the same footing.

@lukastaegert
Copy link
Member

The ModuleJSON does contain output chunk ID information

It does? Looks only like module ids to me which are basically the names of source files, aren't they?

Module.toJSON(): ModuleJSON {
		return {
			id: this.id,
			dependencies: this.dependencies.map(module => module.id),
			code: this.code,
			originalCode: this.originalCode,
			originalSourcemap: this.originalSourcemap,
			ast: this.astClone,
			sourcemapChain: this.sourcemapChain,
			resolvedIds: this.resolvedIds
		};
	}

But really, I'm not attached to this. We could just leave out this information until someone tells us why it was better it was included or which were the important parts they relied on.

There could also be the option to return an object when not code splitting too to get the APIs on the same footing.

Yes, that was the direction I was thinking of.

@guybedford
Copy link
Contributor

@lukastaegert you're right yes, its specifically in how to name the chunks to list this information in the code splitting case that is the problem, not in the ModuleJSON[] output itself.

We could always choose not to deprecate imports, exports and modules in the non-code-splitting case certainly. Will leave things like that for now anyway.

I must admit I do still like returning the single chunk in the single-file case for ease of use in the API to not have to think about output file names in pure in-memory generation cases.

Think we've discussed this enough though - will try to get to actually writing code :P Thanks for feedback

@lukastaegert
Copy link
Member

👍

@guybedford
Copy link
Contributor

PR created at #2068.

@guybedford
Copy link
Contributor

Merged in e96f3fa.

@morganney
Copy link

Is it possible to change the extension to .mjs by using output.entryFileNames, for instance?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants