Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RFC] Automatically extract modules shared across multiple bundles into their own bundle #885

Open
jamiebuilds opened this issue Feb 23, 2018 · 11 comments

Comments

8 participants
@jamiebuilds
Copy link
Member

commented Feb 23, 2018

Given the following output from today:

- bundle-1
  - module-1
  - module-2
  - module-3
- bundle-2
  - module-1
  - module-3
  - module-4
- bundle-3
  - module-4
  - module-5

Parcel should extract shared modules into their own bundles which should be side-linked with the bundle they are required in

- bundle-1
  - (link shared-bundle-1 for module-1 and module-3)
  - module-2
- bundle-2
  - (link shared-bundle-1 for module-1 and module-3)
  - (link shared-bundle-2 for module-4)
- bundle-3
  - (link shared-bundle-2 for module-4)
  - module-5
- shared-bundle-1
  - module-1
  - module-3
- shared-bundle-2
  - module-4

Having a linked bundles should modify the corresponding async import sites to include both the imported bundle and the linked bundles.

<script src="bundle-1"/>

Becomes

<script src="shared-bundle-1"/>
<script src="bundle-1"/>

This can produce much more efficient bundles with zero duplication and potentially greatly improves cachability.

- bundle-1
  - module-2
- bundle-2 (empty)
- bundle-3
  - module-5
- shared-bundle-1
  - module-1
  - module-3
- shared-bundle-2
  - module-4

The initial implementation can come in two stages:

  1. If a module is seen in more than one bundle, extract it into its own bundle
  2. If two modules are always seen together, put them in the same extracted bundle

This should all happen automatically without creating a "commons chunk" or anything manually configured. I'm convinced there is an ideal solution here that won't need to be configured any further. But it will probably take us a little while to get it right.


Note: Implementing an equivalent to Webpack's CommonsChunkPlugin has come up a few times. I think we should avoid ever creating something like that.

Creating one giant "vendor" bundle is actually the worst possible solution to this problem. It makes for an extremely low cache hit rate across multiple builds and adds the maximum amount of overhead to every bundle that needs it.

@jamiebuilds

This comment has been minimized.

Copy link
Member Author

commented Feb 23, 2018

Here is some example output from what happens today.

screen shot 2018-02-22 at 5 50 36 pm copy

@starkwang

This comment has been minimized.

Copy link
Contributor

commented Feb 23, 2018

Big +1 on it. This feature is very useful for many multiple page applications.

But one of my concern is how we choose the minimum size of common chunk (as minChunks option in CommonsChunkPlugin). This option is important for the strategy to generate shared-bundle.

If two modules are always seen together, put them in the same extracted bundle

"Two" is not the best for some large multiple page applications that contains 1000+ modules and lots of shared modules. Otherwise it will generate something like this (for worst case):

- bundle-1
  - (link shared-bundle-1 for module-1 and module-3)
  - (link shared-bundle-2 for module-5 and module-7)
  - (link shared-bundle-3 for module-9 and module-11)
  - (link shared-bundle-4 for module-4 and module-6)
  - (link shared-bundle-5 for module-8 and module-10)
  ...
  - module-2

Just notice that too much shared-bundle should also be avoided.

@jamiebuilds

This comment has been minimized.

Copy link
Member Author

commented Feb 23, 2018

Long-term I think it will work out as a good strategy. Those bundles will tend to align with individual packages and their exclusive sub-dependencies. This will cache really well because they will only be invalidated when you upgrade that package.

I spoke to @addyosmani about initial load. Right now V8 kicks in streaming scripts at 30kb, but with the future of parallelized parsing of scripts small modules loaded asynchronously will likely be the bigger win long term. Firefox is already starting to do some of this work with their quantum project.

Additionally with the web packaging format (which I hope will it's mind about WICG/webpackage#6), cache manifests, HTTP2 server push, and more, I expect many small bundles to win out over these massive bundles we're creating today.

Parcel should be aiming for what the web will long term rather than refilling the same space that other bundlers are already filling today. Addy is scheduling a meeting with members of Chrome team to talk about the ideal future and how we can align together over the coming years.

@davidnagli davidnagli added this to Discussion in RFC via automation Mar 1, 2018

@KyleAMathews

This comment has been minimized.

Copy link

commented Mar 5, 2018

@stevefan1999-personal

This comment has been minimized.

Copy link

commented Mar 29, 2018

This wouldn't work out. Dynamically importing all modules independently is not the silver bullet. Let alone the header cost, it will erode the performance of the asset server/CDN as you will have to load a lot of packages through HTTP one by one when in need. Well, I'm not particularly familiar with static servers but my instinct tells me that getting so many connections at once is not a good thing.
It is also not very worth it to have so many cache entries in client-side browser. As far as I know, browser cache is merely implemented by a hash table and the access time will gradually rust, as the table grows bigger and more buckets/links are required.
Even that the downloading action itself is asynchronized, parsing cost is still a huge penalty.

@jamiebuilds

This comment has been minimized.

Copy link
Member Author

commented Mar 29, 2018

@stevefan1999 All of those concerns are things browsers/specs are working to solve right now. We should be working with them towards that goal

@devongovett

This comment has been minimized.

Copy link
Member

commented May 6, 2018

I think we will need to add a few more heuristics here. I tried implementing this strategy and tried it on a fairly large app, which already had 6 split points defined with import(). This resulted in 66 files being produced, basically a JS + CSS + map file for most of the combinations of these 6 (22 JS files for example). So, in order to import one of the 6 split points, something like 11 JS files would need to be loaded in parallel. I guess some of those should probably get combined together in some way.

@devongovett

This comment has been minimized.

Copy link
Member

commented May 6, 2018

The duplication you were seeing was a bug in the logic for hoisting modules. See #1310. Now the modules will be properly hoisted up to the parent bundle and deduped.

This issue should still be left open though since that generally results in one giant bundle of all the common deps at the top rather than splitting things out in parallel.

@devongovett

This comment has been minimized.

Copy link
Member

commented Dec 17, 2018

This is implemented in #2401 and will be part of Parcel 2!

@aputinski

This comment has been minimized.

Copy link

commented Dec 31, 2018

Is this related to the following scenario?

I have an entry point that contains several dynamic imports. Each of those imports share several large dependencies. When I build my project, all of those large shared dependencies are hoisted into my app.js which is effectively defeating the purpose of code splitting? I apologize if I'm misunderstanding.

@dnagir

This comment has been minimized.

Copy link

commented Apr 28, 2019

Just wondering if there is a workaround until Parcel 2 arrives?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.