-
-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: deduplicate assets with buffer source #4712
feat: deduplicate assets with buffer source #4712
Conversation
Note: I would need some help with wiring up the new sample test |
Thanks a lot! Will have a look as soon as I find time. |
Honestly I do not think we need an option and should just leave it on by default. I do not think anyone would turn this option off as I still expect the performance overhead to be small compared to overall build time and as you said, usually you will want to use the hash anyway. |
Codecov Report
@@ Coverage Diff @@
## master #4712 +/- ##
=======================================
Coverage 99.02% 99.02%
=======================================
Files 215 215
Lines 7597 7599 +2
Branches 2106 2104 -2
=======================================
+ Hits 7523 7525 +2
Misses 24 24
Partials 50 50
Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. |
Yes, I'm with you here too. And maybe the potential to break someone is small enough to add this in a minor. The only problem could be that a new asset is deduplicated that wasn't before, but I don't expect anyone to be relying on this. I can remove the option and simplify the code if you would like to go in that direction. |
Definitely, especially since we are already duplicating string assets.
Yes, I would. There are already complaints that we have too many options. |
The option has been removed and the code is now a lot easier to follow. We could avoid the need to pregenerate the file name to find the hash if the type of @lukastaegert I still would need help with the new test assertions, although now the new paths are being tested by several other tests too. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great work, thanks so much! Just one minor comment about when the hashes are generated, and a suggestion for the test.
@@ -0,0 +1,18 @@ | |||
const source = Buffer.from('abc'); | |||
|
|||
module.exports = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As I see it, this test does not really have any assertions and a function
test may not be the best fit for asset deduplication as it does not verify _expected
. Why not just extend the existing chunking-form
test as it was before Rollup 3: https://github.com/rollup/rollup/blob/69ff4181e701a0fe0026d0ba147f31bc86beffa8/test/chunking-form/samples/emit-file/deduplicate-assets/_config.js
We had a test for buffer deduplication, so it should just be green again
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You could also include cross Buffer/string deduplication as well in the test by emitting each asset a third time using the other format.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, nice! I missed the deduplicate-assets
test before. Extended it in rollup/rollup@9dbffe1
(#4712) and also added a different buffer source for completeness.
I also didn't know that Rollup 2 was already deduplicating binary sources. So then there is even less of a reason for the option if this was already how things worked before. Is my understanding correct that cross-buffer/string deduplication didn't happen in Rollup 2 though?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is my understanding correct that cross-buffer/string deduplication didn't happen in Rollup 2 though?
Yes, that is indeed a new feature!
src/utils/FileEmitter.ts
Outdated
); | ||
sourceHash ??= getSourceHash(source); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought: Why not just const sourceHash = getSourceHash(source)
at the top? But then I noticed that we allow specifying a hash size. Now this raises a problem, though an extremely edge case one: If for some reason with a dynamic asset file name function two assets with the same source are given file names with different hash length, they are not recognized as identical.
Going even deeper, why would we even be calling the assetFileNames
function twice in such a situation? What we actually want is that we first calculate the hash, then compare the hash, and only if it does not match we call the function for the name. For this to work reliably, the first hash needs to have the maximum hash length (e.g. by passing Infinity
as the size, which means we do not slice). Then during the actual file name generation we can slice the hash to the required size. What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, the contortions in the code were because of trying to reuse the dynamic hash size.
Your idea is great, implemented it in refactor: precompute hash then slice
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A lot easier to directly check the final changelog, as the code is now a lot closer to the original https://github.com/rollup/rollup/pull/4712/files
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great!
This PR has been released as part of rollup@3.3.0. You can test it via |
This PR contains:
Are tests included?
Breaking Changes?
List any relevant issue numbers:
resolves #4711
Description
See description at #4711
Some questions:
[hash]
inassetFileNames
, so I assume this should be optional.deduplicateBinaryAssets
is a good name here, I added it to start the discussion. It could also be an enum: 'only-strings' | 'best-try' (only deduplicate if there is a[hash]
, not implemented here) | 'always'