Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

enhance mangle #5359

Merged
merged 1 commit into from Feb 19, 2022
Merged

enhance mangle #5359

merged 1 commit into from Feb 19, 2022

Conversation

alexlamsl
Copy link
Collaborator

No description provided.

@kzc
Copy link
Contributor

kzc commented Feb 19, 2022

I spent a half hour studying this:

UglifyJS/lib/scope.js

Lines 659 to 673 in ef0fcfd

if (to_mangle.length > cutoff) {
var indices = to_mangle.map(function(def, index) {
return index;
}).sort(function(i, j) {
return to_mangle[j].references.length - to_mangle[i].references.length || i - j;
});
to_mangle = indices.slice(0, cutoff).sort(function(i, j) {
return i - j;
}).map(function(index) {
return to_mangle[index];
}).concat(indices.slice(cutoff).sort(function(i, j) {
return i - j;
}).map(function(index) {
return to_mangle[index];
}));

So if I'm not mistaken, within each block you're moving the variables that have the most references to the front of the mangle list, and if within the cutoff they are sorted by initial index in said list, and then the remaining variables after the cutoff are sorted by their initial indexes and then appended. The reason being that often repeated variables ought to receive the shortest mangled names. Did I get the gist of it?

I wonder if proximity between instances of the same variable and contiguous runs of uses of a given variable without intervening differing variables should also be taken into account, as it impacts the effectiveness of the sliding LZW compression window.

@alexlamsl
Copy link
Collaborator Author

The reason being that often repeated variables ought to receive the shortest mangled names. Did I get the gist of it?

Basically yes − it's a fight between locality vs. total byte count, which is amplified by the recent advances in inline.

With one exception (pending further investigation), the uglified size monotonically decreases with cutoff from 0 to 54, i.e. the number of single-character variable names.

I wonder if proximity between instances of the same variable and contiguous runs of uses of a given variable without intervening differing variables should also be taken into account, as it impacts the effectiveness of the sliding LZW compression window.

That's part of the consideration when brainstorming for this, with the view that restoring (declared) order being a good approximation, since we don't have direct access to information regarding "contiguous runs of uses" in mangle_names().

Any ideas as to how to retrieve said metrics? We do perform a full tree traversal in compute_char_frequency(), so we may be able to do so efficiently over there.

@alexlamsl
Copy link
Collaborator Author

OT: I'm encountering these "skipped" macOS jobs with increasing frequency, more than half a dozen per day for the past week or so 😕

@alexlamsl alexlamsl merged commit a7d0616 into mishoo:master Feb 19, 2022
@alexlamsl alexlamsl deleted the mangle-cutoff branch February 19, 2022 06:02
@kzc
Copy link
Contributor

kzc commented Feb 19, 2022

Any ideas as to how to retrieve said metrics? We do perform a full tree traversal in compute_char_frequency(), so we may be able to do so efficiently over there.

I'll think about it and try to put some code together through trial and error. Even seemingly promising theories can fail miserably in practise. Inlining, as you pointed out, plays havoc with locality. Terser often has better gzipped sizes for large bundles despite having larger non-compressed minification.

OT: I'm encountering these "skipped" macOS jobs with increasing frequency, more than half a dozen per day for the past week or so

macOS is becoming less of a server OS and more of a desktop OS from what I've observed. With every OS upgrade my machine gets slower and less reliable. Just last week it locked up hard and it took me a half hour to figure out how to reboot my machine by holding down the power button for 5 seconds. Perhaps it's common knowledge, but I never had to do that before.

@kzc
Copy link
Contributor

kzc commented Feb 19, 2022

By the way, does this PR generally improve gzip results on the benchmark bundles?

@alexlamsl
Copy link
Collaborator Author

I'll think about it and try to put some code together through trial and error. Even seemingly promising theories can fail miserably in practise.

Thanks in advance 😉

By the way, does this PR generally improve gzip results on the benchmark bundles?

Yup, that's how I perform verifications 👻

reboot my machine by holding down the power button for 5 seconds

Thanks to ACPI standards − no thanks to Apple 👿

@kzc
Copy link
Contributor

kzc commented Feb 19, 2022

My mangle idea was computationally impractical. When you have a sliding window of symbol characters it is not uniform throughout the input so just the act of choosing the next available symbol based on the local window character frequencies of each occurance of the symbol becomes a major ordeal compared to the simple base54 system. Never mind the chaotic feedback between the previously chosen symbol names which would impact the sliding window. Back to the drawing board.

@alexlamsl
Copy link
Collaborator Author

Have 🧁 for effort − your attempt might turn out to be even slower than me trying out full range of cutoff followed by gzip:

UglifyJS-mangle

x-axis: cutoff
y-axis: size in bytes
left/blue: uglified
right/red: uglified+gzipped

@kzc
Copy link
Contributor

kzc commented Feb 19, 2022

That's pretty cool.

I see you chose 36 where uglify+gzip begins to level out. Any particular reason why you didn't go with 53 which is a minimum for both uglify and uglify+gzip? Is it slower?

With any heuristic dependent on statistics you have to be aware of overfitting to specific test case(s). How did it fare against the larger input test cases in https://github.com/privatenumber/minification-benchmarks ?

@alexlamsl
Copy link
Collaborator Author

alexlamsl commented Feb 19, 2022

I see you chose 36 where uglify+gzip begins to level out. Any particular reason why you didn't go with 53 which is a minimum for both uglify and uglify+gzip? Is it slower?

Oh that is just an example − specifically antd.js which seems to suffer from a noticeable increase in gzipped size since uglify-js@3.14.5 from that benchmark site.

I've went through all ten inputs to test/benchmark.js to arrive on the current value − here's one for html-minifier:

uglify-html

Here's the one for mathjs:

uglify-math

As you can see, the previous value of cutoff=10 is an overfit to that last case, as I only paid attention to the sum total sizes of the nine inputs we had before.

@kzc
Copy link
Contributor

kzc commented Feb 19, 2022

Nice.

As I recall, math.js has hundreds of small functions so it might not be typical.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants