enhance `mangle` #5359

alexlamsl · 2022-02-19T02:41:47Z

No description provided.

kzc · 2022-02-19T03:40:39Z

I spent a half hour studying this:

Lines 659 to 673 in ef0fcfd

    
           if (to_mangle.length > cutoff) { 
        
               var indices = to_mangle.map(function(def, index) { 
        
                   return index; 
        
               }).sort(function(i, j) { 
        
                   return to_mangle[j].references.length - to_mangle[i].references.length || i - j; 
        
               }); 
        
               to_mangle = indices.slice(0, cutoff).sort(function(i, j) { 
        
                   return i - j; 
        
               }).map(function(index) { 
        
                   return to_mangle[index]; 
        
               }).concat(indices.slice(cutoff).sort(function(i, j) { 
        
                   return i - j; 
        
               }).map(function(index) { 
        
                   return to_mangle[index]; 
        
               }));

So if I'm not mistaken, within each block you're moving the variables that have the most references to the front of the mangle list, and if within the cutoff they are sorted by initial index in said list, and then the remaining variables after the cutoff are sorted by their initial indexes and then appended. The reason being that often repeated variables ought to receive the shortest mangled names. Did I get the gist of it?

I wonder if proximity between instances of the same variable and contiguous runs of uses of a given variable without intervening differing variables should also be taken into account, as it impacts the effectiveness of the sliding LZW compression window.

alexlamsl · 2022-02-19T06:00:07Z

The reason being that often repeated variables ought to receive the shortest mangled names. Did I get the gist of it?

Basically yes − it's a fight between locality vs. total byte count, which is amplified by the recent advances in inline.

With one exception (pending further investigation), the uglified size monotonically decreases with cutoff from 0 to 54, i.e. the number of single-character variable names.

I wonder if proximity between instances of the same variable and contiguous runs of uses of a given variable without intervening differing variables should also be taken into account, as it impacts the effectiveness of the sliding LZW compression window.

That's part of the consideration when brainstorming for this, with the view that restoring (declared) order being a good approximation, since we don't have direct access to information regarding "contiguous runs of uses" in mangle_names().

Any ideas as to how to retrieve said metrics? We do perform a full tree traversal in compute_char_frequency(), so we may be able to do so efficiently over there.

alexlamsl · 2022-02-19T06:02:33Z

OT: I'm encountering these "skipped" macOS jobs with increasing frequency, more than half a dozen per day for the past week or so 😕

kzc · 2022-02-19T06:27:15Z

Any ideas as to how to retrieve said metrics? We do perform a full tree traversal in compute_char_frequency(), so we may be able to do so efficiently over there.

I'll think about it and try to put some code together through trial and error. Even seemingly promising theories can fail miserably in practise. Inlining, as you pointed out, plays havoc with locality. Terser often has better gzipped sizes for large bundles despite having larger non-compressed minification.

OT: I'm encountering these "skipped" macOS jobs with increasing frequency, more than half a dozen per day for the past week or so

macOS is becoming less of a server OS and more of a desktop OS from what I've observed. With every OS upgrade my machine gets slower and less reliable. Just last week it locked up hard and it took me a half hour to figure out how to reboot my machine by holding down the power button for 5 seconds. Perhaps it's common knowledge, but I never had to do that before.

kzc · 2022-02-19T06:28:38Z

By the way, does this PR generally improve gzip results on the benchmark bundles?

alexlamsl · 2022-02-19T06:45:47Z

I'll think about it and try to put some code together through trial and error. Even seemingly promising theories can fail miserably in practise.

Thanks in advance 😉

By the way, does this PR generally improve gzip results on the benchmark bundles?

Yup, that's how I perform verifications 👻

reboot my machine by holding down the power button for 5 seconds

Thanks to ACPI standards − no thanks to Apple 👿

kzc · 2022-02-19T17:24:54Z

My mangle idea was computationally impractical. When you have a sliding window of symbol characters it is not uniform throughout the input so just the act of choosing the next available symbol based on the local window character frequencies of each occurance of the symbol becomes a major ordeal compared to the simple base54 system. Never mind the chaotic feedback between the previously chosen symbol names which would impact the sliding window. Back to the drawing board.

alexlamsl · 2022-02-19T18:24:54Z

Have 🧁 for effort − your attempt might turn out to be even slower than me trying out full range of cutoff followed by gzip:

x-axis: cutoff
y-axis: size in bytes
left/blue: uglified
right/red: uglified+gzipped

kzc · 2022-02-19T19:14:05Z

That's pretty cool.

I see you chose 36 where uglify+gzip begins to level out. Any particular reason why you didn't go with 53 which is a minimum for both uglify and uglify+gzip? Is it slower?

With any heuristic dependent on statistics you have to be aware of overfitting to specific test case(s). How did it fare against the larger input test cases in https://github.com/privatenumber/minification-benchmarks ?

alexlamsl · 2022-02-19T19:31:11Z

I see you chose 36 where uglify+gzip begins to level out. Any particular reason why you didn't go with 53 which is a minimum for both uglify and uglify+gzip? Is it slower?

Oh that is just an example − specifically antd.js which seems to suffer from a noticeable increase in gzipped size since uglify-js@3.14.5 from that benchmark site.

I've went through all ten inputs to test/benchmark.js to arrive on the current value − here's one for html-minifier:

Here's the one for mathjs:

As you can see, the previous value of cutoff=10 is an overfit to that last case, as I only paid attention to the sum total sizes of the nine inputs we had before.

kzc · 2022-02-19T19:59:33Z

Nice.

As I recall, math.js has hundreds of small functions so it might not be typical.

enhance mangle

ef0fcfd

alexlamsl merged commit a7d0616 into mishoo:master Feb 19, 2022

alexlamsl deleted the mangle-cutoff branch February 19, 2022 06:02

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

enhance `mangle` #5359

enhance `mangle` #5359

alexlamsl commented Feb 19, 2022

kzc commented Feb 19, 2022

alexlamsl commented Feb 19, 2022

alexlamsl commented Feb 19, 2022

kzc commented Feb 19, 2022

kzc commented Feb 19, 2022

alexlamsl commented Feb 19, 2022

kzc commented Feb 19, 2022

alexlamsl commented Feb 19, 2022

kzc commented Feb 19, 2022

alexlamsl commented Feb 19, 2022 •

edited

kzc commented Feb 19, 2022

enhance mangle #5359

enhance mangle #5359

Conversation

alexlamsl commented Feb 19, 2022

kzc commented Feb 19, 2022

alexlamsl commented Feb 19, 2022

alexlamsl commented Feb 19, 2022

kzc commented Feb 19, 2022

kzc commented Feb 19, 2022

alexlamsl commented Feb 19, 2022

kzc commented Feb 19, 2022

alexlamsl commented Feb 19, 2022

kzc commented Feb 19, 2022

alexlamsl commented Feb 19, 2022 • edited

kzc commented Feb 19, 2022

enhance `mangle` #5359

enhance `mangle` #5359

alexlamsl commented Feb 19, 2022 •

edited