Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

new-audit(unminified-javascript): detect savings from minifcation #3950

Merged
merged 4 commits into from Dec 20, 2017

Conversation

patrickhulce
Copy link
Collaborator

@patrickhulce patrickhulce commented Nov 29, 2017

addresses js part of #3459 using the 3rd approach outlined

strategy: estimate minification savings by determining the ratio of the length of js tokens to overall string length

this was surprisingly accurate at identifying if a script was already minified, but is only ~6 lines using esprima and takes ~30ms/MB to tokenize compared to ~2000ms/MB for uglify and even longer for babel-minify

below is a table outlining the observed savings, w/gzip denotes the % savings after accounting for gzip, which is usually lower because minification tends to remove things that compress well

library uglify uglify w/gzip babel-minify babel-minify w/gzip our estimate old calculation
angular 86.1% 80.3% 85.8% 80.3% 74.5% (74.5 + 88.8 ) / 2
caltrainschedule.io -- -- 50% 50% 31.3% (31.3 + 63.1) / 2
react 71.4% 50% 71.4% 50% 49.4% (49.4% + 75.2%) / 2
lodash 86.3% 75% 86.3% 70.8% 73.3% (73.3% + 87.9%) / 2
jquery 66.6% 60% 66.6% 60% 49.1% (49.1% + 74.4%) / 2
modernizr 76.9% 50% 76.9% 50% 67.9% (67.9% + 82.1%) / 2

identifying the already minified scripts is as easy as checking if the savings is low as no production minified script had more than 5% savings

if this all looks good, I'll go ahead and add tests, remove WIP 馃憤

relevant code section for estimating minification is

let tokenLength = 0;
let tokenLengthWithMangling = 0;
const tokens = esprima.tokenize(scriptContent);
for (const token of tokens) {
tokenLength += token.value.length;
// assume all identifiers could be reduced to a single character
tokenLengthWithMangling += token.type === 'Identifier' ? 1 : token.value.length;
}

@patrickhulce
Copy link
Collaborator Author

patrickhulce commented Nov 29, 2017

results from cnn.com flag some unminified js from turner cdn, and possibly a bug of theirs since the files even have "ugly" in the name 馃槅

image

@addyosmani
Copy link
Member

I was surprised how close the margin of error was using your simplified lexer for minified savings here. Nice work, @patrickhulce!

tokenLengthWithMangling += token.type === 'Identifier' ? 1 : token.value.length;
}

if (1 - tokenLength / contentLength < IGNORE_THRESHOLD_IN_PERCENT) return null;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's add a comment to indicate this is for handling pre-minified code. \o/

@patrickhulce patrickhulce changed the title WIP: new-audit: add unminified javascript audit new-audit: add unminified javascript audit Dec 7, 2017
@patrickhulce patrickhulce changed the title new-audit: add unminified javascript audit new-audit(unminified-javascript): detect savings from minifying javascript Dec 7, 2017
@patrickhulce patrickhulce changed the title new-audit(unminified-javascript): detect savings from minifying javascript new-audit(unminified-javascript): detect savings from minifcation Dec 8, 2017
@patrickhulce
Copy link
Collaborator Author

aw, it finally happened :(

image

image

Copy link
Member

@paulirish paulirish left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice. The change in fd547cd was the big tweak I think this needed, making it more conservative and reducing our chance of false positives.

Naturally this does increase our bundle size? bundlesize isn't describing how much though.. do you know what the delta is? (Just want to have a record of it)

};
}

/**
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add a comment here explaining the basic approach? using the description from this PR works for me.

let's also call out that inline scripts are not evaluated. (i dont think it matters in practice, but since other audits of ours include inline scripts we should just be explicit)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah sounds good 馃憤

done

.catch(_ => null)
.then(content => {
if (!content) return;
scriptContentMap.set(record.url, content);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since we're definitely dealing with networkRecords i'd rather be using requestIds here as the key.

over in the audit we could could then use WebInspector.NetworkLog.requestForId() to grab the request and pull the URL off that. wdyt?

i suppose this'll make testing slightly harder, so curious what you think.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah that's fair, if same URL was requested multiple times with different content we should surface that 馃憤

@@ -8,9 +8,19 @@
const ByteEfficiencyAudit = require('./byte-efficiency-audit');
const esprima = require('esprima');

const IGNORE_THRESHOLD_IN_PERCENT = .1;
const IGNORE_THRESHOLD_IN_PERCENT = 10;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lol looks like past audits use a mix: image

would be nice to have types for this and milliseconds v seconds. damn.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lol, yeah I ended up switching because the wastedPercent value is x/100, it's the wastedRatio thats x/1

@paulirish paulirish merged commit 017c9c1 into master Dec 20, 2017
@paulirish paulirish deleted the js_minified branch December 20, 2017 02:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants