Server-rendered HTML minification #1432

tigt · 2019-10-19T02:16:22Z

Description

Marko’s compilation to server-side templates could perform provably-safe HTML minification during compilation. (Runtime minification is probably too expensive to be worth it.)

Why

Perf matters! Running Marko’s current home page through kangax’s minifier saves 2,551 bytes. Not too bad after compression, but it’s still overhead, and fewer characters speed up parsing.

I used the relative URLs option, kept comments because I suspected Marko uses them for rehydration, and turned off the Minify CSS/JS options.

Existing HTML minifiers can be plugged into a Marko app’s render pipeline, but they require buffering the stream, defeating any performance improvements.

Possible Implementation & Open Questions

In development mode, probably don’t bother running this compilation step. But in production…

Normalize output for better compression
- Normalize known tag/attribute name casing (might do this already?)
- Sort attributes (probably alphabetically, but frequency would be even better)
- Sort properties in style
- Sort tokens in class, rel, and other TokenLists/similar
Collapse whitespace (Marko largely already does this, but it’s worth reexamining to ensure its behavior is as unsurprising and efficient as it could be)
Collapse all whitespace characters to a single space except when <pre>, marko-preserve-whitespace
Omit the closing slash on known void elements
Optional tags
- Omit optional opening tags without attributes like <head> or <body>
- Omit optional closing tags where safe — </body></html>, consecutive <p>s, </option> since </select> autocloses it
Omit attributes with default values :method="get", <input type="text">, <script type="text/javascript">, etc. — this would also make rendering client-side slightly more efficient
Attribute values
- ~~Truthy booleans should omit =""~~ Misc perf improvements #1535 ✅
- ~~Literals not containing whitespace should omit quotes~~ perf: minify runtime comments, remove unnecessary attr quotes #1557 ✅
- ~~Literals containing " or '' should use the other delimiter for the value~~ perf: minify runtime comments, remove unnecessary attr quotes #1557 ✅
- ~~If the attribute value is known not to stringify with forbidden characters (Number, for example), it should omit quotes~~ perf: minify runtime comments, remove unnecessary attr quotes #1557 ✅
- ~~Everything else should fall back to wrapping double quotes~~ perf: minify runtime comments, remove unnecessary attr quotes #1557 ✅
Reencode character references
- Non-HTML syntax characters like → should be dereferenced
- Unnecessary encoding should be dereferenced
  - Unambiguous ampersands
  - Quotes and > outside opening tag contexts
  - = outside attribute values
- Some character references could be reencoded to be shorter:
  - < → < (pretty sure only < and > qualify)
  - ' → ', since hex numbers are only more compact for characters we’re already decoding
  - " → "
Turn same-origin attribute values with URLs (href, src, srcset) into relative URLs
Encode indices/offsets inside Marko :scoped identifiers and  boundary comments with something more space-efficient than decimal, like number.toString(36)
Use self-closing syntax for foreign elements (SVG & MathML, mostly), instead of the current approach: <path></path> → <path />

Is this something you're interested in working on?

Yes

The text was updated successfully, but these errors were encountered:

tigt · 2019-10-25T00:04:01Z

[moved into own issue]

tigt · 2020-05-13T21:05:01Z

Some updated thoughts on my proposals above:

Normalize output for better compression

Normalize known tag/attribute name casing (might do this already?)

Sort attributes (probably alphabetically, but frequency would be even better)

Sort properties in style

Sort tokens in class, rel, and other TokenLists/similar

While DOMTokenLists produced from attributes like class and rel are order-agnostic, author CSS and JS may rely on specific order (mostly attribute selectors like [class^=…]). The compression gains from ordering these tokens are unlikely to be significant enough for that risk.

Omit optional closing tags where safe — </body></html>, consecutive <p>s, </option> since </select> autocloses it

Omit attributes with default values: method="get", <input type="text">, <script type="text/javascript">, etc. — this would also make rendering client-side slightly more efficient

After some more investigation, these two seem like they have the best reduction:effort ratio. Lotta </option>s in most <select>s, and HTML minification doubling as less JS is always nice.

Turn same-origin attribute values with URLs (href, src, srcset) into relative URLs

This one seems like the most difficult, but it also can pay off the most. I’m undecided.

alexnewmannn · 2020-08-04T10:11:09Z

@tigt Hey, just been looking through this issue and a few MR's around the unquoted attributes. We upgraded to latest and tracked our issues to the unquoted attributes. I don't feel like this requires an issue, but let me know if you prefer it in that format. Is there a way to remove this behaviour or toggle it on/off? We use a HTML to PDF converter, where our HTML is generated by Marko, and our PDF converter that we are using doesn't parse anything with unquoted attributes due to the type of spec it adheres to so right now we are stuck at 4.20.2 for the foreseeable.

tigt · 2020-08-04T16:31:10Z

@alexnewmannn I don’t believe as it stands this feature has a config toggle. (I didn’t implement it, just yammered about it in this issue.)

For a fix that would solve your problem right now, you could round-trip the HTML Marko produces into and then out of a HTML parsing library, then pass that to your PDF converter. To do that with the parse5 package would look something like:

import { parse, serialize } from 'parse5'

const minifiedHtml = markoTemplate.render(…) // or however you do it

const normalizedHtml = serialize(parse(minifiedHtml))

tigt · 2020-11-01T20:45:00Z

Removed the stuff about smaller internal ID encodings since any element index between 30 and ~50 would encode as forbidden Latin-1 Supplement control characters. (Aw beans.)

tigt · 2021-03-07T01:59:09Z

Ran some of the remaining suggestions against the eBay home page:

Minification	Bytes	𝚫	𝚫%
(none)	388,024	—	—
Omit default attributes (also shrinks JS)	387,661	−363	−0.09%
Decode HTML entities	387,600	−424	−0.11%
Omit optional tags (also shrinks JS, DOM size via less text nodes)	386,250	−1,774	−0.46%

DylanPiercey mentioned this issue Apr 21, 2020

perf: minify runtime comments, remove unnecessary attr quotes #1557

Merged

3 tasks

tigt mentioned this issue Sep 8, 2020

Attributes are missing the quotes from version 4.21.0 #1602

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Server-rendered HTML minification #1432

Server-rendered HTML minification #1432

tigt commented Oct 19, 2019 •

edited

tigt commented Oct 25, 2019 •

edited

tigt commented May 13, 2020 •

edited

alexnewmannn commented Aug 4, 2020

tigt commented Aug 4, 2020

tigt commented Nov 1, 2020

tigt commented Mar 7, 2021

Server-rendered HTML minification #1432

Server-rendered HTML minification #1432

Comments

tigt commented Oct 19, 2019 • edited

Description

Why

Possible Implementation & Open Questions

Is this something you're interested in working on?

tigt commented Oct 25, 2019 • edited

tigt commented May 13, 2020 • edited

alexnewmannn commented Aug 4, 2020

tigt commented Aug 4, 2020

tigt commented Nov 1, 2020

tigt commented Mar 7, 2021

tigt commented Oct 19, 2019 •

edited

tigt commented Oct 25, 2019 •

edited

tigt commented May 13, 2020 •

edited