Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix #4482: Minify property names ourselves in fullLink when we don't use GCC. #4930

Closed
wants to merge 5 commits into from

Conversation

sjrd
Copy link
Member

@sjrd sjrd commented Jan 24, 2024

When emitting ES modules, we cannot use Closure because it does not support ES modules the way we need it to. This results in files that are much larger than with other module kinds.

Off-the-shelf JavaScript bundlers/minifier can compensate for that to a large extent for local and file-local variables, but they do not have enough semantic information to do it on property names.

We therefore add our own property name compressor. When enabled, the emitter computes the frequency of every field and method name in the entire program. It then uses those frequencies to allocate short names to them, with the shortest ones allocated to the most used properties.

Obviously, this breaks any sort of incremental behavior, so we also invalidate all the emitter caches on every run when the new minifier is enabled. This should not be a problem as it is only intended to be used for fullLink.

Since we have to walk the entire codebase to compute property frequencies, we take the opportunity to also compute the set of dangerous global refs. This way, when we minify, we can avoid the optimistic first attempt, and always guarantee that a single attempt avoids all the referenced global refs.

We automatically enable the new minifier under fullLink when GCC is disabled. This can be overridden with a scalaJSLinkerConfig setting.


Overall, when used in the context of a Scala.js + Vite application, these change bring a 50% size to the npm run build output of Vite. So the size of the output is half that of before.

To be clear, this is comparing

  • fullLink without GCC without Minify but with Vite's minification (through rollup), versus
  • fullLink without GCC with Minify and also Vite's minification.

When comparing on our test suite without Vite's minification, the reduction is "only" of 20%. But this is not really the important measure; we definitely expect this feature to be used in conjunction with a JS-only minifier.

@ekrich
Copy link
Contributor

ekrich commented Jan 25, 2024

I was just wondering if you expect the size without GCC (SJS minimization) to be similar to GCC's?

@sjrd
Copy link
Member Author

sjrd commented Jan 25, 2024

Not if you only apply the changes here. But if you combine that with any off-the-shelf JS minifier, such as rollup which is automatically part of vite, then I'm hoping we can get close to what GCC offers. If yes, over time we could phase out GCC entirely in favor of this new solution.

@sjrd sjrd changed the title WiP Minify property names ourselves in fullLink when we don't use GCC. Fix #4482: Minify property names ourselves in fullLink when we don't use GCC. Jan 25, 2024
@sjrd sjrd linked an issue Jan 25, 2024 that may be closed by this pull request
@sjrd sjrd marked this pull request as ready for review January 25, 2024 16:40
@sjrd sjrd requested a review from gzm0 January 25, 2024 16:40
@sjrd
Copy link
Member Author

sjrd commented Jan 25, 2024

This is now ready for review. There are further things we could do (for example, "ambiguating" different property names that are used in a disjoint set of classes to have the same name) but this covers the essentials. Indeed, we do not emit any property name that is not compressed anymore, except:

  • $classData, because we use it to identify Scala.js objects (maybe not the best idea, but I'm not touching that now),
  • public properties of $TypeData that are read by the code of java.lang.Class, and
  • the fields of $linkingInfo, which are publicly specified as well.

@sjrd sjrd force-pushed the own-short-names-emitter branch 2 times, most recently from 447dbd2 to 0892dcf Compare January 26, 2024 11:05
Copy link
Contributor

@gzm0 gzm0 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Off-the-shelf JavaScript bundlers/minifier can compensate for that to a large extent for local and file-local variables, but they do not have enough semantic information to do it on property names.

Just to check my understanding: We do not need to be concerned that aggressive property renaming leads to less opportunity in renaming local identifiers since their scopes are completely separate, right?

Higher level comments:

  • The first commit might be split off into a separate PR (IMHO valuable on it's own)
  • Note the comment regarding using a full pass to collect names, rather than re-building name collection.

@@ -42,13 +42,17 @@ final class Emitter(config: Emitter.Config) {

private val uncachedKnowledge = new knowledgeGuardian.KnowledgeAccessor {}

private val nameCompressor: Option[NameCompressor] =
if (minify) Some(new NameCompressor(config))
else None
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this be part of the state as well?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In theory, it probably should. But to create the state we need the lastMentionedDangerousGlobalRefs, and to compute those, we need the nameCompressor. So that would be tricky to manipulate. It seems easier to manage if it is outside.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right. So let's table that discussion until we're settled on how to collect the names we want to minify.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the compressor isn't incremental, could we model it's result as a return value (whether or not it's the same object is irrelevant for the interface).

Then the beginning of emit could look like this:

if (minify) {
  val (compressedNames, dangerousGlobalRefs) = NameCompressor.compress(moduleSet, logger)
  state = new State(dangerousGlobalRefs, Some(compressedNames))
} else if (state == null) {
  state = new State(dangerousGlobalRefs = Set.empty, compressedNames = None)
}

This would also ensure that the state = null after emission releases a maximum amount of memory.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Going along with this: The duplication of the logic in the NameCompressor and the rest of the Emitter still makes me quite uneasy (even with the usage validation, IMHO it is not very easy to verify that the counting logic is correct).

If we use a return type, we could return different implementations that track or do not track the exact count.

I realize you were concern about not being able to approximate, but IIUC the only place where we do this right now is for Apply to hijacked classes. But this seems to be easy to do correctly:

  • isMaybeHijackedClass is already factored in a helper.
  • We can use the uncachedGlobalKnowledge to determine this in the NameCompressor.
  • Updating the global knowledge needs to be moved to emit from emitInternal but that does not seem to be an issue.

@sjrd
Copy link
Member Author

sjrd commented Jan 28, 2024

Off-the-shelf JavaScript bundlers/minifier can compensate for that to a large extent for local and file-local variables, but they do not have enough semantic information to do it on property names.

Just to check my understanding: We do not need to be concerned that aggressive property renaming leads to less opportunity in renaming local identifiers since their scopes are completely separate, right?

Indeed. These are completely separate namespaces. And indeed I see Vite/rollup generate single-letter local variable names even after this transformation, which shows that it knows. ;)

@sjrd
Copy link
Member Author

sjrd commented Jan 28, 2024

Addressed the easy comments so far. I need to think more about the one-pass/two-pass thing.

@@ -42,13 +42,17 @@ final class Emitter(config: Emitter.Config) {

private val uncachedKnowledge = new knowledgeGuardian.KnowledgeAccessor {}

private val nameCompressor: Option[NameCompressor] =
if (minify) Some(new NameCompressor(config))
else None
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right. So let's table that discussion until we're settled on how to collect the names we want to minify.

@sjrd sjrd force-pushed the own-short-names-emitter branch 2 times, most recently from 7b0362c to ca33790 Compare January 29, 2024 17:17
@sjrd
Copy link
Member Author

sjrd commented Jan 29, 2024

I rebased on top of everything, and added a commit that checks that every allocated name is used at least one.

@sjrd sjrd requested a review from gzm0 January 29, 2024 17:21
if (semantics.strictFloats) genCallPolyfillableBuiltin(FroundBuiltin, expr)
else wg(UnaryOp(irt.JSUnaryOp.+, expr))
if (semantics.strictFloats)
genCallPolyfillableBuiltin(FroundBuiltin, List(expr), keepOnlyTrackedGlobalRefs = false)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at this: I'm wondering if we should make this an implicit parameter.

Basically the rule we want is:

  • When called from ClassEmitter, always set to true.
  • When called from FunctionEmitter, always set to false.
  • Otherwise, forward.

This call site violates the forward rule, I assume simply to avoid syntactic bloat. An implicit could solve this and would also make the call sites in ClassEmitter and FunctionEmitter cleaner.

@@ -71,7 +71,7 @@ class LibrarySizeTest {

testLinkedSizes(
expectedFastLinkSize = 150063,
expectedFullLinkSizeWithoutClosure = 130664,
expectedFullLinkSizeWithoutClosure = 95734,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should also have checksizes for the minifier. Otherwise I think just with this, our test coverage is a bit low.

@@ -42,13 +42,17 @@ final class Emitter(config: Emitter.Config) {

private val uncachedKnowledge = new knowledgeGuardian.KnowledgeAccessor {}

private val nameCompressor: Option[NameCompressor] =
if (minify) Some(new NameCompressor(config))
else None
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the compressor isn't incremental, could we model it's result as a return value (whether or not it's the same object is irrelevant for the interface).

Then the beginning of emit could look like this:

if (minify) {
  val (compressedNames, dangerousGlobalRefs) = NameCompressor.compress(moduleSet, logger)
  state = new State(dangerousGlobalRefs, Some(compressedNames))
} else if (state == null) {
  state = new State(dangerousGlobalRefs = Set.empty, compressedNames = None)
}

This would also ensure that the state = null after emission releases a maximum amount of memory.

val generator = new NameGenerator(namesToAvoid)

for (entry <- orderedEntries)
entry.allocatedName = generator.nextString()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have you compared this (performance-wise) with an approach where we create one or two new maps just mapping to the generated name?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't compared performance. Now that we track usages, though, we need more mutability after NameCompressor.compress is done anyway, so I don't think that would make anything simpler at this point.

}

/** Tie break for stability. */
private def tieBreak(x: PropertyNameEntry, y: PropertyNameEntry): Int = (x, y) match {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have you considered making this the natural order of PropertyNameEntry? Then, allocatePropertyNames could look like this:

val comparator: Comparator[PropertyNameEntry] = {
  Comparator.comparingInt(_.occurrences).reversed()
    .andThen(Comparator.naturalOrder()))
}

java.util.Arrays.sort(orderedEntries, comparator)

IMO this improves readability, since it keeps the (relevant) ordering condition more local. (FWIW, it might have a performance impact, IDK).

}

def traverseModuleSet(moduleSet: ModuleSet): Unit = {
interfaceClassNames = (for {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rather than having a var for this, make a factory/helper method? This will make it more explicit, that it is only created once and never modified again.

case JSMethodDef(_, StringLiteral(name), _, _, _) =>
propertyNamesToAvoid += name
case JSPropertyDef(_, StringLiteral(name), _, _) =>
propertyNamesToAvoid += name
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Drop a comment about how collisions on computed names are accepted (in JS minifiers in general)?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Expanded the comment on the declaration of propertyNamesToAvoid.

@@ -42,13 +42,17 @@ final class Emitter(config: Emitter.Config) {

private val uncachedKnowledge = new knowledgeGuardian.KnowledgeAccessor {}

private val nameCompressor: Option[NameCompressor] =
if (minify) Some(new NameCompressor(config))
else None
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Going along with this: The duplication of the logic in the NameCompressor and the rest of the Emitter still makes me quite uneasy (even with the usage validation, IMHO it is not very easy to verify that the counting logic is correct).

If we use a return type, we could return different implementations that track or do not track the exact count.

I realize you were concern about not being able to approximate, but IIUC the only place where we do this right now is for Apply to hijacked classes. But this seems to be easy to do correctly:

  • isMaybeHijackedClass is already factored in a helper.
  • We can use the uncachedGlobalKnowledge to determine this in the NameCompressor.
  • Updating the global knowledge needs to be moved to emit from emitInternal but that does not seem to be an issue.


for (expected <- expectedSequenceStart)
assertEquals(expected, generator.nextString())
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is probably OK (given the amount of boilerplate otherwise), but I recommend this read:
https://testing.googleblog.com/2014/07/testing-on-toilet-dont-put-logic-in.html

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I generally agree with not to put logic in tests, but here it was not reasonably possible.

This is compensated by the fact that the logic in the test should be as declarative as it gets, whereas the logic in the implementation is as imperative as it gets. The algorithms used are very different, so if they agree on the results, it's a good indicator that they're correct.

private sealed abstract class BaseEntry {
private var _occurrences: Int = 0
private var allocatedName: String = null
private var actuallyUsed: Boolean = false
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we decide to keep the "usage only" check with this high amount of mutability, we might want to re-use _occurrences to store usage (set it to 0 when used).

@sjrd
Copy link
Member Author

sjrd commented Feb 14, 2024

Still to do:

  • add checksizes with the minifier
  • verify exact number of occurrences

@sjrd sjrd force-pushed the own-short-names-emitter branch 2 times, most recently from c027782 to 6a00a87 Compare February 14, 2024 16:06
@sjrd sjrd force-pushed the own-short-names-emitter branch 2 times, most recently from 2dc3af7 to ef4ce58 Compare February 15, 2024 12:57
@sjrd sjrd force-pushed the own-short-names-emitter branch 3 times, most recently from 9909a78 to b7341d8 Compare February 17, 2024 16:48
It had been factored out in `SJSGen` because of one condition that
happens to be repeated. However, it is clearer that it does the
right thing at each of its call sites. Given the amount of
information that needed to be passed to this helper, inlining it
twice actually removes additional checks. Ultimately, it seems
simpler this way.
When emitting ES modules, we cannot use Closure because it does not
support ES modules the way we need it to. This results in files
that are much larger than with other module kinds.

Off-the-shelf JavaScript bundlers/minifier can compensate for that
to a large extent for local and file-local variables, but they do
not have enough semantic information to do it on property names.

We therefore add our own property name compressor. When enabled,
the emitter computes the frequency of every field and method name
in the entire program. It then uses those frequencies to allocate
short names to them, with the shortest ones allocated to the most
used properties.

Obviously, this breaks any sort of incremental behavior, so we
also invalidate all the emitter caches on every run when the new
minifier is enabled. This should not be a problem as it is only
intended to be used for fullLink.

Since we have to walk the entire codebase to compute property
frequencies, we take the opportunity to also compute the set of
dangerous global refs. This way, when we minify, we can avoid
the optimistic first attempt, and always guarantee that a single
attempt avoids all the referenced global refs.

We automatically enable the new minifier under fullLink when
GCC is disabled. This can be overridden with a `scalaJSLinkerConfig`
setting.
@sjrd
Copy link
Member Author

sjrd commented Feb 26, 2024

Superseded by #4945.

@sjrd sjrd closed this Feb 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

FullLink Scala.js-specific minifier
3 participants