Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Compress SourceMapWriter.Fragment data #4965

Merged
merged 1 commit into from
Mar 31, 2024
Merged

Conversation

gzm0
Copy link
Contributor

@gzm0 gzm0 commented Mar 17, 2024

We use a similar compression strategy than source maps themselves to reduce the in-memory footprint.

After linking the test suite, residual memory usage is as follows:

what main [MB] PR [MB]
sbt overall heap 842 772
backend retained 147 77
frontend retained 215 215

@gzm0
Copy link
Contributor Author

gzm0 commented Mar 17, 2024

Draft because:

  • Needs to wait for 1.16.0 release.
  • Needs performance benchmarking (touches performance sensitive code).
  • Needs to produce correct source maps :P

@sjrd
Copy link
Member

sjrd commented Mar 17, 2024

Those numbers look awesome!

  • Needs to produce correct source maps :P

🤣

@gzm0
Copy link
Contributor Author

gzm0 commented Mar 18, 2024

Needs to produce correct source maps :P

It was just an inverted condition. In fact, it produced no source maps :P

@gzm0
Copy link
Contributor Author

gzm0 commented Mar 23, 2024

Seems like this is also faster :) (attn: cropped plot, 4 outliers dropped).

EDIT: These are incremental timings (I have not measured batch):

plot
logger-timings.csv

@@ -41,9 +41,11 @@ final class BasicLinkerBackend(config: LinkerBackendImpl.Config)
private[this] var totalModules = 0
private[this] val rewrittenModules = new AtomicInteger(0)

private[this] val fragmentIndex = new SourceMapWriter.Index
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: We accept leaking old sources / names in incremental runs here (similar to how NameGen leaks as well).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we sure that this is only used (for writing) from a single thread at a time? Could you add a comment about that?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, because the Emitter is in fact single threaded. This is probably also something we should fix. I'll make Index thread-safe and see if it affects the benchmarks. If it doesn't, we should probably keep the thread safe version :).

@gzm0 gzm0 marked this pull request as ready for review March 23, 2024 15:52
@gzm0 gzm0 requested a review from sjrd March 23, 2024 15:52
Copy link
Member

@sjrd sjrd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The approach definitely looks good overall. I have a few localized comments.

if (pendingColumnInGenerated >= 0)
doWriteSegment(pendingColumnInGenerated, pendingPos, pendingName)
if (pendingColumnInGenerated >= 0) {
// Allocate a name string *before* we write a fragment so we can cache it.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand what this comments is trying to say.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Obsolete with the revert. The idea was that we convert OriginalName to String before we create a fragment so that when we write a fragment, we already have a String, so we're faster.

@@ -39,18 +40,18 @@ object SourceMapWriter {
private final class NodePosStack {
private var topIndex: Int = -1
private var posStack: Array[Position] = new Array(128)
private var nameStack: Array[String] = new Array(128)
private var nameStack: Array[OriginalName] = new Array(128)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is potentially problematic from a performance point of view. An array of some AnyVal class is an array of its boxed instances. So at the bytecode level we need to allocate instances of OriginalName to put them in this array.

The benchmarks suggest that this is not such a big deal in this case, or that it's well compensated by the other improvements. If it is the latter case, there is maybe more to gain by avoiding this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, ah. I did not realize this. I've reverted the first commit that turned this into an original name.

It probably does not show up in the performance profile because this happens pre-cache (almost exclusively). So in an incremental run, we only hit this code path for very little code.

// Source index field
if (source eq lastSource) { // highly likely
buffer(offset) = 'A' // 0 in Base64VLQ
offset += 1
} else {
val sourceIndex = sourceToIndex(source)
val sourceIndex = outIndex.sourceToIndex(source)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is probably a huge performance win to get here, if we reuse the integer index from the global Index as the key for outIndex. Basically making outIndex a pair of Int -> Int maps rather than of String -> Int maps. And because the keys of such maps would be indices of a reified data structure, we can probably make those Int -> Int maps raw Array[Int]s. IIRC, these map lookups were sill significant in the profiles after all the optimizations, so getting rid of them could have a large impact.

This does not have to be in this PR, though, of course.

@@ -41,9 +41,11 @@ final class BasicLinkerBackend(config: LinkerBackendImpl.Config)
private[this] var totalModules = 0
private[this] val rewrittenModules = new AtomicInteger(0)

private[this] val fragmentIndex = new SourceMapWriter.Index
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we sure that this is only used (for writing) from a single thread at a time? Could you add a comment about that?

var nameIndex: Int = 0

while (buf.hasRemaining()) {
(buf.get(): @unchecked) match {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@unchecked is not necessary for Int matches, is it?

@switch would be relevant, though.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed. Fixed.

Comment on lines 306 to 309
if ((value & ~127) != 0)
buffer(offset) = ((value & 127) | 128).toByte
else
buffer(offset) = (value & 127).toByte
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider using hexadecimal notation when we use numbers for their bit patterns:

Suggested change
if ((value & ~127) != 0)
buffer(offset) = ((value & 127) | 128).toByte
else
buffer(offset) = (value & 127).toByte
if ((value & ~0x7f) != 0)
buffer(offset) = ((value & 0x7f) | 0x80).toByte
else
buffer(offset) = (value & 0x7f).toByte

value >>>= 1

if (!neg) value
else if (value == 0) Int.MinValue
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given what the numbers we read represent, Int.MinValue is not a realistic input for writeRawVLQ. Consider getting rid of this case to avoid one branch. (It might even remove two branches at the assembly level since it will basically become a SELECT.)

We use a similar compression strategy than source maps themselves to
reduce the in-memory footprint.

After linking the test suite, residual memory usage is as follows:

| what              | main [MB] | PR [MB] |
|-------------------|----------:|--------:|
| sbt overall heap  |       842 |     772 |
| backend retained  |       147 |      77 |
| frontend retained |       215 |     215 |
@gzm0
Copy link
Contributor Author

gzm0 commented Mar 31, 2024

Huh, the updated measurements look suspiciously like the baseline. (I'm willing to attribute the spread to the locking introduced). I'll benchmark just the original first commit (changing String to OriginalName), maybe it makes things faster?

plot
logger-timings.csv

@gzm0
Copy link
Contributor Author

gzm0 commented Mar 31, 2024

Ok, this is a bit annoying: It seems my benchmarking setup is insufficiently consistent: The spread I observe seems to be caused by the environment.

However, I think for this PR it does not matter: We have sufficient evidence that it does not make matters worse, and the main goal is to reduce memory consumption, which it does.

plot

legend:

pr original PR (old)
main original baseline (old)
pr-v2 this PR
orig-name just the first commit of the original PR
pr-v2-not-thread-safe this PR with reverted thread safety of Index
main-v2 baseline (again)

logger-timings.csv

@gzm0 gzm0 requested a review from sjrd March 31, 2024 12:23
@sjrd sjrd merged commit 3dfcb9a into scala-js:main Mar 31, 2024
3 checks passed
@gzm0 gzm0 deleted the compress-sm-frags branch March 31, 2024 13:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants