Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tweak `Span` encoding. #58458

Merged
merged 1 commit into from Apr 3, 2019

Conversation

Projects
None yet
9 participants
@nnethercote
Copy link
Contributor

commented Feb 14, 2019

Failing to fit base is more common than failing to fit len.

@rust-highfive

This comment has been minimized.

Copy link
Collaborator

commented Feb 14, 2019

r? @nikomatsakis

(rust_highfive has picked a reviewer for you, use r? to override)

@nnethercote

This comment has been minimized.

Copy link
Contributor Author

commented Feb 14, 2019

Local measurements indicate that this is a 3-5% instruction win for style-servo and didn't make much difference for anything else. I'll double check if CI agrees with those results.

@bors try

@bors

This comment has been minimized.

Copy link
Contributor

commented Feb 14, 2019

⌛️ Trying commit b67352a with merge 46701e6...

bors added a commit that referenced this pull request Feb 14, 2019

Auto merge of #58458 - nnethercote:tweak-Span-encoding, r=<try>
Tweak `Span` encoding.

Failing to fit `base` is more common than failing to fit `len`.
@petrochenkov

This comment has been minimized.

Copy link
Contributor

commented Feb 14, 2019

cc #44646
(The PR that introduced the current encoding, it contains some further references and benchmarks.)

@bors

This comment has been minimized.

Copy link
Contributor

commented Feb 14, 2019

☀️ Test successful - checks-travis
State: approved= try=True

@nnethercote

This comment has been minimized.

Copy link
Contributor Author

commented Feb 14, 2019

@rust-timer

This comment has been minimized.

Copy link

commented Feb 14, 2019

Success: Queued 46701e6 with parent c67d474, comparison URL.

@rust-timer

This comment has been minimized.

Copy link

commented Feb 14, 2019

Finished benchmarking try commit 46701e6

@nnethercote

This comment has been minimized.

Copy link
Contributor Author

commented Feb 15, 2019

The good results for style-servo were replicated:

style-servo-check
        avg: -3.9%      min: -5.7%      max: -2.8%
style-servo-opt
        avg: -1.3%      min: -5.4%      max: -0.2%
style-servo-debug
        avg: -2.0%      min: -4.7%      max: -0.9%

Other changes were minor and hard to distinguish from noise.

@nnethercote

This comment has been minimized.

Copy link
Contributor Author

commented Feb 15, 2019

@petrochenkov

This comment has been minimized.

Copy link
Contributor

commented Feb 15, 2019

This is certainly a tradeoff.
Giving one bit from len to base benefits large crates at the expense of small crates.
Apparently style-servo is huge so it gets the benefits.

@petrochenkov

This comment has been minimized.

Copy link
Contributor

commented Feb 15, 2019

To put things into perspective.

24 bit base is enough to cover about 15MB of code.
I suspect that this includes all the dependent crates used in one compilation session with the primary crate, since spans from other crates can also be reported.
So, 25 bit base now covers about 31MB of code.
(libstd + libcore is ~5.8MB, for example.)

7 bit len is 127 bytes of code

XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

this is a small function of a modest if condition if a larger function.
Most of items probably don't fit, but most of expressions and patterns do.

6 bit len is 63 bytes of code

XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

this is a non-block expression or a pattern, items and larger expressions probably don't fit.
Most notable, identifier spans still most certainly fit.

@petrochenkov

This comment has been minimized.

Copy link
Contributor

commented Feb 15, 2019

I'd be really interested to look at distribution of bases/lengths/ctxts during both span decoding and encoding since they should be quite different.
We looked only at encoding when introducing the span compression, since this is what determines memory consumption, but not at decoding that's probably more common and determines instruction counts.

@petrochenkov

This comment has been minimized.

Copy link
Contributor

commented Feb 15, 2019

@nnethercote
Could you collect the base/len/ctxt distribution on some code you think is interesting, e.g. style-servo?
This should be as simple as dumping the input numbers into a file in span_encoding::encode and span_encoding::decode.

(The most convenient way to do it is static mut *mut libc::FILE and libc::printf, sigh.)

@nnethercote

This comment has been minimized.

Copy link
Contributor Author

commented Feb 15, 2019

Could you collect the base/len/ctxt distribution on some code you think is interesting, e.g. style-servo?
This should be as simple as dumping the input numbers into a file in span_encoding::encode and span_encoding::decode.

I did that (with eprintln! and counts) before making this change. The machine holding the numbers is turned off for the weekend, but I found that too-long bases were more common than too-long lengths, hence the change. Lengths up to 63 are quite common and tend to drop off a lot after that; reducing the length to 5 bits would make things quite a bit worse. I can paste full numbers on Monday if you like.

And you are right that this trade-off favours large crates. I view this as a good thing... compile times of small crates are already low :) Meanwhile large crates, which are slow, pay an extra price for their size; this change mitigates that a little.

It's a shame that script-servo is broken at the moment, I suspect it would benefit too, because it's even larger than style-servo.

@petrochenkov

This comment has been minimized.

Copy link
Contributor

commented Feb 15, 2019

I can paste full numbers on Monday if you like.

Yes, please.
Looking at the decode statistics will make me more comfortable at approving this change.

@Centril

This comment has been minimized.

Copy link
Contributor

commented Feb 23, 2019

Ping from triage @nnethercote :)

@Dylan-DPC

This comment has been minimized.

Copy link
Member

commented Mar 11, 2019

ping from triage @nnethercote Unfortunately we haven't heard from you on this in a while, so I'm closing the PR to keep things tidy. Don't worry though, if you'll have time again in the future please reopen this PR, we'll be happy to review it again!

@nnethercote

This comment has been minimized.

Copy link
Contributor Author

commented Apr 2, 2019

helloworld results are less good:

old
20600 counts:
(  1)    18555 (90.1%, 90.1%): fast encode
(  2)     1416 ( 6.9%, 96.9%): fast decode
(  3)      324 ( 1.6%, 98.5%): slow decode
(  4)      303 ( 1.5%,100.0%): slow encode

new
20600 counts:
(  1)    14695 (71.3%, 71.3%): fast encode
(  2)     4163 (20.2%, 91.5%): slow encode
(  3)     1414 ( 6.9%, 98.4%): fast decode
(  4)      326 ( 1.6%,100.0%): slow decode

Fast paths drop from 96.9% to 78.2%.

Tweak `Span` encoding.
Failing to fit `base` is more common than failing to fit `len`.

@nnethercote nnethercote force-pushed the nnethercote:tweak-Span-encoding branch from b67352a to ff94fea Apr 2, 2019

@nnethercote

This comment has been minimized.

Copy link
Contributor Author

commented Apr 2, 2019

@bors try

@bors

This comment has been minimized.

Copy link
Contributor

commented Apr 2, 2019

⌛️ Trying commit ff94fea with merge b56778a...

bors added a commit that referenced this pull request Apr 2, 2019

Auto merge of #58458 - nnethercote:tweak-Span-encoding, r=<try>
Tweak `Span` encoding.

Failing to fit `base` is more common than failing to fit `len`.
@bors

This comment has been minimized.

Copy link
Contributor

commented Apr 3, 2019

☀️ Try build successful - checks-travis
Build commit: b56778a

@nnethercote

This comment has been minimized.

Copy link
Contributor Author

commented Apr 3, 2019

@rust-timer

This comment has been minimized.

Copy link

commented Apr 3, 2019

Success: Queued b56778a with parent 428943c, comparison URL.

@nnethercote

This comment has been minimized.

Copy link
Contributor Author

commented Apr 3, 2019

Here are some len distributions (both encoding and decoding) for spans with a zero SyntaxContext. The number after each "L" indicates how many bits were required to store each length.

helloworld
20214 counts:
(  1)     4205 (20.8%, 20.8%): L6
(  2)     3862 (19.1%, 39.9%): L7
(  3)     3004 (14.9%, 54.8%): L2
(  4)     2385 (11.8%, 66.6%): L3
(  5)     1995 ( 9.9%, 76.4%): L5
(  6)     1817 ( 9.0%, 85.4%): L1
(  7)     1477 ( 7.3%, 92.7%): L0
(  8)     1226 ( 6.1%, 98.8%): L4
(  9)       88 ( 0.4%, 99.2%): L9
( 10)       84 ( 0.4%, 99.6%): L8
( 11)       42 ( 0.2%, 99.9%): L10
( 12)       16 ( 0.1%, 99.9%): L11
( 13)        9 ( 0.0%,100.0%): L12
( 14)        3 ( 0.0%,100.0%): L14
( 15)        1 ( 0.0%,100.0%): L13

ripgrep
4513315 counts:
(  1)  1331930 (29.5%, 29.5%): L3
(  2)   745711 (16.5%, 46.0%): L4
(  3)   669557 (14.8%, 60.9%): L0
(  4)   594187 (13.2%, 74.0%): L1
(  5)   488818 (10.8%, 84.9%): L2
(  6)   315629 ( 7.0%, 91.9%): L5
(  7)   202941 ( 4.5%, 96.4%): L6
(  8)   143324 ( 3.2%, 99.5%): L7
(  9)    10364 ( 0.2%, 99.8%): L8
( 10)     5862 ( 0.1%, 99.9%): L9
( 11)     2840 ( 0.1%,100.0%): L10
( 12)     1289 ( 0.0%,100.0%): L11
( 13)      540 ( 0.0%,100.0%): L12
( 14)      150 ( 0.0%,100.0%): L13
( 15)       96 ( 0.0%,100.0%): L14
( 16)       37 ( 0.0%,100.0%): L15
( 17)       18 ( 0.0%,100.0%): L17
( 18)       14 ( 0.0%,100.0%): L24
( 19)        8 ( 0.0%,100.0%): L16

style-servo
52961408 counts:
(  1) 13960255 (26.4%, 26.4%): L3
(  2)  9857544 (18.6%, 45.0%): L4
(  3)  8476040 (16.0%, 61.0%): L5
(  4)  8100731 (15.3%, 76.3%): L0
(  5)  6131785 (11.6%, 87.8%): L1
(  6)  3616192 ( 6.8%, 94.7%): L2
(  7)  1993894 ( 3.8%, 98.4%): L6
(  8)   676393 ( 1.3%, 99.7%): L7
(  9)    70194 ( 0.1%, 99.9%): L8
( 10)    40349 ( 0.1%, 99.9%): L9
( 11)    18673 ( 0.0%,100.0%): L10
( 12)     9843 ( 0.0%,100.0%): L11
( 13)     3749 ( 0.0%,100.0%): L12
( 14)     2253 ( 0.0%,100.0%): L13
( 15)     1632 ( 0.0%,100.0%): L17
( 16)      778 ( 0.0%,100.0%): L14
( 17)      318 ( 0.0%,100.0%): L15
( 18)      248 ( 0.0%,100.0%): L16
( 19)      209 ( 0.0%,100.0%): L24
( 20)      152 ( 0.0%,100.0%): L22
( 21)       94 ( 0.0%,100.0%): L23
( 22)       25 ( 0.0%,100.0%): L18
( 23)       25 ( 0.0%,100.0%): L21
( 24)       23 ( 0.0%,100.0%): L19
( 25)        7 ( 0.0%,100.0%): L20
( 26)        2 ( 0.0%,100.0%): L25

rustc
370386914 counts:
(  1) 71681717 (19.4%, 19.4%): L0
(  2) 69652930 (18.8%, 38.2%): L4
(  3) 69366057 (18.7%, 56.9%): L3
(  4) 62368764 (16.8%, 73.7%): L1
(  5) 37265248 (10.1%, 83.8%): L5
(  6) 29597104 ( 8.0%, 91.8%): L2
(  7) 19322268 ( 5.2%, 97.0%): L6
(  8) 10155160 ( 2.7%, 99.7%): L7
(  9)   417426 ( 0.1%, 99.8%): L8
( 10)   260701 ( 0.1%, 99.9%): L9
( 11)   152713 ( 0.0%,100.0%): L10
( 12)    72886 ( 0.0%,100.0%): L11
( 13)    37076 ( 0.0%,100.0%): L12
( 14)    21746 ( 0.0%,100.0%): L13
( 15)     8930 ( 0.0%,100.0%): L14
( 16)     2142 ( 0.0%,100.0%): L22
( 17)     1386 ( 0.0%,100.0%): L15
( 18)     1013 ( 0.0%,100.0%): L16
( 19)      560 ( 0.0%,100.0%): L23
( 20)      350 ( 0.0%,100.0%): L17
( 21)      293 ( 0.0%,100.0%): L24
( 22)      150 ( 0.0%,100.0%): L20
( 23)      108 ( 0.0%,100.0%): L25
( 24)       65 ( 0.0%,100.0%): L21
( 25)       64 ( 0.0%,100.0%): L19
( 26)       57 ( 0.0%,100.0%): L18

Here are the same results for base:

helloworld
20214 counts:
(  1)     8353 (41.3%, 41.3%): B23
(  2)     4133 (20.4%, 61.8%): B21
(  3)     3455 (17.1%, 78.9%): B22
(  4)     1929 ( 9.5%, 88.4%): B20
(  5)     1495 ( 7.4%, 95.8%): B0
(  6)      193 ( 1.0%, 96.8%): B15
(  7)      182 ( 0.9%, 97.7%): B19
(  8)      107 ( 0.5%, 98.2%): B24
(  9)      105 ( 0.5%, 98.7%): B18
( 10)      103 ( 0.5%, 99.2%): B5
( 11)       45 ( 0.2%, 99.4%): B14
( 12)       42 ( 0.2%, 99.6%): B16
( 13)       32 ( 0.2%, 99.8%): B6
( 14)       25 ( 0.1%, 99.9%): B4
( 15)        8 ( 0.0%,100.0%): B3
( 16)        7 ( 0.0%,100.0%): B2

ripgrep
4513315 counts:
(  1)   748159 (16.6%, 16.6%): B16
(  2)   721945 (16.0%, 32.6%): B18
(  3)   711070 (15.8%, 48.3%): B17
(  4)   644334 (14.3%, 62.6%): B0
(  5)   523121 (11.6%, 74.2%): B15
(  6)   280247 ( 6.2%, 80.4%): B24
(  7)   236609 ( 5.2%, 85.6%): B14
(  8)   219443 ( 4.9%, 90.5%): B23
(  9)   142768 ( 3.2%, 93.7%): B13
( 10)    64378 ( 1.4%, 95.1%): B21
( 11)    61704 ( 1.4%, 96.5%): B12
( 12)    50680 ( 1.1%, 97.6%): B22
( 13)    44914 ( 1.0%, 98.6%): B20
( 14)    30196 ( 0.7%, 99.3%): B11
( 15)    17740 ( 0.4%, 99.6%): B19
( 16)     8809 ( 0.2%, 99.8%): B10
( 17)     3173 ( 0.1%, 99.9%): B8
( 18)     2588 ( 0.1%,100.0%): B9
( 19)     1020 ( 0.0%,100.0%): B7
( 20)      164 ( 0.0%,100.0%): B6
( 21)      109 ( 0.0%,100.0%): B5
( 22)       71 ( 0.0%,100.0%): B4
( 23)       40 ( 0.0%,100.0%): B2
( 24)       23 ( 0.0%,100.0%): B3
( 25)       10 ( 0.0%,100.0%): B1

style-servo
52961408 counts:
(  1) 14743079 (27.8%, 27.8%): B24
(  2) 10758107 (20.3%, 48.2%): B25
(  3)  7817248 (14.8%, 62.9%): B0
(  4)  5507943 (10.4%, 73.3%): B21
(  5)  2837391 ( 5.4%, 78.7%): B20
(  6)  2065167 ( 3.9%, 82.6%): B19
(  7)  1957924 ( 3.7%, 86.3%): B17
(  8)  1917016 ( 3.6%, 89.9%): B18
(  9)  1551566 ( 2.9%, 92.8%): B15
( 10)  1166388 ( 2.2%, 95.0%): B16
( 11)   973654 ( 1.8%, 96.9%): B23
( 12)   613856 ( 1.2%, 98.0%): B14
( 13)   511654 ( 1.0%, 99.0%): B13
( 14)   216395 ( 0.4%, 99.4%): B12
( 15)   198358 ( 0.4%, 99.8%): B22
( 16)    86184 ( 0.2%, 99.9%): B11
( 17)    27818 ( 0.1%,100.0%): B10
( 18)     9600 ( 0.0%,100.0%): B9
( 19)     1434 ( 0.0%,100.0%): B8
( 20)      222 ( 0.0%,100.0%): B4
( 21)      215 ( 0.0%,100.0%): B7
( 22)       87 ( 0.0%,100.0%): B5
( 23)       70 ( 0.0%,100.0%): B3
( 24)       28 ( 0.0%,100.0%): B2
( 25)        3 ( 0.0%,100.0%): B1
( 26)        1 ( 0.0%,100.0%): B6

rustc
370386883 counts:
(  1) 67350517 (18.2%, 18.2%): B0
(  2) 46673993 (12.6%, 30.8%): B24
(  3) 46242997 (12.5%, 43.3%): B22
(  4) 40759067 (11.0%, 54.3%): B23
(  5) 37617157 (10.2%, 64.4%): B21
(  6) 30556919 ( 8.3%, 72.7%): B20
(  7) 30546917 ( 8.2%, 80.9%): B19
(  8) 19659648 ( 5.3%, 86.2%): B18
(  9) 19027989 ( 5.1%, 91.4%): B17
( 10) 12393685 ( 3.3%, 94.7%): B16
( 11)  7367651 ( 2.0%, 96.7%): B15
( 12)  4140107 ( 1.1%, 97.8%): B14
( 13)  3308420 ( 0.9%, 98.7%): B13
( 14)  2539587 ( 0.7%, 99.4%): B25
( 15)  1319804 ( 0.4%, 99.8%): B12
( 16)   515782 ( 0.1%, 99.9%): B11
( 17)   229554 ( 0.1%,100.0%): B10
( 18)    75830 ( 0.0%,100.0%): B9
( 19)    39175 ( 0.0%,100.0%): B8
( 20)    10555 ( 0.0%,100.0%): B7
( 21)     5048 ( 0.0%,100.0%): B6
( 22)     2138 ( 0.0%,100.0%): B4
( 23)     2074 ( 0.0%,100.0%): B5
( 24)     1428 ( 0.0%,100.0%): B3
( 25)      728 ( 0.0%,100.0%): B2
( 26)      113 ( 0.0%,100.0%): B1

Most lengths are short, with a hump at 3 and 4. Lengths drop off quite a bit going from 6 bits to 7, and a lot more going from 7 bits to 8.

Bases are more evenly spread out between zero and the maximum base. Because the above tables shows the number of bits, their entries are biased towards the larger sizes, because each additional bit doubles the span covered.

So, the effects of too few bits are quite different for length vs. base. For length, all programs will be affected roughly equally. E.g. dropping from 7 to 6 bits makes things a bit worse, dropping from 6 to 5 bits would be a lot worse, etc. For base, it depends on the crate size; any program that is big enough to greatly exceed the base maximum is going to face a performance cliff.

So this PR makes things universally slightly worse for lengths for all programs, but then makes things a lot better for bases for large crates.

@nnethercote

This comment has been minimized.

Copy link
Contributor Author

commented Apr 3, 2019

I thought about using another tag bit and then having two compression regimes, perhaps 23base/7len and 26base/4len... but it doesn't seem worth it.

@nnethercote

This comment has been minimized.

Copy link
Contributor Author

commented Apr 3, 2019

script-servo is even bigger than style-servo. Here is the important part of its base distribution:

130151231 counts:
(  1) 26758797 (20.6%, 20.6%): B26
(  2) 18808123 (14.5%, 35.0%): B0
(  3) 13406089 (10.3%, 45.3%): B21
(  4) 12769437 ( 9.8%, 55.1%): B22
(  5)  8856470 ( 6.8%, 61.9%): B20
(  6)  8474469 ( 6.5%, 68.4%): B25
(  7)  8405568 ( 6.5%, 74.9%): B24
(  8)  5744849 ( 4.4%, 79.3%): B18
(  9)  5051649 ( 3.9%, 83.2%): B16
( 10)  4888619 ( 3.8%, 86.9%): B17
( 11)  4404563 ( 3.4%, 90.3%): B19
( 12)  4011167 ( 3.1%, 93.4%): B15
( 13)  2885705 ( 2.2%, 95.6%): B14
( 14)  2079133 ( 1.6%, 97.2%): B23
( 15)  1572516 ( 1.2%, 98.4%): B13
( 16)   916387 ( 0.7%, 99.1%): B12
( 17)   873029 ( 0.7%, 99.8%): B11
( 18)   189718 ( 0.1%,100.0%): B10

20.6% require 26 bits...

@rust-timer

This comment has been minimized.

Copy link

commented Apr 3, 2019

Finished benchmarking try commit b56778a

@petrochenkov

This comment has been minimized.

Copy link
Contributor

commented Apr 3, 2019

I thought about using another tag bit and then having two compression regimes

We tried that in the original PR (with a perf run), it was slightly slower, apparently due to more complex encoding/decoding.

@bors r+

@bors

This comment has been minimized.

Copy link
Contributor

commented Apr 3, 2019

📌 Commit ff94fea has been approved by petrochenkov

@petrochenkov

This comment has been minimized.

Copy link
Contributor

commented Apr 3, 2019

Future directions, mostly to satisfy curiosity I guess, it's hard to expect significant perf gains:

  • Remove repr(packed) from Span and see what happens with perf.
  • Increase the Span size to 64 (no padding) or 40 (a lot of padding expected, unless packed and even if packed) bit and see what happens with perf (bit layouts optimized for span encoding in rustc: #44646 (comment), #44646 (comment)). Increasing the size could become more useful with parallel rustc where slow path would mean going through locks or atomics.
@petrochenkov

This comment has been minimized.

Copy link
Contributor

commented Apr 3, 2019

Actually, a good layout for 64-bit spans (if the span size is bumped) would probably be simply

struct Span {
    base: u32,
    len: u16,
    ctxt_and_tag: u16,
}

which is both machine and compiler friendly, and cover the statistics found in this thread and corner case like #36799 (comment).

@bors

This comment has been minimized.

Copy link
Contributor

commented Apr 3, 2019

⌛️ Testing commit ff94fea with merge 0ba7d41...

bors added a commit that referenced this pull request Apr 3, 2019

Auto merge of #58458 - nnethercote:tweak-Span-encoding, r=petrochenkov
Tweak `Span` encoding.

Failing to fit `base` is more common than failing to fit `len`.
@bors

This comment has been minimized.

Copy link
Contributor

commented Apr 3, 2019

☀️ Test successful - checks-travis, status-appveyor
Approved by: petrochenkov
Pushing 0ba7d41 to master...

@bors bors added the merged-by-bors label Apr 3, 2019

@bors bors merged commit ff94fea into rust-lang:master Apr 3, 2019

1 check passed

homu Test successful
Details

@nnethercote nnethercote deleted the nnethercote:tweak-Span-encoding branch Apr 4, 2019

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.