Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cmd/compile: use -64 instead of 0 for "different file" position encoding in binary exporter #20080

Closed
josharian opened this issue Apr 22, 2017 · 4 comments

Comments

Projects
None yet
3 participants
@josharian
Copy link
Contributor

commented Apr 22, 2017

The int 0 is used in the binary import/export format to indicate "different file". If the files are the same, an offset of 0 lines is represented by adding a second int of -1. See exporter.pos.

However, over 50% of same-file-different-line position deltas have a line delta of 0, so this representation is inefficient. A simple fix is to use the magic value -64 for "different file". That line offset is rare. -64 is the smallest value that fits in a single byte with varint encoding.

The implementation is trivial--it consists of changing 0 to -64 in a few places.

The savings are small, though, so it's not worth doing unless we are changing the export format for some other reason as well--thus an issue instead of a CL.

cc @mdempsky @griesemer

@griesemer

This comment has been minimized.

Copy link
Contributor

commented Apr 22, 2017

I was wondering about that when I was thinking about the encoding. Do you have concrete numbers?

@griesemer griesemer self-assigned this Apr 22, 2017

@griesemer griesemer added this to the Go1.9Maybe milestone Apr 22, 2017

@josharian josharian added the ToolSpeed label Apr 23, 2017

@josharian

This comment has been minimized.

Copy link
Contributor Author

commented Apr 23, 2017

I instrumented bexport's p.pos, and ran 'go build -a std cmd'. Of the calls to pos, here's everything at >1%:

 47.78% 96566 line delta 0
 14.93% 30182 different file
 13.28% 26834 line delta 1
  4.96% 10021 line delta -1
  2.03%  4095 line delta -2
  1.94%  3927 line delta 2
  1.19%  2410 line delta -4
  1.04%  2095 line delta 3
  1.03%  2077 line delta 4

Results on the compilebench set (requires CL 41053):

name       old obj-bytes     new obj-bytes     delta
Template          382k ± 0%         381k ± 0%   -0.23%  (p=0.002 n=6+6)
Unicode           203k ± 0%         203k ± 0%   -0.00%  (p=0.002 n=6+6)
GoTypes          1.18M ± 0%        1.17M ± 0%   -0.08%  (p=0.002 n=6+6)
Compiler         3.99M ± 0%        3.99M ± 0%   -0.08%  (p=0.002 n=6+6)
SSA              8.28M ± 0%        8.28M ± 0%   -0.02%  (p=0.002 n=6+6)
Flate             230k ± 0%         230k ± 0%   -0.05%  (p=0.002 n=6+6)
GoParser          287k ± 0%         287k ± 0%   -0.16%  (p=0.002 n=6+6)
Reflect          1.00M ± 0%        1.00M ± 0%   -0.01%  (p=0.002 n=6+6)
Tar               190k ± 0%         189k ± 0%   -0.24%  (p=0.002 n=6+6)
XML               416k ± 0%         415k ± 0%   -0.16%  (p=0.002 n=6+6)

name       old export-bytes  new export-bytes  delta
Template         19.0k ± 0%        18.2k ± 0%   -4.55%  (p=0.002 n=6+6)
Unicode          4.45k ± 0%        4.44k ± 0%   -0.11%  (p=0.002 n=6+6)
GoTypes          29.7k ± 0%        28.8k ± 0%   -3.12%  (p=0.002 n=6+6)
Compiler         75.6k ± 0%        72.5k ± 0%   -4.03%  (p=0.002 n=6+6)
SSA              76.2k ± 0%        74.8k ± 0%   -1.72%  (p=0.002 n=6+6)
Flate            4.98k ± 0%        4.87k ± 0%   -2.29%  (p=0.002 n=6+6)
GoParser         8.81k ± 0%        8.34k ± 0%   -5.30%  (p=0.002 n=6+6)
Reflect          6.25k ± 0%        6.16k ± 0%   -1.49%  (p=0.002 n=6+6)
Tar              9.49k ± 0%        9.03k ± 0%   -4.82%  (p=0.002 n=6+6)
XML              16.0k ± 0%        15.4k ± 0%   -4.03%  (p=0.002 n=6+6)

So looks like it shrinks export data by 1-5% and object files sizes by a tiny amount.

@josharian

This comment has been minimized.

Copy link
Contributor Author

commented Apr 23, 2017

If we do this, another thing to consider at the same time is switching from using commonPrefixLen for filenames to just writing dir and file separately. A quick hack (diff below) shows another minor win from this, due entirely to the elimination of the need to encode of the length of the common prefix.

name        old export-bytes  new export-bytes  delta
Template          19.0k ± 0%        18.9k ± 0%   -0.82%  (p=0.002 n=6+6)
Unicode           4.45k ± 0%        4.41k ± 0%   -0.97%  (p=0.002 n=6+6)
GoTypes           29.7k ± 0%        29.1k ± 0%   -1.90%  (p=0.002 n=6+6)
Compiler          75.6k ± 0%        74.1k ± 0%   -2.01%  (p=0.002 n=6+6)
SSA               76.2k ± 0%        75.2k ± 0%   -1.31%  (p=0.002 n=6+6)
Flate             4.98k ± 0%        4.90k ± 0%   -1.67%  (p=0.002 n=6+6)
GoParser          8.81k ± 0%        8.75k ± 0%   -0.65%  (p=0.002 n=6+6)
Reflect           6.25k ± 0%        6.20k ± 0%   -0.72%  (p=0.002 n=6+6)
Tar               9.49k ± 0%        9.41k ± 0%   -0.79%  (p=0.002 n=6+6)
XML               16.0k ± 0%        15.9k ± 0%   -0.62%  (p=0.002 n=6+6)
[Geo mean]        15.1k             14.9k        -1.15%
diff --git a/src/cmd/compile/internal/gc/bexport.go b/src/cmd/compile/internal/gc/bexport.go
index 3637804a12..7adcbdde28 100644
--- a/src/cmd/compile/internal/gc/bexport.go
+++ b/src/cmd/compile/internal/gc/bexport.go
@@ -118,6 +118,7 @@ import (
 	"encoding/binary"
 	"fmt"
 	"math/big"
+	"path/filepath"
 	"sort"
 	"strings"
 )
@@ -532,17 +533,12 @@ func (p *exporter) pos(n *Node) {
 	} else {
 		// different file
 		p.int(0)
-		// Encode filename as length of common prefix with previous
-		// filename, followed by (possibly empty) suffix. Filenames
-		// frequently share path prefixes, so this can save a lot
-		// of space and make export data size less dependent on file
-		// path length. The suffix is unlikely to be empty because
-		// file names tend to end in ".go".
-		n := commonPrefixLen(p.prevFile, file)
-		p.int(n)           // n >= 0
-		p.string(file[n:]) // write suffix only
+		p.int(line) // line >= 0
+		// Encode filenames as dir+path, so that common dirs can be shared.
+		// TODO: many dirs are strict prefixes of others. Can we use this?
+		d, f := filepath.Split(file)
+		p.string(d)
+		p.string(f)
 		p.prevFile = file
-		p.int(line)
 	}
 	p.prevLine = line
 }
diff --git a/src/cmd/compile/internal/gc/bimport.go b/src/cmd/compile/internal/gc/bimport.go
index 7766a5617a..c24e216956 100644
--- a/src/cmd/compile/internal/gc/bimport.go
+++ b/src/cmd/compile/internal/gc/bimport.go
@@ -387,9 +387,9 @@ func (p *importer) pos() src.XPos {
 		line += delta
 	} else if n := p.int(); n >= 0 {
 		// file changed
-		file = p.prevFile[:n] + p.string()
+		file = p.string() + p.string()
+		line = n
 		p.prevFile = file
-		line = p.int()
 		p.posBase = src.NewFileBase(file, file)
 	}
 	p.prevLine = line
@gopherbot

This comment has been minimized.

Copy link

commented Apr 25, 2017

CL https://golang.org/cl/41619 mentions this issue.

@gopherbot gopherbot closed this in a6b16e0 Apr 25, 2017

@golang golang locked and limited conversation to collaborators Apr 25, 2018

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
You can’t perform that action at this time.