Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

generate compressed tables #28

Closed
awalterschulze opened this issue Jul 21, 2016 · 14 comments
Closed

generate compressed tables #28

awalterschulze opened this issue Jul 21, 2016 · 14 comments
Assignees

Comments

@awalterschulze
Copy link
Collaborator

The generated tables can become quite large
https://raw.githubusercontent.com/katydid/katydid/master/relapse/parser/actiontable.go
and this is the reason they are generated and not typed by a human :)

The current tables are great output for debugging, but in production we don't really care how pretty they are printed.
A bigger concern is how long it takes to generate the code and the size of the generated code.

I have cut down my code generation time in gogoprotobuf by 4 times just by rather generating a compressed FileDescriptorProtoSet object. The generated function in that case still returns the uncompressed version, so the API has not changed.

I suspect we can cut down on the gocc code generation time by quite a bit by simply printing a compressed table. Something like the following.

// generated by gocc; DO NOT EDIT.

package parser

type (
    actionTable [numStates]actionRow
    actionRow   struct {
        canRecover bool
        actions    [numSymbols]action
    }
)

var actionTab actionTable

func init() {
  compressedActTab := []byte{
    0x1, 0x2, 0x3, 0x4, 0x5, 0x6, 0x7, 0x8,
    0x1, 0x2, 0x3, 0x4, 0x5, 0x6, 0x7, 0x8,
    ...
  }
  actionTab = uncompress(compressedActTab)
}

Maybe I am wrong and this is not the bottle neck, but then at least we still get a smaller code size.

I still think the current table output should be kept as the default.
We can add a flag (-min) to gocc which will then generate the compressed tables.

What do you think @goccmack @mewmew

@goccmack
Copy link
Owner

I have not objections to giving it a try.

On 21 Jul 2016, at 11:36, Walter Schulze notifications@github.com wrote:

The generated tables can become quite large
https://raw.githubusercontent.com/katydid/katydid/master/relapse/parser/actiontable.go
and this is the reason they are generated and not typed by a human :)

The current tables are great output for debugging, but in production we don't really care how pretty they are printed.
A bigger concern is how long it takes to generate the code and the size of the generated code.

I have cut down my code generation time in gogoprotobuf by 4 times just by rather generating a compressed FileDescriptorProtoSet object. The generated function in that case still returns the uncompressed version, so the API has not changed.

I suspect we can cut down on the gocc code generation time by quite a bit by simply printing a compressed table. Something like the following.

// generated by gocc; DO NOT EDIT.

package parser

type (
actionTable [numStates]actionRow
actionRow struct {
canRecover bool
actions [numSymbols]action
}
)

var actionTab actionTable

func init() {
compressedActTab := []byte{
0x1, 0x2, 0x3, 0x4, 0x5, 0x6, 0x7, 0x8,
0x1, 0x2, 0x3, 0x4, 0x5, 0x6, 0x7, 0x8,
...
}
actionTab = uncompress(compressedActTab)
}

Maybe I am wrong and this is not the bottle neck, but then at least we still get a smaller code size.

I still think the current table output should be kept as the default.
We can add a flag (-min) to gocc which will then generate the compressed tables.

What do you think @goccmack @mewmew


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or mute the thread.

@mewmew
Copy link
Collaborator

mewmew commented Jul 22, 2016

The generated tables can become quite large
https://raw.githubusercontent.com/katydid/katydid/master/relapse/parser/actiontable.go
and this is the reason they are generated and not typed by a human :)

Indeed :)

A bigger concern is how long it takes to generate the code and the size of the generated code.

Just to clarify, are you referring to (1) the time it takes for Gocc to produces the source files, (2) the time it takes for the Go compiler to compile the generated source files, or (3) the time it takes to run the generated parsers?

In the case of (2), this is a bug in the later versions of the Go compiler, and is being tracked upstream at golang/go#14934 and golang/go#16407.

Maybe I am wrong and this is not the bottle neck, but then at least we still get a smaller code size.

I'm still interested to understand precisely which bottle neck you are referring to, (1), (2) or (3) or something else?

I still think the current table output should be kept as the default.

Definitely keep the current one default.

We can add a flag (-min) to gocc which will then generate the compressed tables.
What do you think @goccmack @mewmew

As part of this effort to generate compressed tables, may we take a look at adding proper support for David Pager's "Practical General Method" as motivated by Marius in #13 (comment)

When I have the time again I would like to add Pager's PGM again because it generates smaller tables for full LR(1) grammars -- similar in size to LALR for the same grammar. I also think that the original paper is beautiful and that it is a pity that LALR became popular instead of this technique. The PGM parse table generator was working in gocc3 but not extensively tested.

Then perhaps the command line flag could be -pager.

@goccmack
Copy link
Owner

I really like the Pager option.

@awalterschulze
Copy link
Collaborator Author

I am proposing to do the compression of the tables, on my own time (which shouldn't take too long), but I won't have time to do pager as well, since this will be a very time consuming effort requiring time to understand the theory and time for extra testing.
I also think even if someone has time to do pager, this is a parallel option and both can be used in combination.

On your question of "which time". I had to compress my gogoprotobuf generated code because of issue (2), but then I gained (1) and this has really sped up my development time. Obviously I also want smaller source files and faster compilation. But the main purpose is faster code generation, since this has become my main bottleneck when developing. I love the quick feedback loop that golang gives me and I want to keep that high even if I am doing some code generation.

Given that I am proposing to do this work, I just want to know whether you,@goccmack and @mewmew support this work without me starting a journey down pager theory?

@goccmack
Copy link
Owner

I do

@mewmew
Copy link
Collaborator

mewmew commented Jul 22, 2016

Given that I am proposing to do this work, I just want to know whether you,@goccmack and @mewmew support this work without me starting a journey down pager theory?

Sure, go for it. We may always evaluate the performance of -min tables against -pager later. Would it be possible to document that -min is considered experimental for now, so that once pager is implemented and we may properly evaluate them against one another, we have the option to remove -min, if motivated without breaking users.

But the main purpose is faster code generation, since this has become my main bottleneck when developing. I love the quick feedback loop that golang gives me and I want to keep that high even if I am doing some code generation.

As a work around, you may try using Go 1.4. It does speed up compilation quite a bit for katydid.

Go 1.4

[~]$ ~/go1.4/bin/go version
go version go1.4.3 linux/amd64
[~]$ export GOPATH=/tmp/go14
[~]$ ~/go1.4/bin/go get github.com/gogo/protobuf/protoc-gen-gogo
[~]$ ~/go1.4/bin/go get github.com/goccmack/gocc
[~]$ ~/go1.4/bin/go get -d github.com/katydid/katydid
[~]$ export PATH=${GOPATH}/bin:${PATH}
[~]$ cd ${GOPATH}/src/github.com/katydid/katydid
[/tmp/go14/src/github.com/katydid/katydid]$ time ~/go1.4/bin/go install ./...

real    0m4.940s
user    0m7.877s
sys 0m0.973s
[/tmp/go14/src/github.com/katydid/katydid]$ time ~/go1.4/bin/go install -a ./...

real    0m5.421s
user    0m8.423s
sys 0m1.117s

Go 1.6

[~]$ ~/go1.6/bin/go version
go version go1.6.3 linux/amd64
[~]$ export GOPATH=/tmp/go16
[~]$ ~/go1.6/bin/go get github.com/gogo/protobuf/protoc-gen-gogo
[~]$ ~/go1.6/bin/go get github.com/goccmack/gocc
[~]$ ~/go1.6/bin/go get -d github.com/katydid/katydid
[~]$ export PATH=${GOPATH}/bin:${PATH}
[~]$ cd ${GOPATH}/src/github.com/katydid/katydid
[/tmp/go16/src/github.com/katydid/katydid]$ time ~/go1.6/bin/go install ./...

real    0m11.661s
user    0m28.207s
sys 0m1.043s
[/tmp/go16/src/github.com/katydid/katydid]$ time ~/go1.6/bin/go install -a ./...

real    0m17.792s
user    0m44.480s
sys 0m1.737s

Go tip

[~]$ ~/go/bin/go version
go version devel +243d51f Thu Jul 21 18:17:31 2016 +0000 linux/amd64
[~]$ export GOPATH=/tmp/gotip
[~]$ ~/go/bin/go get github.com/gogo/protobuf/protoc-gen-gogo
[~]$ ~/go/bin/go get github.com/goccmack/gocc
[~]$ ~/go/bin/go get -d github.com/katydid/katydid
[~]$ export PATH=${GOPATH}/bin:${PATH}
[~]$ cd ${GOPATH}/src/github.com/katydid/katydid
[/tmp/gotip/src/github.com/katydid/katydid]$ time ~/go/bin/go install ./...

real    0m11.166s
user    0m19.103s
sys 0m0.847s
[/tmp/gotip/src/github.com/katydid/katydid]$ time ~/go/bin/go install -a ./...

real    0m16.454s
user    0m31.083s
sys 0m1.337s

@awalterschulze
Copy link
Collaborator Author

awalterschulze commented Jul 22, 2016

Yes totally. Having it as an experimental flag sounds perfect :)
I also want to see if it really makes a difference.

Wow that is a huge difference in compilation speed. Thanks for the effort.
I am a bit more worried about the total speed of running make all and not just go install.
Compilation speed is issue (2) and Code generation (1) happens during make regenerate

@mewmew
Copy link
Collaborator

mewmew commented Jul 24, 2016

Yes totally. Having it as an experimental flag sounds perfect :)
I also want to see if it really makes a difference.

Perfect. Go for it.

@awalterschulze
Copy link
Collaborator Author

Thank you for both of your approval.
Now I just need to create some time :)

@mewmew
Copy link
Collaborator

mewmew commented Oct 5, 2016

For reference, issue golang/go#14934 and golang/go#16407 have now been fixed, resulting in a tremendous improvement in compile time.

For the Gocc generated code of golang/go#14934, a 38x slowdown was noticed between Go 1.4 and Go tip (at rev golang/go@0659cf6). This has been brought down to roughly 2x at the latest Go tip (rev golang/go@da6b9ec). The main commit dealing with this issue was golang/go@5a6e511, which introduced a new phi placement algorithm (the Sreedhar+Gao phi building algorithm).

@awalterschulze, you may not have to generate the compressed tables if the new compile times are good enough. Then we can focus on introducing Pager at some point in the future.

@awalterschulze
Copy link
Collaborator Author

Interesting :)
Just to make sure, this will be in go 1.8 and is not in go 1.7 yet right?

@mewmew
Copy link
Collaborator

mewmew commented Oct 5, 2016

Just to make sure, this will be in go 1.8 and is not in go 1.7 yet right?

That's correct.

@mewmew
Copy link
Collaborator

mewmew commented Dec 3, 2016

@awalterschulze Do you want to keep this issue open? Or should we close it since #31 has been merged. We could then open a dedicated issue for Pager support.

@awalterschulze
Copy link
Collaborator Author

awalterschulze commented Dec 3, 2016 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants