New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runtime: need a better itab table #20505

Closed
randall77 opened this Issue May 26, 2017 · 15 comments

Comments

Projects
None yet
9 participants
@randall77
Contributor

randall77 commented May 26, 2017

The runtime keeps a map from [interface type, concrete type] to the itab for that pair. Most interface->interface conversions need to look in this table to find the itab to use. (concrete type->interface conversions don't need this table any more - see CL 20901, CL 20902, and others.)
It is implemented as a hash table with 1009 entries and chained (singly-linked list) buckets.

We've run into servers inside Google that have 43,540 entries in this table. That leads to lots of CPU getting wasted in walking long linked lists in convI2I calls.

Part of the problem is that we're now loading this table at startup with all the itabs that we build at compile time (introduced in the CLs referenced above), instead of just keeping dynamically-used entries in this table. Part of the problem may just be the sheer size of the code for the server in question (lots of protobufs, I think).

We should come up with a better way to organize the itab table. Maybe it would be enough to just implement move-to-front on the bucket lists? Or maybe keep a hot table and a cold table, put the itabs in the cold table at startup, and only move them to the hot table when referenced dynamically? Other ideas welcome.

The current implementation is ~append-only which makes the multithreading easy. I don't want to lose that property by locking on every call.

@walken-google @cherrymui

@randall77 randall77 added this to the Go1.10 milestone May 26, 2017

@bcmills

This comment has been minimized.

Show comment
Hide comment
@bcmills

bcmills May 26, 2017

Member

Other ideas welcome.

This sounds like pretty much the same use-case that sync.Map is optimized for, although I suspect you'd have to reimplement sync.Map at a much lower level to be able to use it that deeply within the runtime.

Member

bcmills commented May 26, 2017

Other ideas welcome.

This sounds like pretty much the same use-case that sync.Map is optimized for, although I suspect you'd have to reimplement sync.Map at a much lower level to be able to use it that deeply within the runtime.

@huguesb

This comment has been minimized.

Show comment
Hide comment
@huguesb

huguesb May 26, 2017

Contributor

Other ideas welcome

Could the compiler generate a perfect hash for this data? That would save CPU by avoiding walks altogether and initialization at startup. The downside being longer compile time, larger object size, and it might not work well with plugins.

Contributor

huguesb commented May 26, 2017

Other ideas welcome

Could the compiler generate a perfect hash for this data? That would save CPU by avoiding walks altogether and initialization at startup. The downside being longer compile time, larger object size, and it might not work well with plugins.

@randall77

This comment has been minimized.

Show comment
Hide comment
@randall77

randall77 May 26, 2017

Contributor

@huguesb It is possible to perfect hash the startup data, but that seems overkill. Just sizing the table appropriately would be enough. We'd need a dynamic table anyway, as not all of the data is available at startup. If we move itabs to the dynamic table on first lookup (the hot/cold table idea), the speed of the static tables is ~irrelevant.

Contributor

randall77 commented May 26, 2017

@huguesb It is possible to perfect hash the startup data, but that seems overkill. Just sizing the table appropriately would be enough. We'd need a dynamic table anyway, as not all of the data is available at startup. If we move itabs to the dynamic table on first lookup (the hot/cold table idea), the speed of the static tables is ~irrelevant.

@dr2chase

This comment has been minimized.

Show comment
Hide comment
@dr2chase

dr2chase May 26, 2017

Contributor

How do you feel about graph coloring? For a type T that implements interfaces I1, I2, I3, create a clique on I1, I2, I3, to indicate that they must have different colors.

Color the resulting graph, and construct tables (slices) mapping colors (now integers) to itabs.
Each itab/table entry also needs to indicate the interface it is for.

To convert Ix to Iy, extract Cx from Ix, then
(itab, iface) := Cx.iitable[color(Iy)]
if iface = Iy, success, otherwise failure.

Problem with this is that the graph coloring could be large-ish, I don't really know. It could happen at link time, might be interesting to figure out if there was a way to do it on-demand.

Yes, I know it's an NP-complete problem. So is register allocation.

Contributor

dr2chase commented May 26, 2017

How do you feel about graph coloring? For a type T that implements interfaces I1, I2, I3, create a clique on I1, I2, I3, to indicate that they must have different colors.

Color the resulting graph, and construct tables (slices) mapping colors (now integers) to itabs.
Each itab/table entry also needs to indicate the interface it is for.

To convert Ix to Iy, extract Cx from Ix, then
(itab, iface) := Cx.iitable[color(Iy)]
if iface = Iy, success, otherwise failure.

Problem with this is that the graph coloring could be large-ish, I don't really know. It could happen at link time, might be interesting to figure out if there was a way to do it on-demand.

Yes, I know it's an NP-complete problem. So is register allocation.

@randall77

This comment has been minimized.

Show comment
Hide comment
@randall77

randall77 May 26, 2017

Contributor

@dr2chase I totally don't understand what you are suggesting. What is this graph of? Interfaces? Why are two different interfaces interfering? Is this just to make a small index out of an interface name?

We need to do this lookup even when the compiler never sees any of these types together (they are in different packages, say). So I don't think we could construct a complete interference graph at compile time anyway. Maybe link time, but then plugins break it.

Contributor

randall77 commented May 26, 2017

@dr2chase I totally don't understand what you are suggesting. What is this graph of? Interfaces? Why are two different interfaces interfering? Is this just to make a small index out of an interface name?

We need to do this lookup even when the compiler never sees any of these types together (they are in different packages, say). So I don't think we could construct a complete interference graph at compile time anyway. Maybe link time, but then plugins break it.

@dr2chase

This comment has been minimized.

Show comment
Hide comment
@dr2chase

dr2chase May 26, 2017

Contributor

Goal is to make a small index out of an interface name. Two interfaces I1 and I2 interfere if type T implements both of them (they can't both have the same index in T's table).

Plugins, the risk is either a new type TP collides two previously non-colliding interfaces, thus extending all the tables by one. A new interface IP also potentially extends all the tables.

Conceivably, "failure" can continue the probe by some other means.

How much of an event is loading a plugin?

More on the graph:

graph nodes are interfaces, graph edges indicate that two interfaces interfere (cannot share the same color) because they are both implemented by the same type. This is much the same as register allocation graph coloring, where nodes are variables, and an edge exists whenever two variables have overlapping lifetimes. In both cases, coloring is used to find and reuse small indices.

Contributor

dr2chase commented May 26, 2017

Goal is to make a small index out of an interface name. Two interfaces I1 and I2 interfere if type T implements both of them (they can't both have the same index in T's table).

Plugins, the risk is either a new type TP collides two previously non-colliding interfaces, thus extending all the tables by one. A new interface IP also potentially extends all the tables.

Conceivably, "failure" can continue the probe by some other means.

How much of an event is loading a plugin?

More on the graph:

graph nodes are interfaces, graph edges indicate that two interfaces interfere (cannot share the same color) because they are both implemented by the same type. This is much the same as register allocation graph coloring, where nodes are variables, and an edge exists whenever two variables have overlapping lifetimes. In both cases, coloring is used to find and reuse small indices.

@ianlancetaylor

This comment has been minimized.

Show comment
Hide comment
@ianlancetaylor

ianlancetaylor May 26, 2017

Contributor

I think it would be worth doing some dynamic examination of real programs to see what dynamic types actually appear during specific interface-to-interface conversions. Where are all the interface-to-interface conversions coming from? How many are the fmt package checking for Formatter, GoStringer, and Stringer? Is it worth optimizing those specific cases, by, for example, having special hash tables just for those types?

I note that a type with methods always an uncommonType struct as part of its type descriptor. I note that uncommonType has 48 unused bits. That gives us plenty of room to identify the method index of the standard String method, if that is indeed one of the most common methods to look up.

In general, my guess is that the great majority of interface-to-interface conversions are converting to an interface with a single method. Perhaps we could add a pointer to the type descriptor of an interface type with a single method that points to a hash table mapping concrete types to the appropriate itab or nil if the type does not support the interface. The idea here is to simply split up the hash table.

Contributor

ianlancetaylor commented May 26, 2017

I think it would be worth doing some dynamic examination of real programs to see what dynamic types actually appear during specific interface-to-interface conversions. Where are all the interface-to-interface conversions coming from? How many are the fmt package checking for Formatter, GoStringer, and Stringer? Is it worth optimizing those specific cases, by, for example, having special hash tables just for those types?

I note that a type with methods always an uncommonType struct as part of its type descriptor. I note that uncommonType has 48 unused bits. That gives us plenty of room to identify the method index of the standard String method, if that is indeed one of the most common methods to look up.

In general, my guess is that the great majority of interface-to-interface conversions are converting to an interface with a single method. Perhaps we could add a pointer to the type descriptor of an interface type with a single method that points to a hash table mapping concrete types to the appropriate itab or nil if the type does not support the interface. The idea here is to simply split up the hash table.

@josharian

This comment has been minimized.

Show comment
Hide comment
@josharian

josharian May 28, 2017

Contributor

Once we understand the nature of the table use (per Ian), I think a good next step would be to construct a benchmark program that reproduces the problem at different scales, so we can easily profile and experiment.

Contributor

josharian commented May 28, 2017

Once we understand the nature of the table use (per Ian), I think a good next step would be to construct a benchmark program that reproduces the problem at different scales, so we can easily profile and experiment.

@randall77

This comment has been minimized.

Show comment
Hide comment
@randall77

randall77 May 29, 2017

Contributor

Of the static entries, there were more than 30,000 proto.Message, plus another 3000 or so helpers, like the additional Oneof types.
I don't have any dynamic counts, unfortunately. That requires being able to run the binary, which isn't easy.
I do have a benchmark which demonstrates the problem. I'll send it out tomorrow.

Contributor

randall77 commented May 29, 2017

Of the static entries, there were more than 30,000 proto.Message, plus another 3000 or so helpers, like the additional Oneof types.
I don't have any dynamic counts, unfortunately. That requires being able to run the binary, which isn't easy.
I do have a benchmark which demonstrates the problem. I'll send it out tomorrow.

@walken-google

This comment has been minimized.

Show comment
Hide comment
@walken-google

walken-google May 30, 2017

Contributor

I think it would make sense to dynamically resize the itab hash table so that we can address both the dynamic and static itab entries cases in one go.

I think this can be simpler than sync.Map: the hash table would need to be a slice (instead of a fixed size array) and there would be an atomic pointer to the hash table slice. Lookups would follow the same "look twice - once without lock, once with" pattern as today. If the entry is not found the second time, we proceed to the insertion case which could be made to resize the table when above a given load factor.

During runtime init we may not be able to handle memory allocation (?); if that is a problem we can just disable hash resizing during early init.

Another issue is that the itab hash modulo would not be a constant anymore. We may want to avoid having an integer divide on that path. Is there a strong requirement for the prime hash size ? I suspect power of two hash sizes could be made to work too...

Contributor

walken-google commented May 30, 2017

I think it would make sense to dynamically resize the itab hash table so that we can address both the dynamic and static itab entries cases in one go.

I think this can be simpler than sync.Map: the hash table would need to be a slice (instead of a fixed size array) and there would be an atomic pointer to the hash table slice. Lookups would follow the same "look twice - once without lock, once with" pattern as today. If the entry is not found the second time, we proceed to the insertion case which could be made to resize the table when above a given load factor.

During runtime init we may not be able to handle memory allocation (?); if that is a problem we can just disable hash resizing during early init.

Another issue is that the itab hash modulo would not be a constant anymore. We may want to avoid having an integer divide on that path. Is there a strong requirement for the prime hash size ? I suspect power of two hash sizes could be made to work too...

@rasky

This comment has been minimized.

Show comment
Hide comment
@rasky

rasky May 30, 2017

Member

Another option is to precalc the prime sequence and the magic factors (libdivide style)

Member

rasky commented May 30, 2017

Another option is to precalc the prime sequence and the magic factors (libdivide style)

@gopherbot

This comment has been minimized.

Show comment
Hide comment
@gopherbot

gopherbot commented May 30, 2017

CL https://golang.org/cl/44338 mentions this issue.

@gopherbot

This comment has been minimized.

Show comment
Hide comment
@gopherbot

gopherbot commented May 30, 2017

CL https://golang.org/cl/44339 mentions this issue.

@gopherbot

This comment has been minimized.

Show comment
Hide comment
@gopherbot

gopherbot commented May 30, 2017

CL https://golang.org/cl/44341 mentions this issue.

@gopherbot

This comment has been minimized.

Show comment
Hide comment
@gopherbot

gopherbot commented May 31, 2017

CL https://golang.org/cl/44472 mentions this issue.

@gopherbot gopherbot closed this in 3d1699e Aug 15, 2017

gopherbot pushed a commit that referenced this issue Aug 15, 2017

cmd/compile: set itab function pointers at compile time
I noticed that we don't set an itab's function pointers at compile
time. Instead, we currently do it at executable startup.

Set the function pointers at compile time instead. This shortens
startup time. It has no effect on normal binary size. Object files
will have more relocations, but that isn't a big deal.

For PIE there are additional pointers that will need to be adjusted at
load time. There are already other pointers in an itab that need to be
adjusted, so the cache line will already be paged in. There might be
some binary size overhead to mark these pointers. The "go test -c
-buildmode=pie net/http" binary is 0.18% bigger.

Update #20505

Change-Id: I267c82489915b509ff66e512fc7319b2dd79b8f7
Reviewed-on: https://go-review.googlesource.com/44341
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Martin Möhrmann <moehrmann@google.com>

@golang golang locked and limited conversation to collaborators Aug 15, 2018

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.