I guess there's really two pieces here:
a) That call should just be a fully direct call, since it's reading a function pointer from a knownitab.
b) Then it should be inlined because the function is below the inlining threshold.
Inlining in the compiler happens far before devirtualization. Changing that is difficult, unfortunately. Missed optimization opportunities due to compiler ordering problems are among the most difficult to address.
Yes agreed. You basically end up needing to run some variant of passes 2x. I can imagine an architecture which tries to minimize costs looking something like:
Inlining phase tags functions which are inlineable (this must already be happening, -m shows <source>:26:6: can inline int32FuncTable.Cmp).
When devirtualization devirtualizes a call, if was tagged as an inlineable function it gets stored somewhere (I don't know gc's IR well, so this may require solving the part (a) problem I flagged where gc is still emitting an indirect call even after it was devirtualized)
Post-devirtualization, a mini-inlining pass occurs which inlines everything that was stored by the devirtualization pass.
Someone will inevitably be back a month after this and ask you to run devirtualization again after the second inlining, because unless you just run the whole optimizer stack again and again until you reach a fixed point, you'll always miss something, but this approach seems tractable and like it probably balances compiler performance with emitted code performance.
It is unfortunately far more complicated than that. The internal representation in the compiler changes dramatically between those passes. You can’t just run a second “mini-inlining” pass later. I’d like to move inlining later in the compiler in general, but that’s a major undertaking and raises a number of thorny issues.