Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add first proper benchmarks #17

Merged
merged 1 commit into from Mar 21, 2023
Merged

Add first proper benchmarks #17

merged 1 commit into from Mar 21, 2023

Conversation

bbrehm
Copy link
Contributor

@bbrehm bbrehm commented Mar 20, 2023

Some changes in graph accessors and schemagen. Add JMH benchmarks. Unify benchmarks into a single subproject.

The opsPerInvocation annoyed me enough to try to upstream it at openjdk/jmh#97

I think we now can start discussing about whether to try to devirtualize neighbor and edge access.

@bbrehm bbrehm requested review from ml86 and mpollmeier March 20, 2023 16:25
Copy link
Contributor

@mpollmeier mpollmeier left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

devirtualize neighbor and edge access

can you please elaborate?

@bbrehm
Copy link
Contributor Author

bbrehm commented Mar 21, 2023

devirtualize neighbor and edge access

can you please elaborate?

Sure! Timings look like

Benchmark                                       (shuffled)  Mode  Cnt    Score    Error  Units
joernBench.JoernGenerated.orderSum                    true  avgt    5   82.620 ±  2.324  ns/op
joernBench.JoernGenerated.orderSum                   false  avgt    5   34.356 ±  1.342  ns/op
joernBench.JoernLegacy.orderSum                       true  avgt    5   58.692 ±  0.553  ns/op
joernBench.JoernLegacy.orderSum                      false  avgt    5   29.085 ±  0.511  ns/op
joernBench.Odb2Generated.orderSumDevirtualized        true  avgt    5   49.812 ±  2.499  ns/op
joernBench.Odb2Generated.orderSumDevirtualized       false  avgt    5    5.504 ±  0.082  ns/op
joernBench.Odb2Generated.orderSumVirtual              true  avgt    5   99.325 ±  2.615  ns/op
joernBench.Odb2Generated.orderSumVirtual             false  avgt    5   10.972 ±  0.449  ns/op

So we can have a 2x speedup from the "devirtualized" optimization. The slowdown for the unshuffled case is presumably due to preventing inlining. The slowdown for the shuffled case is presumably due to branch misses that mess up the pipeline (due to the long reorder buffer, a branch miss can be much more expensive than the 20 cycle sticker price). Some relevant perf stats:

Benchmark                                                               (shuffled)  Mode  Cnt     Score     Error   Units
joernBench.Odb2Generated.orderSumDevirtualized:branch-misses:u                true  avgt          0.008              #/op
joernBench.Odb2Generated.orderSumDevirtualized:branches:u                     true  avgt         13.411              #/op
joernBench.Odb2Generated.orderSumDevirtualized:instructions:u                 true  avgt         78.990              #/op
joernBench.Odb2Generated.orderSumVirtual:branch-misses:u                      true  avgt          0.870              #/op
joernBench.Odb2Generated.orderSumVirtual:branches:u                           true  avgt         34.907              #/op
joernBench.Odb2Generated.orderSumVirtual:instructions:u                       true  avgt        174.542              #/op
joernBench.Odb2Generated.orderSumVirtual:branch-misses:u                     false  avgt          0.004              #/op
joernBench.Odb2Generated.orderSumVirtual:branches:u                          false  avgt         33.225              #/op
joernBench.Odb2Generated.orderSumVirtual:instructions:u                      false  avgt        170.058              #/op

In other words, we see that the devirtualized variant has no branch misses, even in the shuffled case; and due to inlining it can run with only ~80 instructions. In contrast, the virtual variant has a megamorphic call site that prevents inlining and balloons instruction count to ~170; more to the point, it has ~0.9 mis-predicted branches. These are indirect calls (i.e. via branch target buffer, not via branch history table) which is even worse. Unfortunately perfasm output doesn't do what I want for this specific case (I'm probably holding it wrong?).

The benchmark in question looks like

  @Benchmark
  def orderSumVirtual(blackhole: Blackhole): Int = {
    var sumOrder = 0
    for (node <- nodeStart.iterator.asInstanceOf[Iterator[v2.nodes.AstNode]]) {
      sumOrder += node.order
    }
    if (blackhole != null) blackhole.consume(sumOrder)
    sumOrder
  }

  @Benchmark
  def orderSumDevirtualized(blackhole: Blackhole): Int = {
    var sumOrder = 0
    val prop     = nodeStart.head.graph.schema.getPropertyIdByLabel("ORDER")
    for (node <- nodeStart) {
      sumOrder += odb2.Accessors.getNodePropertySingle(node.graph, node.nodeKind, prop, node.seq(), -1)
    }
    if (blackhole != null) blackhole.consume(sumOrder)
    sumOrder
  }

The actual code for .order is similar to what we do in odbv1:

trait AstNode extends StoredNode with AstNodeBase {
  def order: Int
...
}
class Annotation(graph_4762: odb2.Graph, seq_4762: Int) extends StoredNode(graph_4762, 0.toShort, seq_4762) with Expression {
  def order: Int                   = odb2.Accessors.getNodePropertySingle(graph, nodeKind, 41, seq, -1: Int)
...
}

This means that we need to do an invokeinterface in order to grab the property (invokevirtual would be almost as bad).

I'd like to note that the implementation of order is completely independent of the node type. But our class hierarchy contains diamonds where:

  1. Two abstract node types provide the same property
  2. There exist concrete node types that derive from both, leading to a potential conflict.

@bbrehm bbrehm merged commit e73cb77 into master Mar 21, 2023
1 check passed
@delete-merged-branch delete-merged-branch bot deleted the bbrehm/moreBench branch March 21, 2023 16:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants