Skip to content

Conversation

jbj
Copy link
Contributor

@jbj jbj commented Nov 9, 2020

A performance regression in definitionByReferenceNodeFromArgument#ff was ultimately caused by a join on parameter indexes in DefinitionByReferenceNode.getArgument. Joining on numbers in QL is always fragile, and somehow the changes in #4432 had caused the join order here to break.

Instead of tweaking the join order in the slow predicate itself, I added pragma[noinline] to one of the predicates involved in the join on parameter indexes. This should prevent us from getting similar performance problems in the future when we write code that joins on parameter numbers. Joining on indexes is always risky, but it's even more risky when one of the predicates in the join is inlined by the compiler and expands to further joins.

I tested performance by running CgiXss.ql on a ChakraCore snapshot. Tuple counts before (I interrupted execution after five minutes or so):

(626s) Tuple counts for DataFlowUtil::definitionByReferenceNodeFromArgument#ff:
58162      ~0%     {3} r1 = SCAN DataFlowUtil::DefinitionByReferenceNode#class#ff AS I OUTPUT I.<1>, -1, I.<0>
26934      ~0%     {2} r2 = JOIN r1 WITH Instruction::IndexedInstruction#ff AS R ON FIRST 2 OUTPUT r1.<0>, r1.<2>
26934      ~1%     {2} r3 = JOIN r2 WITH Instruction::SideEffectInstruction::getPrimaryInstruction_dispred#3#ff AS R ON FIRST 1 OUTPUT R.<1>, r2.<1>
26850      ~1%     {2} r4 = JOIN r3 WITH Instruction::CallInstruction::getThisArgumentOperand_dispred#ff AS R ON FIRST 1 OUTPUT R.<1>, r3.<1>
26850      ~0%     {2} r5 = JOIN r4 WITH Operand::Operand::getDef_dispred#3#ff AS R ON FIRST 1 OUTPUT R.<1>, r4.<1>
26850      ~1%     {2} r6 = JOIN r5 WITH Instruction::Instruction::getUnconvertedResultExpression_dispred#ff AS R ON FIRST 1 OUTPUT R.<1>, r5.<1>
58162      ~0%     {2} r7 = SCAN DataFlowUtil::DefinitionByReferenceNode#class#ff AS I OUTPUT I.<1>, I.<0>
58162      ~4%     {3} r8 = JOIN r7 WITH Instruction::IndexedInstruction#ff AS R ON FIRST 1 OUTPUT R.<1>, r7.<1>, r7.<0>
4026581120 ~0%     {4} r9 = JOIN r8 WITH Instruction::CallInstruction::getPositionalArgumentOperand_dispred#fff_102#join_rhs AS R ON FIRST 1 OUTPUT r8.<2>, R.<1>, r8.<1>, R.<2>
31154      ~4%     {2} r10 = JOIN r9 WITH Instruction::SideEffectInstruction::getPrimaryInstruction_dispred#3#ff AS R ON FIRST 2 OUTPUT r9.<3>, r9.<2>
31154      ~8%     {2} r11 = JOIN r10 WITH Operand::Operand::getDef_dispred#3#ff AS R ON FIRST 1 OUTPUT R.<1>, r10.<1>
31154      ~0%     {2} r12 = JOIN r11 WITH Instruction::Instruction::getUnconvertedResultExpression_dispred#ff AS R ON FIRST 1 OUTPUT R.<1>, r11.<1>
58004      ~0%     {2} r13 = r6 \/ r12
                   return r13

Tuple counts after:

(0s) Tuple counts for DataFlowUtil::definitionByReferenceNodeFromArgument#ff:
385785  ~6%     {2} r1 = SCAN DataFlowUtil::DefinitionByReferenceNode#class#ff AS I OUTPUT I.<1>, I.<0>
385785  ~0%     {3} r2 = JOIN r1 WITH Instruction::IndexedInstruction#ff AS R ON FIRST 1 OUTPUT r1.<0>, r1.<1>, R.<1>
385785  ~1%     {3} r3 = JOIN r2 WITH Instruction::SideEffectInstruction::getPrimaryInstruction_dispred#3#ff AS R ON FIRST 1 OUTPUT R.<1>, r2.<2>, r2.<1>
198736  ~4%     {2} r4 = JOIN r3 WITH Instruction::CallInstruction::getPositionalArgument#fff AS R ON FIRST 2 OUTPUT R.<2>, r3.<2>
198736  ~0%     {2} r5 = JOIN r4 WITH Instruction::Instruction::getUnconvertedResultExpression_dispred#ff AS R ON FIRST 1 OUTPUT R.<1>, r4.<1>
385785  ~1%     {3} r6 = SCAN DataFlowUtil::DefinitionByReferenceNode#class#ff AS I OUTPUT I.<1>, -1, I.<0>
186891  ~1%     {2} r7 = JOIN r6 WITH Instruction::IndexedInstruction#ff AS R ON FIRST 2 OUTPUT r6.<0>, r6.<2>
186891  ~2%     {2} r8 = JOIN r7 WITH Instruction::SideEffectInstruction::getPrimaryInstruction_dispred#3#ff AS R ON FIRST 1 OUTPUT R.<1>, r7.<1>
183201  ~3%     {2} r9 = JOIN r8 WITH Instruction::CallInstruction::getThisArgumentOperand_dispred#ff AS R ON FIRST 1 OUTPUT R.<1>, r8.<1>
183201  ~0%     {2} r10 = JOIN r9 WITH Operand::Operand::getDef_dispred#3#ff AS R ON FIRST 1 OUTPUT R.<1>, r9.<1>
175449  ~8%     {2} r11 = JOIN r10 WITH Instruction::Instruction::getUnconvertedResultExpression_dispred#ff AS R ON FIRST 1 OUTPUT R.<1>, r10.<1>
374185  ~3%     {2} r12 = r5 \/ r11
                return r12

A performance regression in `definitionByReferenceNodeFromArgument#ff`
was ultimately caused by a join on parameter indexes in
`DefinitionByReferenceNode.getArgument`. Joining on numbers in QL is
always fragile, and somehow the changes in github#4432 had caused the join
order here to break.

Instead of tweaking the join order in the slow predicate itself, I added
`pragma[noinline]` to one of the predicates involved in the join on
parameter indexes. This should prevent us from getting similar
performance problems in the future when we write code that joins on
parameter numbers. Joining on indexes is always risky, but it's even
more risky when one of the predicates in the join is inlined by the
compiler and expands to further joins.

I tested performance by running `CgiXss.ql` on a ChakraCore snapshot.
Tuple counts before (I interrupted execution after five minutes or so):

    (626s) Tuple counts for DataFlowUtil::definitionByReferenceNodeFromArgument#ff:
    58162      ~0%     {3} r1 = SCAN DataFlowUtil::DefinitionByReferenceNode#class#ff AS I OUTPUT I.<1>, -1, I.<0>
    26934      ~0%     {2} r2 = JOIN r1 WITH Instruction::IndexedInstruction#ff AS R ON FIRST 2 OUTPUT r1.<0>, r1.<2>
    26934      ~1%     {2} r3 = JOIN r2 WITH Instruction::SideEffectInstruction::getPrimaryInstruction_dispred#3#ff AS R ON FIRST 1 OUTPUT R.<1>, r2.<1>
    26850      ~1%     {2} r4 = JOIN r3 WITH Instruction::CallInstruction::getThisArgumentOperand_dispred#ff AS R ON FIRST 1 OUTPUT R.<1>, r3.<1>
    26850      ~0%     {2} r5 = JOIN r4 WITH Operand::Operand::getDef_dispred#3#ff AS R ON FIRST 1 OUTPUT R.<1>, r4.<1>
    26850      ~1%     {2} r6 = JOIN r5 WITH Instruction::Instruction::getUnconvertedResultExpression_dispred#ff AS R ON FIRST 1 OUTPUT R.<1>, r5.<1>
    58162      ~0%     {2} r7 = SCAN DataFlowUtil::DefinitionByReferenceNode#class#ff AS I OUTPUT I.<1>, I.<0>
    58162      ~4%     {3} r8 = JOIN r7 WITH Instruction::IndexedInstruction#ff AS R ON FIRST 1 OUTPUT R.<1>, r7.<1>, r7.<0>
    4026581120 ~0%     {4} r9 = JOIN r8 WITH Instruction::CallInstruction::getPositionalArgumentOperand_dispred#fff_102#join_rhs AS R ON FIRST 1 OUTPUT r8.<2>, R.<1>, r8.<1>, R.<2>
    31154      ~4%     {2} r10 = JOIN r9 WITH Instruction::SideEffectInstruction::getPrimaryInstruction_dispred#3#ff AS R ON FIRST 2 OUTPUT r9.<3>, r9.<2>
    31154      ~8%     {2} r11 = JOIN r10 WITH Operand::Operand::getDef_dispred#3#ff AS R ON FIRST 1 OUTPUT R.<1>, r10.<1>
    31154      ~0%     {2} r12 = JOIN r11 WITH Instruction::Instruction::getUnconvertedResultExpression_dispred#ff AS R ON FIRST 1 OUTPUT R.<1>, r11.<1>
    58004      ~0%     {2} r13 = r6 \/ r12
                       return r13

Tuple counts after:

    (0s) Tuple counts for DataFlowUtil::definitionByReferenceNodeFromArgument#ff:
    385785  ~6%     {2} r1 = SCAN DataFlowUtil::DefinitionByReferenceNode#class#ff AS I OUTPUT I.<1>, I.<0>
    385785  ~0%     {3} r2 = JOIN r1 WITH Instruction::IndexedInstruction#ff AS R ON FIRST 1 OUTPUT r1.<0>, r1.<1>, R.<1>
    385785  ~1%     {3} r3 = JOIN r2 WITH Instruction::SideEffectInstruction::getPrimaryInstruction_dispred#3#ff AS R ON FIRST 1 OUTPUT R.<1>, r2.<2>, r2.<1>
    198736  ~4%     {2} r4 = JOIN r3 WITH Instruction::CallInstruction::getPositionalArgument#fff AS R ON FIRST 2 OUTPUT R.<2>, r3.<2>
    198736  ~0%     {2} r5 = JOIN r4 WITH Instruction::Instruction::getUnconvertedResultExpression_dispred#ff AS R ON FIRST 1 OUTPUT R.<1>, r4.<1>
    385785  ~1%     {3} r6 = SCAN DataFlowUtil::DefinitionByReferenceNode#class#ff AS I OUTPUT I.<1>, -1, I.<0>
    186891  ~1%     {2} r7 = JOIN r6 WITH Instruction::IndexedInstruction#ff AS R ON FIRST 2 OUTPUT r6.<0>, r6.<2>
    186891  ~2%     {2} r8 = JOIN r7 WITH Instruction::SideEffectInstruction::getPrimaryInstruction_dispred#3#ff AS R ON FIRST 1 OUTPUT R.<1>, r7.<1>
    183201  ~3%     {2} r9 = JOIN r8 WITH Instruction::CallInstruction::getThisArgumentOperand_dispred#ff AS R ON FIRST 1 OUTPUT R.<1>, r8.<1>
    183201  ~0%     {2} r10 = JOIN r9 WITH Operand::Operand::getDef_dispred#3#ff AS R ON FIRST 1 OUTPUT R.<1>, r9.<1>
    175449  ~8%     {2} r11 = JOIN r10 WITH Instruction::Instruction::getUnconvertedResultExpression_dispred#ff AS R ON FIRST 1 OUTPUT R.<1>, r10.<1>
    374185  ~3%     {2} r12 = r5 \/ r11
                    return r12
@jbj jbj added the C++ label Nov 9, 2020
@jbj jbj requested review from a team as code owners November 9, 2020 08:02
@github-actions github-actions bot added the C# label Nov 9, 2020
Copy link
Contributor

@MathiasVP MathiasVP left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@MathiasVP
Copy link
Contributor

Merging now to avoid having a bad main for too long.

@MathiasVP MathiasVP merged commit 25ba6ca into github:main Nov 9, 2020
@jbj
Copy link
Contributor Author

jbj commented Nov 9, 2020

I've started https://jenkins.internal.semmle.com/job/Changes/job/CPP-Differences/1545/ to check if we're back to baseline performance. This job compares the last good CPP-Changes reference (5f1c91f04/92494441a) with the latest internal/external main branches.

@@ -1585,6 +1586,7 @@ class CallInstruction extends Instruction {
/**
* Gets the argument at the specified index.
*/
pragma[noinline]
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it necessary to add the pragma[noinline] to this predicate? Is annotating the other predicate not enough to avoid the problem if this predicate is inlined?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel more confident claiming that the other predicate, getPositionalArgumentOperand, can do without pragma[noinline]. The slow version of definitionByReferenceNodeFromArgument has getPositionalArgumentOperand in its tuple counts, so that predicate seems to not have been inlined.

In any case, I don't think it was strictly necessary to put the pragma on both predicates. I did it anyway because I worried we'd otherwise have a similar performance problem the next time we try and do a slightly different thing with these predicates. Joining on parameter indexes is always risky, and I wanted to reduce that risk.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants