Skip to content

C++: Fix join-order in HttpStringLiteral charpred #7426

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

MathiasVP
Copy link
Contributor

The char pred of HttpStringLiteral did great on almost every project on LGTM. It turns out the join-order wasn't the best possible one, however. This led to a single outlier on LGTM (specifically, this project: https://github.com/e-ago/bitcracker).

This PR fixes the join order.

On main (on https://github.com/e-ago/bitcracker):

Tuple counts for UseOfHttp::HttpStringLiteral#class#f#antijoin_rhs/2@d95908b5 after 45m50s:
  65980      ~0%     {3} r1 = JOIN quick_eval#f#shared WITH Expr::Expr::getValue_dispred#ff ON FIRST 2 OUTPUT Lhs.0 'arg0', Lhs.0 'arg0', Lhs.1 'arg1'
                      
  131968     ~1%     {3} r2 = JOIN quick_eval#f#shared#1 WITH #Expr::Expr::getParent_dispredPlus#bf ON FIRST 1 OUTPUT Rhs.1, Lhs.0 'arg0', Lhs.1 'arg1'
                      
  197948     ~0%     {3} r3 = r1 UNION r2
  4295033318 ~0%     {3} r4 = JOIN r3 WITH TaintTrackingUtil::localExprTaint#bf_10#join_rhs ON FIRST 1 OUTPUT Rhs.1, Lhs.1 'arg0', Lhs.2 'arg1'
  4295033313 ~0%     {3} r5 = JOIN r4 WITH Literal::StringLiteral#f ON FIRST 1 OUTPUT Lhs.0, Lhs.1 'arg0', Lhs.2 'arg1'
  4295033313 ~0%     {4} r6 = JOIN r5 WITH Expr::Expr::getValue_dispred#ff ON FIRST 1 OUTPUT Lhs.1 'arg0', Lhs.2 'arg1', Rhs.1, "(?i)localhost(?:[:/?#].*)?|127\\.0\\.0\\.1(?:[:/?#].*)?|10(?:\\.[0-9]+){3}(?:[:/?#].*)?|172\\.16(?:\\.[0-9]+){2}(?:[:/?#].*)?|192.168(?:\\.[0-9]+){2}(?:[:/?#].*)?|\\[?0:0:0:0:0:0:0:1\\]?(?:[:/?#].*)?|\\[?::1\\]?(?:[:/?#].*)?"
  0          ~0%     {4} r7 = SELECT r6 ON In.2 matches_constant "(?i)localhost(?:[:/?#].*)?|127\.0\.0\.1(?:[:/?#].*)?|10(?:\.[0-9]+){3}(?:[:/?#].*)?|172\.16(?:\.[0-9]+){2}(?:[:/?#].*)?|192.168(?:\.[0-9]+){2}(?:[:/?#].*)?|\[?0:0:0:0:0:0:0:1\]?(?:[:/?#].*)?|\[?::1\]?(?:[:/?#].*)?"
  0          ~0%     {2} r8 = SCAN r7 OUTPUT In.0 'arg0', In.1 'arg1'
  return r8

On this PR:

Tuple counts for UseOfHttp::privateHostNameFlowsToExpr#f/1@8e3d0b6e after 122ms:
  65980 ~0%     {3} r1 = JOIN Literal::StringLiteral#f WITH Expr::Expr::getValue_dispred#ff ON FIRST 1 OUTPUT Lhs.0, Rhs.1, "(?i)localhost(?:[:/?#].*)?|127\\.0\\.0\\.1(?:[:/?#].*)?|10(?:\\.[0-9]+){3}(?:[:/?#].*)?|172\\.16(?:\\.[0-9]+){2}(?:[:/?#].*)?|192.168(?:\\.[0-9]+){2}(?:[:/?#].*)?|\\[?0:0:0:0:0:0:0:1\\]?(?:[:/?#].*)?|\\[?::1\\]?(?:[:/?#].*)?"
  0     ~0%     {3} r2 = SELECT r1 ON In.1 matches_constant "(?i)localhost(?:[:/?#].*)?|127\.0\.0\.1(?:[:/?#].*)?|10(?:\.[0-9]+){3}(?:[:/?#].*)?|172\.16(?:\.[0-9]+){2}(?:[:/?#].*)?|192.168(?:\.[0-9]+){2}(?:[:/?#].*)?|\[?0:0:0:0:0:0:0:1\]?(?:[:/?#].*)?|\[?::1\]?(?:[:/?#].*)?"
  0     ~0%     {1} r3 = SCAN r2 OUTPUT In.0
  0     ~0%     {1} r4 = JOIN r3 WITH DataFlowUtil::TExprNode#ff ON FIRST 1 OUTPUT Rhs.1
  0     ~0%     {1} r5 = JOIN r4 WITH TaintTrackingUtil::localTaint#bf ON FIRST 1 OUTPUT Rhs.1
  0     ~0%     {1} r6 = JOIN r5 WITH DataFlowUtil::TExprNode#ff_10#join_rhs ON FIRST 1 OUTPUT Rhs.1 'e'
  return r6
...
Inferred that UseOfHttp::HttpStringLiteral#class#f#antijoin_rhs/1@f2b613a7 is empty, due to UseOfHttp::privateHostNameFlowsToExpr#f/1@8e3d0b6e.

Well, that fixed it. But I'm not super happy with the validation of the performance yet 😂.

Here's another random project that shows that the join order is fine even when UseOfHttp::privateHostNameFlowsToExpr#f is non-empty (I picked php):

Tuple counts for UseOfHttp::privateHostNameFlowsToExpr#f/1@2b0c936l after 41ms:
  40478 ~0%     {3} r1 = JOIN Literal::StringLiteral#f WITH Expr::Expr::getValue_dispred#ff ON FIRST 1 OUTPUT Lhs.0, Rhs.1, "(?i)localhost(?:[:/?#].*)?|127\\.0\\.0\\.1(?:[:/?#].*)?|10(?:\\.[0-9]+){3}(?:[:/?#].*)?|172\\.16(?:\\.[0-9]+){2}(?:[:/?#].*)?|192.168(?:\\.[0-9]+){2}(?:[:/?#].*)?|\\[?0:0:0:0:0:0:0:1\\]?(?:[:/?#].*)?|\\[?::1\\]?(?:[:/?#].*)?"
  4     ~0%     {3} r2 = SELECT r1 ON In.1 matches_constant "(?i)localhost(?:[:/?#].*)?|127\.0\.0\.1(?:[:/?#].*)?|10(?:\.[0-9]+){3}(?:[:/?#].*)?|172\.16(?:\.[0-9]+){2}(?:[:/?#].*)?|192.168(?:\.[0-9]+){2}(?:[:/?#].*)?|\[?0:0:0:0:0:0:0:1\]?(?:[:/?#].*)?|\[?::1\]?(?:[:/?#].*)?"
  4     ~0%     {1} r3 = SCAN r2 OUTPUT In.0
  4     ~0%     {1} r4 = JOIN r3 WITH DataFlowUtil::TExprNode#ff ON FIRST 1 OUTPUT Rhs.1
  16    ~0%     {1} r5 = JOIN r4 WITH TaintTrackingUtil::localTaint#bf ON FIRST 1 OUTPUT Rhs.1
  16    ~6%     {1} r6 = JOIN r5 WITH DataFlowUtil::TExprNode#ff_10#join_rhs ON FIRST 1 OUTPUT Rhs.1 'e'
  return r6
...
Tuple counts for UseOfHttp::HttpStringLiteral#class#f#antijoin_rhs/1@237e621b after 11ms:
  40478  ~0%       {2} r1 = JOIN Literal::StringLiteral#f WITH Literal::StringLiteral#f ON FIRST 1 OUTPUT Rhs.0 'arg0', Lhs.0 'arg0'
  40478  ~0%       {2} r2 = JOIN r1 WITH Literal::StringLiteral#f ON FIRST 1 OUTPUT Lhs.0, Lhs.1 'arg0'
                      
  4      ~0%       {1} r3 = JOIN r2 WITH UseOfHttp::privateHostNameFlowsToExpr#f ON FIRST 1 OUTPUT Lhs.1 'arg0'
                      
  127947 ~2%       {2} r4 = JOIN r2 WITH #Expr::Expr::getParent_dispredPlus#bf ON FIRST 1 OUTPUT Rhs.1, Lhs.1 'arg0'
  4      ~100%     {1} r5 = JOIN r4 WITH UseOfHttp::privateHostNameFlowsToExpr#f ON FIRST 1 OUTPUT Lhs.1 'arg0'
                      
  8      ~100%     {1} r6 = r3 UNION r5
  return r6

@MathiasVP MathiasVP added the C++ label Dec 16, 2021
@MathiasVP MathiasVP requested a review from a team as a code owner December 16, 2021 17:54
@MathiasVP MathiasVP added the no-change-note-required This PR does not need a change note label Dec 16, 2021
@aschackmull aschackmull merged commit 3adc0b5 into github:main Dec 17, 2021
@geoffw0
Copy link
Contributor

geoffw0 commented Jan 4, 2022

Thanks for fixing this while I was away!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C++ no-change-note-required This PR does not need a change note
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants