Skip to content

Conversation

MathiasVP
Copy link
Contributor

The gist of this PR is super simple (it's the first commit) but it sent me on a wild ride of bugfixing. So now I invite you, my dear reviewer, to join me on this wild ride!

Please review this PR on a commit-by-commit basis.

@github-actions github-actions bot added the C++ label Feb 26, 2023
@MathiasVP MathiasVP added the no-change-note-required This PR does not need a change note label Feb 26, 2023
@MathiasVP MathiasVP force-pushed the no-taint-indirect-direct-conflation branch from 4ce148d to e9aa017 Compare February 27, 2023 00:01
Copy link
Contributor

@geoffw0 geoffw0 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In summary (I'm writing this mostly for my own benefit):

  • we no longer flow from op with *op and instr to *instr in IR taint tracking.
  • but we do in the default taint tracking layer instead, to preserve current query results.
  • we do better flow through smart pointers, using the existing PointerWrapper models.
  • arrays are now understood better by SSA, rather than being special cased in several places.
  • fixed an issue with partial definitions (I understand why you did it, but not how it works).
  • fixed some specific cases of indirections we didn't get right.
  • fixed some performance issues.

Very thorough changes. I appreciate the extra test cases and work on areas where there were result and performance regressions.

I have not reviewed the overall test changes (yet). --- now done.

Definitely want to see a DCA run on this!

…direct sources. This means we'll get a lot of duplication since there'll be flow from indirect source -> indirect sink and direct source -> direct sink (which both map to the same expressions). This commit changes the testing so that we only report a duplication when they're at different locations.
for a value of type `void*` (i.e., `p`, `*p` and `**p`) we need to decide
on a type for the `**p` value. We will do this in the next commit.
taint-tracking we need to make sure the affected sink definitions also
handle indirect flow.
```
Evaluated relational algebra for predicate _CppType#d1355c92::CppType::hasUnspecifiedType#2#dispred#fff_10#join_rhs_SsaInternals#50208335::DefO__#shared@ec353boa with tuple counts:
  459594    ~0%     {2} r1 = JOIN _IRVariable#e9bf30b2::IRVariable::getAst#0#dispred#ff_Parameter#ed81dd8f::Parameter#f#shared WITH SsaInternalsCommon#3c4fa02d::BaseIRVariable::getIRVariable#0#dispred#ff_10#join_rhs ON FIRST 1 OUTPUT Lhs.1 'arg1', Rhs.1
  461383    ~2%     {3} r2 = JOIN r1 WITH Variable#7a968d4e::Variable::getUnspecifiedType#0#dispred#ff ON FIRST 1 OUTPUT Lhs.0 'arg1', Lhs.1, Rhs.1
  477945    ~6%     {4} r3 = JOIN r2 WITH Variable#7a968d4e::Variable::getUnspecifiedType#0#dispred#ff ON FIRST 1 OUTPUT Lhs.2, Lhs.0 'arg1', Lhs.1, Rhs.1
  346338    ~2%     {4} r4 = JOIN r3 WITH SsaInternalsCommon#3c4fa02d::Indirection::getNumberOfIndirections#0#dispred#ff ON FIRST 1 OUTPUT Lhs.3, Lhs.1 'arg1', Lhs.2, Rhs.1 'arg2'
  178593374 ~0%     {4} r5 = JOIN r4 WITH CppType#d1355c92::CppType::hasUnspecifiedType#2#dispred#fff_10#join_rhs ON FIRST 1 OUTPUT Lhs.2, Lhs.1 'arg1', Lhs.3 'arg2', Rhs.1 'arg3'
  934806228 ~0%     {4} r6 = JOIN r5 WITH SsaInternals#50208335::DefOrUse::getSourceVariable#0#dispred#ff_10#join_rhs ON FIRST 1 OUTPUT Rhs.1 'arg0', Lhs.1 'arg1', Lhs.2 'arg2', Lhs.3 'arg3'
                    return r6

Tuple counts for _DataFlowPrivate#fbdd7bd7::InstructionNode0#class#ff_SsaInternals#50208335::Def#ff_SsaInternals#5020__#antijoin_rhs/4@305d42l5 after 25.6s:
  180185672 ~0%     {4} r1 = JOIN _CppType#d1355c92::CppType::hasUnspecifiedType#2#dispred#fff_10#join_rhs_SsaInternals#50208335::DefO__#shared WITH SsaInternals#50208335::Def#ff ON FIRST 1 OUTPUT Lhs.0 'arg3', Lhs.1 'arg0', Lhs.2 'arg1', Lhs.3 'arg2'
  180185672 ~0%     {5} r2 = JOIN r1 WITH SsaInternals#50208335::Def::getValue#0#dispred#ff ON FIRST 1 OUTPUT Rhs.1, Lhs.1 'arg0', Lhs.2 'arg1', Lhs.3 'arg2', Lhs.0 'arg3'
  180185672 ~0%     {5} r3 = JOIN r2 WITH DataFlowPrivate#fbdd7bd7::InstructionNode0#class#ff ON FIRST 1 OUTPUT Rhs.1, Lhs.1 'arg0', Lhs.2 'arg1', Lhs.3 'arg2', Lhs.4 'arg3'
  178459578 ~1%     {4} r4 = JOIN r3 WITH project#Instruction#577b6a83::InitializeParameterInstruction#ff ON FIRST 1 OUTPUT Lhs.1 'arg0', Lhs.2 'arg1', Lhs.3 'arg2', Lhs.4 'arg3'
                    return r4

Tuple counts for SsaInternals#7b362d2f::TFinalParameterUse#dom#ff/2@9ff4dbcg after 7.9s:
  180185672 ~1%         {4} r1 = JOIN _CppType#d1355c92::CppType::hasUnspecifiedType#2#dispred#fff_10#join_rhs_SsaInternals#50208335::DefO__#shared WITH SsaInternals#50208335::Def#ff ON FIRST 1 OUTPUT Lhs.1 'p', Lhs.2, Lhs.3, Lhs.0
  1726094   ~0%         {4} r2 = r1 AND NOT _DataFlowPrivate#fbdd7bd7::InstructionNode0#class#ff_SsaInternals#50208335::Def#ff_SsaInternals#5020__#antijoin_rhs(Lhs.0 'p', Lhs.1, Lhs.2, Lhs.3)
  1726094   ~54%        {4} r3 = SCAN r2 OUTPUT In.0 'p', In.1, In.2, 1
  1769636   ~54%        {5} r4 = JOIN r3 WITH PRIMITIVE range#bbf ON Lhs.3,Lhs.1

  1769636   ~45%        {4} r5 = SCAN r4 OUTPUT In.2, (In.4 'indirectionIndex' + 1), In.0 'p', In.4 'indirectionIndex'
  591253    ~11541%     {2} r6 = JOIN r5 WITH SsaInternalsCommon#3c4fa02d::isModifiableAtImpl#2#ff ON FIRST 2 OUTPUT Lhs.2 'p', Lhs.3 'indirectionIndex'

  1769636   ~52%        {4} r7 = SCAN r4 OUTPUT In.2, In.0 'p', In.4 'indirectionIndex', (In.4 'indirectionIndex' + 1)
  1724893   ~41%        {5} r8 = JOIN r7 WITH CppType#d1355c92::CppType::hasType#2#dispred#fff ON FIRST 1 OUTPUT Rhs.1, Lhs.1 'p', Lhs.0, Lhs.2 'indirectionIndex', Lhs.3
  1718843   ~46%        {5} r9 = JOIN r8 WITH Type#2e8eb3ef::Type::stripType#0#dispred#ff ON FIRST 1 OUTPUT Rhs.1, Lhs.1 'p', Lhs.2, Lhs.3 'indirectionIndex', Lhs.4
  8608      ~0%         {5} r10 = JOIN r9 WITH SmartPointer#917721ba::SmartPtr#f ON FIRST 1 OUTPUT Lhs.1 'p', Lhs.2, Lhs.3 'indirectionIndex', Lhs.4, Lhs.0
  8608      ~0%         {5} r11 = r10 AND NOT PointerWrapper#7cc81d2d::PointerWrapper::pointsToConst#0#dispred#f(Lhs.4)
  8608      ~4986%      {2} r12 = SCAN r11 OUTPUT In.0 'p', In.2 'indirectionIndex'

  599861    ~11711%     {2} r13 = r6 UNION r12
                        return r13
```
@MathiasVP MathiasVP force-pushed the no-taint-indirect-direct-conflation branch from e9aa017 to 31f3504 Compare February 27, 2023 14:57
@MathiasVP MathiasVP force-pushed the no-taint-indirect-direct-conflation branch from c69ae7f to 04b8432 Compare February 28, 2023 00:06
@MathiasVP MathiasVP marked this pull request as ready for review February 28, 2023 00:07
@MathiasVP MathiasVP requested a review from a team as a code owner February 28, 2023 00:07
@MathiasVP
Copy link
Contributor Author

DCA and tests look good. I've verified that the following query result changes are good:

  • cpp/cleartext-transmission on SAMATE
  • cpp/unbounded-write on vim/vim
  • cpp/tainted-format-string on vim/vim

Based on that sample I'm fairly confident in this PR now 🎉.

@jketema
Copy link
Contributor

jketema commented Feb 28, 2023

CI still shows a test failure for Likely Bugs/Format/NonConstantFormat/NonConstantFormat.qlref. I also see more failures on internal tests - I think - than I saw before, did you look at those?

@geoffw0
Copy link
Contributor

geoffw0 commented Feb 28, 2023

LGTM apart from the test failures. I also took a look at the DCA run - performance is fine - there are a lot of result changes! Verified:

  • cpp/cleartext-storage-file for git-linux
  • cpp/cleartext-storage-file for systemd-linux
  • cpp/non-constant-format for vim__vim
  • cpp/path-injection for openjdk-jdk
  • cpp/tainted-format-string-through-global for vim__vim (first two results only; it's fairly likely the various conditions in this extremely complex function make them safe in all actual cases, but the results are plausible).
  • cpp/uncontrolled-allocation-size for git-linux (a lost result; I assume taint was coming in through the argument refs into snapshot->refs, so snapshot->eof and snapshot->start are indeed clean and we're happy to lose this result?)

@MathiasVP
Copy link
Contributor Author

CI still shows a test failure for Likely Bugs/Format/NonConstantFormat/NonConstantFormat.qlref.

Oh, yes. I searched for FAILED(RESULT) to correct CI failures, but didn't search for FAILED(EXECUTION). I'll take a look 👍.

I also see more failures on internal tests - I think - than I saw before, did you look at those?

I see 19 FAILED(RESULT) failures on CI, which I think is consistent with other PRs?

@jketema
Copy link
Contributor

jketema commented Feb 28, 2023

I see 19 FAILED(RESULT) failures on CI, which I think is consistent with other PRs?

That seems the correct number indeed. I apparently mis-counted.

```
Tuple counts for SsaInternals#7b362d2f::getAPriorDefinition#1#ff/2@bfabfc7o after 11.4s:
  1000      ~4%     {2} r1 = SCAN Ssa#da392372::Make#SsaInternals#7b362d2f::SsaInput#::Definition::definesAt#3#dispred#ffff OUTPUT In.1, In.0
  474321529 ~0%     {4} r2 = JOIN r1 WITH SsaInternals#7b362d2f::DefOrUse::hasIndexInBlock#3#dispred#ffff_3012#join_rhs ON FIRST 1 OUTPUT Lhs.1, Rhs.2, Rhs.3, Rhs.1
  0         ~0%     {2} r3 = JOIN r2 WITH SsaInternals#7b362d2f::SsaCached::lastRefRedef#4#ffff ON FIRST 3 OUTPUT Lhs.3, Rhs.3
  0         ~0%     {2} r4 = JOIN r3 WITH SsaInternals#7b362d2f::nodeToDefOrUse#3#fff_10#join_rhs ON FIRST 1 OUTPUT Lhs.1, Rhs.1 'result'
  0         ~0%     {2} r5 = JOIN r4 WITH SsaInternals#7b362d2f::ssaDefinition#1#ff_10#join_rhs ON FIRST 1 OUTPUT Rhs.1 'defOrUse', Lhs.1 'result'
                    return r5
```
@MathiasVP
Copy link
Contributor Author

I fixed CI in 85c7116 (it was actually a bad join that made a stress test fail. Good job, tests! 🎉), which then revealed some new FPs (see 1e5b235), which I then fixed in d93d22b.

TLDR: This PR should be good to go now. I've started a new DCA to make sure my last change didn't do anything surprising.

Copy link
Contributor

@geoffw0 geoffw0 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM (but wait for DCA + @jketema 's 👍 as well before merging)

Copy link
Contributor

@jketema jketema left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM (if DCA is ok)

@MathiasVP
Copy link
Contributor Author

DCA still looks good 🎉

@MathiasVP MathiasVP merged commit 484f761 into github:mathiasvp/replace-ast-with-ir-use-usedataflow Feb 28, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C++ no-change-note-required This PR does not need a change note
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants