-
Notifications
You must be signed in to change notification settings - Fork 1.8k
C++: Remove indirect -> direct taint-flow #12316
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
C++: Remove indirect -> direct taint-flow #12316
Conversation
4ce148d
to
e9aa017
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In summary (I'm writing this mostly for my own benefit):
- we no longer flow from
op
with*op
andinstr
to*instr
in IR taint tracking. - but we do in the default taint tracking layer instead, to preserve current query results.
- we do better flow through smart pointers, using the existing
PointerWrapper
models. - arrays are now understood better by SSA, rather than being special cased in several places.
- fixed an issue with partial definitions (I understand why you did it, but not how it works).
- fixed some specific cases of indirections we didn't get right.
- fixed some performance issues.
Very thorough changes. I appreciate the extra test cases and work on areas where there were result and performance regressions.
I have not reviewed the overall test changes (yet). --- now done.
Definitely want to see a DCA run on this!
cpp/ql/test/library-tests/dataflow/dataflow-tests/dataflow-ir-consistency.expected
Show resolved
Hide resolved
cpp/ql/lib/semmle/code/cpp/ir/dataflow/internal/SsaInternals.qll
Outdated
Show resolved
Hide resolved
…direct sources. This means we'll get a lot of duplication since there'll be flow from indirect source -> indirect sink and direct source -> direct sink (which both map to the same expressions). This commit changes the testing so that we only report a duplication when they're at different locations.
for a value of type `void*` (i.e., `p`, `*p` and `**p`) we need to decide on a type for the `**p` value. We will do this in the next commit.
an `UnknownType`.
taint-tracking we need to make sure the affected sink definitions also handle indirect flow.
``` Evaluated relational algebra for predicate _CppType#d1355c92::CppType::hasUnspecifiedType#2#dispred#fff_10#join_rhs_SsaInternals#50208335::DefO__#shared@ec353boa with tuple counts: 459594 ~0% {2} r1 = JOIN _IRVariable#e9bf30b2::IRVariable::getAst#0#dispred#ff_Parameter#ed81dd8f::Parameter#f#shared WITH SsaInternalsCommon#3c4fa02d::BaseIRVariable::getIRVariable#0#dispred#ff_10#join_rhs ON FIRST 1 OUTPUT Lhs.1 'arg1', Rhs.1 461383 ~2% {3} r2 = JOIN r1 WITH Variable#7a968d4e::Variable::getUnspecifiedType#0#dispred#ff ON FIRST 1 OUTPUT Lhs.0 'arg1', Lhs.1, Rhs.1 477945 ~6% {4} r3 = JOIN r2 WITH Variable#7a968d4e::Variable::getUnspecifiedType#0#dispred#ff ON FIRST 1 OUTPUT Lhs.2, Lhs.0 'arg1', Lhs.1, Rhs.1 346338 ~2% {4} r4 = JOIN r3 WITH SsaInternalsCommon#3c4fa02d::Indirection::getNumberOfIndirections#0#dispred#ff ON FIRST 1 OUTPUT Lhs.3, Lhs.1 'arg1', Lhs.2, Rhs.1 'arg2' 178593374 ~0% {4} r5 = JOIN r4 WITH CppType#d1355c92::CppType::hasUnspecifiedType#2#dispred#fff_10#join_rhs ON FIRST 1 OUTPUT Lhs.2, Lhs.1 'arg1', Lhs.3 'arg2', Rhs.1 'arg3' 934806228 ~0% {4} r6 = JOIN r5 WITH SsaInternals#50208335::DefOrUse::getSourceVariable#0#dispred#ff_10#join_rhs ON FIRST 1 OUTPUT Rhs.1 'arg0', Lhs.1 'arg1', Lhs.2 'arg2', Lhs.3 'arg3' return r6 Tuple counts for _DataFlowPrivate#fbdd7bd7::InstructionNode0#class#ff_SsaInternals#50208335::Def#ff_SsaInternals#5020__#antijoin_rhs/4@305d42l5 after 25.6s: 180185672 ~0% {4} r1 = JOIN _CppType#d1355c92::CppType::hasUnspecifiedType#2#dispred#fff_10#join_rhs_SsaInternals#50208335::DefO__#shared WITH SsaInternals#50208335::Def#ff ON FIRST 1 OUTPUT Lhs.0 'arg3', Lhs.1 'arg0', Lhs.2 'arg1', Lhs.3 'arg2' 180185672 ~0% {5} r2 = JOIN r1 WITH SsaInternals#50208335::Def::getValue#0#dispred#ff ON FIRST 1 OUTPUT Rhs.1, Lhs.1 'arg0', Lhs.2 'arg1', Lhs.3 'arg2', Lhs.0 'arg3' 180185672 ~0% {5} r3 = JOIN r2 WITH DataFlowPrivate#fbdd7bd7::InstructionNode0#class#ff ON FIRST 1 OUTPUT Rhs.1, Lhs.1 'arg0', Lhs.2 'arg1', Lhs.3 'arg2', Lhs.4 'arg3' 178459578 ~1% {4} r4 = JOIN r3 WITH project#Instruction#577b6a83::InitializeParameterInstruction#ff ON FIRST 1 OUTPUT Lhs.1 'arg0', Lhs.2 'arg1', Lhs.3 'arg2', Lhs.4 'arg3' return r4 Tuple counts for SsaInternals#7b362d2f::TFinalParameterUse#dom#ff/2@9ff4dbcg after 7.9s: 180185672 ~1% {4} r1 = JOIN _CppType#d1355c92::CppType::hasUnspecifiedType#2#dispred#fff_10#join_rhs_SsaInternals#50208335::DefO__#shared WITH SsaInternals#50208335::Def#ff ON FIRST 1 OUTPUT Lhs.1 'p', Lhs.2, Lhs.3, Lhs.0 1726094 ~0% {4} r2 = r1 AND NOT _DataFlowPrivate#fbdd7bd7::InstructionNode0#class#ff_SsaInternals#50208335::Def#ff_SsaInternals#5020__#antijoin_rhs(Lhs.0 'p', Lhs.1, Lhs.2, Lhs.3) 1726094 ~54% {4} r3 = SCAN r2 OUTPUT In.0 'p', In.1, In.2, 1 1769636 ~54% {5} r4 = JOIN r3 WITH PRIMITIVE range#bbf ON Lhs.3,Lhs.1 1769636 ~45% {4} r5 = SCAN r4 OUTPUT In.2, (In.4 'indirectionIndex' + 1), In.0 'p', In.4 'indirectionIndex' 591253 ~11541% {2} r6 = JOIN r5 WITH SsaInternalsCommon#3c4fa02d::isModifiableAtImpl#2#ff ON FIRST 2 OUTPUT Lhs.2 'p', Lhs.3 'indirectionIndex' 1769636 ~52% {4} r7 = SCAN r4 OUTPUT In.2, In.0 'p', In.4 'indirectionIndex', (In.4 'indirectionIndex' + 1) 1724893 ~41% {5} r8 = JOIN r7 WITH CppType#d1355c92::CppType::hasType#2#dispred#fff ON FIRST 1 OUTPUT Rhs.1, Lhs.1 'p', Lhs.0, Lhs.2 'indirectionIndex', Lhs.3 1718843 ~46% {5} r9 = JOIN r8 WITH Type#2e8eb3ef::Type::stripType#0#dispred#ff ON FIRST 1 OUTPUT Rhs.1, Lhs.1 'p', Lhs.2, Lhs.3 'indirectionIndex', Lhs.4 8608 ~0% {5} r10 = JOIN r9 WITH SmartPointer#917721ba::SmartPtr#f ON FIRST 1 OUTPUT Lhs.1 'p', Lhs.2, Lhs.3 'indirectionIndex', Lhs.4, Lhs.0 8608 ~0% {5} r11 = r10 AND NOT PointerWrapper#7cc81d2d::PointerWrapper::pointsToConst#0#dispred#f(Lhs.4) 8608 ~4986% {2} r12 = SCAN r11 OUTPUT In.0 'p', In.2 'indirectionIndex' 599861 ~11711% {2} r13 = r6 UNION r12 return r13 ```
e9aa017
to
31f3504
Compare
cpp/ql/test/query-tests/Security/CWE/CWE-311/semmle/tests/CleartextBufferWrite.expected
Show resolved
Hide resolved
…taint-indirect-direct-conflation
…taint-indirect-direct-conflation
c69ae7f
to
04b8432
Compare
DCA and tests look good. I've verified that the following query result changes are good:
Based on that sample I'm fairly confident in this PR now 🎉. |
CI still shows a test failure for |
LGTM apart from the test failures. I also took a look at the DCA run - performance is fine - there are a lot of result changes! Verified:
|
Oh, yes. I searched for
I see 19 |
That seems the correct number indeed. I apparently mis-counted. |
``` Tuple counts for SsaInternals#7b362d2f::getAPriorDefinition#1#ff/2@bfabfc7o after 11.4s: 1000 ~4% {2} r1 = SCAN Ssa#da392372::Make#SsaInternals#7b362d2f::SsaInput#::Definition::definesAt#3#dispred#ffff OUTPUT In.1, In.0 474321529 ~0% {4} r2 = JOIN r1 WITH SsaInternals#7b362d2f::DefOrUse::hasIndexInBlock#3#dispred#ffff_3012#join_rhs ON FIRST 1 OUTPUT Lhs.1, Rhs.2, Rhs.3, Rhs.1 0 ~0% {2} r3 = JOIN r2 WITH SsaInternals#7b362d2f::SsaCached::lastRefRedef#4#ffff ON FIRST 3 OUTPUT Lhs.3, Rhs.3 0 ~0% {2} r4 = JOIN r3 WITH SsaInternals#7b362d2f::nodeToDefOrUse#3#fff_10#join_rhs ON FIRST 1 OUTPUT Lhs.1, Rhs.1 'result' 0 ~0% {2} r5 = JOIN r4 WITH SsaInternals#7b362d2f::ssaDefinition#1#ff_10#join_rhs ON FIRST 1 OUTPUT Rhs.1 'defOrUse', Lhs.1 'result' return r5 ```
I fixed CI in 85c7116 (it was actually a bad join that made a stress test fail. Good job, tests! 🎉), which then revealed some new FPs (see 1e5b235), which I then fixed in d93d22b. TLDR: This PR should be good to go now. I've started a new DCA to make sure my last change didn't do anything surprising. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM (but wait for DCA + @jketema 's 👍 as well before merging)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM (if DCA is ok)
DCA still looks good 🎉 |
484f761
into
github:mathiasvp/replace-ast-with-ir-use-usedataflow
The gist of this PR is super simple (it's the first commit) but it sent me on a wild ride of bugfixing. So now I invite you, my dear reviewer, to join me on this wild ride!
Please review this PR on a commit-by-commit basis.