C++: Promote insecure removal of memset query #5256

MathiasVP · 2021-02-24T16:10:56Z

Fixes https://github.com/github/codeql-c-analysis-team/issues/236.

A couple of LGTM links are in order:

This run shows the results on the usual LGTM projects we consider. All these results look like true positives to me.
This run compares the new query with the results from @dbartol's old version of the query.
It seems like that query has a high false-positive rate, so I only used it to check whether we flag something that the old query didn't flag (i.e., we obviously missed something).
This run compares the new query with the experimental version of this query. The new results are neither a subset nor a superset of the old ones. For instance:
- We sadly lose some true-positives encryption results on bitcoin/bitcoin and redis/redis.
- Interestingly, we gain a true positive on alibaba/AliSQL, and lose a true positive on the very next line.

I think all of these can all be recovered by teaching variableAddressEscapesTree about AliasFunction's parameterNeverEscapes predicate. But I think the query is good enough for promotion as it is now.

While the issue says that we should:

[...] only focus on memsets that appear as the very last thing in a function body.

my initial strategy for doing this was to specify that there are no more mentions of the variable in the function body. This turned out to also catch a lot of good results where the memset isn't the last thing in the body, but merely the last mention of the variable. I think this strategy:

Catches more results that we care about, and
is also the more concise than other ways of specifying "happens as the last thing in the function".

CPP-differences: https://jenkins.internal.semmle.com/job/Changes/job/CPP-Differences/1755/

… from the other memset PRs.

geoffw0

Good qhelp and fantastic tests! I've yet to review the query itself...

cpp/change-notes/2021-02-24-memset-may-be-deleted.md

cpp/ql/src/Security/CWE/CWE-014/MemsetMayBeDeleted.qhelp

geoffw0 · 2021-02-25T10:08:32Z

cpp/ql/test/query-tests/Security/CWE/CWE-014/test.c

+
+// x86-64 gcc 9.2: deleted
+// x86-64 clang 9.0.0: deleted
+// x64 msvc v19.14 (WINE): deleted


Love these annotations!

cpp/ql/test/query-tests/Security/CWE/CWE-014/test.c

geoffw0 · 2021-02-25T10:25:01Z

cpp/ql/test/query-tests/Security/CWE/CWE-014/test.cpp

+void nobadFunc2_0_1(){
+	unsigned char buff1[PW_SIZE];
+	memset(buff1, '\0', sizeof(buff1));
+	memset(buff1, 0, PW_SIZE); // GOOD


What are we trying to test here? Theoretically either (but not both) could be deleted, right?

Yeah, I think this was my attempt at porting a test from the experimental query in a way that didn't overlap with the tests we already had through a combination of the draft query + the tests added by the experimental-query author. Indeed, one of them (but not both) could be deleted.

cpp/ql/test/query-tests/Security/CWE/CWE-014/test.cpp

geoffw0

The query looks like a good, conservative starting point appropriate for code scanning. It would be nice to revisit some of the simpler bad test cases we do not flag at some point, provided we can maintain the accuracy we have now.

geoffw0 · 2021-02-25T10:33:27Z

cpp/ql/src/Security/CWE/CWE-014/MemsetMayBeDeleted.ql

+    or
+    this.hasGlobalOrStdName("wmemset")
+    or
+    this.hasGlobalName(["bzero", "__builtin_memset"])


Stuff like this makes me wonder if the private models like MemsetFunction (models of clearly defined sets of functions that is) should in fact be public, or have public interfaces?

Indeed, I think we should have a public interface expressing something like: "A function that writes a value x to the first n entries of a given pointer".

This case is a bit special because there are functions like bzero_explicit which fits exactly this interface, but which we don't want to include in this query.

Related: https://github.com/github/codeql-c-analysis-team/issues/225

This case is a bit special because there are functions like bzero_explicit which fits exactly this interface, but which we don't want to include in this query.

I think that's a strong reason not to use shared models in this case.

cpp/ql/src/Security/CWE/CWE-014/MemsetMayBeDeleted.ql

…and add a testcase for this.

MathiasVP · 2021-02-25T12:44:03Z

CPP-differences: https://jenkins.internal.semmle.com/job/Changes/job/CPP-Differences/1755/

Performance looks good. The new results also look like true positives. No sensitive data as far as I can tell, though. It's all just zero'ing buffers after they're declared, but afterward are never actually used for anything.

geoffw0 · 2021-02-25T13:48:54Z

cpp-differences results have arrived and LGTM. They're all results we've seen on LGTM already I think.

I appreciate the effort you've put into gathering real world results and diffs by the way. They appear to be a mix of potential security issues, and clears of buffers that are simply unused. There's a difference in severity, but I'm happy that both types of results are reported. I didn't see any false positives.

jbj

First pass: I've looked at the QL file only.

cpp/ql/src/Security/CWE/CWE-014/MemsetMayBeDeleted.ql

jbj · 2021-02-25T13:59:51Z

cpp/ql/src/Security/CWE/CWE-014/MemsetMayBeDeleted.ql

+    or
+    this.hasGlobalOrStdName("wmemset")
+    or
+    this.hasGlobalName(["bzero", "__builtin_memset"])


This case is a bit special because there are functions like bzero_explicit which fits exactly this interface, but which we don't want to include in this query.

I think that's a strong reason not to use shared models in this case.

cpp/ql/src/Security/CWE/CWE-014/MemsetMayBeDeleted.ql

jbj · 2021-02-25T14:03:54Z

cpp/ql/src/Security/CWE/CWE-014/MemsetMayBeDeleted.ql

+  // There is no later use of `v`.
+  not v.getAnAccess() = call.getASuccessor*() and


I'm nervous about the performance of TC'ing the successor relation in the CFG. Using basic blocks would help here. But if the optimizer pushes enough context into the starting points of this TC, perhaps it's fine.

I was nervous about that too, but it looks like magic saved us:

m##ControlFlowGraph::ControlFlowNode::getASuccessor_dispred#bfPlus(unique int this) :- {3} r1 = SCAN Access::Access::getTarget_dispred#ff OUTPUT 0, In.0, In.1 {2} r2 = JOIN r1 WITH Call::Call::getArgument_dispred#fff_120#join_rhs ON FIRST 2 OUTPUT Lhs.2, Rhs.2 'this' {2} r3 = JOIN r2 WITH Variable::Variable::getUnspecifiedType_dispred#ff ON FIRST 1 OUTPUT Rhs.1, Lhs.1 'this' {1} r4 = JOIN r3 WITH Type::ArrayType#class#f ON FIRST 1 OUTPUT Lhs.1 'this' {2} r5 = SCAN Variable::Variable::getUnspecifiedType_dispred#ff OUTPUT In.1, In.0 {1} r6 = JOIN r5 WITH Struct::Struct#class#f ON FIRST 1 OUTPUT Lhs.1 {1} r7 = JOIN r6 WITH project#Access::Access::getTarget_dispred#ff ON FIRST 1 OUTPUT Lhs.0 {2} r8 = JOIN r7 WITH Expr::AddressOfExpr::getAddressable_dispred#ff_10#join_rhs ON FIRST 1 OUTPUT 0, Rhs.1 {1} r9 = JOIN r8 WITH Call::Call::getArgument_dispred#fff_120#join_rhs ON FIRST 2 OUTPUT Rhs.2 'this' {1} r10 = r4 UNION r9 return r10 EVALUATE RECURSIVE LAYER: rec #ControlFlowGraph::ControlFlowNode::getASuccessor_dispred#bfPlus(int this, int result) :- [[[ BASE CASE ]]] {2} r1 = JOIN m##ControlFlowGraph::ControlFlowNode::getASuccessor_dispred#bfPlus WITH ConstantExprs::Cached::successors_adapted#ff@staged_ext ON FIRST 1 OUTPUT Lhs.0 'this', Rhs.1 'result' return r1 [[[ SEMINAIVE VARIANT - STANDARD ORDER]]] {2} r2 = SCAN #ControlFlowGraph::ControlFlowNode::getASuccessor_dispred#bfPlus#prev_delta OUTPUT In.1, In.0 'this' {2} r3 = JOIN r2 WITH ConstantExprs::Cached::successors_adapted#ff@staged_ext ON FIRST 1 OUTPUT Lhs.1 'this', Rhs.1 'result' {2} r4 = r3 AND NOT #ControlFlowGraph::ControlFlowNode::getASuccessor_dispred#bfPlus#prev(Lhs.0 'this', Lhs.1 'result') return r4

I can rewrite this to use basic blocks if we don't want to rely on optimizations.

jbj · 2021-02-25T14:11:32Z

cpp/ql/src/Security/CWE/CWE-014/MemsetMayBeDeleted.ql

+  // `v` only escapes as the argument to `memset`.
+  forall(Expr escape | variableAddressEscapesTree(v.getAnAccess(), escape) |
+    call.getArgument(0) = escape.getUnconverted()
+  ) and


I think this requirement is too strong, and it's likely the reason why CPP-Differences only shows memset calls to variables that aren't used at all. Most buffers will be arguments to memcpy or printf or something like that, and their address will escape there.

A simple solution could be to assume that every call to functions other than memset will read from v if it escapes. This can be improved by using our models library.

I think this requirement is too strong, and it's likely the reason why CPP-Differences only shows memset calls to variables that aren't used at all. Most buffers will be arguments to memcpy or printf or something like that, and their address will escape there.

That's a good point. I realized that this was a very strict requirement, but I hadn't made the connection to this affecting the results on CPP-differences.

A simple solution could be to assume that every call to functions other than memset will read from v if it escapes. This can be improved by using our models library.

Yeah, I actually considered this approach. But I figured that it would be better to just teach variableAddressEscapesTree about the AliasFunction model after getting this query merged. But I see that there's not much value in the query without this additional model information.

Here's an LGTM run that compares the query in the PR with a version that uses the solution you proposed (at least I think it is) with support from the models library: https://lgtm.com/query/2490354227721723019/ It produces actual good (i.e., sensitive-data related memsets) results on 3 projects.

But I figured that it would be better to just teach variableAddressEscapesTree about the AliasFunction model after getting this query merged.

That eventual enhancement should probably go in a new variableAddressEscapes predicate (without the Tree) since some callers of the original variableAddressEscapesTree don't use it as an escape analysis but are concerned precisely with the question of whether the variable address ends up at the top of some expression tree. An example of such a caller is the one I proposed in #5256 (comment).

cpp/ql/src/Security/CWE/CWE-014/MemsetMayBeDeleted.ql

Co-authored-by: Jonas Jensen <jbj@github.com>

…he local variable being used in memset.

cpp/ql/src/Security/CWE/CWE-014/MemsetMayBeDeleted.ql

Co-authored-by: intrigus-lgtm <60750685+intrigus-lgtm@users.noreply.github.com>

…le for something. Otherwise it looks like a zero-initialization of a buffer, which the query now tries to exclude.

jbj

The query finds good results now and seems robust. I'd like to merge it after another CPP-Differences run.

cpp/change-notes/2021-02-24-memset-may-be-deleted.md

cpp/ql/src/Security/CWE/CWE-014/MemsetMayBeDeleted-bad.c

cpp/ql/src/Security/CWE/CWE-014/MemsetMayBeDeleted.qhelp

MathiasVP · 2021-02-26T08:47:50Z

The query finds good results now and seems robust. I'd like to merge it after another CPP-Differences run.

Should be ready within a few hours: https://jenkins.internal.semmle.com/job/Changes/job/CPP-Differences/1761/

Co-authored-by: Jonas Jensen <jbj@github.com>

jbj · 2021-02-26T09:17:55Z

@github/docs-content-codeql, please review the qhelp, name and description of this new query.

MathiasVP · 2021-02-26T09:53:19Z

Should be ready within a few hours: https://jenkins.internal.semmle.com/job/Changes/job/CPP-Differences/1761/

CPP-differences looks good:

No performance concerns.
The query now only flags two calls to memset (both in Wireshark), and they both look like true positives to me.

docs-bot · 2021-02-26T10:06:30Z

📚 Thanks for the docs ping! 🛎️ This was added to our docs first-responder project board. A team member will be along shortly to respond. To request changes to the docs you can also open a CodeQL docs issue.

geoffw0 · 2021-02-26T11:23:31Z

There's been a lot of discussion above, so for clarity: all of my comments above have been addressed and/or should not be considered blocking.

shati-patel

A few small editorial suggestions, otherwise LGTM from a docs perspective 📝

cpp/ql/src/Security/CWE/CWE-014/MemsetMayBeDeleted.qhelp

cpp/change-notes/2021-02-24-memset-may-be-deleted.md

Co-authored-by: Shati Patel <42641846+shati-patel@users.noreply.github.com>

shati-patel

Re-approved 👍🏽

MathiasVP added the C++ label Feb 24, 2021

MathiasVP requested a review from jbj February 24, 2021 16:10

MathiasVP requested a review from a team as a code owner February 24, 2021 16:10

github-actions bot added the documentation label Feb 24, 2021

MathiasVP force-pushed the promote-insecure-memset-query branch from 8604784 to cbeeed2 Compare February 24, 2021 16:43

MathiasVP added 3 commits February 24, 2021 18:01

C++: Promote memset query out of experimental.

c44fbaa

C++: Move tests out of experimental and merge with old existing tests…

ef8b734

… from the other memset PRs.

C++: Add change-note.

70a953b

MathiasVP force-pushed the promote-insecure-memset-query branch from cbeeed2 to 70a953b Compare February 24, 2021 17:02

geoffw0 reviewed Feb 25, 2021

View reviewed changes

C++: Fix test annotations. Also exclude static locals from the query …

d332093

…and add a testcase for this.

jbj reviewed Feb 25, 2021

View reviewed changes

MathiasVP and others added 2 commits February 25, 2021 15:48

Update cpp/ql/src/Security/CWE/CWE-014/MemsetMayBeDeleted.ql

3f26b29

Co-authored-by: Jonas Jensen <jbj@github.com>

C++: Respond to review comments. Relax the escaping requirements on t…

9e7c9d0

…he local variable being used in memset.

intrigus-lgtm reviewed Feb 25, 2021

View reviewed changes

cpp/ql/src/Security/CWE/CWE-014/MemsetMayBeDeleted.ql Outdated Show resolved Hide resolved

MathiasVP dismissed a stale review via 2777ca4 February 25, 2021 18:49

MathiasVP and others added 3 commits February 25, 2021 19:49

Update cpp/ql/src/Security/CWE/CWE-014/MemsetMayBeDeleted.ql

2777ca4

Co-authored-by: intrigus-lgtm <60750685+intrigus-lgtm@users.noreply.github.com>

C++: Exclude memsets that clear a variable that has no other uses.

faadcd9

C++: Make the tests more realistic by actually using the local variab…

72daf2e

…le for something. Otherwise it looks like a zero-initialization of a buffer, which the query now tries to exclude.

jbj reviewed Feb 26, 2021

View reviewed changes

cpp/change-notes/2021-02-24-memset-may-be-deleted.md Outdated Show resolved Hide resolved

cpp/ql/src/Security/CWE/CWE-014/MemsetMayBeDeleted-bad.c Outdated Show resolved Hide resolved

cpp/ql/src/Security/CWE/CWE-014/MemsetMayBeDeleted.qhelp Show resolved Hide resolved

MathiasVP and others added 2 commits February 26, 2021 09:48

Update cpp/change-notes/2021-02-24-memset-may-be-deleted.md

4e4ffbd

Co-authored-by: Jonas Jensen <jbj@github.com>

C++: Respond to review comments.

42d2a67

shati-patel previously approved these changes Feb 26, 2021

View reviewed changes

cpp/ql/src/Security/CWE/CWE-014/MemsetMayBeDeleted.qhelp Outdated Show resolved Hide resolved

cpp/change-notes/2021-02-24-memset-may-be-deleted.md Outdated Show resolved Hide resolved

MathiasVP dismissed shati-patel’s stale review via 0f72567 February 26, 2021 18:16

MathiasVP and others added 2 commits February 26, 2021 19:16

Update cpp/ql/src/Security/CWE/CWE-014/MemsetMayBeDeleted.qhelp

0f72567

Co-authored-by: Shati Patel <42641846+shati-patel@users.noreply.github.com>

Update cpp/change-notes/2021-02-24-memset-may-be-deleted.md

d4f7fab

Co-authored-by: Shati Patel <42641846+shati-patel@users.noreply.github.com>

shati-patel approved these changes Feb 26, 2021

View reviewed changes

jbj merged commit 208a374 into github:main Mar 1, 2021

smowton mentioned this pull request Mar 3, 2021

Java: Add backward dataflow edges through fluent function invocations. #5219

Merged

MathiasVP mentioned this pull request Apr 20, 2021

C++: Fix false positive in cpp/return-stack-allocated-memory #5724

Closed

		// There is no later use of `v`.
		not v.getAnAccess() = call.getASuccessor*() and

C++: Promote insecure removal of memset query #5256

C++: Promote insecure removal of memset query #5256

Uh oh!

Conversation

MathiasVP commented Feb 24, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

geoffw0 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

geoffw0 left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

MathiasVP Feb 25, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

MathiasVP commented Feb 25, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

geoffw0 commented Feb 25, 2021

Uh oh!

jbj left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jbj left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

MathiasVP commented Feb 26, 2021

Uh oh!

jbj commented Feb 26, 2021

Uh oh!

MathiasVP commented Feb 26, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

docs-bot commented Feb 26, 2021

Uh oh!

geoffw0 commented Feb 26, 2021

Uh oh!

MathiasVP commented Feb 24, 2021 •

edited

Loading

MathiasVP Feb 25, 2021 •

edited

Loading

MathiasVP commented Feb 25, 2021 •

edited

Loading

MathiasVP commented Feb 26, 2021 •

edited

Loading