-
Notifications
You must be signed in to change notification settings - Fork 1.8k
C++: Add some additional uncontrolled format string tests #11958
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
These duplicate the `i9` and `i91` tests slightly earlier in the same file, but use an explicit `if` instead of the ternary operator.
{ | ||
char b[64]; | ||
char *bp = &b[0]; | ||
char *t; | ||
if (0) { | ||
t = 0; | ||
} else { | ||
t = bp; | ||
} | ||
memcpy(t, argv[1] + 1, 1); | ||
printf(bp); | ||
printWrapper(bp); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, I think the reason we don't catch these is because dataflow doesn't do a good job at alias analysis: We're never really good at catching flows like:
// establish alias between p and q
// p = source();
// sink(q);
since dataflow only considers paths starting at source
and onwards. So that's probably why we don't catch the flow in this example.
With that said, these are some excellent testcases!
It would be relatively simple to get the simple cases of this: Since the IR does a proper must-alias analysis we could incorporate this information and have an SSA definition of p
be an SSA definition of q
when the IR can deduce that p
and q
alias. This, however, won't get us very far due to the shortcomings of the IR's alias analysis for dataflow (i.e., we'd really like to have a may-alias analysis for dataflow, but the IR implements a must-alias analysis).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They do work with def-use dataflow, so that's a bit "funny" to say the least.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I think that's because the def-use dataflow does use the IR's alias analysis in some cases 🤔.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did not know that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(Let me double check this to make sure I'm not lying.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, there is a couple of cases of reliance on the IR alias analysis:
- In dataflow, when we're finding the input for a model that specifies indirect flow into the model: https://github.com/github/codeql/blob/main/cpp/ql/lib/semmle/code/cpp/ir/dataflow/internal/DataFlowUtil.qll#L978 (and the same happens for taint flow).
- At function call boundaries we conflate a pointer and its indirection: https://github.com/github/codeql/blob/main/cpp/ql/lib/semmle/code/cpp/ir/dataflow/internal/TaintTrackingUtil.qll#L98
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we have something similar in the case of use-use dataflow?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would certainly be good to make use of this information (hopefully in a more structured way than how we do it on main
currently). https://github.com/github/codeql-c-team/issues/714 describes a(n old) version of this problem, but it should probably be updated to reflect the state of the use-use flow branch.
Merging this. Please don't let this stop anyone from continuing the above discussion. |
These duplicate the
i9
andi91
tests slightly earlier in the same file, but use an explicitif
instead of the ternary operator.These demonstrate that the missing results for the
i9
andi91
tests on the use-use dataflow feature branch are not due to the use of the ternary operator. As these tests might be useful in general, this PR targetsmain
.