-
Notifications
You must be signed in to change notification settings - Fork 125
Incorrect integer conversion #275
Incorrect integer conversion #275
Conversation
898398f
to
a4c2986
Compare
6144629
to
0eac6ff
Compare
Thanks @intrigus-lgtm , lots of good points. I have addressed most of them with a new commit. I'm not sure what to do about the problem that 2^63 - 1 isn't exactly representable in a 32-bit float. |
qhelp preview https://jenkins.internal.semmle.com/job/Pull-Request-Check/job/QHelp-Preview/3098/ (though it doesn't work for me) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great overall, thanks for tackling this! I have a few comments here and there.
@@ -0,0 +1,142 @@ | |||
/** | |||
* @name Incorrect conversion between integer types | |||
* @description Converting the result of strconv.Atoi, strconv.ParseInt and strconv.ParseUint |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd suggest putting backticks around the names of the functions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
➕
Could you also add a comma after "strconv.ParseInt"? We try to consistently use Oxford commas.
sourceBitSize in [0, 16, 32, 64] and | ||
sinkBitSize in [0, 8, 16, 32] and | ||
not (sourceBitSize = 0 and sinkBitSize = 0) and | ||
exists(int source, int sink | | ||
(if sourceBitSize = 0 then source = 64 else source = sourceBitSize) and | ||
if sinkBitSize = 0 then sink = 32 else sink = sinkBitSize | ||
| | ||
source > sink | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a little hard to follow. Would it perhaps make sense to split out the treatment of 0
into separate disjuncts?
sourceBitSize in [0, 16, 32, 64] and | |
sinkBitSize in [0, 8, 16, 32] and | |
not (sourceBitSize = 0 and sinkBitSize = 0) and | |
exists(int source, int sink | | |
(if sourceBitSize = 0 then source = 64 else source = sourceBitSize) and | |
if sinkBitSize = 0 then sink = 32 else sink = sinkBitSize | |
| | |
source > sink | |
) | |
sourceBitSize in [16, 32, 64] and | |
sinkBitSize in [8, 16, 32] and | |
sourceBitSize > sinkBitSize | |
or | |
sourceBitSize = 0 and | |
sinkBitSize in [8, 16, 32] | |
or | |
sourceBitSize = 64 and | |
sinkBitSize = 0 |
sourceIsSigned in [true, false] and | ||
isIncorrectIntegerConversion(sourceBitSize, sinkBitSize) and | ||
this = | ||
sourceBitSize.toString() + sourceIsSigned.toString() + sinkBitSize.toString() + |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you put "ConversionWithoutBoundsCheckConfig"
first, you can get rid of the toString()
s, which reads slightly nicer, I think.
string getParserName() { result = self.getParserName() } | ||
|
||
/** Gets a string describing the size of the integer parsed. */ | ||
string getBitSizeString() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I'd prefer this to be called getBitSizeDescription
or even describeBitSize
, just to emphasise that this isn't simply the toString()
of getTargetBitSize()
.
ParseIntCall() { exists(StrConv::ParseInt parseInt | this = parseInt.getACall()) } | ||
|
||
override int getTargetBitSize() { | ||
if exists(StrConv::IntSize intSize | this.getArgument(2).(DataFlow::ReadNode).reads(intSize)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if exists(StrConv::IntSize intSize | this.getArgument(2).(DataFlow::ReadNode).reads(intSize)) | |
if this.getArgument(2) = any(IntSize is).getARead() |
} | ||
|
||
/** A call to `strconv.Atoi`. */ | ||
class AtoiCall extends DataFlow::CallNode, ParserCall::Range { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
class AtoiCall extends DataFlow::CallNode, ParserCall::Range { | |
class AtoiCall extends ParserCall::Range { |
(ParserCall::Range
already extends DataFlow::CallNode
)
|
||
override predicate targetIsSigned() { any() } | ||
|
||
override string getParserName() { result = "strconv.Atoi" } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Couldn't you just use getTarget().getQualifiedName()
? Then you wouldn't even need a separate predicate for it.
} | ||
|
||
/** Provides a class for modeling calls to number-parsing functions. */ | ||
module ParserCall { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder whether it wouldn't be neater to describe this at the level of functions instead of function calls. So you'd have
module NumberParser {
class Range extends Function {
abstract FunctionInput getTargetBitSize();
abstract predicate isTargetSigned();
}
}
The various functions from StrConv
above then extend this Range
class. You'd have to move the logic for checking whether a bit size is architecture-dependent into the query, but that seems to be the right place for it anyway.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That does seem neater. How would getTargetBitSize()
work for Atoi
, which doesn't have a parameter corresponding to the bit size? Implement it as none()
and then treat that case specially when calculating the bit size? Or return an algebraic data type which has branches for int
and FunctionInput
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd say implement as none()
and have clients do what's appropriate for their context.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
qhelp preview https://jenkins.internal.semmle.com/job/Pull-Request-Check/job/QHelp-Preview/3098/ (though it doesn't work for me)
Also doesn't work for me for some reason, but the query help looks good (from looking at the source) 😊
@@ -0,0 +1,142 @@ | |||
/** | |||
* @name Incorrect conversion between integer types | |||
* @description Converting the result of strconv.Atoi, strconv.ParseInt and strconv.ParseUint |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
➕
Could you also add a comma after "strconv.ParseInt"? We try to consistently use Oxford commas.
Adds classes for some integer-parsing functions and a constant from strconv, plus a class for calls to integer-parsing functions.
c11ab56
to
89eae10
Compare
|
||
/** | ||
* Gets the `FunctionInput` containing the maximum bit size of the | ||
* return value, if this makes sense, where 0 represents the bit |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the "where 0 represents" bit intentional? I can't immediately see how that would apply in this case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mean if the value of the FunctionInput
is 0 then that means the bit size of int
and uint
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggestions only, looks good
int sinkBitSize; | ||
|
||
ConversionWithoutBoundsCheckConfig() { | ||
sourceIsSigned in [true, false] and |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No kidding ;)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Apparently the compiler considers boolean
to be an unbounded type for historical reasons...
c.getTarget() = ip and source = c.getResult(0) | ||
| | ||
( | ||
if ip.getResultType(0) instanceof SignedIntegerType |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess sourceIsSigned = (ip.getResultType(0) instanceof SignedIntegerType)
doesn't work?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sadly not. "Expected an expression but found a term instead".
) and | ||
( | ||
bitSize = ip.getTargetBitSize() or | ||
if exists(StrConv::IntSize intSize | ip.getTargetBitSizeInput().getNode(c).(DataFlow::ReadNode).reads(intSize)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This if
might be worth a brief explanatory comment
override predicate isSink(DataFlow::Node sink) { isSink(sink, sinkBitSize) } | ||
|
||
override predicate isSanitizerGuard(DataFlow::BarrierGuard guard) { | ||
exists(int bitSize | if sinkBitSize != 0 then bitSize = sinkBitSize else bitSize = 32 | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comment on why we assume 32 for arch-specific?
* return value, if this makes sense. Note that if the value of the | ||
* input is 0 then it means the bit size of `int` and `uint`. | ||
*/ | ||
FunctionInput getTargetBitSizeInput() { none() } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might be overkill, but perhaps get-result should be a member here, rather than assuming in the query that getResult(0) is always the place to find the output?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's overkill. We can always change in future it if we want to add an integer parsing function that returns the output in a different place.
panic(err) | ||
} | ||
_ = byte(parsed) // OK | ||
_ = byte(parsed >> 8) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Possible false-positive to consider: masking before/after the shift, e.g. parsed & 0xff00 >> 8
? I don't know if these are likely in reality, perhaps worth a leaf through lgtm
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. 17 repos on lgtm have the pattern byte(parsed & 0xff00 >> 8)
and 83 have the pattern byte(parsed >> 8 & 0xff)
(though not all necessarily relate to this query). I've made the query catch the former and added tests for it - I wouldn't expect it to impact performance noticeably. We already caught the latter but I've added tests for it too.
af75756
to
30f1762
Compare
da08b83
to
c7a8730
Compare
@@ -76,6 +85,8 @@ class ConversionWithoutBoundsCheckConfig extends TaintTracking::Configuration { | |||
( | |||
bitSize = ip.getTargetBitSize() | |||
or | |||
// If we are reading a variable, check if it is | |||
// `strconv.IntSize`, and use 0 if it is. | |||
if | |||
exists(StrConv::IntSize intSize | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This if
condition is a little too complex for my taste. As you know, the compiler transforms if A then B else C
into (A and B) or (not A and C)
, and if A
is complex then not A
can have unexpectedly bad performance.
In general, I would suggest pulling out parts of the condition into a surrounding exists
wherever possible. In this specific case, the following rewrite may make sense:
exists(DataFlow::Node rawBitSize | rawBitSize = ip.getTargetBitSizeInput().getNode(c) |
if rawBitSize = any(StrConv::IntSize intSize).getARead() then
bitSize = 0
else
bitSize = rawBitSize.getIntValue()
)
You might also consider pulling out the any(StrConv::IntSize intSize).getARead()
bit into a separate predicate to make the intention even clearer.
I've pushed a commit dealing with build constraints. We now run the tests in this folder again with |
cd3ce65
to
f4d9fae
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice! Thanks for tackling that final piece of the puzzle. As usual I have a few nitpicks. I know that you've been hammering away at this for a while, but it seems to me like there are a few generally useful bits of functionality in here that are worth pulling out.
Also, we'll probably need a performance evaluation of this after all, just to make sure the reasoning about build constraints is handled well.
if file.hasConstrainedIntBitSize(32) | ||
then result = 32 | ||
else | ||
if file.hasConstrainedIntBitSize(64) | ||
then result = 64 | ||
else result = 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if file.hasConstrainedIntBitSize(32) | |
then result = 32 | |
else | |
if file.hasConstrainedIntBitSize(64) | |
then result = 64 | |
else result = 0 | |
file.hasConstrainedIntBitSize(result) | |
or | |
not file.hasConstrainedIntBitSize(_) and | |
result = 0 |
ql/src/semmle/go/Files.qll
Outdated
* Holds if this file contains build constraints that ensure that it | ||
* is only built on architectures of bit size `bitSize`. | ||
*/ | ||
predicate hasConstrainedIntBitSize(int bitSize) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about simplifying the name slightly to constrainsBitSize
and similarly explicitlyConstrainsBitSize
and implicitlyConstrainsBitSize
below?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(In particular I don't think the "Architecture" part included in the name of those two other predicates adds much.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alternatively, perhaps expose a more general predicate that determines whether a file is only built on a particular architecture, and then derive the bit-size information from that.
We might even want
class Architecture extends string {
int bitSize;
Architecture() {
this = "386" and bitSize = 32 or
...
}
int getBitSize() { result = bitSize }
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the idea of the architecture class. Which file do you think it should go in? Files.qll
? Its own file?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can't immediately think of a good place to put it, so I think its own file is fine.
ql/src/semmle/go/Files.qll
Outdated
|
||
/** | ||
* Holds if this file contains build constraints that ensure that it | ||
* is only built on architectures of bit size `bitSize`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it might be worth explicitly stating that bitSize
is either 32 or 64 here.
ql/src/semmle/go/Files.qll
Outdated
exists(BuildConstraintComment bcc, string bc | | ||
this = bcc.getFile() and bc = bcc.getText().splitAt("+build ", 1) | ||
| | ||
forex(string disjunct | disjunct = bc.splitAt(" ") | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the logic for pulling out individual build constraints deserves to be moved into BuildConstraintComment
, it looks independently useful.
772da47
to
79b5cc6
Compare
Note that build constraints can be explicit (comments at the top of the file) or implicit (part of the file name)
ql/src/semmle/go/Files.qll
Outdated
*/ | ||
predicate explicitlyConstrainsIntBitSize(int bitSize) { | ||
exists(BuildConstraintComment bcc, string bc | | ||
this = bcc.getFile() and bc = bcc.getText().splitAt("+build ", 1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think you need bc
anymore.
ql/src/semmle/go/Files.qll
Outdated
this | ||
.getStem() | ||
.regexpMatch(".*_" + any(Architecture arch | arch.getBitSize() = bitSize) + "(_test)?") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it might be slightly more readable to pull arch
out into an exists
this | |
.getStem() | |
.regexpMatch(".*_" + any(Architecture arch | arch.getBitSize() = bitSize) + "(_test)?") | |
exists(Architecture arch | arch.getBitSize() = bitsize | | |
this.getStem().regexpMatch("(?i).*_\\Q" + arch + "\\E(_test)?") | |
) |
I've added (?i)
to do a case-insensitive match (I'm guessing that's required for non-case preserving file systems), and \\Q
and \\E
to quote any regex metacharacters in arch
(I don't think there can be any at the moment, but better safe than sorry).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some final minor nitpicks, but overall this looks very good now! I'll do a quick dist-compare as a sanity check.
DataFlow::PathNode source, DataFlow::PathNode sink, ConversionWithoutBoundsCheckConfig cfg, | ||
DataFlow::CallNode call | ||
where cfg.hasFlowPath(source, sink) and call.getResult(0) = source.getNode() | ||
select source.getNode(), source, sink, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Two suggestions about the select
:
- Since the message mentions the conversion, we should probably select
sink.getNode()
rather thansource.getNode()
. - To provide a reference to
source
, we can turn the relevant part of the alert message into a link:
" from $@ to a lower (...)", source, call.getTarget().getQualifiedName()
Benchmarking suggests that performance is acceptable for a data-flow query. (It's a little on the slow side on large databases, but the evaluation log doesn't show anything that looks like an easy win.) |
79b5cc6
to
e2eb932
Compare
e2eb932
to
951d597
Compare
Promoting this query (formerly called "Incorrect numeric conversion" because it also dealt with floats) from experimental status. There are three things I am aware I still need to do:
Run dist-compare on lgtm to check performance. It takes a long time on cockroachdb because there are a lot of sources and sinks.(Didn't do - decided to do 4 instead.)Add tests for the library code that has been addedDeal with build constraints specifying 32-bit or 64-bit architectures, which could lead to false positives. Note that file names ending in the name of an architecture act as implicit build constraints as well.Run on lgtm and check results.(Actually decided to do 1 after all.)Note that we only look for upper bounds checks as sanitizers. It isn't possible to track upper and lower bounds checks with the taint tracking framework. This means we may miss some results, but it is probably quite rare to remember to check the upper bound and not the lower bound. (And with an unsigned integer you don't need a lower bound check.)