Skip to content
This repository was archived by the owner on Jan 5, 2023. It is now read-only.

Conversation

owen-mc
Copy link
Contributor

@owen-mc owen-mc commented Aug 5, 2020

Promoting this query (formerly called "Incorrect numeric conversion" because it also dealt with floats) from experimental status. There are three things I am aware I still need to do:

  1. Run dist-compare on lgtm to check performance. It takes a long time on cockroachdb because there are a lot of sources and sinks. (Didn't do - decided to do 4 instead.)
  2. Add tests for the library code that has been added
  3. Deal with build constraints specifying 32-bit or 64-bit architectures, which could lead to false positives. Note that file names ending in the name of an architecture act as implicit build constraints as well.
  4. Run on lgtm and check results. (Actually decided to do 1 after all.)

Note that we only look for upper bounds checks as sanitizers. It isn't possible to track upper and lower bounds checks with the taint tracking framework. This means we may miss some results, but it is probably quite rare to remember to check the upper bound and not the lower bound. (And with an unsigned integer you don't need a lower bound check.)

@owen-mc owen-mc requested a review from a team August 5, 2020 13:36
@owen-mc owen-mc force-pushed the incorrect-integer-conversion branch 2 times, most recently from 898398f to a4c2986 Compare August 5, 2020 14:52
@owen-mc owen-mc force-pushed the incorrect-integer-conversion branch from 6144629 to 0eac6ff Compare August 5, 2020 15:50
@owen-mc
Copy link
Contributor Author

owen-mc commented Aug 5, 2020

Thanks @intrigus-lgtm , lots of good points. I have addressed most of them with a new commit. I'm not sure what to do about the problem that 2^63 - 1 isn't exactly representable in a 32-bit float.

@owen-mc
Copy link
Contributor Author

owen-mc commented Aug 5, 2020

qhelp preview https://jenkins.internal.semmle.com/job/Pull-Request-Check/job/QHelp-Preview/3098/ (though it doesn't work for me)

@owen-mc owen-mc requested a review from shati-patel August 6, 2020 10:21
Copy link
Contributor

@max-schaefer max-schaefer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great overall, thanks for tackling this! I have a few comments here and there.

@@ -0,0 +1,142 @@
/**
* @name Incorrect conversion between integer types
* @description Converting the result of strconv.Atoi, strconv.ParseInt and strconv.ParseUint
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd suggest putting backticks around the names of the functions.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you also add a comma after "strconv.ParseInt"? We try to consistently use Oxford commas.

Comment on lines 36 to 44
sourceBitSize in [0, 16, 32, 64] and
sinkBitSize in [0, 8, 16, 32] and
not (sourceBitSize = 0 and sinkBitSize = 0) and
exists(int source, int sink |
(if sourceBitSize = 0 then source = 64 else source = sourceBitSize) and
if sinkBitSize = 0 then sink = 32 else sink = sinkBitSize
|
source > sink
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a little hard to follow. Would it perhaps make sense to split out the treatment of 0 into separate disjuncts?

Suggested change
sourceBitSize in [0, 16, 32, 64] and
sinkBitSize in [0, 8, 16, 32] and
not (sourceBitSize = 0 and sinkBitSize = 0) and
exists(int source, int sink |
(if sourceBitSize = 0 then source = 64 else source = sourceBitSize) and
if sinkBitSize = 0 then sink = 32 else sink = sinkBitSize
|
source > sink
)
sourceBitSize in [16, 32, 64] and
sinkBitSize in [8, 16, 32] and
sourceBitSize > sinkBitSize
or
sourceBitSize = 0 and
sinkBitSize in [8, 16, 32]
or
sourceBitSize = 64 and
sinkBitSize = 0

sourceIsSigned in [true, false] and
isIncorrectIntegerConversion(sourceBitSize, sinkBitSize) and
this =
sourceBitSize.toString() + sourceIsSigned.toString() + sinkBitSize.toString() +
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you put "ConversionWithoutBoundsCheckConfig" first, you can get rid of the toString()s, which reads slightly nicer, I think.

string getParserName() { result = self.getParserName() }

/** Gets a string describing the size of the integer parsed. */
string getBitSizeString() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I'd prefer this to be called getBitSizeDescription or even describeBitSize, just to emphasise that this isn't simply the toString() of getTargetBitSize().

ParseIntCall() { exists(StrConv::ParseInt parseInt | this = parseInt.getACall()) }

override int getTargetBitSize() {
if exists(StrConv::IntSize intSize | this.getArgument(2).(DataFlow::ReadNode).reads(intSize))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if exists(StrConv::IntSize intSize | this.getArgument(2).(DataFlow::ReadNode).reads(intSize))
if this.getArgument(2) = any(IntSize is).getARead()

}

/** A call to `strconv.Atoi`. */
class AtoiCall extends DataFlow::CallNode, ParserCall::Range {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
class AtoiCall extends DataFlow::CallNode, ParserCall::Range {
class AtoiCall extends ParserCall::Range {

(ParserCall::Range already extends DataFlow::CallNode)


override predicate targetIsSigned() { any() }

override string getParserName() { result = "strconv.Atoi" }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couldn't you just use getTarget().getQualifiedName()? Then you wouldn't even need a separate predicate for it.

}

/** Provides a class for modeling calls to number-parsing functions. */
module ParserCall {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder whether it wouldn't be neater to describe this at the level of functions instead of function calls. So you'd have

module NumberParser {
  class Range extends Function {
    abstract FunctionInput getTargetBitSize();
    abstract predicate isTargetSigned();
  }
}

The various functions from StrConv above then extend this Range class. You'd have to move the logic for checking whether a bit size is architecture-dependent into the query, but that seems to be the right place for it anyway.

Copy link
Contributor Author

@owen-mc owen-mc Aug 7, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That does seem neater. How would getTargetBitSize() work for Atoi, which doesn't have a parameter corresponding to the bit size? Implement it as none() and then treat that case specially when calculating the bit size? Or return an algebraic data type which has branches for int and FunctionInput?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd say implement as none() and have clients do what's appropriate for their context.

Copy link
Contributor

@shati-patel shati-patel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

qhelp preview https://jenkins.internal.semmle.com/job/Pull-Request-Check/job/QHelp-Preview/3098/ (though it doesn't work for me)

Also doesn't work for me for some reason, but the query help looks good (from looking at the source) 😊

@@ -0,0 +1,142 @@
/**
* @name Incorrect conversion between integer types
* @description Converting the result of strconv.Atoi, strconv.ParseInt and strconv.ParseUint
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you also add a comma after "strconv.ParseInt"? We try to consistently use Oxford commas.

@owen-mc owen-mc force-pushed the incorrect-integer-conversion branch from c11ab56 to 89eae10 Compare August 10, 2020 10:07

/**
* Gets the `FunctionInput` containing the maximum bit size of the
* return value, if this makes sense, where 0 represents the bit
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the "where 0 represents" bit intentional? I can't immediately see how that would apply in this case.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean if the value of the FunctionInput is 0 then that means the bit size of int and uint.

Copy link
Contributor

@smowton smowton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestions only, looks good

int sinkBitSize;

ConversionWithoutBoundsCheckConfig() {
sourceIsSigned in [true, false] and
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No kidding ;)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apparently the compiler considers boolean to be an unbounded type for historical reasons...

c.getTarget() = ip and source = c.getResult(0)
|
(
if ip.getResultType(0) instanceof SignedIntegerType
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess sourceIsSigned = (ip.getResultType(0) instanceof SignedIntegerType) doesn't work?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sadly not. "Expected an expression but found a term instead".

) and
(
bitSize = ip.getTargetBitSize() or
if exists(StrConv::IntSize intSize | ip.getTargetBitSizeInput().getNode(c).(DataFlow::ReadNode).reads(intSize))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This if might be worth a brief explanatory comment

override predicate isSink(DataFlow::Node sink) { isSink(sink, sinkBitSize) }

override predicate isSanitizerGuard(DataFlow::BarrierGuard guard) {
exists(int bitSize | if sinkBitSize != 0 then bitSize = sinkBitSize else bitSize = 32 |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment on why we assume 32 for arch-specific?

* return value, if this makes sense. Note that if the value of the
* input is 0 then it means the bit size of `int` and `uint`.
*/
FunctionInput getTargetBitSizeInput() { none() }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be overkill, but perhaps get-result should be a member here, rather than assuming in the query that getResult(0) is always the place to find the output?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's overkill. We can always change in future it if we want to add an integer parsing function that returns the output in a different place.

panic(err)
}
_ = byte(parsed) // OK
_ = byte(parsed >> 8)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Possible false-positive to consider: masking before/after the shift, e.g. parsed & 0xff00 >> 8? I don't know if these are likely in reality, perhaps worth a leaf through lgtm

Copy link
Contributor Author

@owen-mc owen-mc Aug 11, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. 17 repos on lgtm have the pattern byte(parsed & 0xff00 >> 8) and 83 have the pattern byte(parsed >> 8 & 0xff) (though not all necessarily relate to this query). I've made the query catch the former and added tests for it - I wouldn't expect it to impact performance noticeably. We already caught the latter but I've added tests for it too.

@owen-mc owen-mc force-pushed the incorrect-integer-conversion branch from af75756 to 30f1762 Compare August 10, 2020 14:21
@max-schaefer max-schaefer changed the base branch from master to main August 10, 2020 15:57
@owen-mc owen-mc force-pushed the incorrect-integer-conversion branch from da08b83 to c7a8730 Compare August 11, 2020 06:25
@@ -76,6 +85,8 @@ class ConversionWithoutBoundsCheckConfig extends TaintTracking::Configuration {
(
bitSize = ip.getTargetBitSize()
or
// If we are reading a variable, check if it is
// `strconv.IntSize`, and use 0 if it is.
if
exists(StrConv::IntSize intSize |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This if condition is a little too complex for my taste. As you know, the compiler transforms if A then B else C into (A and B) or (not A and C), and if A is complex then not A can have unexpectedly bad performance.

In general, I would suggest pulling out parts of the condition into a surrounding exists wherever possible. In this specific case, the following rewrite may make sense:

exists(DataFlow::Node rawBitSize | rawBitSize = ip.getTargetBitSizeInput().getNode(c) |
  if rawBitSize = any(StrConv::IntSize intSize).getARead() then
    bitSize = 0
  else
    bitSize = rawBitSize.getIntValue()
)

You might also consider pulling out the any(StrConv::IntSize intSize).getARead() bit into a separate predicate to make the intention even clearer.

@owen-mc
Copy link
Contributor Author

owen-mc commented Aug 11, 2020

I've pushed a commit dealing with build constraints. We now run the tests in this folder again with GOARCH=386 and check we get the same results. Currently I've just hard-coded the path to this folder, but if we want to reuse the same functionality for other folders in future then we can set up something nicer. All the tests in files with architecture-specific build constraints have to produce no results, so that we get the same results no matter which architecture we are testing, but I have checked that they really are being run.

@owen-mc owen-mc force-pushed the incorrect-integer-conversion branch 2 times, most recently from cd3ce65 to f4d9fae Compare August 12, 2020 09:12
Copy link
Contributor

@max-schaefer max-schaefer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! Thanks for tackling that final piece of the puzzle. As usual I have a few nitpicks. I know that you've been hammering away at this for a while, but it seems to me like there are a few generally useful bits of functionality in here that are worth pulling out.

Also, we'll probably need a performance evaluation of this after all, just to make sure the reasoning about build constraints is handled well.

Comment on lines 36 to 41
if file.hasConstrainedIntBitSize(32)
then result = 32
else
if file.hasConstrainedIntBitSize(64)
then result = 64
else result = 0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if file.hasConstrainedIntBitSize(32)
then result = 32
else
if file.hasConstrainedIntBitSize(64)
then result = 64
else result = 0
file.hasConstrainedIntBitSize(result)
or
not file.hasConstrainedIntBitSize(_) and
result = 0

* Holds if this file contains build constraints that ensure that it
* is only built on architectures of bit size `bitSize`.
*/
predicate hasConstrainedIntBitSize(int bitSize) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about simplifying the name slightly to constrainsBitSize and similarly explicitlyConstrainsBitSize and implicitlyConstrainsBitSize below?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(In particular I don't think the "Architecture" part included in the name of those two other predicates adds much.)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alternatively, perhaps expose a more general predicate that determines whether a file is only built on a particular architecture, and then derive the bit-size information from that.

We might even want

class Architecture extends string {
  int bitSize;

  Architecture() {
    this = "386" and bitSize = 32 or
    ...
  }

  int getBitSize() { result = bitSize }
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the idea of the architecture class. Which file do you think it should go in? Files.qll? Its own file?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't immediately think of a good place to put it, so I think its own file is fine.


/**
* Holds if this file contains build constraints that ensure that it
* is only built on architectures of bit size `bitSize`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it might be worth explicitly stating that bitSize is either 32 or 64 here.

exists(BuildConstraintComment bcc, string bc |
this = bcc.getFile() and bc = bcc.getText().splitAt("+build ", 1)
|
forex(string disjunct | disjunct = bc.splitAt(" ") |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the logic for pulling out individual build constraints deserves to be moved into BuildConstraintComment, it looks independently useful.

@owen-mc owen-mc force-pushed the incorrect-integer-conversion branch 2 times, most recently from 772da47 to 79b5cc6 Compare August 12, 2020 13:54
Note that build constraints can be explicit (comments at the top of the
file) or implicit (part of the file name)
*/
predicate explicitlyConstrainsIntBitSize(int bitSize) {
exists(BuildConstraintComment bcc, string bc |
this = bcc.getFile() and bc = bcc.getText().splitAt("+build ", 1)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think you need bc anymore.

Comment on lines 237 to 239
this
.getStem()
.regexpMatch(".*_" + any(Architecture arch | arch.getBitSize() = bitSize) + "(_test)?")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it might be slightly more readable to pull arch out into an exists

Suggested change
this
.getStem()
.regexpMatch(".*_" + any(Architecture arch | arch.getBitSize() = bitSize) + "(_test)?")
exists(Architecture arch | arch.getBitSize() = bitsize |
this.getStem().regexpMatch("(?i).*_\\Q" + arch + "\\E(_test)?")
)

I've added (?i) to do a case-insensitive match (I'm guessing that's required for non-case preserving file systems), and \\Q and \\E to quote any regex metacharacters in arch (I don't think there can be any at the moment, but better safe than sorry).

Copy link
Contributor

@max-schaefer max-schaefer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some final minor nitpicks, but overall this looks very good now! I'll do a quick dist-compare as a sanity check.

DataFlow::PathNode source, DataFlow::PathNode sink, ConversionWithoutBoundsCheckConfig cfg,
DataFlow::CallNode call
where cfg.hasFlowPath(source, sink) and call.getResult(0) = source.getNode()
select source.getNode(), source, sink,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two suggestions about the select:

  • Since the message mentions the conversion, we should probably select sink.getNode() rather than source.getNode().
  • To provide a reference to source, we can turn the relevant part of the alert message into a link:
      " from $@ to a lower (...)", source, call.getTarget().getQualifiedName()

@max-schaefer
Copy link
Contributor

Benchmarking suggests that performance is acceptable for a data-flow query. (It's a little on the slow side on large databases, but the evaluation log doesn't show anything that looks like an easy win.)

@owen-mc owen-mc force-pushed the incorrect-integer-conversion branch from 79b5cc6 to e2eb932 Compare August 13, 2020 15:22
@owen-mc owen-mc force-pushed the incorrect-integer-conversion branch from e2eb932 to 951d597 Compare August 13, 2020 17:23
@max-schaefer max-schaefer merged commit fe6cf8c into github:main Aug 13, 2020
@owen-mc owen-mc deleted the incorrect-integer-conversion branch August 13, 2020 21:29
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants