Incorrect integer conversion #275

owen-mc · 2020-08-05T13:36:34Z

Promoting this query (formerly called "Incorrect numeric conversion" because it also dealt with floats) from experimental status. There are three things I am aware I still need to do:

~~Run dist-compare on lgtm to check performance. It takes a long time on cockroachdb because there are a lot of sources and sinks.~~ ~~(Didn't do - decided to do 4 instead.)~~
~~Add tests for the library code that has been added~~
Deal with build constraints specifying 32-bit or 64-bit architectures, which could lead to false positives. Note that file names ending in the name of an architecture act as implicit build constraints as well.
~~Run on lgtm and check results.~~ (Actually decided to do 1 after all.)

Note that we only look for upper bounds checks as sanitizers. It isn't possible to track upper and lower bounds checks with the taint tracking framework. This means we may miss some results, but it is probably quite rare to remember to check the upper bound and not the lower bound. (And with an unsigned integer you don't need a lower bound check.)

ql/src/Security/CWE-681/IncorrectIntegerConversion.qhelp

ql/src/Security/CWE-681/IncorrectIntegerConversion.ql

ql/src/semmle/go/frameworks/Stdlib.qll

owen-mc · 2020-08-05T15:52:21Z

Thanks @intrigus-lgtm , lots of good points. I have addressed most of them with a new commit. I'm not sure what to do about the problem that 2^63 - 1 isn't exactly representable in a 32-bit float.

owen-mc · 2020-08-05T20:47:43Z

qhelp preview https://jenkins.internal.semmle.com/job/Pull-Request-Check/job/QHelp-Preview/3098/ (though it doesn't work for me)

max-schaefer

Looks great overall, thanks for tackling this! I have a few comments here and there.

ql/src/Security/CWE-681/IncorrectIntegerConversion.qhelp

max-schaefer · 2020-08-07T08:20:26Z

ql/src/Security/CWE-681/IncorrectIntegerConversion.ql

@@ -0,0 +1,142 @@
+/**
+ * @name Incorrect conversion between integer types
+ * @description Converting the result of strconv.Atoi, strconv.ParseInt and strconv.ParseUint


I'd suggest putting backticks around the names of the functions.

➕

Could you also add a comma after "strconv.ParseInt"? We try to consistently use Oxford commas.

ql/src/Security/CWE-681/IncorrectIntegerConversion.ql

max-schaefer · 2020-08-07T08:31:02Z

ql/src/Security/CWE-681/IncorrectIntegerConversion.ql

+  sourceBitSize in [0, 16, 32, 64] and
+  sinkBitSize in [0, 8, 16, 32] and
+  not (sourceBitSize = 0 and sinkBitSize = 0) and
+  exists(int source, int sink |
+    (if sourceBitSize = 0 then source = 64 else source = sourceBitSize) and
+    if sinkBitSize = 0 then sink = 32 else sink = sinkBitSize
+  |
+    source > sink
+  )


This is a little hard to follow. Would it perhaps make sense to split out the treatment of 0 into separate disjuncts?

Suggested change

sourceBitSize in [0, 16, 32, 64] and

sinkBitSize in [0, 8, 16, 32] and

not (sourceBitSize = 0 and sinkBitSize = 0) and

exists(int source, int sink |

(if sourceBitSize = 0 then source = 64 else source = sourceBitSize) and

if sinkBitSize = 0 then sink = 32 else sink = sinkBitSize

|

source > sink

)

sourceBitSize in [16, 32, 64] and

sinkBitSize in [8, 16, 32] and

sourceBitSize > sinkBitSize

or

sourceBitSize = 0 and

sinkBitSize in [8, 16, 32]

or

sourceBitSize = 64 and

sinkBitSize = 0

max-schaefer · 2020-08-07T08:32:00Z

ql/src/Security/CWE-681/IncorrectIntegerConversion.ql

+    sourceIsSigned in [true, false] and
+    isIncorrectIntegerConversion(sourceBitSize, sinkBitSize) and
+    this =
+      sourceBitSize.toString() + sourceIsSigned.toString() + sinkBitSize.toString() +


If you put "ConversionWithoutBoundsCheckConfig" first, you can get rid of the toString()s, which reads slightly nicer, I think.

max-schaefer · 2020-08-07T08:43:21Z

ql/src/semmle/go/frameworks/Stdlib.qll

+  string getParserName() { result = self.getParserName() }
+
+  /** Gets a string describing the size of the integer parsed. */
+  string getBitSizeString() {


I think I'd prefer this to be called getBitSizeDescription or even describeBitSize, just to emphasise that this isn't simply the toString() of getTargetBitSize().

max-schaefer · 2020-08-07T08:45:47Z

ql/src/semmle/go/frameworks/Stdlib.qll

+  ParseIntCall() { exists(StrConv::ParseInt parseInt | this = parseInt.getACall()) }
+
+  override int getTargetBitSize() {
+    if exists(StrConv::IntSize intSize | this.getArgument(2).(DataFlow::ReadNode).reads(intSize))


Suggested change

if exists(StrConv::IntSize intSize | this.getArgument(2).(DataFlow::ReadNode).reads(intSize))

if this.getArgument(2) = any(IntSize is).getARead()

max-schaefer · 2020-08-07T08:46:10Z

ql/src/semmle/go/frameworks/Stdlib.qll

+}
+
+/** A call to `strconv.Atoi`. */
+class AtoiCall extends DataFlow::CallNode, ParserCall::Range {


Suggested change

class AtoiCall extends DataFlow::CallNode, ParserCall::Range {

class AtoiCall extends ParserCall::Range {

(ParserCall::Range already extends DataFlow::CallNode)

max-schaefer · 2020-08-07T08:47:01Z

ql/src/semmle/go/frameworks/Stdlib.qll

+
+  override predicate targetIsSigned() { any() }
+
+  override string getParserName() { result = "strconv.Atoi" }


Couldn't you just use getTarget().getQualifiedName()? Then you wouldn't even need a separate predicate for it.

max-schaefer · 2020-08-07T08:54:15Z

ql/src/semmle/go/frameworks/Stdlib.qll

+}
+
+/** Provides a class for modeling calls to number-parsing functions. */
+module ParserCall {


I wonder whether it wouldn't be neater to describe this at the level of functions instead of function calls. So you'd have

module NumberParser { class Range extends Function { abstract FunctionInput getTargetBitSize(); abstract predicate isTargetSigned(); } }

The various functions from StrConv above then extend this Range class. You'd have to move the logic for checking whether a bit size is architecture-dependent into the query, but that seems to be the right place for it anyway.

That does seem neater. How would getTargetBitSize() work for Atoi, which doesn't have a parameter corresponding to the bit size? Implement it as none() and then treat that case specially when calculating the bit size? Or return an algebraic data type which has branches for int and FunctionInput?

I'd say implement as none() and have clients do what's appropriate for their context.

shati-patel

qhelp preview https://jenkins.internal.semmle.com/job/Pull-Request-Check/job/QHelp-Preview/3098/ (though it doesn't work for me)

Also doesn't work for me for some reason, but the query help looks good (from looking at the source) 😊

shati-patel · 2020-08-07T10:16:36Z

ql/src/Security/CWE-681/IncorrectIntegerConversion.ql

@@ -0,0 +1,142 @@
+/**
+ * @name Incorrect conversion between integer types
+ * @description Converting the result of strconv.Atoi, strconv.ParseInt and strconv.ParseUint


➕

Could you also add a comma after "strconv.ParseInt"? We try to consistently use Oxford commas.

Adds classes for some integer-parsing functions and a constant from strconv, plus a class for calls to integer-parsing functions.

ql/src/Security/CWE-681/IncorrectIntegerConversion.qhelp

ql/src/Security/CWE-681/IncorrectIntegerConversion.ql

ql/src/semmle/go/frameworks/Stdlib.qll

max-schaefer · 2020-08-10T10:29:33Z

ql/src/semmle/go/frameworks/Stdlib.qll

+
+    /**
+     * Gets the `FunctionInput` containing the maximum bit size of the
+     * return value, if this makes sense, where 0 represents the bit


Is the "where 0 represents" bit intentional? I can't immediately see how that would apply in this case.

I mean if the value of the FunctionInput is 0 then that means the bit size of int and uint.

ql/src/semmle/go/frameworks/Stdlib.qll

smowton

Suggestions only, looks good

smowton · 2020-08-10T13:23:43Z

ql/src/Security/CWE-681/IncorrectIntegerConversion.ql

+  int sinkBitSize;
+
+  ConversionWithoutBoundsCheckConfig() {
+    sourceIsSigned in [true, false] and


No kidding ;)

Apparently the compiler considers boolean to be an unbounded type for historical reasons...

smowton · 2020-08-10T13:25:42Z

ql/src/Security/CWE-681/IncorrectIntegerConversion.ql

+      c.getTarget() = ip and source = c.getResult(0)
+    |
+      (
+        if ip.getResultType(0) instanceof SignedIntegerType


I guess sourceIsSigned = (ip.getResultType(0) instanceof SignedIntegerType) doesn't work?

Sadly not. "Expected an expression but found a term instead".

smowton · 2020-08-10T13:27:50Z

ql/src/Security/CWE-681/IncorrectIntegerConversion.ql

+      ) and
+      (
+        bitSize = ip.getTargetBitSize() or
+        if exists(StrConv::IntSize intSize | ip.getTargetBitSizeInput().getNode(c).(DataFlow::ReadNode).reads(intSize))


This if might be worth a brief explanatory comment

smowton · 2020-08-10T13:42:57Z

ql/src/Security/CWE-681/IncorrectIntegerConversion.ql

+  override predicate isSink(DataFlow::Node sink) { isSink(sink, sinkBitSize) }
+
+  override predicate isSanitizerGuard(DataFlow::BarrierGuard guard) {
+    exists(int bitSize | if sinkBitSize != 0 then bitSize = sinkBitSize else bitSize = 32 |


Comment on why we assume 32 for arch-specific?

smowton · 2020-08-10T14:15:01Z

ql/src/semmle/go/frameworks/Stdlib.qll

+     * return value, if this makes sense. Note that if the value of the
+     * input is 0 then it means the bit size of `int` and `uint`.
+     */
+    FunctionInput getTargetBitSizeInput() { none() }


Might be overkill, but perhaps get-result should be a member here, rather than assuming in the query that getResult(0) is always the place to find the output?

I think it's overkill. We can always change in future it if we want to add an integer parsing function that returns the output in a different place.

smowton · 2020-08-10T14:18:59Z

ql/test/query-tests/Security/CWE-681/IncorrectIntegerConversion.go

+			panic(err)
+		}
+		_ = byte(parsed) // OK
+		_ = byte(parsed >> 8)


Possible false-positive to consider: masking before/after the shift, e.g. parsed & 0xff00 >> 8? I don't know if these are likely in reality, perhaps worth a leaf through lgtm

Good point. 17 repos on lgtm have the pattern byte(parsed & 0xff00 >> 8) and 83 have the pattern byte(parsed >> 8 & 0xff) (though not all necessarily relate to this query). I've made the query catch the former and added tests for it - I wouldn't expect it to impact performance noticeably. We already caught the latter but I've added tests for it too.

max-schaefer · 2020-08-11T08:26:37Z

ql/src/Security/CWE-681/IncorrectIntegerConversion.ql

@@ -76,6 +85,8 @@ class ConversionWithoutBoundsCheckConfig extends TaintTracking::Configuration {
      (
        bitSize = ip.getTargetBitSize()
        or
+        // If we are reading a variable, check if it is
+        // `strconv.IntSize`, and use 0 if it is.
        if
          exists(StrConv::IntSize intSize |


This if condition is a little too complex for my taste. As you know, the compiler transforms if A then B else C into (A and B) or (not A and C), and if A is complex then not A can have unexpectedly bad performance.

In general, I would suggest pulling out parts of the condition into a surrounding exists wherever possible. In this specific case, the following rewrite may make sense:

exists(DataFlow::Node rawBitSize | rawBitSize = ip.getTargetBitSizeInput().getNode(c) | if rawBitSize = any(StrConv::IntSize intSize).getARead() then bitSize = 0 else bitSize = rawBitSize.getIntValue() )

You might also consider pulling out the any(StrConv::IntSize intSize).getARead() bit into a separate predicate to make the intention even clearer.

owen-mc · 2020-08-11T15:20:11Z

I've pushed a commit dealing with build constraints. We now run the tests in this folder again with GOARCH=386 and check we get the same results. Currently I've just hard-coded the path to this folder, but if we want to reuse the same functionality for other folders in future then we can set up something nicer. All the tests in files with architecture-specific build constraints have to produce no results, so that we get the same results no matter which architecture we are testing, but I have checked that they really are being run.

max-schaefer

Nice! Thanks for tackling that final piece of the puzzle. As usual I have a few nitpicks. I know that you've been hammering away at this for a while, but it seems to me like there are a few generally useful bits of functionality in here that are worth pulling out.

Also, we'll probably need a performance evaluation of this after all, just to make sure the reasoning about build constraints is handled well.

max-schaefer · 2020-08-12T10:40:08Z

ql/src/Security/CWE-681/IncorrectIntegerConversion.ql

+  if file.hasConstrainedIntBitSize(32)
+  then result = 32
+  else
+    if file.hasConstrainedIntBitSize(64)
+    then result = 64
+    else result = 0


Suggested change

if file.hasConstrainedIntBitSize(32)

then result = 32

else

if file.hasConstrainedIntBitSize(64)

then result = 64

else result = 0

file.hasConstrainedIntBitSize(result)

or

not file.hasConstrainedIntBitSize(_) and

result = 0

max-schaefer · 2020-08-12T10:44:24Z

ql/src/semmle/go/Files.qll

+   * Holds if this file contains build constraints that ensure that it
+   * is only built on architectures of bit size `bitSize`.
+   */
+  predicate hasConstrainedIntBitSize(int bitSize) {


How about simplifying the name slightly to constrainsBitSize and similarly explicitlyConstrainsBitSize and implicitlyConstrainsBitSize below?

(In particular I don't think the "Architecture" part included in the name of those two other predicates adds much.)

Alternatively, perhaps expose a more general predicate that determines whether a file is only built on a particular architecture, and then derive the bit-size information from that.

We might even want

class Architecture extends string { int bitSize; Architecture() { this = "386" and bitSize = 32 or ... } int getBitSize() { result = bitSize } }

I like the idea of the architecture class. Which file do you think it should go in? Files.qll? Its own file?

I can't immediately think of a good place to put it, so I think its own file is fine.

max-schaefer · 2020-08-12T10:44:38Z

ql/src/semmle/go/Files.qll

+
+  /**
+   * Holds if this file contains build constraints that ensure that it
+   * is only built on architectures of bit size `bitSize`.


I think it might be worth explicitly stating that bitSize is either 32 or 64 here.

max-schaefer · 2020-08-12T10:46:35Z

ql/src/semmle/go/Files.qll

+    exists(BuildConstraintComment bcc, string bc |
+      this = bcc.getFile() and bc = bcc.getText().splitAt("+build ", 1)
+    |
+      forex(string disjunct | disjunct = bc.splitAt(" ") |


I think the logic for pulling out individual build constraints deserves to be moved into BuildConstraintComment, it looks independently useful.

Note that build constraints can be explicit (comments at the top of the file) or implicit (part of the file name)

max-schaefer · 2020-08-12T21:26:46Z

ql/src/semmle/go/Files.qll

+   */
+  predicate explicitlyConstrainsIntBitSize(int bitSize) {
+    exists(BuildConstraintComment bcc, string bc |
+      this = bcc.getFile() and bc = bcc.getText().splitAt("+build ", 1)


I don't think you need bc anymore.

max-schaefer · 2020-08-12T21:31:48Z

ql/src/semmle/go/Files.qll

+    this
+        .getStem()
+        .regexpMatch(".*_" + any(Architecture arch | arch.getBitSize() = bitSize) + "(_test)?")


I think it might be slightly more readable to pull arch out into an exists

Suggested change

this

.getStem()

.regexpMatch(".*_" + any(Architecture arch | arch.getBitSize() = bitSize) + "(_test)?")

exists(Architecture arch | arch.getBitSize() = bitsize |

this.getStem().regexpMatch("(?i).*_\\Q" + arch + "\\E(_test)?")

)

I've added (?i) to do a case-insensitive match (I'm guessing that's required for non-case preserving file systems), and \\Q and \\E to quote any regex metacharacters in arch (I don't think there can be any at the moment, but better safe than sorry).

max-schaefer

Some final minor nitpicks, but overall this looks very good now! I'll do a quick dist-compare as a sanity check.

max-schaefer · 2020-08-13T07:00:45Z

ql/src/Security/CWE-681/IncorrectIntegerConversion.ql

+  DataFlow::PathNode source, DataFlow::PathNode sink, ConversionWithoutBoundsCheckConfig cfg,
+  DataFlow::CallNode call
+where cfg.hasFlowPath(source, sink) and call.getResult(0) = source.getNode()
+select source.getNode(), source, sink,


Two suggestions about the select:

Since the message mentions the conversion, we should probably select sink.getNode() rather than source.getNode().

To provide a reference to source, we can turn the relevant part of the alert message into a link:

" from $@ to a lower (...)", source, call.getTarget().getQualifiedName()

max-schaefer · 2020-08-13T07:07:56Z

Benchmarking suggests that performance is acceptable for a data-flow query. (It's a little on the slow side on large databases, but the evaluation log doesn't show anything that looks like an easy win.)

owen-mc requested a review from a team August 5, 2020 13:36

owen-mc force-pushed the incorrect-integer-conversion branch 2 times, most recently from 898398f to a4c2986 Compare August 5, 2020 14:52

intrigus-lgtm reviewed Aug 5, 2020

View reviewed changes

ql/src/Security/CWE-681/IncorrectIntegerConversion.qhelp Outdated Show resolved Hide resolved

ql/src/Security/CWE-681/IncorrectIntegerConversion.ql Show resolved Hide resolved

intrigus-lgtm reviewed Aug 5, 2020

View reviewed changes

owen-mc force-pushed the incorrect-integer-conversion branch from 6144629 to 0eac6ff Compare August 5, 2020 15:50

owen-mc requested a review from shati-patel August 6, 2020 10:21

max-schaefer suggested changes Aug 7, 2020

View reviewed changes

shati-patel reviewed Aug 7, 2020

View reviewed changes

owen-mc added 8 commits August 10, 2020 11:04

Fix typo in QLDoc

3a6aa58

Delete experimental query and tests for it

ac49aa2

Add modeling to Stdlib.qll

34fa072

Adds classes for some integer-parsing functions and a constant from strconv, plus a class for calls to integer-parsing functions.

Add query for incorrect integer conversion

329888e

Add tests for incorrect integer conversion

06d1eb9

Add change note

681ca90

Address review comments 1

4bfb2b4

Address review comments 2

89eae10

owen-mc force-pushed the incorrect-integer-conversion branch from c11ab56 to 89eae10 Compare August 10, 2020 10:07

owen-mc assigned owen-mc and max-schaefer Aug 10, 2020

max-schaefer suggested changes Aug 10, 2020

View reviewed changes

smowton approved these changes Aug 10, 2020

View reviewed changes

Address review comments 3

30f1762

owen-mc force-pushed the incorrect-integer-conversion branch from af75756 to 30f1762 Compare August 10, 2020 14:21

max-schaefer changed the base branch from master to main August 10, 2020 15:57

owen-mc added 3 commits August 10, 2020 17:32

Fix mistake in test

ed469a3

Address review comments 4

4907f65

Improve tests of paths with more than one sink

c7a8730

owen-mc force-pushed the incorrect-integer-conversion branch from da08b83 to c7a8730 Compare August 11, 2020 06:25

max-schaefer reviewed Aug 11, 2020

View reviewed changes

Address review comments 5

1e0b9cc

owen-mc force-pushed the incorrect-integer-conversion branch 2 times, most recently from cd3ce65 to f4d9fae Compare August 12, 2020 09:12

max-schaefer suggested changes Aug 12, 2020

View reviewed changes

owen-mc force-pushed the incorrect-integer-conversion branch 2 times, most recently from 772da47 to 79b5cc6 Compare August 12, 2020 13:54

owen-mc added 2 commits August 12, 2020 17:07

Deal with build constraints

69212b9

Note that build constraints can be explicit (comments at the top of the file) or implicit (part of the file name)

Address review comments 6

2e60d40

max-schaefer reviewed Aug 12, 2020

View reviewed changes

max-schaefer reviewed Aug 13, 2020

View reviewed changes

owen-mc force-pushed the incorrect-integer-conversion branch from 79b5cc6 to e2eb932 Compare August 13, 2020 15:22

Address review comments 7

951d597

owen-mc force-pushed the incorrect-integer-conversion branch from e2eb932 to 951d597 Compare August 13, 2020 17:23

max-schaefer approved these changes Aug 13, 2020

View reviewed changes

max-schaefer merged commit fe6cf8c into github:main Aug 13, 2020

owen-mc deleted the incorrect-integer-conversion branch August 13, 2020 21:29

	if exists(StrConv::IntSize intSize \| this.getArgument(2).(DataFlow::ReadNode).reads(intSize))
	if this.getArgument(2) = any(IntSize is).getARead()

	class AtoiCall extends DataFlow::CallNode, ParserCall::Range {
	class AtoiCall extends ParserCall::Range {


		override predicate targetIsSigned() { any() }

		override string getParserName() { result = "strconv.Atoi" }

Incorrect integer conversion #275

Incorrect integer conversion #275

Uh oh!

Conversation

owen-mc commented Aug 5, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

owen-mc commented Aug 5, 2020

Uh oh!

owen-mc commented Aug 5, 2020

Uh oh!

max-schaefer left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

owen-mc Aug 7, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

shati-patel left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

smowton left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

owen-mc commented Aug 5, 2020 •

edited

Loading

owen-mc Aug 7, 2020 •

edited

Loading

owen-mc Aug 11, 2020 •

edited

Loading