-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Description
I've implemented a number of SimpleRangeAnalysisExprs and I'd like to discuss if any of them could be merged upstream into the main library, or if they could be included as optional "extensions" that could be included in individual queries if needed. We've been running queries with these extensions on a number of big (internal) projects and they've proven useful.
The files below may be missing some included files or helper functions (some are here), they're just for reference.
Supporting more operators in more cases
I've improved what is supported for &, &=, <<, <<=, >>, >>=, \, and \=. (Sidenote: I'm wondering if we could support implementing only the operator and having the assignment version of that operator share that same implementation. I did this myself using this class, but it could be part of the library itself). I'm not going to discuss all the differences here, but as an example, the built in & operator only supports cases where both arguments are either unsigned or a non-negative constant. My changes support "constants" that come from variables, i.e. unsigned a = 7; 42 & a, as well as signed operands where the bounds are non-negative. The other operands "expand" the cases where we compute bounds similar to this.
Shift operators
For the shift operators, there are some interesting considerations around what to do in certain cases of undefined or implementation-defined behavior. I've noted some of these cases in the code, and it seems for the cases CodeQL supports we don't treat this as undefined behavior but do what most common compilers do. This is fine, but just worth calling out.
Division
For \, I chose to not consider division by zero in the bounds, as CodeQL only supports continuous ranges, so even if the code verifies zero isn't a possible value, it's quite common for zero to be in the range. And since this case isn't important unless you're looking for DoS (which we're not), we assume division by zero doesn't happen. This could be make configurable depending on the usecase.
Signed multiplication
I've also implemented support for signed multiplication for * and *=. This does have a significant performance hit, but that hit is acceptable in our use cases. It would be nice if an extension like this was included in the library and then could be imported into queries that were willing to take that performance hit. (Sidenote: I'm unsure why there's a performance hit here. Taking the max/min of four different values doesn't seem like something that should be slow).
std::min and std::max
We found implementing the resultant range for min/max was useful for reducing some FPs and not a huge performance hit: https://github.com/gsingh93/codeql/blob/b7d39f7aa41258927c1d7259f9ddd283554e8123/cpp/ql/src/experimental/semmle/code/cpp/rangeanalysis/rangeextensions/ContainerMethodRange.qll#L217-L261
Container ranges
Probably not something we can merge as-is, but might be worth a discussion. A common FP we've seen is str.size() + 1, or something like that with any method that returns a value relating to the container's size. In practice, this will never overflow on a 64-bit system, as no 64-bit system can support a container with this many elements. Since we run on 64-bit systems, it was helpful to limit the max size for many of these methods to some arbitrary number (we chose INT_MAX - 1). You can see these changes here. Maybe it's worth allowing users to customize this somehow?
I will point out though that std::array can be properly modeled since the size is fixed, the implementation from the above file could be merged upstream.
Handling basic guards
I haven't tested this out yet, but it would reduce real world FP's we've seen and @lcartey has provided some code to test out. The basic goal is to filter out issues like this:
// Example 1
uint32_t a = std::min(b, c);
sink(c - a); // No underflow, a is guaranteed to be <= c
// Example 2
if (n <= size_function()) {
sink(size_function() - n);
}
The untested implementation to filter this out is here, but we can implement a proper range analysis expression if #4273 is merged.
Interprocedural analysis
I have some extensions here that do some rudimentary interprocedural analysis. These range analysis expressions propagate from callers to the function parameters, and also support assigning to reference parameters (thanks @lcartey). This extension propagates the return value of a function to the caller only if the return value is tainted by user controlled data. The restriction of the data being user controlled is to improve performance, but I don't think this should be hardcoded in a general implementation. An Extension class could be provided, and users could override methods there to customize when to propagate arguments or return values.
There's a significant performance hit from these, but we're running them on a number of large internal projects, they're very helpful, and we haven't seen any timeouts. This would be another example of something that could be not included by default, but enabled after including a particular file.
Displaying Ranges in Paths
Triaging path queries that rely on range analysis is much easier when you can see the ranges at each step of the path. Here I've extended DataFlow::ExprNode, DataFlow::ExplicitParameterNode, and DataFlow::DefinitionByReferenceNode to provide range information when viewing the path in vscode. This has made triaging issues much faster. This is something that wouldn't be enabled by default, but would be enabled after including a file.