This workshop is adapted from this material.
In this workshop we will explore integer conversion, how it is represented by the standard library, and how it to relates to type conversion security vulnerabilities.
-
Install Visual Studio Code.
-
Install the CodeQL extension for Visual Studio Code.
-
You do not need to install the CodeQL CLI: the extension will handle this for you.
-
Clone this repository:
git clone https://github.com/rvermeulen/codeql-workshop-integer-conversion.git
-
Install the CodeQL pack dependencies using the command
CodeQL: Install Pack Dependencies
and selectexercises
,exercises-tests
,solutions
, andsolutions-tests
.
The workshop is split into multiple exercises introducing control flow. In these exercises you will learn:
- About integer conversion.
- How integer conversion is represented in QL.
- How integer conversion relates to integer overflow vulnerabilities.
Most security related type conversion issues are implicit conversion from signed integers to unsigned integers. When a signed integer is converted to an unsigned integer of the same size then the underlying bit-pattern remains the same, but the value is potentially interpreted differently. The opposite conversion is implementation defined, but typically follows the same implementation of leaving the underlying bit-pattern unchanged.
In CodeQL all conversions are modeled by the class Conversion
and its sub-classes.
The implicit conversion becomes relevant in function calls such as in the following example where there is an implicit conversion from int
to size_t
(defined as unsigned int
).
int get_len(int fd);
void buffer_overflow(int fd) {
int len;
char buf[128];
len = get_input(fd);
if (len > 128) {
return;
}
read(fd, buf, len);
}
In the following exercise we are going to implement a basic query to find the above problematic implicit conversion. Why does the conversion pose a security risk?
Next are the exercises used to further explore integer conversion.
Create the a class SignedInt
that represents that specific IntType
type. Then write a query that uses class to return all occurrences of that type in any source code. Implement this in Exercise1.ql.
Hints
- The
class
keyword is used to write a user defined QL class. - C/C++ provides ways such as
typedef
andusing
to create type aliases. The predicategetUnderlyingType
gets the type after resolving typedefs.
A solution can be found in the query Exercise1.ql
Create the a class UnsignedInt
that represents that specific IntType
type. Then write a query that uses class to return all occurrences of that type in any source code. Implement this in Exercise2.ql.
Hints
- This is very similar to Exercise 1.
A solution can be found in the query Exercise2.ql
In the case of signed int
to unsigned int
conversions we are interested in the conversion IntegeralConversion
class that models implicit and explicit conversions from one integral type to another.
Create the class SignedToUnsignedConversion
that models a signed int
to unsigned int
conversion. Use the classes SignedInt
and UnsignedInt
defined in Exercise1.ql and Exercise2.ql.
Place all relevant classes (and a query that selects from that class) in Exercise3.ql.
A solution can be found in the query Exercise3.ql
Now that we have modeled the signed int
to unsigned int
conversion write a query that find the vulnerable conversion, in Exercise4.ql.
A solution can be found in the query Exercise4.ql
Solution Note
- Note that this solution uses a
VariableAccess
as an argument of the call. This excludes direct uses of literal values.
Alternative Solution
import cpp
from FunctionCall call, int idx, Expr arg
where call.getArgument(idx) = arg and arg.getUnspecifiedType().(IntType).isSigned() and not arg.isConstant() and
call.getTarget().getParameter(idx).getUnspecifiedType().(IntType).isUnsigned()
select call, arg
On a real-world database our current query provides a lot of results so it is key to turning this into a manageable list that can be audited. Implement a heuristic that can meaningfully reduce the list of results in Exercise5.ql.
Hints
- Look for parameters containing the sub-string
len
,size
, ornbyte
.
A solution can be found in the query Exercise5.ql
Implement another possible heuristic that can meaningfully reduce the list of results in Exercise6.ql.
Hints
- Look for parameters of type
size_t
.
A solution can be found in the query Exercise6.ql
In the opposite direction unsigned to signed conversion can result in out of bounds access when the signed value is used in a pointer computation. CVE-2021-33909 is discussed by Qualys and Sequoia variant analysis. The latter discusses a CodeQL query similar to the production query used as an inspiration that can be found at UnsignedToSignedPointerArith.ql.
Consider the following example:
char* out_of_bounds(char * c, int n) {
char * ptr = c + n;
return ptr;
}
#define INT_MAX 2147483648
int main(void) {
unsigned int n = INT_MAX + 1;
char buf[1024];
char *ptr = out_of_bounds(buf, n);
}
The variable n
can range from -2147483648
to 2147483648
(assuming 32-bit integers). Passing an unsigned integer, which can range from 0
to 4294967296
, to a call to out_of_bounds
can result in a pointer that is out of bound because n
can become negative.
To find the above vulnerable case, start by writing the class UnsignedToSigned
that identifies conversions from unsigned int
to signed int
and put it in Exercise7.ql.
Hints
- this is similar to what we did in Exercise 1-3.
A solution can be found in the query Exercise7.ql
The second requirement for the vulnerable case is the participation in a computation that results in a pointer.
Complete the query by establishing that the parameter n
is used to compute a pointer and put it in Exercise8.ql.
You can run your solution on a prebuilt database of the Linux kernel v5.12 and see if this finds the conversion part of CVE-2021-33909
Hints
- Pointer arithmetic operations are modeled by the class
PointerArithmeticOperation
. - Dataflow analysis can help with determining if a value is used somewhere. For local dataflow analysis you can use
DataFlow::localFlow
- The dataflow library provides helper predicates such as
DataFlow::parameterNode
andDataFlow::exprNode
to relate AST elements to their dataflow graph counterparts.
A solution can be found in the query Exercise8.ql.