Skip to content

C/C++ question: taintTracking can not identify indirect use of Array pointer in a structure #11093

@iiins0mn1a

Description

@iiins0mn1a

This issue is mainly about global taint analysis implemented offcially by CodeQL C/C++.

libs I use :

import semmle.code.cpp.dataflow.TaintTracking
import DataFlow::PathGraph

and here followes a C code sample:

struct packets{
    unsigned int something;
    void *objects[0x10];
};

void build_packets(struct packets *ptr0)
{
    struct packets *ptr1;
    void **ptr2;
    void **ptr3;

    ptr1 = ptr0;
    ptr2 = ptr0->objects+2;
    ptr3 = &ptr0->objects[3];

    ptr0->objects[0] = source(); // 0
    ptr1->objects[1] = source(); // 1
    *ptr2 = source(); // 2
    *ptr3 = source(); // 3

    return;
}

void f(void)
{
    struct packets *pkts;
    pkts = (struct packets *)malloc(sizeof(struct packets));
    build_packets(pkts);
    for(int i = 0; i<= 3; i++)
        sink(pkts->objects[i]);
    return;
}

and this is my TaintTracking configuration:

class TestConfig extends TaintTracking::Configuration {
    TestConfig() { this = "TaintTracking test configuration..." }
    
    override predicate isSource(DataFlow::Node node) {
        exists(
            FunctionCall fc|
            fc.getTarget().hasName("source") and
            node.asExpr() = fc
        )
    }
    
    override predicate isSink(DataFlow::Node node) {
        exists(
            FunctionCall fc|
            fc.getTarget().hasName("sink") and
            node.asExpr() = fc.getArgument(0)
        )
    }

    override predicate isAdditionalTaintStep(DataFlow::Node node1, DataFlow::Node node2) {
        // `struct A { type B[0x10] };`  `A->B` should be considered as an access to struct A (implicit Array pointer in a struct)
        exists(
            FieldAccess access | 
            access.getTarget().getType() instanceof ArrayType and
            not access.getParent() instanceof ArrayExpr and
            node2.asExpr()  = access.getQualifier() and
            node1.asExpr() = access
        )
        or
        isAdditionalTaintStep(node2, node1)
    }

    override int fieldFlowBranchLimit() { result = 5000 }
}

Only source() with index 0 and 1 can be identified.
The index_0's path is explained like this:
call to source() -> objects [inner post update] -> ptr0 [post update] [objects] -> ref arg pkts [post update] [objects] -> ...
and the most critical edge is ptr0 [post update] [objects] -> ref arg pkts [post update] [objects], which makes changed info flow from inner parameter ptr0 to the outer ref arg pkts( A.K.A. node.asDefiningArgument()).

Index_1's path looks very similar to index_0's, ptr1 [post update] [objects] -> ref arg pkts [post update] [objects] is the critical one in explaination.

And when I debug index_2/3, it is clearly that critical edge like above is missed. The relation between ptr2/3 and ptr0 haven't been handled correctly.

So, what should I do to overcome this problem? I'm reading the source of dataflowUtil.qll / FlowVar.qll, finding some related implementation like PartialDefinition, but the maintaining of related edges is private, which seems like adding new edges about PartialDefinition with isAdditionalFlowStep is impossible?

I'm just a rookie about codeql, and if someone could offer me some help, I would be very appreciate.

Metadata

Metadata

Assignees

No one assigned

    Labels

    questionFurther information is requested

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions