Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incorrect source value for Phase II(i) value computation #14

Open
jtoman opened this issue Jun 25, 2015 · 5 comments
Open

Incorrect source value for Phase II(i) value computation #14

jtoman opened this issue Jun 25, 2015 · 5 comments

Comments

@jtoman
Copy link
Contributor

jtoman commented Jun 25, 2015

I believe I've run across a bug in the implementation of Phase II(i) in IDESolver. The problem seems arise when a fact besides the zero fact is used for an initial seeds.

The problem occurs in propagateValueAtStart:

private void propagateValueAtStart(Pair<N, D> nAndD, N n) {
    D d = nAndD.getO2();        
    M p = icfg.getMethodOf(n);
    for(N c: icfg.getCallsFromWithin(p)) {                  
        Set<Entry<D, EdgeFunction<V>>> entries; 
        synchronized (jumpFn) {
            entries = jumpFn.forwardLookup(d,c).entrySet();
            for(Map.Entry<D,EdgeFunction<V>> dPAndFP: entries) {
                D dPrime = dPAndFP.getKey();
                EdgeFunction<V> fPrime = dPAndFP.getValue();
                N sP = n;
                propagateValue(c,dPrime,fPrime.computeTarget(val(sP,d)));
            }
        }
    }
}

In this case, d is a non-zero fact that was an initial seed. The forwardLookup call returns that there are no destination (fact,function) pairs for the source fact d. This is because most (all?) jump functions have the zero fact as their source fact. The analysis seems to work if I explicitly replace the second element of the tuple passed to propagateValueAtStart with the zero fact, but this fix is a hack at best and I'm not sure if it is correct in general.

@ericbodden
Copy link
Member

Hello. I think that there actually could be a semantic problem with using non-zero seeds. Remember that jump functions are really functions, of the form "in any context in which x holds initially, y holds at the target statement". I think if you use a non-zero seed then you are saying that the initial value only holds for certain contexts, namely those in which your non-zero seed applies. Could you maybe explain a bit more about your client analysis so that I can understand what it does and how it behaves?

@jtoman
Copy link
Contributor Author

jtoman commented Jun 25, 2015

Sure. For certain functions, I want to track all possible heap allocations that may reach that function call. (For reasons specific to the analysis there is a reason I'm not simply using Soot's built into PTA and then extracting the AllocNodes...) For example, given a function f(x, 2) I want to find all heap allocations that reach x. There may be multiple invocations of f; I want to find not simply which allocations reach an invocation of f, but which allocations reach which invocations of f. For that reason, I assign a unique identifier to each invocation; the values computed along edges are taken from the powerset of all identifiers.

The initial seeds for this backwards-may analysis are the arguments of interest at the each invocation of f. So for example, if there are two invocations f(x,3) and f(y,4) then my initial seeds are (f(x,3), x) and (f(y,4), y) respectively.

Upon further digging it appears that initial seeds (n,d) that are not of the form (sP, zero) are simply seeded into the analysis as jump functions from (sP, zero) -> (n,d) with the corresponding path function being the identity function. Thus, a jump function never originates from (n,d), however the Phase II(i) computation appears to assume that the initial seeds are the source of jump functions; I believe that this mismatch of assumptions causes the error.

@ericbodden
Copy link
Member

Hi @jtoman
Interestingly, your analysis sounds very similar in spirit to DART, an on-demand points-to analysis that we have implemented on top of heros a few months ago. We had submitted it to a conference but did not make it, are re-submitting it now, with additional experiments. Let me know in case this could be useful to you. It might be an opportunity to collaborate... My PhD student @johspaeth is working on this.

@johspaeth or @johanneslerch should also be able to best comment on your assessment. @johanneslerch can you comment on this? I think you are currently looking in this code anyway...

@johanneslerch
Copy link
Contributor

I agree, this looks like a bug to me as well. The initial seed edges are propagated using zero->customValue, but note that to jumpFn an edge zero->zero is added.

It is questionable if the initial edge should be zero->customValue or customValue->customValue. For the former case the line
jumpFn.addFunction(zeroValue, startPoint, zeroValue, EdgeIdentity.<V>v());
in submitInitialSeeds should be changed to
jumpFn.addFunction(zeroValue, startPoint, val, EdgeIdentity.<V>v());

If choosing the former case, then in addition computeValues has to be changed.
Instead of starting with the given value by the seed specification as in the current code:

Pair<N, D> superGraphNode = new Pair<N,D>(startPoint, val); 
scheduleValueProcessing(new ValuePropagationTask(superGraphNode));

the computation should start for the zero value:

Pair<N, D> superGraphNode = new Pair<N,D>(startPoint, zeroValue); 
scheduleValueProcessing(new ValuePropagationTask(superGraphNode));

@ericbodden which version do you prefer? An initial edge from zero to the custom specified seed value or a self-loop edge using only the specified seed (i am not sure if the latter case may cause other assumptions to fail)

@jtoman can you try the sketched bug fix and check if it solves your problem?

I have some general remarks addressing your analysis design. I am not sure if scalability is an issue in your case, but if you encounter problems here you might want to rethink if it is a good choice to have different facts D for each initial seed. Different facts will never allow you to reuse computed summaries. Basically, your approach is as good as running the analysis for each initial seed in isolation.

I immediately see two alternatives allowing you to reuse summaries.

  1. Model the problem as an IDE problem. Use IFDS facts D that do not depend on the initial seed. Use at the initial seeds EdgeFunctions mapping Bottom to your invocation site, e.g., f(x,2). Note that the x value is represented by the IFDS fact, therefore, is not included in the EdgeFunction values.

  2. Use the PathTrackingIFDSSolver or JoinHandlingNodesIFDSSolver and create your facts D such that they have a reference to their predecessor fact from which they were generated. Note that facts still have to be equals independent of their predecessor. The Solver implementations require you to implement addNeighbor, handleJoin respectively, which will take care of facts being merged without loosing the ability to reconstruct were they came from. In the end, you are able to traverse the chain of predecessors and neighbors to find all origins, i.e, in your case all invocation sites of f. You can find more details on this approach in [1].

[1] http://www.bodden.de/pubs/lhbm14flowtwist.pdf

@jtoman
Copy link
Contributor Author

jtoman commented Jun 29, 2015

Hi @johanneslerch, as you suggested, my analysis is actually an IDE problem, the IDE values L are what track invocation sites and the facts D are access paths that are invocation site agnostic.

I will try the sketched bug-fix later today.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants