Add model call tests #53

tatianacv · 2023-10-14T17:06:44Z

Adding tests for testing calling a method. We add four test that include

Calling model indirectly using call and __call__ (2 tests)
Calling a model directly using call and __call__ (2 tests)

Related to wala#24

...bm.wala.cast.python.ml.test/source/com/ibm/wala/cast/python/ml/test/TestTensorflowModel.java

com.ibm.wala.cast.python.test/data/tf2_test_model_call.py

...bm.wala.cast.python.ml.test/source/com/ibm/wala/cast/python/ml/test/TestTensorflowModel.java

khatchad · 2023-10-20T20:54:28Z

This is looking good. Thanks. For each of the model calls, what does the IR look like?

khatchad · 2023-10-20T21:00:00Z

I think there are two problems here:

I am going to guess that the IR will treat model() as a function call with no definition. In other words, there will be no call graph node corresponding to the called function.
The superclass' __call__() method will invoke the subclass' call() method. Since the superclass isn't part of the client code, I am guessing that's missing as well, and would need to be somehow modeled using the summaries, if possible.

khatchad

Thanks!

khatchad · 2023-10-23T19:56:27Z

Can you also throw this up to http://github.com/wala/ML?

tatianacv · 2023-10-24T02:54:47Z

Can you also throw this up to http://github.com/wala/ML?

wala#97

tatianacv · 2023-10-24T03:36:13Z

This is looking good. Thanks. For each of the model calls, what does the IR look like?

For test_model_call2.py (where it is not working), the model call does not show up as a node.

For test_model_call_3.py (which is the case where it is working), the node for the call is the following Node: synthetic < PythonLoader, L$script tf2_test_model_call3.py/SequentialModel/call, trampoline2()LRoot; > Context: CallStringContext: [ script tf2_test_model_call3.py.do()LRoot;@114 ] and the IR is

synthetic < PythonLoader, L$script tf2_test_model_call3.py/SequentialModel/call, trampoline2()LRoot; >
CFG:
BB0[0..0]
    -> BB1
    -> BB5
BB1[1..1]
    -> BB2
    -> BB5
BB2[2..2]
    -> BB3
    -> BB5
BB3[3..3]
    -> BB4
    -> BB5
BB4[4..4]
    -> BB5
BB5[-1..-2]
Instructions:
BB0
0   v3 = getfield < PythonLoader, LRoot, $function, <PythonLoader,LRoot> > v1
BB1
1   v4 = checkcast <PythonLoader,Lscript tf2_test_model_call3.py/SequentialModel/call>v3
BB2
2   v5 = getfield < PythonLoader, LRoot, $self, <PythonLoader,LRoot> > v1
BB3
3   v6 = invokeFunction < PythonLoader, LCodeBody, do()LRoot; > v4,v5,v2 @2 exception:v7
BB4
4   return v6                                
BB5

and for call body, the node is Node: <Code body of function Lscript tf2_test_model_call3.py/SequentialModel/call> Context: CallStringContext: [ $script tf2_test_model_call3.py.SequentialModel.call.trampoline2()LRoot;@2 ] and the IR is

<Code body of function Lscript tf2_test_model_call3.py/SequentialModel/call>
CFG:
BB0[-1..-2]
    -> BB1
BB1[0..5]
    -> BB2
    -> BB9
BB2[6..7]
    -> BB3
BB3[8..12]
    -> BB6
    -> BB4
BB4[13..17]
    -> BB5
    -> BB9
BB5[18..19]
    -> BB3
BB6[20..21]
    -> BB7
    -> BB9
BB7[22..24]
    -> BB8
    -> BB9
BB8[25..26]
    -> BB9
BB9[-1..-2]
Instructions:
BB0
BB1
0   v4 = new <PythonLoader,Lsuperfun>@0      tf2_test_model_call3.py [21:2] -> [30:12] [4=[super]]
1   v6 = lexical:SequentialModel@Lscript tf2_test_model_call3.pytf2_test_model_call3.py [21:2] -> [30:12]
2   fieldref v4.v7:#$class = v6 = v6         tf2_test_model_call3.py [21:2] -> [30:12] [4=[super]]
3   fieldref v4.v8:#$self = v2 = v2          tf2_test_model_call3.py [21:2] -> [30:12] [4=[super]2=[self]]
4   v10 = fieldref v2.v11:#flatten           tf2_test_model_call3.py [22:8] -> [22:12] [2=[self]]
5   v9 = invokeFunction < PythonLoader, LCodeBody, do()LRoot; > v10,v3 @5 exception:v12tf2_test_model_call3.py [22:8] -> [22:23] [9=[x]3=[x]]
BB2
7   v13 = fieldref v2.v14:#my_layers         tf2_test_model_call3.py [24:17] -> [24:31] [13=[temp 3]2=[self]]
BB3
           v36 = phi  v23,v9
8   v18 = global:global layer                tf2_test_model_call3.py [24:8] -> [24:13]
9   v19 = a property name of v13             <no information> [13=[temp 3]]
10   global:global layer = v19               tf2_test_model_call3.py [21:2] -> [30:12]
11   v15 = binaryop(ne) v16:#null , v19      tf2_test_model_call3.py [21:2] -> [30:12]
12   conditional branch(eq, to iindex=20) v15,v20:#0tf2_test_model_call3.py [21:2] -> [30:12]
BB4
13   v22 = global:global layer               tf2_test_model_call3.py [24:8] -> [24:13]
14   v21 = fieldref v13.v22                  tf2_test_model_call3.py [21:2] -> [30:12] [13=[temp 3]]
15   global:global layer = v21               tf2_test_model_call3.py [21:2] -> [30:12]
16   v24 = global:global layer               tf2_test_model_call3.py [25:10] -> [25:15]
17   v23 = invokeFunction < PythonLoader, LCodeBody, do()LRoot; > v24,v36 @17 exception:v25tf2_test_model_call3.py [25:10] -> [25:18] [23=[x]36=[x]]
BB5
19   goto (from iindex= 19 to iindex = 8)    tf2_test_model_call3.py [21:2] -> [30:12]
BB6
20   v29 = fieldref v2.v30:#dropout          tf2_test_model_call3.py [27:8] -> [27:12] [2=[self]]
21   v28 = invokeFunction < PythonLoader, LCodeBody, do()LRoot; > v29,v36 @21 exception:v31tf2_test_model_call3.py [27:8] -> [27:23] [28=[x]36=[x]]
BB7
23   v33 = fieldref v2.v34:#dense_2          tf2_test_model_call3.py [28:8] -> [28:12] [2=[self]]
24   v32 = invokeFunction < PythonLoader, LCodeBody, do()LRoot; > v33,v28 @24 exception:v35tf2_test_model_call3.py [28:8] -> [28:23] [32=[x]28=[x]]
BB8
26   return v32                              tf2_test_model_call3.py [21:2] -> [30:12] [32=[x]]
BB9

khatchad · 2023-10-24T22:42:49Z

This is looking good. Thanks. For each of the model calls, what does the IR look like?

For test_model_call2.py (where it is not working), the model call does not show up as a node.

We need the IR for this node.

For test_model_call_3.py (which is the case where it is working), the node for the call is the following `Node: synthetic < PythonLoader, L$script tf2_test_model_call3.py/SequentialModel/call, trampoline2()LRoot; > Context: CallStringContext: [ script tf2_test_model_call3.py.do()LRoot;@114 and the IR is

Let's focus on the __call__ case here, since that is simpler.

...
and for call body

We don't need the body, just the client code. Actually, the node name alone would be helpful:

<Code body of function Lscript tf2_test_model_call3.py/SequentialModel/call>

khatchad · 2023-10-24T22:42:55Z

This is looking good. Thanks. For each of the model calls, what does the IR look like?

For test_model_call2.py (where it is not working), the model call does not show up as a node.

We need the IR for this node.

For test_model_call_3.py (which is the case where it is working), the node for the call is the following `Node: synthetic < PythonLoader, L$script tf2_test_model_call3.py/SequentialModel/call, trampoline2()LRoot; > Context: CallStringContext: [ script tf2_test_model_call3.py.do()LRoot;@114 and the IR is

Let's focus on the __call__ case here, since that is simpler.

...
and for call body

We don't need the body, just the client code. Actually, the node name alone would be helpful:

<Code body of function Lscript tf2_test_model_call3.py/SequentialModel/call>

tatianacv · 2023-10-25T21:13:36Z

For test_model_call.py and test_model_call2.py, there are no SequentialModel/call or SequentialModel/__call__ in the callgraph. Please refer to them here CG for test_model_call.py and CG for test_model_call.2py. These are the test that do not work.

For test_model_call_3.py, I added the IR above, which is the call case.

For test_model_call_4.py, for the node Node: synthetic < PythonLoader, L$script tf2_test_model_call4.py/SequentialModel/__call__, trampoline2()LRoot; > Context: CallStringContext: [ script tf2_test_model_call4.py.do()LRoot;@114 ] the IR is the following:

synthetic < PythonLoader, L$script tf2_test_model_call4.py/SequentialModel/__call__, trampoline2()LRoot; >
CFG:
BB0[0..0]
    -> BB1
    -> BB5
BB1[1..1]
    -> BB2
    -> BB5
BB2[2..2]
    -> BB3
    -> BB5
BB3[3..3]
    -> BB4
    -> BB5
BB4[4..4]
    -> BB5
BB5[-1..-2]
Instructions:
BB0
0   v3 = getfield < PythonLoader, LRoot, $function, <PythonLoader,LRoot> > v1
BB1
1   v4 = checkcast <PythonLoader,Lscript tf2_test_model_call4.py/SequentialModel/__call__>v3
BB2
2   v5 = getfield < PythonLoader, LRoot, $self, <PythonLoader,LRoot> > v1
BB3
3   v6 = invokeFunction < PythonLoader, LCodeBody, do()LRoot; > v4,v5,v2 @2 exception:v7
BB4
4   return v6                                
BB5

khatchad · 2023-10-26T13:16:41Z

For test_model_call.py and test_model_call2.py, there are no SequentialModel/call or SequentialModel/__call__ in the callgraph.

We need the IR of the calling functions (callers).

tatianacv · 2023-10-26T19:12:58Z

For Node: synthetic < PythonLoader, Lscript tf2_test_model_call.py/SequentialModel, do()LRoot; > Context: CallStringContext: [ script tf2_test_model_call.py.do()LRoot;@111 ] in test_model_call.py the IR is

synthetic < PythonLoader, Lscript tf2_test_model_call.py/SequentialModel, do()LRoot; >
CFG:
BB0[0..0]
    -> BB1
    -> BB14
BB1[1..1]
    -> BB2
    -> BB14
BB2[2..2]
    -> BB3
    -> BB14
BB3[3..3]
    -> BB4
    -> BB14
BB4[4..4]
    -> BB5
    -> BB14
BB5[5..5]
    -> BB6
    -> BB14
BB6[6..6]
    -> BB7
    -> BB14
BB7[7..7]
    -> BB8
    -> BB14
BB8[8..8]
    -> BB9
    -> BB14
BB9[9..9]
    -> BB10
    -> BB14
BB10[10..10]
    -> BB11
    -> BB14
BB11[11..11]
    -> BB12
    -> BB14
BB12[12..12]
    -> BB13
    -> BB14
BB13[13..13]
    -> BB14
BB14[-1..-2]
Instructions:
BB0
0   v4 = new <PythonLoader,Lobject>@0        
BB1
1   v5 = new <PythonLoader,L$script tf2_test_model_call.py/SequentialModel/__init__>@1
BB2
2   putfield v5.< PythonLoader, LRoot, $self, <PythonLoader,LRoot> > = v4
BB3
3   v6 = getfield < PythonLoader, LRoot, __init__, <PythonLoader,LRoot> > v1 [1=[self]]
BB4
4   putfield v5.< PythonLoader, LRoot, $function, <PythonLoader,LRoot> > = v6
BB5
5   putfield v4.< PythonLoader, LRoot, __init__, <PythonLoader,LRoot> > = v5
BB6
6   v7 = new <PythonLoader,L$script tf2_test_model_call.py/SequentialModel/__call__>@6
BB7
7   putfield v7.< PythonLoader, LRoot, $self, <PythonLoader,LRoot> > = v4
BB8
8   v8 = getfield < PythonLoader, LRoot, __call__, <PythonLoader,LRoot> > v1 [1=[self]]
BB9
9   putfield v7.< PythonLoader, LRoot, $function, <PythonLoader,LRoot> > = v8
BB10
10   putfield v4.< PythonLoader, LRoot, __call__, <PythonLoader,LRoot> > = v7
BB11
11   v9 = getfield < PythonLoader, LRoot, __init__, <PythonLoader,LRoot> > v1 [1=[self]]
BB12
12   v10 = invokeFunction < PythonLoader, LCodeBody, do()LRoot; > v9,v4 @12 exception:v11
BB13
13   return v4                               
BB14

For Node: synthetic < PythonLoader, Lscript tf2_test_model_call2.py/SequentialModel, do()LRoot; > Context: CallStringContext: [ script tf2_test_model_call2.py.do()LRoot;@111 ] in test_model_call2.py the IR is

synthetic < PythonLoader, Lscript tf2_test_model_call2.py/SequentialModel, do()LRoot; >
CFG:
BB0[0..0]
    -> BB1
    -> BB14
BB1[1..1]
    -> BB2
    -> BB14
BB2[2..2]
    -> BB3
    -> BB14
BB3[3..3]
    -> BB4
    -> BB14
BB4[4..4]
    -> BB5
    -> BB14
BB5[5..5]
    -> BB6
    -> BB14
BB6[6..6]
    -> BB7
    -> BB14
BB7[7..7]
    -> BB8
    -> BB14
BB8[8..8]
    -> BB9
    -> BB14
BB9[9..9]
    -> BB10
    -> BB14
BB10[10..10]
    -> BB11
    -> BB14
BB11[11..11]
    -> BB12
    -> BB14
BB12[12..12]
    -> BB13
    -> BB14
BB13[13..13]
    -> BB14
BB14[-1..-2]
Instructions:
BB0
0   v4 = new <PythonLoader,Lobject>@0        
BB1
1   v5 = new <PythonLoader,L$script tf2_test_model_call2.py/SequentialModel/__init__>@1
BB2
2   putfield v5.< PythonLoader, LRoot, $self, <PythonLoader,LRoot> > = v4
BB3
3   v6 = getfield < PythonLoader, LRoot, __init__, <PythonLoader,LRoot> > v1 [1=[self]]
BB4
4   putfield v5.< PythonLoader, LRoot, $function, <PythonLoader,LRoot> > = v6
BB5
5   putfield v4.< PythonLoader, LRoot, __init__, <PythonLoader,LRoot> > = v5
BB6
6   v7 = new <PythonLoader,L$script tf2_test_model_call2.py/SequentialModel/call>@6
BB7
7   putfield v7.< PythonLoader, LRoot, $self, <PythonLoader,LRoot> > = v4
BB8
8   v8 = getfield < PythonLoader, LRoot, call, <PythonLoader,LRoot> > v1 [1=[self]]
BB9
9   putfield v7.< PythonLoader, LRoot, $function, <PythonLoader,LRoot> > = v8
BB10
10   putfield v4.< PythonLoader, LRoot, call, <PythonLoader,LRoot> > = v7
BB11
11   v9 = getfield < PythonLoader, LRoot, __init__, <PythonLoader,LRoot> > v1 [1=[self]]
BB12
12   v10 = invokeFunction < PythonLoader, LCodeBody, do()LRoot; > v9,v4 @12 exception:v11
BB13
13   return v4                               
BB14

tatianacv · 2023-10-30T17:38:09Z

Comparison of the IR of all the nodes for model.__call__() (test 4) and model() (test 1) cases .

Please see this gist linked here.

khatchad · 2023-10-30T20:51:20Z

Thanks. Here is test 4 that works:

113   v269 = fieldref v265.v270:#__call__    tf2_test_model_call4.py [36:9] -> [36:14] [265=[model]]
114   v268 = invokeFunction < PythonLoader, LCodeBody, do()LRoot; > v269,v252 @114 exception:v271tf2_test_model_call4.py [36:9] -> [36:35] [268=[result]252=[input_data]]

khatchad · 2023-10-30T20:56:27Z

Here is test 1 that doesn't work:

113   v268 = invokeFunction < PythonLoader, LCodeBody, do()LRoot; > v265,v252 @113 exception:v269tf2_test_model_call.py [34:9] -> [34:26] [268=[result]265=[model]252=[input_data]]

khatchad · 2023-10-30T20:57:04Z

In case 4, the function being invoked is in v269, while it is in v265 in test 1.

khatchad · 2023-10-30T21:01:27Z

In test 4, we can see that v269 being assigned in instruction 113. But, in test 1, v265 is being assigned in instruction 111, which is as follows:

111   v265 = invokeFunction < PythonLoader, LCodeBody, do()LRoot; > v249 @111 exception:v266tf2_test_model_call4.py [35:8] -> [35:25] [265=[model]249=[SequentialModel]]

Thus, in test 1, the "function" being invoked is model (265=[model] above), but we know that model isn't a function, it's an object reference. In fact, this line also exists in test 4.

tatianacv · 2023-11-03T16:39:15Z

I suppose that __call__() has one parameter (self), which explains why we are not entering that method. It seems that method is only for functions not having any parameters because there is parametric polymorphism resolution in the else clause:

https://github.com/wala/WALA/blob/d7b408901e928aaf63c0d9f9b24a46403eb55d60/core/src/main/java/com/ibm/wala/ipa/callgraph/propagation/SSAPropagationCallGraphBuilder.java#L1170-L1173

Correct, we are falling into that else because the params has the value of {0}.

tatianacv · 2023-11-03T16:44:01Z

That method getTargetsForCall(...) is called for an instruction, which is then used for method getTargetForCall(...) (), which returns the CGNode for the particular call that should be dispatched. Therefore, getTargetsForCall(...) is for a specific instruction in the CGNode.

OK. Which instructions are those, because from your earlier comment, there was an invokeFunction instruction corresponding to a ctor for which you said control enters the method.

It enters that method for instructions that in visitInvokeInternal have params empty or that have params but return True on contentAreInvariants method.

Therefore, when I mean that it doesn't enter on those two is that those instructions are not passed as a param in the call for getTargetsForCall. This is because in visitInvokeInternal, the params are not empty; therefore, getTargetsForCall is not called on that instruction. https://github.com/wala/WALA/blob/d7b408901e928aaf63c0d9f9b24a46403eb55d60/core/src/main/java/com/ibm/wala/ipa/callgraph/propagation/SSAPropagationCallGraphBuilder.java#L1160-L1168.

Ah, so it only calls that method if there are no parameters?

Also when it has params but return True on contentAreInvariants method. (https://github.com/wala/WALA/blob/d7b408901e928aaf63c0d9f9b24a46403eb55d60/core/src/main/java/com/ibm/wala/ipa/callgraph/propagation/SSAPropagationCallGraphBuilder.java#L1186)

Also, when it goes into the else of the params not being empty, there is an if that checks if contents are invariant and that also returns false so we don't go into the method getTargetsForCall(...)

So, it does this for both the working an non-working cases? How then do the targets get resolved in the working case?

Yes, in both cases it has the same behavior in visitInvokeInternal.

khatchad · 2023-11-03T23:20:24Z

I suppose that __call__() has one parameter (self), which explains why we are not entering that method. It seems that method is only for functions not having any parameters because there is parametric polymorphism resolution in the else clause:

https://github.com/wala/WALA/blob/d7b408901e928aaf63c0d9f9b24a46403eb55d60/core/src/main/java/com/ibm/wala/ipa/callgraph/propagation/SSAPropagationCallGraphBuilder.java#L1170-L1173

Correct, we are falling into that else because the params has the value of {0}.

So what is going on in that else clause? How are the targets being resolved?

khatchad · 2023-11-03T23:22:16Z

Are they being added through the side effects?
https://github.com/wala/WALA/blob/d7b408901e928aaf63c0d9f9b24a46403eb55d60/core/src/main/java/com/ibm/wala/ipa/callgraph/propagation/SSAPropagationCallGraphBuilder.java#L1208

tatianacv · 2023-11-04T01:33:07Z

In https://github.com/wala/WALA/blob/d7b408901e928aaf63c0d9f9b24a46403eb55d60/core/src/main/java/com/ibm/wala/ipa/callgraph/propagation/SSAPropagationCallGraphBuilder.java#L1181, we have that test 1 gets 265 which is the instruction of the ctor and for test 4 (working test) it get the instruction 269 which is v269 = fieldref v265.v270:#__call__ tf2_test_model_call4.py [36:9] -> [36:14] [265=[model]]. Then, for the pointer kets we have (https://github.com/wala/WALA/blob/d7b408901e928aaf63c0d9f9b24a46403eb55d60/core/src/main/java/com/ibm/wala/ipa/callgraph/propagation/SSAPropagationCallGraphBuilder.java#L1197-L1203):

Test 1: [[Node: <Code body of function Lscript tf2_test_model_call.py> Context: CallStringContext: [ com.ibm.wala.FakeRootClass.fakeRootMethod()V@2 ], v265]]
Test 2: [[Node: <Code body of function Lscript tf2_test_model_call4.py> Context: CallStringContext: [ com.ibm.wala.FakeRootClass.fakeRootMethod()V@2 ], v269]]

khatchad · 2023-11-04T15:14:09Z

Yeah, but what's going on with the side effects? Is that how the edges are eventually created? See this log:

https://github.com/wala/WALA/blob/d7b408901e928aaf63c0d9f9b24a46403eb55d60/core/src/main/java/com/ibm/wala/ipa/callgraph/propagation/SSAPropagationCallGraphBuilder.java#L1194

You are hitting that for both cases, right? If so, what is going on here:

https://github.com/wala/WALA/blob/d7b408901e928aaf63c0d9f9b24a46403eb55d60/core/src/main/java/com/ibm/wala/ipa/callgraph/propagation/SSAPropagationCallGraphBuilder.java#L1208

Is that adding side effects that eventually get turned into edges in the call graph?

khatchad · 2023-11-04T15:17:26Z

The working case should create an edge, whole the non-working case should not.

khatchad · 2023-11-10T22:08:01Z

Looking over this again it seems you want to handle code like:
model = SequentialModel()
result = model(input_data)
Where, model is not a function but an object. If I understand correctly, at a high level you might be able to handle this by just tweaking the call graph builder / pointer analysis. You want some kind of conditional constraint for a call f(...), like, "If a function value flows to f, then that is the target. But if an object value o flows to f, then for each function value in o.__call__, invoke that." I'm not 100% sure this would work, but it's what I would look into first.

Hey @msridhar, thanks again for this idea. The problem we are having here is that, in the latter case above (...if an object value o flows to f, for each function value in o.__call__, invoke that), because there is no (explicit) call to __call__(), the CG node doesn't even exist for that method. Basically, we are at the point where we can see where the edge should be added bu we don't have anything to connect the edge to.

I understand that the CG construction creates nodes on-the-fly, so maybe it's a chicken and egg problem. Thanks again for your help!

khatchad · 2023-11-11T05:21:06Z

Sorry for the noise, @msridhar. I think what we are missing is that we need to add the invocation statement, and later in the worklist,-based algorithm, that will be picked up and the node will then be created.

tatianacv · 2023-11-14T21:06:05Z

The CG of the non-working case is the following:

Node: synthetic < PythonLoader, Lcom/ibm/wala/FakeRootClass, fakeRootMethod()V > Context: Everywhere
 - invokestatic < PythonLoader, Lcom/ibm/wala/FakeRootClass, fakeWorldClinit()V >@0
     -> Node: synthetic < PythonLoader, Lcom/ibm/wala/FakeRootClass, fakeWorldClinit()V > Context: Everywhere
 - invokevirtual < PythonLoader, Lscript tf2_test_model_call.py, do()LRoot; >@2
     -> Node: <Code body of function Lscript tf2_test_model_call.py> Context: CallStringContext: [ com.ibm.wala.FakeRootClass.fakeRootMethod()V@2 ]

Node: synthetic < PythonLoader, Lcom/ibm/wala/FakeRootClass, fakeWorldClinit()V > Context: Everywhere

Node: <Code body of function Lscript tf2_test_model_call.py> Context: CallStringContext: [ com.ibm.wala.FakeRootClass.fakeRootMethod()V@2 ]
 - invokestatic < PythonLoader, Ltensorflow, import()Ltensorflow; >@90
     -> Node: synthetic < PythonLoader, Ltensorflow, import()Ltensorflow; > Context: CallStringContext: [ script tf2_test_model_call.py.do()LRoot;@90 ]
 - JSCall@109
     -> Node: synthetic < PythonLoader, Ltensorflow/functions/uniform, do()LRoot; > Context: CallStringContext: [ script tf2_test_model_call.py.do()LRoot;@109 ]
 - JSCall@111
     -> Node: synthetic < PythonLoader, Lscript tf2_test_model_call.py/SequentialModel, do()LRoot; > Context: CallStringContext: [ script tf2_test_model_call.py.do()LRoot;@111 ]

Node: synthetic < PythonLoader, Ltensorflow, import()Ltensorflow; > Context: CallStringContext: [ script tf2_test_model_call.py.do()LRoot;@90 ]

Node: synthetic < PythonLoader, Lscript tf2_test_model_call.py/SequentialModel, do()LRoot; > Context: CallStringContext: [ script tf2_test_model_call.py.do()LRoot;@111 ]
 - JSCall@12
     -> Node: <Code body of function Lscript tf2_test_model_call.py/SequentialModel/__init__> Context: CallStringContext: [ script tf2_test_model_call.py.SequentialModel.do()LRoot;@12 ]

Node: <Code body of function Lscript tf2_test_model_call.py/SequentialModel/__init__> Context: CallStringContext: [ script tf2_test_model_call.py.SequentialModel.do()LRoot;@12 ]
 - JSCall@5
     -> Node: synthetic < PythonLoader, Lsuperfun, do()LRoot; > Context: DelegatingContext [A=super call, B=CallStringContext: [ script tf2_test_model_call.py.SequentialModel.__init__.do()LRoot;@5 ]]
 - JSCall@24
     -> Node: synthetic < PythonLoader, Lwala/builtin/range, do()LRoot; > Context: CallStringContext: [ script tf2_test_model_call.py.SequentialModel.__init__.do()LRoot;@24 ]
 - JSCall@25
     -> Node: synthetic < PythonLoader, LCodeBody, __Lscript tf2_test_model_call.py/SequentialModel/__init__/comprehension1()LRoot; > Context: CallStringContext: [ script tf2_test_model_call.py.SequentialModel.__init__.do()LRoot;@25 ]

Node: synthetic < PythonLoader, Lsuperfun, do()LRoot; > Context: DelegatingContext [A=super call, B=CallStringContext: [ script tf2_test_model_call.py.SequentialModel.__init__.do()LRoot;@5 ]]

Node: synthetic < PythonLoader, Lwala/builtin/range, do()LRoot; > Context: CallStringContext: [ script tf2_test_model_call.py.SequentialModel.__init__.do()LRoot;@24 ]

Node: synthetic < PythonLoader, LCodeBody, __Lscript tf2_test_model_call.py/SequentialModel/__init__/comprehension1()LRoot; > Context: CallStringContext: [ script tf2_test_model_call.py.SequentialModel.__init__.do()LRoot;@25 ]
 - JSCall@2
     -> Node: <Code body of function Lscript tf2_test_model_call.py/SequentialModel/__init__/comprehension1> Context: CallStringContext: [ CodeBody.__Lscript tf2_test_model_call.py/SequentialModel/__init__/comprehension1()LRoot;@2 ]

Node: <Code body of function Lscript tf2_test_model_call.py/SequentialModel/__init__/comprehension1> Context: CallStringContext: [ CodeBody.__Lscript tf2_test_model_call.py/SequentialModel/__init__/comprehension1()LRoot;@2 ]

Node: synthetic < PythonLoader, Ltensorflow/functions/uniform, do()LRoot; > Context: CallStringContext: [ script tf2_test_model_call.py.do()LRoot;@109 ]
 - invokevirtual < PythonLoader, LRoot, read_data()LRoot; >@0
     -> Node: synthetic < PythonLoader, Ltensorflow/functions/uniform, read_data()LRoot; > Context: CallStringContext: [ tensorflow.functions.uniform.do()LRoot;@0 ]

Node: synthetic < PythonLoader, Ltensorflow/functions/uniform, read_data()LRoot; > Context: CallStringContext: [ tensorflow.functions.uniform.do()LRoot;@0 ]

khatchad · 2023-11-14T21:30:19Z

Thanks, @tatianacv. Can you find in the call graph construction algorithm where it is processing Node: <Code body of function Lscript tf2_test_model_call.py>? What does it do once it hits model(data)? Where does it look for a function called model() and then fail when it doesn't find it? Can you find in the code where that is happening?

tatianacv · 2023-11-21T15:38:31Z

Thanks, @tatianacv. Can you find in the call graph construction algorithm where it is processing Node: <Code body of function Lscript tf2_test_model_call.py>?

Yes, we start processing that node in unconditionallyAddConstraintsFromNode from SSAPropagationCallGraphBuilder in WALA.

What does it do once it hits model(data)?

It goes to visitPythonInvoke and proceeds to have the same behavior we have seen.

Where does it look for a function called model() and then fail when it doesn't find it? Can you find in the code where that is happening?

What I found and the reason why call is referencing to v265 (the result of the ctor) is that in the addBypassLogic, the code sets the options for the method selector as seen below:

ML/com.ibm.wala.cast.python/source/com/ibm/wala/cast/python/client/PythonAnalysisEngine.java

Lines 288 to 291 in 386512a

    
           options.setSelector( 
        
               new PythonTrampolineTargetSelector( 
        
                   new PythonConstructorTargetSelector( 
        
                       new PythonComprehensionTrampolines(options.getMethodTargetSelector()))));

Therefore, when we are in v265 (the result of the ctor), the receiver Core[script tf2_test_model_call.py/SequentialModel] which is used to get the target (https://github.com/ponder-lab/ML/blob/386512a3d439213f1c64de48473db1baaa006735/com.ibm.wala.cast.python/source/com/ibm/wala/cast/python/ipa/callgraph/PythonConstructorTargetSelector.java) has in the methodTypes:

[< PythonLoader, Lscript tf2_test_model_call.py/SequentialModel/__init__, __init__()LRoot; >, < PythonLoader, Lscript tf2_test_model_call.py/SequentialModel/__call__, __call__()LRoot; >]

. This gets processed in this PythonConstructorTargetSelector.getCalleeTarget(...) and creates statements for __call__, which we see in the instructions and references the v265 (the result of the ctor), but does not add the __call__. Since __call__ is in the methodTypes, we process the __call__ in this code snippet:

ML/com.ibm.wala.cast.python/source/com/ibm/wala/cast/python/ipa/callgraph/PythonConstructorTargetSelector.java

Lines 74 to 139 in 386512a

    
           for (TypeReference r : x.getInnerReferences()) { 
        
             int orig_t = v++; 
        
             String typeName = r.getName().toString(); 
        
             typeName = typeName.substring(typeName.lastIndexOf('/') + 1); 
        
             FieldReference inner = 
        
                 FieldReference.findOrCreate( 
        
                     PythonTypes.Root, Atom.findOrCreateUnicodeAtom(typeName), PythonTypes.Root); 
        
             ctor.addStatement(insts.GetInstruction(pc, orig_t, 1, inner)); 
        
             pc++; 
        
             ctor.addStatement(insts.PutInstruction(pc, inst, orig_t, inner)); 
        
             pc++; 
        
           } 
        
           for (MethodReference r : x.getMethodReferences()) { 
        
             int f = v++; 
        
             ctor.addStatement( 
        
                 insts.NewInstruction( 
        
                     pc, 
        
                     f, 
        
                     NewSiteReference.make( 
        
                         pc, 
        
                         PythonInstanceMethodTrampoline.findOrCreate( 
        
                             r.getDeclaringClass(), receiver.getClassHierarchy())))); 
        
             pc++; 
        
             ctor.addStatement( 
        
                 insts.PutInstruction( 
        
                     pc, 
        
                     f, 
        
                     inst, 
        
                     FieldReference.findOrCreate( 
        
                         PythonTypes.Root, 
        
                         Atom.findOrCreateUnicodeAtom("$self"), 
        
                         PythonTypes.Root))); 
        
             pc++; 
        
             int orig_f = v++; 
        
             ctor.addStatement( 
        
                 insts.GetInstruction( 
        
                     pc, 
        
                     orig_f, 
        
                     1, 
        
                     FieldReference.findOrCreate(PythonTypes.Root, r.getName(), PythonTypes.Root))); 
        
             pc++; 
        
             ctor.addStatement( 
        
                 insts.PutInstruction( 
        
                     pc, 
        
                     f, 
        
                     orig_f, 
        
                     FieldReference.findOrCreate( 
        
                         PythonTypes.Root, 
        
                         Atom.findOrCreateUnicodeAtom("$function"), 
        
                         PythonTypes.Root))); 
        
             pc++; 
        
             ctor.addStatement( 
        
                 insts.PutInstruction( 
        
                     pc, 
        
                     inst, 
        
                     f, 
        
                     FieldReference.findOrCreate(PythonTypes.Root, r.getName(), PythonTypes.Root))); 
        
             pc++; 
        
           }

The results of the added statements for __call__ are:

8 = getfield < PythonLoader, LRoot, __call__, <PythonLoader,LRoot> > 1
putfield 7.< PythonLoader, LRoot, $function, <PythonLoader,LRoot> > = 8
putfield 4.< PythonLoader, LRoot, __call__, <PythonLoader,LRoot> > = 7

Just for reference all statements are [4 = new <PythonLoader,Lobject>@0, 5 = new <PythonLoader,L$script tf2_test_model_call.py/SequentialModel/__init__>@1, putfield 5.< PythonLoader, LRoot, $self, <PythonLoader,LRoot> > = 4, 6 = getfield < PythonLoader, LRoot, __init__, <PythonLoader,LRoot> > 1, putfield 5.< PythonLoader, LRoot, $function, <PythonLoader,LRoot> > = 6, putfield 4.< PythonLoader, LRoot, __init__, <PythonLoader,LRoot> > = 5, 7 = new <PythonLoader,L$script tf2_test_model_call.py/SequentialModel/__call__>@6, putfield 7.< PythonLoader, LRoot, $self, <PythonLoader,LRoot> > = 4, 8 = getfield < PythonLoader, LRoot, __call__, <PythonLoader,LRoot> > 1, putfield 7.< PythonLoader, LRoot, $function, <PythonLoader,LRoot> > = 8, putfield 4.< PythonLoader, LRoot, __call__, <PythonLoader,LRoot> > = 7]

khatchad · 2023-11-21T17:56:05Z

Thanks, @tatianacv. Can you find in the call graph construction algorithm where it is processing Node: <Code body of function Lscript tf2_test_model_call.py>?

Yes, we start processing that node in unconditionallyAddConstraintsFromNode from SSAPropagationCallGraphBuilder in WALA.

Link?

What does it do once it hits model(data)?

It goes to visitPythonInvoke and proceeds to have the same behavior we have seen.

Link?

Where does it look for a function called model() and then fail when it doesn't find it? Can you find in the code where that is happening?

What I found and the reason why call is referencing to v265 (the result of the ctor)

I don't think this was ever under question.

khatchad · 2023-11-22T16:13:17Z

The variable v265 is the same in both cases; it references the object created by the ctor invocation. Why v265 references an object is easy to see.

So, to answer this question:

Where does it look for a function called model() and then fail when it doesn't find it? Can you find in the code where that is happening?

I am seeing this in the pointer analysis:

[Node: <Code body of function Lscript tf2_test_model_call.py> Context: CallStringContext: [ com.ibm.wala.FakeRootClass.fakeRootMethod()V@2 ], v265] --> [SITE_IN_NODE{synthetic < PythonLoader, Lscript tf2_test_model_call.py/SequentialModel, do()LRoot; >:Lobject in CallStringContext: [ script tf2_test_model_call.py.do()LRoot;@112 ]}]

That means that v265 is pointing to an object. In other words, the function being invoked is, well, an object, and hence why this is a callable case. We can't say that the constructor is being called again because of this (as I think you are alluding to in your response), because we aren't seeing extra nodes in the call graph in test 1.

khatchad · 2023-11-22T17:07:31Z

I wonder if the MethodSelector is a bit too late in the game to determine callables. Instead, I am wondering if the CallSiteReference given to getCalleeTarget should have a declaredTarget that refers to the correct function. But, what a declared target means in Python is what I am unsure about.

khatchad · 2023-11-22T17:13:03Z

So, when com.ibm.wala.ipa.callgraph.MethodTargetSelector.getCalleeTarget(CGNode, CallSiteReference, IClass) is called in the working test (test 4), receiver is Trampoline[$script tf2_test_model_call4.py/SequentialModel/__call__]. That is what I mean in #53 (comment). Perhaps in the non-working test (test 1), that information should also be present.

khatchad · 2023-11-22T17:21:44Z

BTW, the receiver here is an instance of com.ibm.wala.cast.python.ipa.summaries.PythonInstanceMethodTrampoline.

khatchad · 2023-11-22T17:23:28Z

The receiver's reference is the TypeReference <PythonLoader,L$script tf2_test_model_call4.py/SequentialModel/__call__>.

khatchad · 2023-11-22T17:24:37Z

The corresponding MethodReference is < PythonLoader, L$script tf2_test_model_call4.py/SequentialModel/__call__, trampoline2()LRoot; >.

khatchad · 2023-11-22T17:45:11Z

In the non-working case, the receiver is an instance of com.ibm.wala.cast.loader.CAstAbstractModuleLoader.CoreClass with the value of Core[object].

khatchad · 2023-11-22T17:50:59Z

I should have mentioned that in the working case, com.ibm.wala.cast.python.ipa.callgraph.PythonTrampolineTargetSelector.getCalleeTarget(CGNode, CallSiteReference, IClass) returns an IMethod with the value synthetic < PythonLoader, L$script tf2_test_model_call4.py/SequentialModel/__call__, trampoline2()LRoot; >. It's actual type is com.ibm.wala.cast.python.ipa.summaries.PythonSummarizedFunction.

khatchad · 2023-11-22T18:00:01Z

In the non-working case, that method returns null (note that the trampoline is the outer MethodSelector; there are inner ones like the constructor one that @tatianacv pointed out, but this is the "last" one). So, we now have the answer to our original question:

Where does it look for a function called model() and then fail when it doesn't find it? Can you find in the code where that is happening?

It's in: https://github.com/wala/WALA/blob/5aa300b2d0f0c672d027d9d896792985e1a69424/core/src/main/java/com/ibm/wala/ipa/summaries/BypassMethodTargetSelector.java#L139. The variable target is null there because the target cannot be found (i.e., there is no function named model()).

khatchad · 2023-11-22T18:00:56Z

Thanks for finding the com.ibm.wala.ipa.callgraph.MethodTargetSelector, @tatianacv. Good work.

khatchad · 2023-11-22T19:12:56Z

In the working case, the instance key is:

SITE_IN_NODE{synthetic < PythonLoader, Lscript tf2_test_model_call4.py/SequentialModel, do()LRoot; >:L$script tf2_test_model_call4.py/SequentialModel/__call__ in CallStringContext: [ script tf2_test_model_call4.py.do()LRoot;@112 ]}

In the non-working case, it is:

SITE_IN_NODE{synthetic < PythonLoader, Lscript tf2_test_model_call.py/SequentialModel, do()LRoot; >:Lobject in CallStringContext: [ script tf2_test_model_call.py.do()LRoot;@112 ]}

khatchad · 2023-11-22T22:02:33Z

The problem I'm seeing now is that the pointer analysis is unavailable in com.ibm.wala.cast.python.ipa.callgraph.PythonTrampolineTargetSelector.getCalleeTarget(CGNode, CallSiteReference, IClass); it's not built yet. There is, however, another method, namely, com.ibm.wala.cast.python.client.PytestAnalysisEngine.PytestTargetSelector.getCalleeTarget(CGNode, CallSiteReference, IClass), that does use the pointer analysis. But, it uses it for the summaries, which happens later. Thus, I'm wondering if callable processing should happen at a later time like the call to com.ibm.wala.cast.python.ipa.summaries.PythonSuper.handleSuperCalls(SSAPropagationCallGraphBuilder, AnalysisOptions), which is the last thing that happens in com.ibm.wala.cast.python.client.PythonAnalysisEngine.getCallGraphBuilder(IClassHierarchy, AnalysisOptions, IAnalysisCacheView).

khatchad · 2023-11-22T22:11:01Z

Looks like com.ibm.wala.client.AbstractAnalysisEngine.pointerAnalysis is null, but there is also com.ibm.wala.ipa.callgraph.propagation.PropagationCallGraphBuilder.getPointerAnalysis(), which may be available in com.ibm.wala.cast.python.client.PythonAnalysisEngine.getCallGraphBuilder(IClassHierarchy, AnalysisOptions, IAnalysisCacheView).

khatchad · 2023-11-22T22:25:14Z

If I run com.ibm.wala.cast.python.jython3.test.TestAnnotations.testAnnotation2(), com.ibm.wala.ipa.callgraph.propagation.PropagationCallGraphBuilder.getPointerAnalysis() returns non-null in com.ibm.wala.cast.python.client.PytestAnalysisEngine.PytestTargetSelector.getCalleeTarget(CGNode, CallSiteReference, IClass).

khatchad · 2023-11-27T15:23:13Z

has in the methodTypes: [< PythonLoader, Lscript tf2_test_model_call.py/SequentialModel/__init__, __init__()LRoot; >, < PythonLoader, Lscript tf2_test_model_call.py/SequentialModel/__call__, __call__()LRoot; >].

Yeah, but it shouldn't even matter whether __call__ is defined or not. If it's not defined, then we should return null as usual.

khatchad · 2023-11-27T19:03:19Z

we process the __call__ in this code snippet:

Actually, I think that the com.ibm.wala.cast.python.ipa.callgraph.PythonConstructorTargetSelector is a good example to look at; it also adds a call at the end of com.ibm.wala.cast.python.ipa.callgraph.PythonConstructorTargetSelector.getCalleeTarget(CGNode, CallSiteReference, IClass):

ML/com.ibm.wala.cast.python/source/com/ibm/wala/cast/python/ipa/callgraph/PythonConstructorTargetSelector.java

Lines 163 to 165 in 2fb00a1

    
           CallSiteReference cref = new DynamicCallSiteReference(site.getDeclaredTarget(), pc); 
        
           ctor.addStatement( 
        
               new PythonInvokeInstruction(2, result, except, cref, cps, new Pair[0]));

khatchad · 2023-11-29T15:56:31Z

Move the conversation over to wala#24.

tatianacv added 4 commits October 14, 2023 12:46

adding tests for model call

3c0dda1

applying spotless

131b619

adding new lines

210ec4a

applying spotless

a95bb4e

khatchad requested changes Oct 17, 2023

View reviewed changes

tatianacv added 2 commits October 18, 2023 17:01

Adressing requested changes in PR

097a13a

Apply spotless

e27d147

khatchad reviewed Oct 20, 2023

View reviewed changes

...bm.wala.cast.python.ml.test/source/com/ibm/wala/cast/python/ml/test/TestTensorflowModel.java Outdated Show resolved Hide resolved

Merge branch 'master' into addModelCallTests

7740099

khatchad assigned tatianacv Oct 20, 2023

tatianacv added 2 commits October 23, 2023 00:55

Merge branch 'master' into addModelCallTests

727fe0f

Fix comment

15ed233

khatchad approved these changes Oct 23, 2023

View reviewed changes

khatchad merged commit a05986d into master Oct 23, 2023
3 checks passed

khatchad deleted the addModelCallTests branch October 23, 2023 19:56

Add model call tests #53

Add model call tests #53

Conversation

tatianacv commented Oct 14, 2023

khatchad commented Oct 20, 2023

khatchad commented Oct 20, 2023

khatchad left a comment

Choose a reason for hiding this comment

khatchad commented Oct 23, 2023

tatianacv commented Oct 24, 2023

tatianacv commented Oct 24, 2023 • edited Loading

khatchad commented Oct 24, 2023

khatchad commented Oct 24, 2023

tatianacv commented Oct 25, 2023

khatchad commented Oct 26, 2023 • edited Loading

tatianacv commented Oct 26, 2023

tatianacv commented Oct 30, 2023

khatchad commented Oct 30, 2023

khatchad commented Oct 30, 2023

khatchad commented Oct 30, 2023

khatchad commented Oct 30, 2023

tatianacv commented Nov 3, 2023

tatianacv commented Nov 3, 2023

khatchad commented Nov 3, 2023

khatchad commented Nov 3, 2023

tatianacv commented Nov 4, 2023

khatchad commented Nov 4, 2023

khatchad commented Nov 4, 2023

khatchad commented Nov 10, 2023

khatchad commented Nov 11, 2023

tatianacv commented Nov 14, 2023

khatchad commented Nov 14, 2023

tatianacv commented Nov 21, 2023

khatchad commented Nov 21, 2023

khatchad commented Nov 22, 2023

khatchad commented Nov 22, 2023

khatchad commented Nov 22, 2023

khatchad commented Nov 22, 2023

khatchad commented Nov 22, 2023

khatchad commented Nov 22, 2023

khatchad commented Nov 22, 2023

khatchad commented Nov 22, 2023

khatchad commented Nov 22, 2023

khatchad commented Nov 22, 2023

khatchad commented Nov 22, 2023

khatchad commented Nov 22, 2023

khatchad commented Nov 22, 2023

khatchad commented Nov 22, 2023 • edited Loading

khatchad commented Nov 27, 2023

khatchad commented Nov 27, 2023

khatchad commented Nov 29, 2023

tatianacv commented Oct 24, 2023 •

edited

Loading

khatchad commented Oct 26, 2023 •

edited

Loading

khatchad commented Nov 22, 2023 •

edited

Loading