Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add model call tests #53

Merged
merged 9 commits into from
Oct 23, 2023
Merged

Add model call tests #53

merged 9 commits into from
Oct 23, 2023

Conversation

tatianacv
Copy link
Member

Adding tests for testing calling a method. We add four test that include

  1. Calling model indirectly using call and __call__ (2 tests)
  2. Calling a model directly using call and __call__ (2 tests)

Related to wala#24

@khatchad
Copy link
Member

This is looking good. Thanks. For each of the model calls, what does the IR look like?

@khatchad
Copy link
Member

I think there are two problems here:

  1. I am going to guess that the IR will treat model() as a function call with no definition. In other words, there will be no call graph node corresponding to the called function.
  2. The superclass' __call__() method will invoke the subclass' call() method. Since the superclass isn't part of the client code, I am guessing that's missing as well, and would need to be somehow modeled using the summaries, if possible.

Copy link
Member

@khatchad khatchad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@khatchad khatchad merged commit a05986d into master Oct 23, 2023
3 checks passed
@khatchad khatchad deleted the addModelCallTests branch October 23, 2023 19:56
@khatchad
Copy link
Member

Can you also throw this up to http://github.com/wala/ML?

@tatianacv
Copy link
Member Author

Can you also throw this up to http://github.com/wala/ML?

wala#97

@tatianacv
Copy link
Member Author

tatianacv commented Oct 24, 2023

This is looking good. Thanks. For each of the model calls, what does the IR look like?

For test_model_call2.py (where it is not working), the model call does not show up as a node.

For test_model_call_3.py (which is the case where it is working), the node for the call is the following Node: synthetic < PythonLoader, L$script tf2_test_model_call3.py/SequentialModel/call, trampoline2()LRoot; > Context: CallStringContext: [ script tf2_test_model_call3.py.do()LRoot;@114 ] and the IR is

synthetic < PythonLoader, L$script tf2_test_model_call3.py/SequentialModel/call, trampoline2()LRoot; >
CFG:
BB0[0..0]
    -> BB1
    -> BB5
BB1[1..1]
    -> BB2
    -> BB5
BB2[2..2]
    -> BB3
    -> BB5
BB3[3..3]
    -> BB4
    -> BB5
BB4[4..4]
    -> BB5
BB5[-1..-2]
Instructions:
BB0
0   v3 = getfield < PythonLoader, LRoot, $function, <PythonLoader,LRoot> > v1
BB1
1   v4 = checkcast <PythonLoader,Lscript tf2_test_model_call3.py/SequentialModel/call>v3
BB2
2   v5 = getfield < PythonLoader, LRoot, $self, <PythonLoader,LRoot> > v1
BB3
3   v6 = invokeFunction < PythonLoader, LCodeBody, do()LRoot; > v4,v5,v2 @2 exception:v7
BB4
4   return v6                                
BB5

and for call body, the node is Node: <Code body of function Lscript tf2_test_model_call3.py/SequentialModel/call> Context: CallStringContext: [ $script tf2_test_model_call3.py.SequentialModel.call.trampoline2()LRoot;@2 ] and the IR is

<Code body of function Lscript tf2_test_model_call3.py/SequentialModel/call>
CFG:
BB0[-1..-2]
    -> BB1
BB1[0..5]
    -> BB2
    -> BB9
BB2[6..7]
    -> BB3
BB3[8..12]
    -> BB6
    -> BB4
BB4[13..17]
    -> BB5
    -> BB9
BB5[18..19]
    -> BB3
BB6[20..21]
    -> BB7
    -> BB9
BB7[22..24]
    -> BB8
    -> BB9
BB8[25..26]
    -> BB9
BB9[-1..-2]
Instructions:
BB0
BB1
0   v4 = new <PythonLoader,Lsuperfun>@0      tf2_test_model_call3.py [21:2] -> [30:12] [4=[super]]
1   v6 = lexical:SequentialModel@Lscript tf2_test_model_call3.pytf2_test_model_call3.py [21:2] -> [30:12]
2   fieldref v4.v7:#$class = v6 = v6         tf2_test_model_call3.py [21:2] -> [30:12] [4=[super]]
3   fieldref v4.v8:#$self = v2 = v2          tf2_test_model_call3.py [21:2] -> [30:12] [4=[super]2=[self]]
4   v10 = fieldref v2.v11:#flatten           tf2_test_model_call3.py [22:8] -> [22:12] [2=[self]]
5   v9 = invokeFunction < PythonLoader, LCodeBody, do()LRoot; > v10,v3 @5 exception:v12tf2_test_model_call3.py [22:8] -> [22:23] [9=[x]3=[x]]
BB2
7   v13 = fieldref v2.v14:#my_layers         tf2_test_model_call3.py [24:17] -> [24:31] [13=[temp 3]2=[self]]
BB3
           v36 = phi  v23,v9
8   v18 = global:global layer                tf2_test_model_call3.py [24:8] -> [24:13]
9   v19 = a property name of v13             <no information> [13=[temp 3]]
10   global:global layer = v19               tf2_test_model_call3.py [21:2] -> [30:12]
11   v15 = binaryop(ne) v16:#null , v19      tf2_test_model_call3.py [21:2] -> [30:12]
12   conditional branch(eq, to iindex=20) v15,v20:#0tf2_test_model_call3.py [21:2] -> [30:12]
BB4
13   v22 = global:global layer               tf2_test_model_call3.py [24:8] -> [24:13]
14   v21 = fieldref v13.v22                  tf2_test_model_call3.py [21:2] -> [30:12] [13=[temp 3]]
15   global:global layer = v21               tf2_test_model_call3.py [21:2] -> [30:12]
16   v24 = global:global layer               tf2_test_model_call3.py [25:10] -> [25:15]
17   v23 = invokeFunction < PythonLoader, LCodeBody, do()LRoot; > v24,v36 @17 exception:v25tf2_test_model_call3.py [25:10] -> [25:18] [23=[x]36=[x]]
BB5
19   goto (from iindex= 19 to iindex = 8)    tf2_test_model_call3.py [21:2] -> [30:12]
BB6
20   v29 = fieldref v2.v30:#dropout          tf2_test_model_call3.py [27:8] -> [27:12] [2=[self]]
21   v28 = invokeFunction < PythonLoader, LCodeBody, do()LRoot; > v29,v36 @21 exception:v31tf2_test_model_call3.py [27:8] -> [27:23] [28=[x]36=[x]]
BB7
23   v33 = fieldref v2.v34:#dense_2          tf2_test_model_call3.py [28:8] -> [28:12] [2=[self]]
24   v32 = invokeFunction < PythonLoader, LCodeBody, do()LRoot; > v33,v28 @24 exception:v35tf2_test_model_call3.py [28:8] -> [28:23] [32=[x]28=[x]]
BB8
26   return v32                              tf2_test_model_call3.py [21:2] -> [30:12] [32=[x]]
BB9

@khatchad
Copy link
Member

This is looking good. Thanks. For each of the model calls, what does the IR look like?

For test_model_call2.py (where it is not working), the model call does not show up as a node.

We need the IR for this node.

For test_model_call_3.py (which is the case where it is working), the node for the call is the following `Node: synthetic < PythonLoader, L$script tf2_test_model_call3.py/SequentialModel/call, trampoline2()LRoot; > Context: CallStringContext: [ script tf2_test_model_call3.py.do()LRoot;@114 and the IR is

Let's focus on the __call__ case here, since that is simpler.

...
and for call body

We don't need the body, just the client code. Actually, the node name alone would be helpful:

<Code body of function Lscript tf2_test_model_call3.py/SequentialModel/call>

1 similar comment
@khatchad
Copy link
Member

This is looking good. Thanks. For each of the model calls, what does the IR look like?

For test_model_call2.py (where it is not working), the model call does not show up as a node.

We need the IR for this node.

For test_model_call_3.py (which is the case where it is working), the node for the call is the following `Node: synthetic < PythonLoader, L$script tf2_test_model_call3.py/SequentialModel/call, trampoline2()LRoot; > Context: CallStringContext: [ script tf2_test_model_call3.py.do()LRoot;@114 and the IR is

Let's focus on the __call__ case here, since that is simpler.

...
and for call body

We don't need the body, just the client code. Actually, the node name alone would be helpful:

<Code body of function Lscript tf2_test_model_call3.py/SequentialModel/call>

@tatianacv
Copy link
Member Author

For test_model_call.py and test_model_call2.py, there are no SequentialModel/call or SequentialModel/__call__ in the callgraph. Please refer to them here CG for test_model_call.py and CG for test_model_call.2py. These are the test that do not work.

For test_model_call_3.py, I added the IR above, which is the call case.

For test_model_call_4.py, for the node Node: synthetic < PythonLoader, L$script tf2_test_model_call4.py/SequentialModel/__call__, trampoline2()LRoot; > Context: CallStringContext: [ script tf2_test_model_call4.py.do()LRoot;@114 ] the IR is the following:

synthetic < PythonLoader, L$script tf2_test_model_call4.py/SequentialModel/__call__, trampoline2()LRoot; >
CFG:
BB0[0..0]
    -> BB1
    -> BB5
BB1[1..1]
    -> BB2
    -> BB5
BB2[2..2]
    -> BB3
    -> BB5
BB3[3..3]
    -> BB4
    -> BB5
BB4[4..4]
    -> BB5
BB5[-1..-2]
Instructions:
BB0
0   v3 = getfield < PythonLoader, LRoot, $function, <PythonLoader,LRoot> > v1
BB1
1   v4 = checkcast <PythonLoader,Lscript tf2_test_model_call4.py/SequentialModel/__call__>v3
BB2
2   v5 = getfield < PythonLoader, LRoot, $self, <PythonLoader,LRoot> > v1
BB3
3   v6 = invokeFunction < PythonLoader, LCodeBody, do()LRoot; > v4,v5,v2 @2 exception:v7
BB4
4   return v6                                
BB5

@khatchad
Copy link
Member

khatchad commented Oct 26, 2023

For test_model_call.py and test_model_call2.py, there are no SequentialModel/call or SequentialModel/__call__ in the callgraph.

We need the IR of the calling functions (callers).

@tatianacv
Copy link
Member Author

For Node: synthetic < PythonLoader, Lscript tf2_test_model_call.py/SequentialModel, do()LRoot; > Context: CallStringContext: [ script tf2_test_model_call.py.do()LRoot;@111 ] in test_model_call.py the IR is

synthetic < PythonLoader, Lscript tf2_test_model_call.py/SequentialModel, do()LRoot; >
CFG:
BB0[0..0]
    -> BB1
    -> BB14
BB1[1..1]
    -> BB2
    -> BB14
BB2[2..2]
    -> BB3
    -> BB14
BB3[3..3]
    -> BB4
    -> BB14
BB4[4..4]
    -> BB5
    -> BB14
BB5[5..5]
    -> BB6
    -> BB14
BB6[6..6]
    -> BB7
    -> BB14
BB7[7..7]
    -> BB8
    -> BB14
BB8[8..8]
    -> BB9
    -> BB14
BB9[9..9]
    -> BB10
    -> BB14
BB10[10..10]
    -> BB11
    -> BB14
BB11[11..11]
    -> BB12
    -> BB14
BB12[12..12]
    -> BB13
    -> BB14
BB13[13..13]
    -> BB14
BB14[-1..-2]
Instructions:
BB0
0   v4 = new <PythonLoader,Lobject>@0        
BB1
1   v5 = new <PythonLoader,L$script tf2_test_model_call.py/SequentialModel/__init__>@1
BB2
2   putfield v5.< PythonLoader, LRoot, $self, <PythonLoader,LRoot> > = v4
BB3
3   v6 = getfield < PythonLoader, LRoot, __init__, <PythonLoader,LRoot> > v1 [1=[self]]
BB4
4   putfield v5.< PythonLoader, LRoot, $function, <PythonLoader,LRoot> > = v6
BB5
5   putfield v4.< PythonLoader, LRoot, __init__, <PythonLoader,LRoot> > = v5
BB6
6   v7 = new <PythonLoader,L$script tf2_test_model_call.py/SequentialModel/__call__>@6
BB7
7   putfield v7.< PythonLoader, LRoot, $self, <PythonLoader,LRoot> > = v4
BB8
8   v8 = getfield < PythonLoader, LRoot, __call__, <PythonLoader,LRoot> > v1 [1=[self]]
BB9
9   putfield v7.< PythonLoader, LRoot, $function, <PythonLoader,LRoot> > = v8
BB10
10   putfield v4.< PythonLoader, LRoot, __call__, <PythonLoader,LRoot> > = v7
BB11
11   v9 = getfield < PythonLoader, LRoot, __init__, <PythonLoader,LRoot> > v1 [1=[self]]
BB12
12   v10 = invokeFunction < PythonLoader, LCodeBody, do()LRoot; > v9,v4 @12 exception:v11
BB13
13   return v4                               
BB14

For Node: synthetic < PythonLoader, Lscript tf2_test_model_call2.py/SequentialModel, do()LRoot; > Context: CallStringContext: [ script tf2_test_model_call2.py.do()LRoot;@111 ] in test_model_call2.py the IR is

synthetic < PythonLoader, Lscript tf2_test_model_call2.py/SequentialModel, do()LRoot; >
CFG:
BB0[0..0]
    -> BB1
    -> BB14
BB1[1..1]
    -> BB2
    -> BB14
BB2[2..2]
    -> BB3
    -> BB14
BB3[3..3]
    -> BB4
    -> BB14
BB4[4..4]
    -> BB5
    -> BB14
BB5[5..5]
    -> BB6
    -> BB14
BB6[6..6]
    -> BB7
    -> BB14
BB7[7..7]
    -> BB8
    -> BB14
BB8[8..8]
    -> BB9
    -> BB14
BB9[9..9]
    -> BB10
    -> BB14
BB10[10..10]
    -> BB11
    -> BB14
BB11[11..11]
    -> BB12
    -> BB14
BB12[12..12]
    -> BB13
    -> BB14
BB13[13..13]
    -> BB14
BB14[-1..-2]
Instructions:
BB0
0   v4 = new <PythonLoader,Lobject>@0        
BB1
1   v5 = new <PythonLoader,L$script tf2_test_model_call2.py/SequentialModel/__init__>@1
BB2
2   putfield v5.< PythonLoader, LRoot, $self, <PythonLoader,LRoot> > = v4
BB3
3   v6 = getfield < PythonLoader, LRoot, __init__, <PythonLoader,LRoot> > v1 [1=[self]]
BB4
4   putfield v5.< PythonLoader, LRoot, $function, <PythonLoader,LRoot> > = v6
BB5
5   putfield v4.< PythonLoader, LRoot, __init__, <PythonLoader,LRoot> > = v5
BB6
6   v7 = new <PythonLoader,L$script tf2_test_model_call2.py/SequentialModel/call>@6
BB7
7   putfield v7.< PythonLoader, LRoot, $self, <PythonLoader,LRoot> > = v4
BB8
8   v8 = getfield < PythonLoader, LRoot, call, <PythonLoader,LRoot> > v1 [1=[self]]
BB9
9   putfield v7.< PythonLoader, LRoot, $function, <PythonLoader,LRoot> > = v8
BB10
10   putfield v4.< PythonLoader, LRoot, call, <PythonLoader,LRoot> > = v7
BB11
11   v9 = getfield < PythonLoader, LRoot, __init__, <PythonLoader,LRoot> > v1 [1=[self]]
BB12
12   v10 = invokeFunction < PythonLoader, LCodeBody, do()LRoot; > v9,v4 @12 exception:v11
BB13
13   return v4                               
BB14

@tatianacv
Copy link
Member Author

Comparison of the IR of all the nodes for model.__call__() (test 4) and model() (test 1) cases .

Please see this gist linked here.

@khatchad
Copy link
Member

Thanks. Here is test 4 that works:

113   v269 = fieldref v265.v270:#__call__    tf2_test_model_call4.py [36:9] -> [36:14] [265=[model]]
114   v268 = invokeFunction < PythonLoader, LCodeBody, do()LRoot; > v269,v252 @114 exception:v271tf2_test_model_call4.py [36:9] -> [36:35] [268=[result]252=[input_data]]

@khatchad
Copy link
Member

Here is test 1 that doesn't work:

113   v268 = invokeFunction < PythonLoader, LCodeBody, do()LRoot; > v265,v252 @113 exception:v269tf2_test_model_call.py [34:9] -> [34:26] [268=[result]265=[model]252=[input_data]]

@khatchad
Copy link
Member

In case 4, the function being invoked is in v269, while it is in v265 in test 1.

@khatchad
Copy link
Member

In test 4, we can see that v269 being assigned in instruction 113. But, in test 1, v265 is being assigned in instruction 111, which is as follows:

111   v265 = invokeFunction < PythonLoader, LCodeBody, do()LRoot; > v249 @111 exception:v266tf2_test_model_call4.py [35:8] -> [35:25] [265=[model]249=[SequentialModel]]

Thus, in test 1, the "function" being invoked is model (265=[model] above), but we know that model isn't a function, it's an object reference. In fact, this line also exists in test 4.

@tatianacv
Copy link
Member Author

I suppose that __call__() has one parameter (self), which explains why we are not entering that method. It seems that method is only for functions not having any parameters because there is parametric polymorphism resolution in the else clause:

https://github.com/wala/WALA/blob/d7b408901e928aaf63c0d9f9b24a46403eb55d60/core/src/main/java/com/ibm/wala/ipa/callgraph/propagation/SSAPropagationCallGraphBuilder.java#L1170-L1173

Correct, we are falling into that else because the params has the value of {0}.

@tatianacv
Copy link
Member Author

That method getTargetsForCall(...) is called for an instruction, which is then used for method getTargetForCall(...) (), which returns the CGNode for the particular call that should be dispatched. Therefore, getTargetsForCall(...) is for a specific instruction in the CGNode.

OK. Which instructions are those, because from your earlier comment, there was an invokeFunction instruction corresponding to a ctor for which you said control enters the method.

It enters that method for instructions that in visitInvokeInternal have params empty or that have params but return True on contentAreInvariants method.

Therefore, when I mean that it doesn't enter on those two is that those instructions are not passed as a param in the call for getTargetsForCall. This is because in visitInvokeInternal, the params are not empty; therefore, getTargetsForCall is not called on that instruction. https://github.com/wala/WALA/blob/d7b408901e928aaf63c0d9f9b24a46403eb55d60/core/src/main/java/com/ibm/wala/ipa/callgraph/propagation/SSAPropagationCallGraphBuilder.java#L1160-L1168.

Ah, so it only calls that method if there are no parameters?

Also when it has params but return True on contentAreInvariants method. (https://github.com/wala/WALA/blob/d7b408901e928aaf63c0d9f9b24a46403eb55d60/core/src/main/java/com/ibm/wala/ipa/callgraph/propagation/SSAPropagationCallGraphBuilder.java#L1186)

Also, when it goes into the else of the params not being empty, there is an if that checks if contents are invariant and that also returns false so we don't go into the method getTargetsForCall(...)

So, it does this for both the working an non-working cases? How then do the targets get resolved in the working case?

Yes, in both cases it has the same behavior in visitInvokeInternal.

@khatchad
Copy link
Member

khatchad commented Nov 3, 2023

I suppose that __call__() has one parameter (self), which explains why we are not entering that method. It seems that method is only for functions not having any parameters because there is parametric polymorphism resolution in the else clause:

https://github.com/wala/WALA/blob/d7b408901e928aaf63c0d9f9b24a46403eb55d60/core/src/main/java/com/ibm/wala/ipa/callgraph/propagation/SSAPropagationCallGraphBuilder.java#L1170-L1173

Correct, we are falling into that else because the params has the value of {0}.

So what is going on in that else clause? How are the targets being resolved?

@tatianacv
Copy link
Member Author

In https://github.com/wala/WALA/blob/d7b408901e928aaf63c0d9f9b24a46403eb55d60/core/src/main/java/com/ibm/wala/ipa/callgraph/propagation/SSAPropagationCallGraphBuilder.java#L1181, we have that test 1 gets 265 which is the instruction of the ctor and for test 4 (working test) it get the instruction 269 which is v269 = fieldref v265.v270:#__call__ tf2_test_model_call4.py [36:9] -> [36:14] [265=[model]]. Then, for the pointer kets we have (https://github.com/wala/WALA/blob/d7b408901e928aaf63c0d9f9b24a46403eb55d60/core/src/main/java/com/ibm/wala/ipa/callgraph/propagation/SSAPropagationCallGraphBuilder.java#L1197-L1203):

  • Test 1: [[Node: <Code body of function Lscript tf2_test_model_call.py> Context: CallStringContext: [ com.ibm.wala.FakeRootClass.fakeRootMethod()V@2 ], v265]]
  • Test 2: [[Node: <Code body of function Lscript tf2_test_model_call4.py> Context: CallStringContext: [ com.ibm.wala.FakeRootClass.fakeRootMethod()V@2 ], v269]]

@khatchad
Copy link
Member

khatchad commented Nov 4, 2023

Yeah, but what's going on with the side effects? Is that how the edges are eventually created? See this log:

https://github.com/wala/WALA/blob/d7b408901e928aaf63c0d9f9b24a46403eb55d60/core/src/main/java/com/ibm/wala/ipa/callgraph/propagation/SSAPropagationCallGraphBuilder.java#L1194

You are hitting that for both cases, right? If so, what is going on here:

https://github.com/wala/WALA/blob/d7b408901e928aaf63c0d9f9b24a46403eb55d60/core/src/main/java/com/ibm/wala/ipa/callgraph/propagation/SSAPropagationCallGraphBuilder.java#L1208

Is that adding side effects that eventually get turned into edges in the call graph?

@khatchad
Copy link
Member

khatchad commented Nov 4, 2023

The working case should create an edge, whole the non-working case should not.

@khatchad
Copy link
Member

Looking over this again it seems you want to handle code like:

model = SequentialModel()
result = model(input_data)

Where, model is not a function but an object. If I understand correctly, at a high level you might be able to handle this by just tweaking the call graph builder / pointer analysis. You want some kind of conditional constraint for a call f(...), like, "If a function value flows to f, then that is the target. But if an object value o flows to f, then for each function value in o.__call__, invoke that." I'm not 100% sure this would work, but it's what I would look into first.

Hey @msridhar, thanks again for this idea. The problem we are having here is that, in the latter case above (...if an object value o flows to f, for each function value in o.__call__, invoke that), because there is no (explicit) call to __call__(), the CG node doesn't even exist for that method. Basically, we are at the point where we can see where the edge should be added bu we don't have anything to connect the edge to.

I understand that the CG construction creates nodes on-the-fly, so maybe it's a chicken and egg problem. Thanks again for your help!

@khatchad
Copy link
Member

Sorry for the noise, @msridhar. I think what we are missing is that we need to add the invocation statement, and later in the worklist,-based algorithm, that will be picked up and the node will then be created.

@tatianacv
Copy link
Member Author

The CG of the non-working case is the following:

Node: synthetic < PythonLoader, Lcom/ibm/wala/FakeRootClass, fakeRootMethod()V > Context: Everywhere
 - invokestatic < PythonLoader, Lcom/ibm/wala/FakeRootClass, fakeWorldClinit()V >@0
     -> Node: synthetic < PythonLoader, Lcom/ibm/wala/FakeRootClass, fakeWorldClinit()V > Context: Everywhere
 - invokevirtual < PythonLoader, Lscript tf2_test_model_call.py, do()LRoot; >@2
     -> Node: <Code body of function Lscript tf2_test_model_call.py> Context: CallStringContext: [ com.ibm.wala.FakeRootClass.fakeRootMethod()V@2 ]

Node: synthetic < PythonLoader, Lcom/ibm/wala/FakeRootClass, fakeWorldClinit()V > Context: Everywhere

Node: <Code body of function Lscript tf2_test_model_call.py> Context: CallStringContext: [ com.ibm.wala.FakeRootClass.fakeRootMethod()V@2 ]
 - invokestatic < PythonLoader, Ltensorflow, import()Ltensorflow; >@90
     -> Node: synthetic < PythonLoader, Ltensorflow, import()Ltensorflow; > Context: CallStringContext: [ script tf2_test_model_call.py.do()LRoot;@90 ]
 - JSCall@109
     -> Node: synthetic < PythonLoader, Ltensorflow/functions/uniform, do()LRoot; > Context: CallStringContext: [ script tf2_test_model_call.py.do()LRoot;@109 ]
 - JSCall@111
     -> Node: synthetic < PythonLoader, Lscript tf2_test_model_call.py/SequentialModel, do()LRoot; > Context: CallStringContext: [ script tf2_test_model_call.py.do()LRoot;@111 ]

Node: synthetic < PythonLoader, Ltensorflow, import()Ltensorflow; > Context: CallStringContext: [ script tf2_test_model_call.py.do()LRoot;@90 ]

Node: synthetic < PythonLoader, Lscript tf2_test_model_call.py/SequentialModel, do()LRoot; > Context: CallStringContext: [ script tf2_test_model_call.py.do()LRoot;@111 ]
 - JSCall@12
     -> Node: <Code body of function Lscript tf2_test_model_call.py/SequentialModel/__init__> Context: CallStringContext: [ script tf2_test_model_call.py.SequentialModel.do()LRoot;@12 ]

Node: <Code body of function Lscript tf2_test_model_call.py/SequentialModel/__init__> Context: CallStringContext: [ script tf2_test_model_call.py.SequentialModel.do()LRoot;@12 ]
 - JSCall@5
     -> Node: synthetic < PythonLoader, Lsuperfun, do()LRoot; > Context: DelegatingContext [A=super call, B=CallStringContext: [ script tf2_test_model_call.py.SequentialModel.__init__.do()LRoot;@5 ]]
 - JSCall@24
     -> Node: synthetic < PythonLoader, Lwala/builtin/range, do()LRoot; > Context: CallStringContext: [ script tf2_test_model_call.py.SequentialModel.__init__.do()LRoot;@24 ]
 - JSCall@25
     -> Node: synthetic < PythonLoader, LCodeBody, __Lscript tf2_test_model_call.py/SequentialModel/__init__/comprehension1()LRoot; > Context: CallStringContext: [ script tf2_test_model_call.py.SequentialModel.__init__.do()LRoot;@25 ]

Node: synthetic < PythonLoader, Lsuperfun, do()LRoot; > Context: DelegatingContext [A=super call, B=CallStringContext: [ script tf2_test_model_call.py.SequentialModel.__init__.do()LRoot;@5 ]]

Node: synthetic < PythonLoader, Lwala/builtin/range, do()LRoot; > Context: CallStringContext: [ script tf2_test_model_call.py.SequentialModel.__init__.do()LRoot;@24 ]

Node: synthetic < PythonLoader, LCodeBody, __Lscript tf2_test_model_call.py/SequentialModel/__init__/comprehension1()LRoot; > Context: CallStringContext: [ script tf2_test_model_call.py.SequentialModel.__init__.do()LRoot;@25 ]
 - JSCall@2
     -> Node: <Code body of function Lscript tf2_test_model_call.py/SequentialModel/__init__/comprehension1> Context: CallStringContext: [ CodeBody.__Lscript tf2_test_model_call.py/SequentialModel/__init__/comprehension1()LRoot;@2 ]

Node: <Code body of function Lscript tf2_test_model_call.py/SequentialModel/__init__/comprehension1> Context: CallStringContext: [ CodeBody.__Lscript tf2_test_model_call.py/SequentialModel/__init__/comprehension1()LRoot;@2 ]

Node: synthetic < PythonLoader, Ltensorflow/functions/uniform, do()LRoot; > Context: CallStringContext: [ script tf2_test_model_call.py.do()LRoot;@109 ]
 - invokevirtual < PythonLoader, LRoot, read_data()LRoot; >@0
     -> Node: synthetic < PythonLoader, Ltensorflow/functions/uniform, read_data()LRoot; > Context: CallStringContext: [ tensorflow.functions.uniform.do()LRoot;@0 ]

Node: synthetic < PythonLoader, Ltensorflow/functions/uniform, read_data()LRoot; > Context: CallStringContext: [ tensorflow.functions.uniform.do()LRoot;@0 ]

@khatchad
Copy link
Member

Thanks, @tatianacv. Can you find in the call graph construction algorithm where it is processing Node: <Code body of function Lscript tf2_test_model_call.py>? What does it do once it hits model(data)? Where does it look for a function called model() and then fail when it doesn't find it? Can you find in the code where that is happening?

@tatianacv
Copy link
Member Author

Thanks, @tatianacv. Can you find in the call graph construction algorithm where it is processing Node: <Code body of function Lscript tf2_test_model_call.py>?

Yes, we start processing that node in unconditionallyAddConstraintsFromNode from SSAPropagationCallGraphBuilder in WALA.

What does it do once it hits model(data)?

It goes to visitPythonInvoke and proceeds to have the same behavior we have seen.

Where does it look for a function called model() and then fail when it doesn't find it? Can you find in the code where that is happening?

What I found and the reason why call is referencing to v265 (the result of the ctor) is that in the addBypassLogic, the code sets the options for the method selector as seen below:

options.setSelector(
new PythonTrampolineTargetSelector(
new PythonConstructorTargetSelector(
new PythonComprehensionTrampolines(options.getMethodTargetSelector()))));

Therefore, when we are in v265 (the result of the ctor), the receiver Core[script tf2_test_model_call.py/SequentialModel] which is used to get the target (https://github.com/ponder-lab/ML/blob/386512a3d439213f1c64de48473db1baaa006735/com.ibm.wala.cast.python/source/com/ibm/wala/cast/python/ipa/callgraph/PythonConstructorTargetSelector.java) has in the methodTypes: [< PythonLoader, Lscript tf2_test_model_call.py/SequentialModel/__init__, __init__()LRoot; >, < PythonLoader, Lscript tf2_test_model_call.py/SequentialModel/__call__, __call__()LRoot; >]. This gets processed in this PythonConstructorTargetSelector.getCalleeTarget(...) and creates statements for __call__, which we see in the instructions and references the v265 (the result of the ctor), but does not add the __call__. Since __call__ is in the methodTypes, we process the __call__ in this code snippet:
for (TypeReference r : x.getInnerReferences()) {
int orig_t = v++;
String typeName = r.getName().toString();
typeName = typeName.substring(typeName.lastIndexOf('/') + 1);
FieldReference inner =
FieldReference.findOrCreate(
PythonTypes.Root, Atom.findOrCreateUnicodeAtom(typeName), PythonTypes.Root);
ctor.addStatement(insts.GetInstruction(pc, orig_t, 1, inner));
pc++;
ctor.addStatement(insts.PutInstruction(pc, inst, orig_t, inner));
pc++;
}
for (MethodReference r : x.getMethodReferences()) {
int f = v++;
ctor.addStatement(
insts.NewInstruction(
pc,
f,
NewSiteReference.make(
pc,
PythonInstanceMethodTrampoline.findOrCreate(
r.getDeclaringClass(), receiver.getClassHierarchy()))));
pc++;
ctor.addStatement(
insts.PutInstruction(
pc,
f,
inst,
FieldReference.findOrCreate(
PythonTypes.Root,
Atom.findOrCreateUnicodeAtom("$self"),
PythonTypes.Root)));
pc++;
int orig_f = v++;
ctor.addStatement(
insts.GetInstruction(
pc,
orig_f,
1,
FieldReference.findOrCreate(PythonTypes.Root, r.getName(), PythonTypes.Root)));
pc++;
ctor.addStatement(
insts.PutInstruction(
pc,
f,
orig_f,
FieldReference.findOrCreate(
PythonTypes.Root,
Atom.findOrCreateUnicodeAtom("$function"),
PythonTypes.Root)));
pc++;
ctor.addStatement(
insts.PutInstruction(
pc,
inst,
f,
FieldReference.findOrCreate(PythonTypes.Root, r.getName(), PythonTypes.Root)));
pc++;
}

The results of the added statements for __call__ are:

  • 8 = getfield < PythonLoader, LRoot, __call__, <PythonLoader,LRoot> > 1
  • putfield 7.< PythonLoader, LRoot, $function, <PythonLoader,LRoot> > = 8
  • putfield 4.< PythonLoader, LRoot, __call__, <PythonLoader,LRoot> > = 7

Just for reference all statements are [4 = new <PythonLoader,Lobject>@0, 5 = new <PythonLoader,L$script tf2_test_model_call.py/SequentialModel/__init__>@1, putfield 5.< PythonLoader, LRoot, $self, <PythonLoader,LRoot> > = 4, 6 = getfield < PythonLoader, LRoot, __init__, <PythonLoader,LRoot> > 1, putfield 5.< PythonLoader, LRoot, $function, <PythonLoader,LRoot> > = 6, putfield 4.< PythonLoader, LRoot, __init__, <PythonLoader,LRoot> > = 5, 7 = new <PythonLoader,L$script tf2_test_model_call.py/SequentialModel/__call__>@6, putfield 7.< PythonLoader, LRoot, $self, <PythonLoader,LRoot> > = 4, 8 = getfield < PythonLoader, LRoot, __call__, <PythonLoader,LRoot> > 1, putfield 7.< PythonLoader, LRoot, $function, <PythonLoader,LRoot> > = 8, putfield 4.< PythonLoader, LRoot, __call__, <PythonLoader,LRoot> > = 7]

@khatchad
Copy link
Member

Thanks, @tatianacv. Can you find in the call graph construction algorithm where it is processing Node: <Code body of function Lscript tf2_test_model_call.py>?

Yes, we start processing that node in unconditionallyAddConstraintsFromNode from SSAPropagationCallGraphBuilder in WALA.

Link?

What does it do once it hits model(data)?

It goes to visitPythonInvoke and proceeds to have the same behavior we have seen.

Link?

Where does it look for a function called model() and then fail when it doesn't find it? Can you find in the code where that is happening?

What I found and the reason why call is referencing to v265 (the result of the ctor)

I don't think this was ever under question.

@khatchad
Copy link
Member

The variable v265 is the same in both cases; it references the object created by the ctor invocation. Why v265 references an object is easy to see.

So, to answer this question:

Where does it look for a function called model() and then fail when it doesn't find it? Can you find in the code where that is happening?

I am seeing this in the pointer analysis:

[Node: <Code body of function Lscript tf2_test_model_call.py> Context: CallStringContext: [ com.ibm.wala.FakeRootClass.fakeRootMethod()V@2 ], v265] --> [SITE_IN_NODE{synthetic < PythonLoader, Lscript tf2_test_model_call.py/SequentialModel, do()LRoot; >:Lobject in CallStringContext: [ script tf2_test_model_call.py.do()LRoot;@112 ]}]

That means that v265 is pointing to an object. In other words, the function being invoked is, well, an object, and hence why this is a callable case. We can't say that the constructor is being called again because of this (as I think you are alluding to in your response), because we aren't seeing extra nodes in the call graph in test 1.

@khatchad
Copy link
Member

I wonder if the MethodSelector is a bit too late in the game to determine callables. Instead, I am wondering if the CallSiteReference given to getCalleeTarget should have a declaredTarget that refers to the correct function. But, what a declared target means in Python is what I am unsure about.

@khatchad
Copy link
Member

So, when com.ibm.wala.ipa.callgraph.MethodTargetSelector.getCalleeTarget(CGNode, CallSiteReference, IClass) is called in the working test (test 4), receiver is Trampoline[$script tf2_test_model_call4.py/SequentialModel/__call__]. That is what I mean in #53 (comment). Perhaps in the non-working test (test 1), that information should also be present.

@khatchad
Copy link
Member

BTW, the receiver here is an instance of com.ibm.wala.cast.python.ipa.summaries.PythonInstanceMethodTrampoline.

@khatchad
Copy link
Member

The receiver's reference is the TypeReference <PythonLoader,L$script tf2_test_model_call4.py/SequentialModel/__call__>.

@khatchad
Copy link
Member

The corresponding MethodReference is < PythonLoader, L$script tf2_test_model_call4.py/SequentialModel/__call__, trampoline2()LRoot; >.

@khatchad
Copy link
Member

In the non-working case, the receiver is an instance of com.ibm.wala.cast.loader.CAstAbstractModuleLoader.CoreClass with the value of Core[object].

@khatchad
Copy link
Member

I should have mentioned that in the working case, com.ibm.wala.cast.python.ipa.callgraph.PythonTrampolineTargetSelector.getCalleeTarget(CGNode, CallSiteReference, IClass) returns an IMethod with the value synthetic < PythonLoader, L$script tf2_test_model_call4.py/SequentialModel/__call__, trampoline2()LRoot; >. It's actual type is com.ibm.wala.cast.python.ipa.summaries.PythonSummarizedFunction.

@khatchad
Copy link
Member

In the non-working case, that method returns null (note that the trampoline is the outer MethodSelector; there are inner ones like the constructor one that @tatianacv pointed out, but this is the "last" one). So, we now have the answer to our original question:

Where does it look for a function called model() and then fail when it doesn't find it? Can you find in the code where that is happening?

It's in: https://github.com/wala/WALA/blob/5aa300b2d0f0c672d027d9d896792985e1a69424/core/src/main/java/com/ibm/wala/ipa/summaries/BypassMethodTargetSelector.java#L139. The variable target is null there because the target cannot be found (i.e., there is no function named model()).

@khatchad
Copy link
Member

Thanks for finding the com.ibm.wala.ipa.callgraph.MethodTargetSelector, @tatianacv. Good work.

@khatchad
Copy link
Member

In the working case, the instance key is:

SITE_IN_NODE{synthetic < PythonLoader, Lscript tf2_test_model_call4.py/SequentialModel, do()LRoot; >:L$script tf2_test_model_call4.py/SequentialModel/__call__ in CallStringContext: [ script tf2_test_model_call4.py.do()LRoot;@112 ]}

In the non-working case, it is:

SITE_IN_NODE{synthetic < PythonLoader, Lscript tf2_test_model_call.py/SequentialModel, do()LRoot; >:Lobject in CallStringContext: [ script tf2_test_model_call.py.do()LRoot;@112 ]}

@khatchad
Copy link
Member

The problem I'm seeing now is that the pointer analysis is unavailable in com.ibm.wala.cast.python.ipa.callgraph.PythonTrampolineTargetSelector.getCalleeTarget(CGNode, CallSiteReference, IClass); it's not built yet. There is, however, another method, namely, com.ibm.wala.cast.python.client.PytestAnalysisEngine.PytestTargetSelector.getCalleeTarget(CGNode, CallSiteReference, IClass), that does use the pointer analysis. But, it uses it for the summaries, which happens later. Thus, I'm wondering if callable processing should happen at a later time like the call to com.ibm.wala.cast.python.ipa.summaries.PythonSuper.handleSuperCalls(SSAPropagationCallGraphBuilder, AnalysisOptions), which is the last thing that happens in com.ibm.wala.cast.python.client.PythonAnalysisEngine.getCallGraphBuilder(IClassHierarchy, AnalysisOptions, IAnalysisCacheView).

@khatchad
Copy link
Member

Looks like com.ibm.wala.client.AbstractAnalysisEngine.pointerAnalysis is null, but there is also com.ibm.wala.ipa.callgraph.propagation.PropagationCallGraphBuilder.getPointerAnalysis(), which may be available in com.ibm.wala.cast.python.client.PythonAnalysisEngine.getCallGraphBuilder(IClassHierarchy, AnalysisOptions, IAnalysisCacheView).

@khatchad
Copy link
Member

khatchad commented Nov 22, 2023

If I run com.ibm.wala.cast.python.jython3.test.TestAnnotations.testAnnotation2(), com.ibm.wala.ipa.callgraph.propagation.PropagationCallGraphBuilder.getPointerAnalysis() returns non-null in com.ibm.wala.cast.python.client.PytestAnalysisEngine.PytestTargetSelector.getCalleeTarget(CGNode, CallSiteReference, IClass).

@khatchad
Copy link
Member

has in the methodTypes: [< PythonLoader, Lscript tf2_test_model_call.py/SequentialModel/__init__, __init__()LRoot; >, < PythonLoader, Lscript tf2_test_model_call.py/SequentialModel/__call__, __call__()LRoot; >].

Yeah, but it shouldn't even matter whether __call__ is defined or not. If it's not defined, then we should return null as usual.

@khatchad
Copy link
Member

we process the __call__ in this code snippet:

Actually, I think that the com.ibm.wala.cast.python.ipa.callgraph.PythonConstructorTargetSelector is a good example to look at; it also adds a call at the end of com.ibm.wala.cast.python.ipa.callgraph.PythonConstructorTargetSelector.getCalleeTarget(CGNode, CallSiteReference, IClass):

CallSiteReference cref = new DynamicCallSiteReference(site.getDeclaredTarget(), pc);
ctor.addStatement(
new PythonInvokeInstruction(2, result, except, cref, cps, new Pair[0]));

@khatchad
Copy link
Member

Move the conversation over to wala#24.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants