Skip to content

Python: Test improvements in preparation for new call-graph PR #11208

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 17 commits into from
Nov 22, 2022

Conversation

RasmusWL
Copy link
Member

While working on the new call graph, I improved some tests. To easier to be able to see the changes by the new call-graph, here is a PR with just these test improvements (then the total diff for the call-graph PR will only contain changes in inline-expectations)

And adopt argument passing tests as well.

turns out that `C.staticmethod.__func__` doesn't actually work :O
The selected edges is covered by `NormalDataflowTest.ql` now... and
reading the test-output changes in `edges` is just going to make commits
larger while not providing any real value.
I don't see the value from this, so just going to outright delete it.
(it actually stayed alive for quite some time in the original git history,
but never seemed to be that useful.)
We never had a showcase of how keyword arguments were handled
Copy link
Member Author

@RasmusWL RasmusWL left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ignore this review.. contained a few fixes, but I've since then squashed some commits and force pushed

This adds inline expectations, next commit will remove old annotations
code... but I thought it would be easier to review like this.
I ported the predicates showing difference between points-to and
type-tracking, since it's helpful to see the list of differences,
instead of having to parse expectations!
No longer needed since we're using an established testing framework now
to the new call-graph test setup. Nice that we can write `MISSING:` now!
Copy link
Contributor

@yoff yoff left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally, these are very nice changes!

Some comments are trying to retain a systematic structure of the tests, but I can sympathise if you did not find this structure very obvious..

The dataflow callgraph tests were scoped from the dataflow point of view, "are method calls hooked up properly?", including special methods which are currently handled in the dataflow layer (but perhaps the callgraph should handle this?). They have sometimes been useful in the past, but, as the comment shows, were never written properly. I think it is fine to delete them now, we can create something more appropriate if the need arises.

foo = Foo(SOURCE)
SINK(foo.x) # $ flow="SOURCE, l:-1 -> foo.x"
foo.update_x(None)
SINK_F(foo.x) # $ flow="SOURCE, l:-3 -> foo.x"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
SINK_F(foo.x) # $ flow="SOURCE, l:-3 -> foo.x"
SINK_F(foo.x) # $ SPURIOUS: flow="SOURCE, l:-3 -> foo.x"

Is this not enforced by some sink-annotation predicate?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this not enforced by some sink-annotation predicate?

not currently. maybe it should be?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, this is of course specific to SINK_F, that we know there should not be flow. Yes, we should enforce this, but not as part of this PR.

SINK_F(objy.y) # $ MISSING: flow="SOURCE, l:-5 -> objy.y"

SINK(obj.x) # $ flow="SOURCE, l:-7 -> obj.x"
SINK_F(obj.y) # $ flow="SOURCE, l:-8 -> obj.y"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
SINK_F(obj.y) # $ flow="SOURCE, l:-8 -> obj.y"
SINK(obj.y) # $ flow="SOURCE, l:-8 -> obj.y"

I believe yo want the flow here. Otherwise, it should be marked SPURIOUS...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, I do want flow, but we cannot use SINK since that would make the runtime checks unhappy 😞 I guess we could have a SINK(..., no_present_at_runtime=True) for these situations -- let me know what you think.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh! No, I think the annotations (by which I mean SINK and SINK_F) should agree with the runtime. Perhaps we could rewrite to else if cond1:?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I originally had the following code. I noticed that adding these if-then-else statements made the cfg-splitting kick in, so the problem this is testing was solved trivially. I guess I should have made a comment about this, since it really isn't apparent from the test-case.

if cond:
    SINK(objx.x) # $ MISSING: flow="SOURCE, l:-2 -> objx.x"
    SINK_F(objx.y)
else:
    SINK_F(objy.x)
    SINK(objy.y) # $ MISSING: flow="SOURCE, l:-5 -> objy.y"

if cond:
    SINK(obj.x) # $ flow="SOURCE, l:-7 -> obj.x"
else:
    SINK(obj.y) # $ flow="SOURCE, l:-8 -> obj.y"

I agree that SINK(..., no_present_at_runtime=True) isn't pretty, but I also agree that my currently solution of using SINK_F where SINK should have been used is really bad... so I like the SINK(..., no_present_at_runtime=True) solution better

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, but I wanted to rewrite line 293. Would that avoid the problem with extra splitting?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I see the problem, I have found a potential solution, which I will encode in suggestions.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alright, I cannot actually make suggestions in unchanged code, so here is a description of the diff, then you can evaluate if it makes sense :-)

  • Change SINK to accept an optional expected argument
def SINK(x, expected=None):
    if expected is None and is_source(x):
        print("OK")
    elif expected is not None and x == expected:
        print("OK")
    else:
        print("Unexpected flow", x)

This makes None illegal as an expected value, though, so

  • Change the default values for CrosstalkTestX and CrosstalkTestY away from None
class CrosstalkTestX:
    def __init__(self):
        self.x = NONSOURCE
        self.y = NONSOURCE

    def setx(self, value):
        self.x = value

    def setvalue(self, value):
        self.x = value


class CrosstalkTestY:
    def __init__(self):
        self.x = NONSOURCE
        self.y = NONSOURCE

    def sety(self ,value):
        self.y = value

    def setvalue(self, value):
        self.y = value
  • Use an inline if expression to avoid splitting. Lines 303 and 304 become (hopefully using better line reference numbers)
    SINK(obj.x, SOURCE if cond else NONSOURCE) # $ flow="SOURCE, l:-7 -> obj.x" flow="SOURCE -> IfExp"
    SINK(obj.y, NONSOURCE if cond else SOURCE) # $ flow="SOURCE, l:-8 -> obj.y" flow="SOURCE -> IfExp"

This makes both validTest and NormalDataflowTest happy in my Codespace (although UnresolvedCalls and dataflow-consistency both have some issues?).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I ended up using the not_present_at_runtime approach, since with SINK(obj.y, expected=NONSOURCE) it might be confusing for the non-initiated people whether we expect obj.y to be SOURCE (as it usually is for SINK arguments) or NONSOURCE (as indicated by the expected argument).

With not_present_at_runtime, it's more clear that the intention is that data-flow finds the flow, but the flow is not present at runtime.

I've also added comments explaining why we can't just use if-then-else -- I should have done this initially, so I'm happy we've had this conversation, so I could add least get those added 😓

Use an inline if expression to avoid splitting. Lines 303 and 304 become (hopefully using better line reference numbers)

This won't work. Turns out I had forgotten to pull out another test-case from the call-graph branch, so I've also added this as a new commit.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure I agree that overriding the expected value is confusing. What annoys me a bit about not_present_at_runtime is that it goes against the spirit of the tests: That the runtime behaviour is our source of truth. I can see that we might at times need to express the approximations of our analysis, but I hope that not_present_at_runtime will be used seldomly and only as a last resort..

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I hope that not_present_at_runtime will be used seldomly and only as a last resort..

Agreed on that part 👍

@RasmusWL RasmusWL requested a review from yoff November 15, 2022 10:17
Copy link
Contributor

@yoff yoff left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only change requested is the SINK_F annotation. All other changes are good. Nice improvements to the tests, and interesting to have the added test of enclosing callable. I wonder if we actually get that predicate wrong currently, when classes are callable...

@RasmusWL
Copy link
Member Author

I wonder if we actually get that predicate wrong currently, when classes are callable...

For Python, this should really have been named enclosingScope, but 🤷 Yes, we get it wrong right now, since classes have their own scope, and we currently don't model that. I'll postpone fixing this for until after the call-graph PR, so we can properly investigate what that change would actually do.

@RasmusWL RasmusWL requested a review from yoff November 22, 2022 10:44
Copy link
Contributor

@yoff yoff left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM - nice that you got the compound argument test in there, I remember reading about that problem some time ago.

@RasmusWL RasmusWL merged commit b281cc8 into github:main Nov 22, 2022
@RasmusWL RasmusWL deleted the call-graph-tests branch November 22, 2022 13:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants