Ruby: Add basic subclassing support to API Graphs #7663

hmac · 2022-01-19T23:44:18Z

Given the code

class A; end
class B < A; end
class C < A; end

You can find uses of B and C with the expression

API::getTopLevelMember("A").getASubclass()

We do this by adding edges from the use of A in class B < A to all uses of B.

API Graph paths that use getASubclass() will, I think, always be longer than an equivalent "canonical" path which doesn't use getASubclass(). For example, a use of the class B above is accessible via API::getTopLevelMember("B") and API::getTopLevelMember("A").getASubclass(). Therefore when it comes to testing getASubclass(), we're interested in non-canonical paths, for example:

class A; end
class B < A; end

B # API::getTopLevelMember("B") [canonical], API::getTopLevelMember("A").getASubclass()

To test subclass support using inline tests, I've extended the InlineTest framework to support optional results, which are matched against annotations but do not trigger a failure if there is no matching annotation. This allows us to add annotations for non-canonical paths where we want to test subclassing, but leave existing tests alone.

Given the code class A; end class B < A; end class C < A; end You can find uses of B and C with the expression API::getTopLevelMember("A").getASubclass()

Now that API graphs have basic subclassing support, we can simplify some of the ActiveRecord and ActionController code.

This simplifies some of the code.

The idea behind optional results is that there may be instances where each line of source code has many results and you don't want to annotate all of them, but you still want to ensure that any annotations you do have are correct. This change makes that possible by exposing a new predicate `hasOptionalResult`, which has the same signature as `hasResult`. Results produced by `hasOptionalResult` will be matched against any annotations, but the lack of a matching annotation will not cause a failure. We will use this in the inline tests for the API edge getASubclass, because for each API path that uses getASubclass there is always a shorter path that does not use it, and thus we can't use the normal shortest-path matching approach that works for other API Graph tests.

aibaars · 2022-01-27T11:04:08Z

ruby/ql/lib/codeql/ruby/frameworks/ActionController.qll

+        // In Rails applications `ApplicationController` typically extends `ActionController::Base`, but we
+        // treat it separately in case the `ApplicationController` definition is not in the database.
+        API::getTopLevelMember("ApplicationController")
+      ].getASubclass*().getAUse().asExpr().getExpr()


I think it would be nice to have a method named getADirectSubclass and define getASubclass to be getADirectSubclass*. I expect users in almost all cases need the transitive closure and they are likely to forget adding *. Having the nicely named method "just do the right thing in most cases" would be helpful.

hvitved

I don't have much experience with sub classes in API graphs, but @max-schaefer mentioned that it might be a good principle for API graphs to adhere to the substitution principle. The way I understand this, it would mean that when asking, say, for getMember(C), we should already then be getting C and all sub classes thereof. However, whether we would also like to be able to get just C, I don't know.

hvitved · 2022-01-27T12:29:31Z

ruby/ql/lib/codeql/ruby/ApiGraphs.qll

@@ -174,8 +174,7 @@ module API {
          // avoid producing strings longer than 1MB
          result.length() < 1000 * 1000
        )
-      ) and
-      length in [1 .. Impl::distanceFromRoot(this)]


I don't think we should change this just so we can use it in a test. I think it would be OK to replicate the above in the test itself instead.

I removed this because I noticed that we were only calling this predicate in

string getPath() { result = min(string p | p = this.getAPath(Impl::distanceFromRoot(this)) | p) }

which already passes in a max length of Impl::distanceFromRoot(this), so I thought it was redundant. Am I mistaken?

[...] which already passes in a max length of Impl::distanceFromRoot(this), so I thought it was redundant. Am I mistaken?

Careful! This is "top-down" reasoning ("passes in"), but the evaluation is actually bottom-up.

Consider how getAPath is evaluated in isolation. The lines you deleted would prevent this predicate from containing a result and length where length is greater than the shortest path to the given node. Without those lines, there's no such restriction (except the fact that result can't be more than 1000000 characters, which is likely something that happens much further out). In particular, if you have a loop in the API graph (which is almost certain), this predicate will keep going round and round in that loop, producing path strings of greater and greater length, until finally hitting the 1M limit.

In general, it's useful to consider the question "which tuples will be in this relation" regardless of any context that may limit that set. If we're lucky, that context will be automatically magicked in, but it might not (and so it's better to just limit the number of tuples in advance).

Thanks for explaining! I'm definitely still getting used to thinking in the bottom-up evaluation mindset :)

I've reverted this change in a followup commit - the test now uses a copy of getAPath without the length restriction.

hvitved · 2022-01-27T12:30:13Z

ruby/ql/lib/codeql/ruby/ApiGraphs.qll

+        use(succ, b) and
+        resolveConstant(b.asExpr().getExpr()) = resolveConstantWriteAccess(c) and
+        c.getSuperclassExpr() = a.asExpr().getExpr() and
+        lbl = Label::subclass()


I wonder if this should include the name of the sub class.

That sounds reasonable. I can't immediately see what benefit it would bring though. Does it make anything particular easier/harder?

Let's just skip it for now.

max-schaefer · 2022-01-27T13:44:13Z

@max-schaefer mentioned that it might be a good principle for API graphs to adhere to the substitution principle

Yeah, that was admittedly a bit of a throw-away line. I haven't implemented this yet.

My intuition was that if I ask the API graph to give me all instances of C, it would be nice if that included instances of subtypes, since (by the substitution principle) they are also instances of C. Similarly, if I ask for method m of C (perhaps specifying a signature for languages where that makes sense), it would seem reasonable for that to include overrides of m in subclasses of C.

I remember the @github/codeql-python folks remarking how their API-graph code ended up using getASubclass*() all over the place, which could perhaps be avoided by this scheme.

tausbn · 2022-01-27T14:55:22Z

I remember the @github/codeql-python folks remarking how their API-graph code ended up using getASubclass*() all over the place, which could perhaps be avoided by this scheme.

Indeed. When I originally implemented support for subclassing, you mentioned this issue and I discussed it with the other members of the Python team (as I was eager to avoid peppering our code with getASubclass*()).

If memory serves, one of the concerns raised was that if some_class_api_node implicitly represents not just itself but all of its subclasses, then we lose the ability to distinguish between these cases (which might be relevant at some point). So instead we opted for the more explicit (and in turn perhaps more Pythonic) getASubclass, to make it explicitly clear that we're now including all subclasses. In practice, this isn't terribly much of a nuisance. (Also, checking our libraries just now, most instances use getASubclass*, but there is one instance where we do getASubclass+ instead.)

max-schaefer · 2022-01-27T15:01:25Z

If memory serves, one of the concerns raised was that if some_class_api_node implicitly represents not just itself but all of its subclasses, then we lose the ability to distinguish between these cases (which might be relevant at some point)

Absolutely! That's why my proposal was to only bring in the substitution principle when referring to instances or members (not when referring to the class itself), but there are still things you can't express that way, and it's entirely possible that they are practically relevant.

aibaars · 2022-01-27T16:31:26Z

If memory serves, one of the concerns raised was that if some_class_api_node implicitly represents not just itself but all of its subclasses, then we lose the ability to distinguish between these cases (which might be relevant at some point)

Absolutely! That's why my proposal was to only bring in the substitution principle when referring to instances or members (not when referring to the class itself), but there are still things you can't express that way, and it's entirely possible that they are practically relevant.

I think we should make the features that people need 90% of the time as convenient as possible. Ideally we'd still offer some predicates for the curious corner cases where one really needs the distinction. For example let getAnInstance/getASubClass return the transitive result, while having things like getAnImmediateInstance/getAnImmediateSubClass to be able to implement the special cases.

max-schaefer · 2022-01-27T16:40:05Z

I like that suggestion! It's similar to how we have getAUse and getAnImmediateUse that do/don't follow interprocedural flow: you usually want the former but the latter is available for the occasional case where you don't.

class A; end class B < A; end class C < B; end In the example above, `getMember("A").getAnImmediateSubclass()` will select only uses of B, whereas `getMember("A").getASubclass()` will select uses of A, B and C. This is usually the behaviour you want.

hmac · 2022-02-01T23:18:22Z

Thanks everyone for your comments and the insightful discussion! I've pushed a change to the exposed API such that:

getAnImmediateSubclass() returns direct subclasses of the receiver
getASubclass() returns the transitive closure of the above

As a result, to get a class A and all its subclasses you still have to explicitly call getASubclass().

If you want to match calls to Foo::Bar.baz whilst also including any subclasses of Foo and subclasses of Foo::Bar, you will have to use

API::getTopLevelMember("Foo").getASubclass().getMember("Bar").getASubclass().getMethodCall("baz")

However I think this is a rare case. It is more common for Foo to be a module, in which case you can't create subclasses of it.

I think this a reasonable middle ground, and I think it would be nice to merge this PR and try using it in our queries etc, and then make further changes/improvements as we encounter the need. How does that sound?

MathiasVP · 2022-02-01T23:29:26Z

To test subclass support using inline tests, I've extended the InlineTest framework to support optional results, which are matched against annotations but do not trigger a failure if there is no matching annotation. This allows us to add annotations for non-canonical paths where we want to test subclassing, but leave existing tests alone.

I've wanted this feature myself for a while ❤️. In fact, I did it the wrong way in #5417 and closed it again following the discussion with @aschackmull. But I like the API you went with here a lot more 👍.

hmac · 2022-02-02T00:23:30Z

(Sorry about the mass ping, language teams! Changes to the shared inline test framework caused GitHub to request reviews from everyone.)

erik-krogh · 2022-02-02T09:22:21Z

I'm currently implementing def nodes for the API graph implementation in Python, and I also encountered similar testing issues.
I ended up porting the API-graph testing framework from JS, but using the same syntax as the inline expectation tests.

You can see my implementation here.
A clear advantage is that the error messages you get are usually very good, and there is no potential performance issue from computing a basically unbounded number of paths.

E.g. I think your test will blow up if you try a test like this one.
(Some quick math suggests that there are at least 2^(1000000/17)≈10^17707 paths of length less than 1 million to the API-nodes in that function).

hvitved · 2022-02-02T09:58:07Z

ruby/ql/lib/codeql/ruby/ApiGraphs.qll

     */
-    Node getASubclass() { result = this.getASuccessor(Label::subclass()) }
+    Node getASubclass() { result = this.getAnImmediateSubclass*() }


Should we update

Node getInstance() { result = this.getASuccessor(Label::instance()) }

to

Node getInstance() { result = this.getASubclass().getASuccessor(Label::instance()) }

@asgerf , @aibaars I think this is related to our discussion yesterday

I agree that's probably better (though I still don't have a good intuition about what it's like to model Ruby frameworks).

I think that makes sense. If we do so, we should probably also do the same for getReturn. Together that should cover both instance and class methods.

Added this in 704b585

Change the behaviour of `API::getInstance()` and `API::getReturn()` to include results on subclasses of the current API node.

hmac · 2022-02-03T18:46:03Z

I'm currently implementing def nodes for the API graph implementation in Python, and I also encountered similar testing issues. I ended up porting the API-graph testing framework from JS, but using the same syntax as the inline expectation tests.

You can see my implementation here. A clear advantage is that the error messages you get are usually very good, and there is no potential performance issue from computing a basically unbounded number of paths.

E.g. I think your test will blow up if you try a test like this one. (Some quick math suggests that there are at least 2^(1000000/17)≈10^17707 paths of length less than 1 million to the API-nodes in that function).

Thanks @erik-krogh! I think for now I'm going to stick with the simple approach in this PR but if we encounter performance problems in future then it's good to know there's a better implementation we can switch to 👍

github-actions bot added the Ruby label Jan 19, 2022

hmac force-pushed the hmac/api-graph-subclass branch from a2928f4 to cfcbca0 Compare January 20, 2022 23:35

github-actions bot added C# C++ Java Python labels Jan 20, 2022

Ruby: Add subclassing support to API Graphs

8419daa

Given the code class A; end class B < A; end class C < A; end You can find uses of B and C with the expression API::getTopLevelMember("A").getASubclass()

hmac force-pushed the hmac/api-graph-subclass branch from cfcbca0 to e225d9d Compare January 23, 2022 23:25

hmac added 4 commits January 25, 2022 16:40

Ruby: Use API graph subclassing in Rails modelling

5e7a29a

Now that API graphs have basic subclassing support, we can simplify some of the ActiveRecord and ActionController code.

Use API graph subclassing in GraphQL modelling

d0a274c

This simplifies some of the code.

Add inline tests for API Graph subclassing

c5904b7

hmac force-pushed the hmac/api-graph-subclass branch from e225d9d to c5904b7 Compare January 25, 2022 03:41

aibaars reviewed Jan 27, 2022

View reviewed changes

hvitved reviewed Jan 27, 2022

View reviewed changes

hmac added 2 commits January 28, 2022 16:44

Use modified getAPath predicate for test

b01f81a

hmac marked this pull request as ready for review February 1, 2022 23:18

hmac requested review from a team as code owners February 1, 2022 23:18

MathiasVP previously approved these changes Feb 1, 2022

View reviewed changes

hvitved reviewed Feb 2, 2022

View reviewed changes

asgerf mentioned this pull request Feb 2, 2022

Ruby: add def-nodes and separate method/return steps to API graphs #7819

Merged

Ruby: Include subclasses in more API calls

704b585

Change the behaviour of `API::getInstance()` and `API::getReturn()` to include results on subclasses of the current API node.

hmac dismissed MathiasVP’s stale review via 704b585 February 2, 2022 22:37

hvitved approved these changes Feb 3, 2022

View reviewed changes

hmac merged commit ab7fd89 into main Feb 3, 2022

hmac deleted the hmac/api-graph-subclass branch February 3, 2022 21:19

Ruby: Add basic subclassing support to API Graphs #7663

Ruby: Add basic subclassing support to API Graphs #7663

Uh oh!

Conversation

hmac commented Jan 19, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

hvitved left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

max-schaefer commented Jan 27, 2022

Uh oh!

tausbn commented Jan 27, 2022

Uh oh!

max-schaefer commented Jan 27, 2022

Uh oh!

aibaars commented Jan 27, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

max-schaefer commented Jan 27, 2022

Uh oh!

hmac commented Feb 1, 2022

Uh oh!

MathiasVP commented Feb 1, 2022

Uh oh!

hmac commented Feb 2, 2022

Uh oh!

erik-krogh commented Feb 2, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

hmac commented Feb 3, 2022

Uh oh!

Uh oh!

hmac commented Jan 19, 2022 •

edited

Loading

aibaars commented Jan 27, 2022 •

edited

Loading

erik-krogh commented Feb 2, 2022 •

edited

Loading