-
Notifications
You must be signed in to change notification settings - Fork 1.8k
Ruby: Add basic subclassing support to API Graphs #7663
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
a2928f4
to
cfcbca0
Compare
Given the code class A; end class B < A; end class C < A; end You can find uses of B and C with the expression API::getTopLevelMember("A").getASubclass()
cfcbca0
to
e225d9d
Compare
Now that API graphs have basic subclassing support, we can simplify some of the ActiveRecord and ActionController code.
This simplifies some of the code.
The idea behind optional results is that there may be instances where each line of source code has many results and you don't want to annotate all of them, but you still want to ensure that any annotations you do have are correct. This change makes that possible by exposing a new predicate `hasOptionalResult`, which has the same signature as `hasResult`. Results produced by `hasOptionalResult` will be matched against any annotations, but the lack of a matching annotation will not cause a failure. We will use this in the inline tests for the API edge getASubclass, because for each API path that uses getASubclass there is always a shorter path that does not use it, and thus we can't use the normal shortest-path matching approach that works for other API Graph tests.
e225d9d
to
c5904b7
Compare
// In Rails applications `ApplicationController` typically extends `ActionController::Base`, but we | ||
// treat it separately in case the `ApplicationController` definition is not in the database. | ||
API::getTopLevelMember("ApplicationController") | ||
].getASubclass*().getAUse().asExpr().getExpr() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would be nice to have a method named getADirectSubclass
and define getASubclass
to be getADirectSubclass*
. I expect users in almost all cases need the transitive closure and they are likely to forget adding *
. Having the nicely named method "just do the right thing in most cases" would be helpful.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't have much experience with sub classes in API graphs, but @max-schaefer mentioned that it might be a good principle for API graphs to adhere to the substitution principle. The way I understand this, it would mean that when asking, say, for getMember(C)
, we should already then be getting C
and all sub classes thereof. However, whether we would also like to be able to get just C
, I don't know.
@@ -174,8 +174,7 @@ module API { | |||
// avoid producing strings longer than 1MB | |||
result.length() < 1000 * 1000 | |||
) | |||
) and | |||
length in [1 .. Impl::distanceFromRoot(this)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we should change this just so we can use it in a test. I think it would be OK to replicate the above in the test itself instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I removed this because I noticed that we were only calling this predicate in
string getPath() {
result = min(string p | p = this.getAPath(Impl::distanceFromRoot(this)) | p)
}
which already passes in a max length of Impl::distanceFromRoot(this)
, so I thought it was redundant. Am I mistaken?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[...] which already passes in a max length of
Impl::distanceFromRoot(this)
, so I thought it was redundant. Am I mistaken?
Careful! This is "top-down" reasoning ("passes in"), but the evaluation is actually bottom-up.
Consider how getAPath
is evaluated in isolation. The lines you deleted would prevent this predicate from containing a result
and length
where length
is greater than the shortest path to the given node. Without those lines, there's no such restriction (except the fact that result
can't be more than 1000000 characters, which is likely something that happens much further out). In particular, if you have a loop in the API graph (which is almost certain), this predicate will keep going round and round in that loop, producing path strings of greater and greater length, until finally hitting the 1M limit.
In general, it's useful to consider the question "which tuples will be in this relation" regardless of any context that may limit that set. If we're lucky, that context will be automatically magicked in, but it might not (and so it's better to just limit the number of tuples in advance).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for explaining! I'm definitely still getting used to thinking in the bottom-up evaluation mindset :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've reverted this change in a followup commit - the test now uses a copy of getAPath
without the length restriction.
use(succ, b) and | ||
resolveConstant(b.asExpr().getExpr()) = resolveConstantWriteAccess(c) and | ||
c.getSuperclassExpr() = a.asExpr().getExpr() and | ||
lbl = Label::subclass() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if this should include the name of the sub class.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That sounds reasonable. I can't immediately see what benefit it would bring though. Does it make anything particular easier/harder?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's just skip it for now.
Yeah, that was admittedly a bit of a throw-away line. I haven't implemented this yet. My intuition was that if I ask the API graph to give me all instances of I remember the @github/codeql-python folks remarking how their API-graph code ended up using |
Indeed. When I originally implemented support for subclassing, you mentioned this issue and I discussed it with the other members of the Python team (as I was eager to avoid peppering our code with If memory serves, one of the concerns raised was that if |
Absolutely! That's why my proposal was to only bring in the substitution principle when referring to instances or members (not when referring to the class itself), but there are still things you can't express that way, and it's entirely possible that they are practically relevant. |
I think we should make the features that people need 90% of the time as convenient as possible. Ideally we'd still offer some predicates for the curious corner cases where one really needs the distinction. For example let |
I like that suggestion! It's similar to how we have |
class A; end class B < A; end class C < B; end In the example above, `getMember("A").getAnImmediateSubclass()` will select only uses of B, whereas `getMember("A").getASubclass()` will select uses of A, B and C. This is usually the behaviour you want.
Thanks everyone for your comments and the insightful discussion! I've pushed a change to the exposed API such that:
As a result, to get a class If you want to match calls to API::getTopLevelMember("Foo").getASubclass().getMember("Bar").getASubclass().getMethodCall("baz") However I think this is a rare case. It is more common for I think this a reasonable middle ground, and I think it would be nice to merge this PR and try using it in our queries etc, and then make further changes/improvements as we encounter the need. How does that sound? |
I've wanted this feature myself for a while ❤️. In fact, I did it the wrong way in #5417 and closed it again following the discussion with |
(Sorry about the mass ping, language teams! Changes to the shared inline test framework caused GitHub to request reviews from everyone.) |
I'm currently implementing def nodes for the API graph implementation in Python, and I also encountered similar testing issues. You can see my implementation here. E.g. I think your test will blow up if you try a test like this one. |
*/ | ||
Node getASubclass() { result = this.getASuccessor(Label::subclass()) } | ||
Node getASubclass() { result = this.getAnImmediateSubclass*() } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that's probably better (though I still don't have a good intuition about what it's like to model Ruby frameworks).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that makes sense. If we do so, we should probably also do the same for getReturn
. Together that should cover both instance and class methods.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added this in 704b585
Change the behaviour of `API::getInstance()` and `API::getReturn()` to include results on subclasses of the current API node.
Thanks @erik-krogh! I think for now I'm going to stick with the simple approach in this PR but if we encounter performance problems in future then it's good to know there's a better implementation we can switch to 👍 |
Given the code
You can find uses of B and C with the expression
We do this by adding edges from the use of
A
inclass B < A
to all uses ofB
.API Graph paths that use
getASubclass()
will, I think, always be longer than an equivalent "canonical" path which doesn't usegetASubclass()
. For example, a use of the classB
above is accessible viaAPI::getTopLevelMember("B")
andAPI::getTopLevelMember("A").getASubclass()
. Therefore when it comes to testinggetASubclass()
, we're interested in non-canonical paths, for example:To test subclass support using inline tests, I've extended the
InlineTest
framework to support optional results, which are matched against annotations but do not trigger a failure if there is no matching annotation. This allows us to add annotations for non-canonical paths where we want to test subclassing, but leave existing tests alone.