Ruby: Add partial support for working with RBI (Ruby Interface) files #8845

alexrford · 2022-04-24T22:08:01Z

RBI files are used by Sorbet to glean type information about Ruby code. These are valid Ruby files that can contain additional information which Sorbet can use to perform both static and runtime type checking. I've focused mostly on the static elements of this in the current PR.

It's possible to define the source code of a program directly in RBI files, but it generally seems more common to have the source of a program in regular .rb files and to use separate corresponding RBI files to define type signatures. This approach is useful in defining type signatures for gems, collected into a repository at https://github.com/sorbet/sorbet-typed/tree/master/lib - similar to the https://github.com/DefinitelyTyped/DefinitelyTyped project for TypeScript. External type definitions are the focus of this initial library implementation. One possible use case is to use existing type definitions from sorbet-typed to generate a list of type signatures for an API, which could serve as a first step in modelling that API.

As RBI files are "just" Ruby, we just need to tell the extractor to look for .rbi files as well in order to extract them. I've not enabled this by default in this PR, partly because I'm concerned that methods with multiple definitions (one real, and one "prototype" for typing purposes) could cause problems with resolving call targets.

This library is marked as experimental for the moment. It includes support for a chunk of RBI, but it's definitely not comprehensive. Some major missing features of the current implementation are:

Inheritance and mixins
- Sorbet has some more advanced features in this area such as sealed classes, final methods, abstract classes and methods, etc.
Some generic types like T::Enumerable, T::Set, T::Range
Anything relating to runtime typing information, such as T.reveal_type(x) to get the type of a variable x. These seemed less relevant for just extracting type signatures.
Sorbet's typed T::Struct and T::Enum classes which lets code that uses these concepts be written in a more type-safe way

See https://sorbet.org/docs/rbi for reference

This reverts commit ba9342e.

hmac · 2022-04-28T01:05:23Z

As RBI files are "just" Ruby, we just need to tell the extractor to look for .rbi files as well in order to extract them. I've not enabled this by default in this PR, partly because I'm concerned that methods with multiple definitions (one real, and one "prototype" for typing purposes) could cause problems with resolving call targets.

This is a really good point. I'm not sure what the best approach is. We could completely separate .rbi files from normal Ruby files, creating new dbscheme entries for them and extracting them separately. Though I think we'd then need to copy a whole bunch of QL code to model the AST all over again (though maybe we could model the restricted subset that rbi files typically use).

Another option is to extract them as normal Ruby code, but try to exclude them from call resolution and other stuff that we don't want them included in. This might be as simple as adding result.getLocation().getFile().getExtension() != "rbi" in various places, but it seems a little messy and we will probably miss some things. I also don't know if there would be performance implications to applying this kind of filtering so high up, vs early on when constructing the AST.

hmac · 2022-04-28T04:42:31Z

Another thing that comes to mind: do we want to be working at the DataFlow layer here, or should/can we stick to the AST layer? It seems to me that we're effectively parsing a DSL, so it should be enough to work with the AST to extract the information we need.

aibaars · 2022-04-28T08:12:09Z

As RBI files are "just" Ruby, we just need to tell the extractor to look for .rbi files as well in order to extract them. I've not enabled this by default in this PR, partly because I'm concerned that methods with multiple definitions (one real, and one "prototype" for typing purposes) could cause problems with resolving call targets.

It may indeed cause problem, although most likely we'd resolve a call to both targets. The prototype likely has no body so its effect on dataflow etc should be minimal.

In normal Ruby files you can also define the same method multiple times. At runtime the last definition "wins". So the problem with multiple definitions is not new to RBI; it's just more likely to happen with RBI files.

ruby/ql/lib/codeql/ruby/controlflow/CfgNodes.qll

ruby/ql/lib/codeql/ruby/experimental/Rbi.qll

Co-authored-by: Arthur Baars <aibaars@github.com>

alexrford · 2022-05-05T17:49:04Z

Another thing that comes to mind: do we want to be working at the DataFlow layer here, or should/can we stick to the AST layer? It seems to me that we're effectively parsing a DSL, so it should be enough to work with the AST to extract the information we need.

Yeah, this is a really good point. I went with dataflow out of convenience more than anything else, mostly to use API graphs for get method calls/constant accesses from the T module. The main use case that I can think of where dataflow might be useful is in tracking transitive type aliases, so something like:

StringOrSymbol = T.type_alias { T.Any(String, Symbol) }
...
Identifier = T.type_alias { StringOrSymbol }

It's perhaps not a very relevant use case though. Type aliases seem uncommon, at least in the sorbet-typed repo, and for models-as-data I think we could just handle the transitive case in parsing of the type definition rows.

I've replaced the dataflow version with an AST based version in the latest commit at time of writing. The implementation is of very similar complexity overall. As a side note, I'm still using the CFG to determine things like:

which method or attr_reader/attr_accessor is associated with a given signature
which type a type alias resolves to

It may be possible to cover these cases purely using the AST, e.g. by looking at successive statements in a StmtSequence after the signature definition until you find a statement that looks like a method definition or attr_{reader,accessor} call for 1, rather than looking at successors in the CFG. At this point I'm not really sure if there are any cases that we cover with the CFG search that we wouldn't cover with an AST search, or vice-versa.

aibaars

This looks fine to me. We might want to have a generalized API that covers RBI and RBS when we add support for RBS files.

aibaars · 2022-05-11T08:36:24Z

ruby/ql/lib/codeql/ruby/experimental/Rbi.qll

+     */
+    abstract class RbiType extends Expr { }
+
+    class ConstantReadAccessAsRbiType extends RbiType {


This class has no qldoc and its name isn't very descriptive. Is it supposed to represent a named class or module type?

Forgot to add qldoc for this. It's intended to represent cases like read accesses to the Integer and MyList classes and the MyList2 constant in:

MyList2 = T.type_alias(MyList) sig { params(l: MyList2).returns(Integer) } def len(l); end

Generally I think that these types would always represent some Ruby class, possibly via some intermediate type aliases, but the class definitions might be not be available in the database (Integer and String are common examples). I ended up making this QL class very broad to avoid missing cases like these.

Separately, I've noticed that I can simplify this as just class ConstantReadAccessAsRbiType extends RbiType, ConstantReadAccess { } after moving this library to the AST layer.

alexrford · 2022-05-11T14:05:24Z

This looks fine to me. We might want to have a generalized API that covers RBI and RBS when we add support for RBS files.

Makes sense to me - the design of this API should probably be based around working with other models-as-data tooling.

alexrford added 5 commits April 24, 2022 22:27

Ruby: Add ExprNodes::CallableCfgNode and ExprNodes::MethodBaseCfgNode

e3e02c9

Ruby: add experimental library to support RBI files

e03ce8f

Ruby: test files for RBI library

ad3a9b1

Ruby: extract rbi files

de35bd9

Revert "Ruby: extract rbi files"

869d827

This reverts commit ba9342e.

alexrford added the Ruby label Apr 24, 2022

alexrford marked this pull request as ready for review April 25, 2022 08:41

alexrford requested a review from a team as a code owner April 25, 2022 08:41

Ruby: fix alert

b956616

alexrford added the no-change-note-required This PR does not need a change note label Apr 25, 2022

aibaars reviewed Apr 29, 2022

View reviewed changes

ruby/ql/lib/codeql/ruby/controlflow/CfgNodes.qll Outdated Show resolved Hide resolved

ruby/ql/lib/codeql/ruby/experimental/Rbi.qll Outdated Show resolved Hide resolved

alexrford and others added 8 commits May 4, 2022 14:04

ruby: drop unnecessary getExpr

4210973

Co-authored-by: Arthur Baars <aibaars@github.com>

ruby: drop a TODO

687602b

ruby: drop the CallableCfgNode classes

1af5c68

ruby: new rbi test case

08fa397

ruby: tidy up methodSignatureSuccessorNodeRanked predicate

1e3ab52

Ruby: fix getAssociatedMethod predicate to include class methods

961f867

ruby: Add AST layer version of the RBI library

bedb1d4

ruby: replace the dataflow layer RBI library with the AST layer version

4844e4f

aibaars reviewed May 11, 2022

View reviewed changes

Ruby: document ConstantReadAccessAsRbiType class

a114050

Merge remote-tracking branch 'origin/main' into ruby/rbi-lib

196c68b

aibaars approved these changes May 27, 2022

View reviewed changes

alexrford merged commit 5d4473b into github:main May 27, 2022

alexrford deleted the ruby/rbi-lib branch September 23, 2022 09:04

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Ruby: Add partial support for working with RBI (Ruby Interface) files #8845

Ruby: Add partial support for working with RBI (Ruby Interface) files #8845

Uh oh!

alexrford commented Apr 24, 2022

Uh oh!

hmac commented Apr 28, 2022 •

edited

Loading

Uh oh!

hmac commented Apr 28, 2022

Uh oh!

aibaars commented Apr 28, 2022

Uh oh!

Uh oh!

Uh oh!

alexrford commented May 5, 2022

Uh oh!

aibaars left a comment

Uh oh!

aibaars May 11, 2022

Uh oh!

alexrford May 11, 2022 •

edited

Loading

Uh oh!

alexrford May 11, 2022

Uh oh!

alexrford commented May 11, 2022

Uh oh!

Uh oh!

Ruby: Add partial support for working with RBI (Ruby Interface) files #8845

Ruby: Add partial support for working with RBI (Ruby Interface) files #8845

Uh oh!

Conversation

alexrford commented Apr 24, 2022

Uh oh!

hmac commented Apr 28, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

hmac commented Apr 28, 2022

Uh oh!

aibaars commented Apr 28, 2022

Uh oh!

Uh oh!

Uh oh!

alexrford commented May 5, 2022

Uh oh!

aibaars left a comment

Choose a reason for hiding this comment

Uh oh!

aibaars May 11, 2022

Choose a reason for hiding this comment

Uh oh!

alexrford May 11, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

alexrford May 11, 2022

Choose a reason for hiding this comment

Uh oh!

alexrford commented May 11, 2022

Uh oh!

Uh oh!

hmac commented Apr 28, 2022 •

edited

Loading

alexrford May 11, 2022 •

edited

Loading