Skip to content

Ruby: Add partial support for working with RBI (Ruby Interface) files #8845

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 16 commits into from
May 27, 2022

Conversation

alexrford
Copy link
Contributor

RBI files are used by Sorbet to glean type information about Ruby code. These are valid Ruby files that can contain additional information which Sorbet can use to perform both static and runtime type checking. I've focused mostly on the static elements of this in the current PR.

It's possible to define the source code of a program directly in RBI files, but it generally seems more common to have the source of a program in regular .rb files and to use separate corresponding RBI files to define type signatures. This approach is useful in defining type signatures for gems, collected into a repository at https://github.com/sorbet/sorbet-typed/tree/master/lib - similar to the https://github.com/DefinitelyTyped/DefinitelyTyped project for TypeScript. External type definitions are the focus of this initial library implementation. One possible use case is to use existing type definitions from sorbet-typed to generate a list of type signatures for an API, which could serve as a first step in modelling that API.

As RBI files are "just" Ruby, we just need to tell the extractor to look for .rbi files as well in order to extract them. I've not enabled this by default in this PR, partly because I'm concerned that methods with multiple definitions (one real, and one "prototype" for typing purposes) could cause problems with resolving call targets.

This library is marked as experimental for the moment. It includes support for a chunk of RBI, but it's definitely not comprehensive. Some major missing features of the current implementation are:

  • Inheritance and mixins
    • Sorbet has some more advanced features in this area such as sealed classes, final methods, abstract classes and methods, etc.
  • Some generic types like T::Enumerable, T::Set, T::Range
  • Anything relating to runtime typing information, such as T.reveal_type(x) to get the type of a variable x. These seemed less relevant for just extracting type signatures.
  • Sorbet's typed T::Struct and T::Enum classes which lets code that uses these concepts be written in a more type-safe way

See https://sorbet.org/docs/rbi for reference

@alexrford alexrford added the Ruby label Apr 24, 2022
@alexrford alexrford marked this pull request as ready for review April 25, 2022 08:41
@alexrford alexrford requested a review from a team as a code owner April 25, 2022 08:41
@alexrford alexrford added the no-change-note-required This PR does not need a change note label Apr 25, 2022
@hmac
Copy link
Contributor

hmac commented Apr 28, 2022

As RBI files are "just" Ruby, we just need to tell the extractor to look for .rbi files as well in order to extract them. I've not enabled this by default in this PR, partly because I'm concerned that methods with multiple definitions (one real, and one "prototype" for typing purposes) could cause problems with resolving call targets.

This is a really good point. I'm not sure what the best approach is. We could completely separate .rbi files from normal Ruby files, creating new dbscheme entries for them and extracting them separately. Though I think we'd then need to copy a whole bunch of QL code to model the AST all over again (though maybe we could model the restricted subset that rbi files typically use).

Another option is to extract them as normal Ruby code, but try to exclude them from call resolution and other stuff that we don't want them included in. This might be as simple as adding result.getLocation().getFile().getExtension() != "rbi" in various places, but it seems a little messy and we will probably miss some things. I also don't know if there would be performance implications to applying this kind of filtering so high up, vs early on when constructing the AST.

@hmac
Copy link
Contributor

hmac commented Apr 28, 2022

Another thing that comes to mind: do we want to be working at the DataFlow layer here, or should/can we stick to the AST layer? It seems to me that we're effectively parsing a DSL, so it should be enough to work with the AST to extract the information we need.

@aibaars
Copy link
Contributor

aibaars commented Apr 28, 2022

As RBI files are "just" Ruby, we just need to tell the extractor to look for .rbi files as well in order to extract them. I've not enabled this by default in this PR, partly because I'm concerned that methods with multiple definitions (one real, and one "prototype" for typing purposes) could cause problems with resolving call targets.

It may indeed cause problem, although most likely we'd resolve a call to both targets. The prototype likely has no body so its effect on dataflow etc should be minimal.

In normal Ruby files you can also define the same method multiple times. At runtime the last definition "wins". So the problem with multiple definitions is not new to RBI; it's just more likely to happen with RBI files.

@alexrford
Copy link
Contributor Author

Another thing that comes to mind: do we want to be working at the DataFlow layer here, or should/can we stick to the AST layer? It seems to me that we're effectively parsing a DSL, so it should be enough to work with the AST to extract the information we need.

Yeah, this is a really good point. I went with dataflow out of convenience more than anything else, mostly to use API graphs for get method calls/constant accesses from the T module. The main use case that I can think of where dataflow might be useful is in tracking transitive type aliases, so something like:

StringOrSymbol = T.type_alias { T.Any(String, Symbol) }
...
Identifier = T.type_alias { StringOrSymbol }

It's perhaps not a very relevant use case though. Type aliases seem uncommon, at least in the sorbet-typed repo, and for models-as-data I think we could just handle the transitive case in parsing of the type definition rows.

I've replaced the dataflow version with an AST based version in the latest commit at time of writing. The implementation is of very similar complexity overall. As a side note, I'm still using the CFG to determine things like:

  1. which method or attr_reader/attr_accessor is associated with a given signature
  2. which type a type alias resolves to

It may be possible to cover these cases purely using the AST, e.g. by looking at successive statements in a StmtSequence after the signature definition until you find a statement that looks like a method definition or attr_{reader,accessor} call for 1, rather than looking at successors in the CFG. At this point I'm not really sure if there are any cases that we cover with the CFG search that we wouldn't cover with an AST search, or vice-versa.

Copy link
Contributor

@aibaars aibaars left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks fine to me. We might want to have a generalized API that covers RBI and RBS when we add support for RBS files.

*/
abstract class RbiType extends Expr { }

class ConstantReadAccessAsRbiType extends RbiType {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This class has no qldoc and its name isn't very descriptive. Is it supposed to represent a named class or module type?

Copy link
Contributor Author

@alexrford alexrford May 11, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Forgot to add qldoc for this. It's intended to represent cases like read accesses to the Integer and MyList classes and the MyList2 constant in:

  MyList2 = T.type_alias(MyList)

  sig { params(l: MyList2).returns(Integer) }
  def len(l); end

Generally I think that these types would always represent some Ruby class, possibly via some intermediate type aliases, but the class definitions might be not be available in the database (Integer and String are common examples). I ended up making this QL class very broad to avoid missing cases like these.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Separately, I've noticed that I can simplify this as just class ConstantReadAccessAsRbiType extends RbiType, ConstantReadAccess { } after moving this library to the AST layer.

@alexrford
Copy link
Contributor Author

This looks fine to me. We might want to have a generalized API that covers RBI and RBS when we add support for RBS files.

Makes sense to me - the design of this API should probably be based around working with other models-as-data tooling.

@alexrford alexrford merged commit 5d4473b into github:main May 27, 2022
@alexrford alexrford deleted the ruby/rbi-lib branch September 23, 2022 09:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
no-change-note-required This PR does not need a change note Ruby
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants