Switch to using YARP as the parser #41

kddnewton · 2023-08-14T16:35:27Z

This parser is going to be backend for Syntax Tree soon anyway, so we may as well start using it. It comes with much increased speed and much reduced memory.

As a part of this commit, I've changed a little bit and fixed a couple of bugs:

Comments were being grabbed up even if they had empty newlines between blocks. I'm not sure if that's desirable. If it is, we can definitely revert to get that behavior back. Also, inline comments on expressions (like foo + bar # comment) were being grabbed up, so I removed that.
Namespaces weren't being correctly calculated when you had nested constants in the name, as in class Foo::Bar; end. That's been fixed.
Top-level constant lookup wasn't being handled properly, as in module Foo; class ::Bar::Baz; end; end, so that's been fixed.
Superclasses that were expressions resulted in some weird behavior. For example class Foo < bar; end. At the moment I changed it to just return nil for a superclass, but you might want a placeholder that says "we couldn't figure it out" or something to that effect.
All methods were being attached as instance methods, even if there was a receiver where that didn't make sense. I've changed that to ignore those methods for now, but of course we can revisit that later.
The qualified names for included and extended modules was incorrect in a lot of cases. In order to get this right you need to replicate Ruby's runtime behavior, which isn't exactly possible in a static context. However, we can approximate it, which is what I've done in this commit.
I've ignored included and extended arguments that cannot be resolved as constants. This is because you could have include foo.

A couple of things I noticed that would be good for future work:

I think it would be more ergonomic to use a hash on qualified name => class/module as opposed to the sets. Right now whenever you define a class or module, it scans the entire list of classes and modules to find a matching one, which can be quite costly in a large codebase. Since qualified names have to be unique anyway, you can make this O(1) instead of O(n).
Singleton classes aren't handled (class << self), so a lot of your methods are going to be attached to the wrong object.
To get the order of things right, you should probably build a tree of requires from the main gem require as opposed to looping through every file. Otherwise you might end up with a lack of information about necessary includes/extends. A topographical sort would be your friend here.
The fact that the analyzer is the top-level node for your methods if you don't have a current context makes it difficult to test. It might be nicer to have a Root object that you can attach those things to instead.

Sorry this is such a big PR. I tried to keep things 1:1 as much as I could to make it easier to review. Happy to discuss or explain anything you might have questions about.

This parser is going to be backend for Syntax Tree soon anyway, so we may as well start using it. It comes with much increased speed and much reduced memory.

…mespaces Now, that `Class/ModuleDefiniton#namespace` is an array of constants we need another way of comparing and matching namespaces. That's why this commit introcudes `Class/ModuleDefiniton#qualified_namespace` so we can compare and match the namespaces as strings.

marcoroth · 2023-08-15T12:53:40Z

Wow... this is great! Thank you so much, Kevin!

Now I kinda feel sorry that you had to refactor my undocumented and untested Hackathon code 🙈

I manually tested most of the app-functionality, and the speed difference is actually crazy as it's really noticeable while browsing. Super awesome!

Going through your changes, I really appreciate the fixes you've implemented, they are spot on! 🙏🏼 Also the way you've revamped the Visitor/Analyzer with the added comments is great, feels much better compared to what I had!

Your insights for future improvements are gold!

At the moment I changed it to just return nil for a superclass, but you might want a placeholder that says "we couldn't figure it out" or something to that effect.

I think returning nil for unprocessable superclasses is good enough for now, but I'm going to open an issue for it so we can revisit that later (=> #43)

All methods were being attached as instance methods, even if there was a receiver where that didn't make sense. I've changed that to ignore those methods for now, but of course we can revisit that later.

I would be interested to know if you happen to have an example at hand to show how that might look like in Ruby code. I would assume something like the case described in #20?

I think it would be more ergonomic to use a hash on qualified name => class/module as opposed to the sets.

Agreed, this is a good point. Right now the use of Set is also not super consistent throughout the codebase.

Singleton classes aren't handled (class << self), so a lot of your methods are going to be attached to the wrong object.

Yeah, that's a tricky one. I guess you could detect the << method call on class with self as an argument and set a flag on the current context to indicate that we are currently in a class << self block. And, when determining if a method definition is a class or instance method we could check for the presence of that flag (Tracking it here => #7)

To get the order of things right, you should probably build a tree of requires from the main gem require as opposed to looping through every file.

Yeah, I wasn't 100% sure how to approach this. I left this out for now, especially since we also have zeitwerk powered gems which make this approach not viable from my understanding. Maybe we would just need two modes to be able to handle both options.

It might be nicer to have a Root object that you can attach those things to instead.

Oh yeah, I've also been thinking about this. The solution with storing it on the Analyzer was a quick-win to make it work, but definitely isn't really the "right solution".

I'm super happy with the outcome of this pull request! From my perspective this is mergeable as-is, thank you!

kddnewton · 2023-08-15T13:30:20Z

I would be interested to know if you happen to have an example at hand to show how that might look like in Ruby code. I would assume something like the case described in #20?

You can have really (almost) anything as the receiver of a method definition. So for example:

def foo.bar; end
def @foo.bar; end
def @@foo.bar; end
def $foo.bar; end
def Foo.bar; end
def ::Foo.bar; end
def Foo::Foo.bar; end
def __FILE__.bar; end
def self.bar; end
def (foo + foo).bar; end

Yeah, that's a tricky one. I guess you could detect the << method call on class with self as an argument and set a flag on the current context to indicate that we are currently in a class << self block. And, when determining if a method definition is a class or instance method we could check for the presence of that flag (Tracking it here => #7)

Here as well, you can have almost anything with a class << self syntax. I would suggest handling self and constants, and bailing on the rest. Here are some examples:

class << foo; end
class << @foo; end
class << @@foo; end
class << $foo; end
class << Foo; end
class << ::Foo; end
class << Foo::Foo; end
class << __FILE__; end
class << self.foo; end
class << (foo + foo); end

Yeah, I wasn't 100% sure how to approach this. I left this out for now, especially since we also have zeitwerk powered gems which make this approach not viable from my understanding. Maybe we would just need two modes to be able to handle both options.

I don't think having zeitwerk makes this impossible. If anything it's helpful to have zeitwerk find all of the files for you. Either way to build the graph you'd have to walk through each file. It's difficult because require is just a method call, but you can probably take the same approach I took here with include/extend and just guess. You would probably need require_relative and load as well.

marcoroth · 2023-08-15T14:15:00Z

Thanks for the added context! All of this seems reasonable and I kinda doubt that YARD/RDoc are able to handle all of these cases as well.

I guess focusing on the most common use-cases should be good enough for 95%+ of the cases. There is anyway no way to statically analyze all the meta-programming one could do in Ruby.

paracycle · 2023-08-15T19:28:33Z

To get the order of things right, you should probably build a tree of requires from the main gem require as opposed to looping through every file.

Yeah, I wasn't 100% sure how to approach this. I left this out for now, especially since we also have zeitwerk powered gems which make this approach not viable from my understanding. Maybe we would just need two modes to be able to handle both options.

I am working on something like this in Tapioca. I have it roughly working, but it is not fully usable yet. If it does end up working, it would not matter if a gem is using Zeitwerk or not, and the implementation could be reusable so that gem.sh can use it as well.

Btw, Tapioca, as a concept, does a lot of things that are similar to what gem.sh does. After all, at the core of it, it is a tool that takes in a gem and extracts a Ruby interface file for it, which, if you squint is basically documentation for the gem. It would be nice to find other ways to collaborate between the two projects, and I am happy to share more about some of our experiences trying to get runtime code loading from gems just right (hint: in the general, it is impossible to load all of a gem's code).

bradgessler · 2023-08-23T22:22:45Z

I'd be interested in a gem that exposes a RubyGem as a model that I can pull into a documentation project I have in the works similar to gem.sh. @marcoroth and I briefly talked about this on Twitter. @paracycle how far off do you feel your project is from landing in a gem? It's tempting to move all the Gem parsing/modeling code from this repo into ./lib/foo and then releasing those bits as a gem since the code seems pretty well factored in this project.

If we're keeping count, that's 3 projects that would want to share this functionality 🙃

kddnewton and others added 3 commits August 14, 2023 12:29

Switch to using YARP as the parser

d24877b

This parser is going to be backend for Syntax Tree soon anyway, so we may as well start using it. It comes with much increased speed and much reduced memory.

Fix undefined variable in Analyzer#analyze

201efdd

marcoroth mentioned this pull request Aug 15, 2023

Support superclasses with any expression #43

Open

marcoroth merged commit 8a02fb3 into marcoroth:main Aug 15, 2023
2 checks passed

marcoroth mentioned this pull request Aug 15, 2023

handle method definitions with constants #26

Open

kddnewton deleted the yarp branch August 26, 2023 15:26

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Switch to using YARP as the parser #41

Switch to using YARP as the parser #41

kddnewton commented Aug 14, 2023

marcoroth commented Aug 15, 2023

kddnewton commented Aug 15, 2023

marcoroth commented Aug 15, 2023

paracycle commented Aug 15, 2023

bradgessler commented Aug 23, 2023

Switch to using YARP as the parser #41

Switch to using YARP as the parser #41

Conversation

kddnewton commented Aug 14, 2023

marcoroth commented Aug 15, 2023

kddnewton commented Aug 15, 2023

marcoroth commented Aug 15, 2023

paracycle commented Aug 15, 2023

bradgessler commented Aug 23, 2023