Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Switch to using YARP as the parser #41

Merged
merged 3 commits into from Aug 15, 2023
Merged

Conversation

kddnewton
Copy link
Contributor

This parser is going to be backend for Syntax Tree soon anyway, so we may as well start using it. It comes with much increased speed and much reduced memory.

As a part of this commit, I've changed a little bit and fixed a couple of bugs:

  • Comments were being grabbed up even if they had empty newlines between blocks. I'm not sure if that's desirable. If it is, we can definitely revert to get that behavior back. Also, inline comments on expressions (like foo + bar # comment) were being grabbed up, so I removed that.
  • Namespaces weren't being correctly calculated when you had nested constants in the name, as in class Foo::Bar; end. That's been fixed.
  • Top-level constant lookup wasn't being handled properly, as in module Foo; class ::Bar::Baz; end; end, so that's been fixed.
  • Superclasses that were expressions resulted in some weird behavior. For example class Foo < bar; end. At the moment I changed it to just return nil for a superclass, but you might want a placeholder that says "we couldn't figure it out" or something to that effect.
  • All methods were being attached as instance methods, even if there was a receiver where that didn't make sense. I've changed that to ignore those methods for now, but of course we can revisit that later.
  • The qualified names for included and extended modules was incorrect in a lot of cases. In order to get this right you need to replicate Ruby's runtime behavior, which isn't exactly possible in a static context. However, we can approximate it, which is what I've done in this commit.
  • I've ignored included and extended arguments that cannot be resolved as constants. This is because you could have include foo.

A couple of things I noticed that would be good for future work:

  • I think it would be more ergonomic to use a hash on qualified name => class/module as opposed to the sets. Right now whenever you define a class or module, it scans the entire list of classes and modules to find a matching one, which can be quite costly in a large codebase. Since qualified names have to be unique anyway, you can make this O(1) instead of O(n).
  • Singleton classes aren't handled (class << self), so a lot of your methods are going to be attached to the wrong object.
  • To get the order of things right, you should probably build a tree of requires from the main gem require as opposed to looping through every file. Otherwise you might end up with a lack of information about necessary includes/extends. A topographical sort would be your friend here.
  • The fact that the analyzer is the top-level node for your methods if you don't have a current context makes it difficult to test. It might be nicer to have a Root object that you can attach those things to instead.

Sorry this is such a big PR. I tried to keep things 1:1 as much as I could to make it easier to review. Happy to discuss or explain anything you might have questions about.

kddnewton and others added 3 commits August 14, 2023 12:29
This parser is going to be backend for Syntax Tree soon anyway, so
we may as well start using it. It comes with much increased speed
and much reduced memory.
…mespaces

Now, that `Class/ModuleDefiniton#namespace` is an array of constants we need another way of comparing and matching namespaces. That's why this commit introcudes `Class/ModuleDefiniton#qualified_namespace` so we can compare and match the namespaces as strings.
@marcoroth
Copy link
Owner

Wow... this is great! Thank you so much, Kevin!

Now I kinda feel sorry that you had to refactor my undocumented and untested Hackathon code 🙈

I manually tested most of the app-functionality, and the speed difference is actually crazy as it's really noticeable while browsing. Super awesome!

Going through your changes, I really appreciate the fixes you've implemented, they are spot on! 🙏🏼 Also the way you've revamped the Visitor/Analyzer with the added comments is great, feels much better compared to what I had!

Your insights for future improvements are gold!


At the moment I changed it to just return nil for a superclass, but you might want a placeholder that says "we couldn't figure it out" or something to that effect.

I think returning nil for unprocessable superclasses is good enough for now, but I'm going to open an issue for it so we can revisit that later (=> #43)


All methods were being attached as instance methods, even if there was a receiver where that didn't make sense. I've changed that to ignore those methods for now, but of course we can revisit that later.

I would be interested to know if you happen to have an example at hand to show how that might look like in Ruby code. I would assume something like the case described in #20?


I think it would be more ergonomic to use a hash on qualified name => class/module as opposed to the sets.

Agreed, this is a good point. Right now the use of Set is also not super consistent throughout the codebase.


Singleton classes aren't handled (class << self), so a lot of your methods are going to be attached to the wrong object.

Yeah, that's a tricky one. I guess you could detect the << method call on class with self as an argument and set a flag on the current context to indicate that we are currently in a class << self block. And, when determining if a method definition is a class or instance method we could check for the presence of that flag (Tracking it here => #7)


To get the order of things right, you should probably build a tree of requires from the main gem require as opposed to looping through every file.

Yeah, I wasn't 100% sure how to approach this. I left this out for now, especially since we also have zeitwerk powered gems which make this approach not viable from my understanding. Maybe we would just need two modes to be able to handle both options.


It might be nicer to have a Root object that you can attach those things to instead.

Oh yeah, I've also been thinking about this. The solution with storing it on the Analyzer was a quick-win to make it work, but definitely isn't really the "right solution".


I'm super happy with the outcome of this pull request! From my perspective this is mergeable as-is, thank you!

@kddnewton
Copy link
Contributor Author

I would be interested to know if you happen to have an example at hand to show how that might look like in Ruby code. I would assume something like the case described in #20?

You can have really (almost) anything as the receiver of a method definition. So for example:

def foo.bar; end
def @foo.bar; end
def @@foo.bar; end
def $foo.bar; end
def Foo.bar; end
def ::Foo.bar; end
def Foo::Foo.bar; end
def __FILE__.bar; end
def self.bar; end
def (foo + foo).bar; end

Yeah, that's a tricky one. I guess you could detect the << method call on class with self as an argument and set a flag on the current context to indicate that we are currently in a class << self block. And, when determining if a method definition is a class or instance method we could check for the presence of that flag (Tracking it here => #7)

Here as well, you can have almost anything with a class << self syntax. I would suggest handling self and constants, and bailing on the rest. Here are some examples:

class << foo; end
class << @foo; end
class << @@foo; end
class << $foo; end
class << Foo; end
class << ::Foo; end
class << Foo::Foo; end
class << __FILE__; end
class << self.foo; end
class << (foo + foo); end

Yeah, I wasn't 100% sure how to approach this. I left this out for now, especially since we also have zeitwerk powered gems which make this approach not viable from my understanding. Maybe we would just need two modes to be able to handle both options.

I don't think having zeitwerk makes this impossible. If anything it's helpful to have zeitwerk find all of the files for you. Either way to build the graph you'd have to walk through each file. It's difficult because require is just a method call, but you can probably take the same approach I took here with include/extend and just guess. You would probably need require_relative and load as well.

@marcoroth
Copy link
Owner

Thanks for the added context! All of this seems reasonable and I kinda doubt that YARD/RDoc are able to handle all of these cases as well.

I guess focusing on the most common use-cases should be good enough for 95%+ of the cases. There is anyway no way to statically analyze all the meta-programming one could do in Ruby.

@marcoroth marcoroth merged commit 8a02fb3 into marcoroth:main Aug 15, 2023
2 checks passed
@paracycle
Copy link

To get the order of things right, you should probably build a tree of requires from the main gem require as opposed to looping through every file.

Yeah, I wasn't 100% sure how to approach this. I left this out for now, especially since we also have zeitwerk powered gems which make this approach not viable from my understanding. Maybe we would just need two modes to be able to handle both options.

I am working on something like this in Tapioca. I have it roughly working, but it is not fully usable yet. If it does end up working, it would not matter if a gem is using Zeitwerk or not, and the implementation could be reusable so that gem.sh can use it as well.

Btw, Tapioca, as a concept, does a lot of things that are similar to what gem.sh does. After all, at the core of it, it is a tool that takes in a gem and extracts a Ruby interface file for it, which, if you squint is basically documentation for the gem. It would be nice to find other ways to collaborate between the two projects, and I am happy to share more about some of our experiences trying to get runtime code loading from gems just right (hint: in the general, it is impossible to load all of a gem's code).

@bradgessler
Copy link

I'd be interested in a gem that exposes a RubyGem as a model that I can pull into a documentation project I have in the works similar to gem.sh. @marcoroth and I briefly talked about this on Twitter. @paracycle how far off do you feel your project is from landing in a gem? It's tempting to move all the Gem parsing/modeling code from this repo into ./lib/foo and then releasing those bits as a gem since the code seems pretty well factored in this project.

If we're keeping count, that's 3 projects that would want to share this functionality 🙃

@kddnewton kddnewton deleted the yarp branch August 26, 2023 15:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants