Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Named captures syntax in NodePattern #6724

Open
baweaver opened this issue Feb 3, 2019 · 11 comments
Open

Named captures syntax in NodePattern #6724

baweaver opened this issue Feb 3, 2019 · 11 comments
Labels
feature request stale Issues that haven't been active in a while

Comments

@baweaver
Copy link

baweaver commented Feb 3, 2019

Describe the solution you'd like

Currently captures are returned in an Array in the order that they're processed walking the tree.

As this syntax feels loosely based on Regex, a named capture would potentially be within the realm of features such a language could express:

def parse(s)
  RuboCop::ProcessedSource.new(s, RUBY_VERSION.to_f)
end

def pattern(s)
  NodePattern.new(s)
end

pattern('(send $<a>(...) :+ $<b>(...))').match(parse('1 + 2').ast)
# => { a: 1, b: 2 }

I believe this would add to the expressiveness of the language, and make code-rewriting based on the results of a captured match easier to work with.

Psuedo-Implementation

I'd made a quick pass at implementation of this, not working with the actual source:

require 'rubocop'

class MetaNodePattern
  def initialize(string)
    @capture_tags = []

    ast_string = string.gsub(/\$<(.+?)>/) do |m|
      @capture_tags << m[2..-2].to_sym
      '$'
    end

    @pattern = RuboCop::NodePattern.new(ast_string)
  end

  def match(node)
    matches = @pattern.match(node)
    return unless matches

    @capture_tags.empty? ? matches : @capture_tags.zip(matches).to_h
  end
end

def parse(s)
  RuboCop::ProcessedSource.new(s, RUBY_VERSION.to_f)
end

def mp(s)
  MetaNodePattern.new(s)
end

meta_match = '(send $<a>(...) :+ $<b>(...))'
ast = parse('1 + 2').ast

p mp(meta_match).match(ast)
# => {:a=>s(:int, 1), :b=>s(:int, 2)}

I'd looked at the source defining the capture, but I'd have to read over it a few more times to really get a grip on how it'd be implemented.

Additional context

This idea is heavily inspired by named captures in Regex ( /(?<name>.+)/ ), and that idea has helped make more complicated Regex queries more expressive through giving names to various sections of captured content.

Potential Issues

Granted changing the syntax comes with a few potential issues.

Forked Return Types

By introducing this it would effectively fork the return from being solely {nil, Array[Node]} to {nil, Array[Node], Hash[Symbol, Node]} depending on the string inputted.

As it would be an additive type it should not affect current matches, mitigating some of the risk.

Mixed Captures

The first potential is dealing with a mix of named and unnamed captures:

mp '(send $<a>(...) :+ $(...))'

In this case I'd consider raising an exception, but don't have a good idea of how to best deal with it at the moment.

Other Issues

There are some minor other issues, but those are mostly due to the nature of hashes involving duplicate keys and validating the syntax, which is done for other constructs as is.

Thoughts?

I'd be curious to get people's thoughts on this. I believe that named captures would make the NodePattern language more expressive, and be a substantial win when dealing with cop clarity, especially around autocorrect code.

Thanks for reading!

@Drenmi
Copy link
Collaborator

Drenmi commented Feb 4, 2019

Hi, @baweaver!

Firstly, thank you for taking the time to think this through. The suggestion is clear and well worded. 🙇

I believe this would add to the expressiveness of the language, and make code-rewriting based on the results of a captured match easier to work with.

The benefit would be limited to the expression itself (since the result will either commonly be destructured or yielded to named variables), but there it could add a great deal of clarity.

The first potential is dealing with a mix of named and unnamed captures [...] In this case I'd consider raising an exception, but don't have a good idea of how to best deal with it at the moment.

Yes. I agree. Either all unnamed, or all named. 🙂

There are some minor other issues, but those are mostly due to the nature of hashes involving duplicate keys and validating the syntax, which is done for other constructs as is.

These are good considerations to make. Another one is that node matchers can yield to a block, e.g.:

my_matcher(node) do |capture|
  ...
end

but I think in the case of the extended syntax, the matcher could yield keyword arguments. WDYT?

I'd be curious to get people's thoughts on this.

I think it is a great suggestion! Would you be interested in working on this? 🙂

@baweaver
Copy link
Author

baweaver commented Feb 4, 2019

These are good considerations to make. Another one is that node matchers can yield to a block [...]

I think in the case of the extended syntax, the matcher could yield keyword arguments. WDYT?

I've done similar things when destructuring hashes, and it makes for some nice syntax, especially considering you know exactly what keys are going to be present preventing any of the usual issues with key presence mismatches versus hashes:

node = parse '1 + 2'

NodePattern
  .new('(send $<a>(...) :+ $<b>(...))')
  .match(node) do |a:, b:|
    # a is 1, b is 2
  end

The benefit would be limited to the expression itself (since the result will either commonly be destructured or yielded to named variables), but there it could add a great deal of clarity.

One thing that I'd noticed for autocorrect is that if you want to extract the values you'd have to run the matcher twice to do so. Not sure if that's correct or not, but if this is circumvented it may be really useful.

I'd been using autocorrect with extracted s-expression trees to change code around, and named captures would make it clearer in those specific contexts:

module RuboCop
  module Cop
    module Tests
      class RailsActionCableWebsocket < RuboCop::Cop::Cop
        MSG = 'Deprecated!'

        # Original
        def_node_search :websocket_set, <<~AST
          (send
            (const nil? :ActionCable) :WebSocket=
            $(...))
        AST

        # Proposed
        def_node_search :websocket_set_alt, <<~AST
          (send
            (const nil? :ActionCable) :WebSocket=
            $<websocket_handler>(...))
        AST

        # Unchanged
        def on_send(node)
          matching_nodes = websocket_set(node)
          add_offense(node, location: :expression) if matching_nodes.any?
        end

        def autocorrect(node)
          lambda do |corrector|
            # Original
            matching_nodes = websocket_set(node)
            content = matching_nodes.first.source

            corrector.replace(
              node.loc.expression,
              "ActionCable.adapters.WebSocket = #{content}"
            )

            # Proposed
            matching_nodes = websocket_set_alt(node)
            content = matching_nodes[:websocket_handler].source

            corrector.replace(
              node.loc.expression,
              "ActionCable.adapters.WebSocket = #{content}"
            )
          end
        end
      end
    end
  end
end

For this specific code it's not a massive change, but it does make it clear exactly what content it is you're substituting in instead of relying on the index and giving it positional (connascence)[http://connascence.io/pages/about.html] (I still love Jim's talk on that concept).

I'd just be very careful not to let it go into method-based captures (e.g. matches.websocket_handler) as that's a good way to get all types of unintended collisions with the language itself.

I think it is a great suggestion! Would you be interested in working on this? 🙂

I'd certainly consider it, though I think my first priority in contribution may be more towards documenting the existing content and exposing how it works in more of a guide-based format.

It took me a fair amount of time to grok what was what and how exactly to do some of this, so I think there'd be substantial gain in focusing on that first. Noted that's a separate issue though, and I still have quite a bit to learn about how this all works. :)

Definitely up for chatting more, are you all active on the Gitter channel?

@rrosenblum
Copy link
Contributor

Somewhat related to this. I have had some issues with the current implementation of named captures when using the same name multiple times across optional sections.

For example, (send _name :== _name) will only match when the value of _name is the same on both sides.

(done from memory so this may not show off the exact issue)
Something like

(send
  {
    (send _name :== _)
    (send _name :!= _)
  }
  :== _name)

may not wind up matching. My theory is that during a first pass, _name gets registered to a variable partially matching the pattern (foo == bar == baz). The value for name is not reset, or held onto, if the second part of the matcher is invoked, foo != bar == foo.

I bring this up because I assume the same area of the code will be touched when looking into this feature.

@Drenmi
Copy link
Collaborator

Drenmi commented Feb 6, 2019

For example, (send _name :== _name) will only match when the value of _name is the same on both sides.

This is intentional behavior. It is mentioned in the code file as “unification”. 🙂

@rrosenblum
Copy link
Contributor

rrosenblum commented Feb 6, 2019

I don't think I did the best job explaining the issue that I have run into. The functionality of matching named captures via (send _name :== _name) works great. I have run into issues with it when working with more complex patterns that have multiple options for a pattern.

Given the code foo == bar && foo == baz and the pattern (and (send $_name :== _) (send $_name :== _)), the pattern will only match when the left hand sides of both == methods are the same variable. This works exactly as I would expect, and want, it to.

However, given the code foo && foo == baz and the pattern

(and {    
      (send $_ name :!= _)    
      $_name    
     }    
     (send $_name :== _)    
)

This will fail to match the code even though we have a direct match in the second option of the left hand side of the && pattern.

For clarity, the following pattern will match the example code. This will force you to have to check that the variables match within the code rather than being able to handle it directly in pattern.

(and {    
      (send $_ :!= _)    
      $_    
     }    
     (send $_ :== _)    
)

It seems like when there are multiple parts to a pattern, the named captures are unable to hang onto multiple potential matches. My assumption is that when the first part of the group (send $_name :!= _) fails to match, it winds up registering _name as nil, or something similar, and prevents the more generic $_name from registering anything onto _name.

I hope this clarifies the issue that I was trying to convey. I realize that this is tangentially related to the issue being reported, and I will gladly open a new issue to move this conversation to.

@stale
Copy link

stale bot commented May 8, 2019

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contribution and understanding!

@stale stale bot added the stale Issues that haven't been active in a while label May 8, 2019
@baweaver
Copy link
Author

baweaver commented May 9, 2019

I'll be looking more into this and a few ideas, though I'd noted the any order groupings taking <> so I might need to rethink that a bit.

@stale stale bot removed the stale Issues that haven't been active in a while label May 9, 2019
@stale
Copy link

stale bot commented Aug 7, 2019

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contribution and understanding!

@stale stale bot added the stale Issues that haven't been active in a while label Aug 7, 2019
@stale
Copy link

stale bot commented Sep 6, 2019

This issues been automatically closed due to lack of activity. Feel free to re-open it if you ever come back to it.

@stale stale bot closed this as completed Sep 6, 2019
@andrykonchin
Copy link
Contributor

+1 for this feature

@Drenmi
Copy link
Collaborator

Drenmi commented May 4, 2023

Re-opening this, as I think it's still worth tracking.

@Drenmi Drenmi reopened this May 4, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request stale Issues that haven't been active in a while
Projects
None yet
Development

No branches or pull requests

4 participants