Named captures syntax in NodePattern #6724

baweaver · 2019-02-03T07:27:19Z

Describe the solution you'd like

Currently captures are returned in an Array in the order that they're processed walking the tree.

As this syntax feels loosely based on Regex, a named capture would potentially be within the realm of features such a language could express:

def parse(s)
  RuboCop::ProcessedSource.new(s, RUBY_VERSION.to_f)
end

def pattern(s)
  NodePattern.new(s)
end

pattern('(send $<a>(...) :+ $<b>(...))').match(parse('1 + 2').ast)
# => { a: 1, b: 2 }

I believe this would add to the expressiveness of the language, and make code-rewriting based on the results of a captured match easier to work with.

Psuedo-Implementation

I'd made a quick pass at implementation of this, not working with the actual source:

require 'rubocop'

class MetaNodePattern
  def initialize(string)
    @capture_tags = []

    ast_string = string.gsub(/\$<(.+?)>/) do |m|
      @capture_tags << m[2..-2].to_sym
      '$'
    end

    @pattern = RuboCop::NodePattern.new(ast_string)
  end

  def match(node)
    matches = @pattern.match(node)
    return unless matches

    @capture_tags.empty? ? matches : @capture_tags.zip(matches).to_h
  end
end

def parse(s)
  RuboCop::ProcessedSource.new(s, RUBY_VERSION.to_f)
end

def mp(s)
  MetaNodePattern.new(s)
end

meta_match = '(send $<a>(...) :+ $<b>(...))'
ast = parse('1 + 2').ast

p mp(meta_match).match(ast)
# => {:a=>s(:int, 1), :b=>s(:int, 2)}

I'd looked at the source defining the capture, but I'd have to read over it a few more times to really get a grip on how it'd be implemented.

Additional context

This idea is heavily inspired by named captures in Regex ( /(?<name>.+)/ ), and that idea has helped make more complicated Regex queries more expressive through giving names to various sections of captured content.

Potential Issues

Granted changing the syntax comes with a few potential issues.

Forked Return Types

By introducing this it would effectively fork the return from being solely {nil, Array[Node]} to {nil, Array[Node], Hash[Symbol, Node]} depending on the string inputted.

As it would be an additive type it should not affect current matches, mitigating some of the risk.

Mixed Captures

The first potential is dealing with a mix of named and unnamed captures:

mp '(send $<a>(...) :+ $(...))'

In this case I'd consider raising an exception, but don't have a good idea of how to best deal with it at the moment.

Other Issues

There are some minor other issues, but those are mostly due to the nature of hashes involving duplicate keys and validating the syntax, which is done for other constructs as is.

Thoughts?

I'd be curious to get people's thoughts on this. I believe that named captures would make the NodePattern language more expressive, and be a substantial win when dealing with cop clarity, especially around autocorrect code.

Thanks for reading!

The text was updated successfully, but these errors were encountered:

Drenmi · 2019-02-04T00:51:47Z

Hi, @baweaver!

Firstly, thank you for taking the time to think this through. The suggestion is clear and well worded. 🙇

I believe this would add to the expressiveness of the language, and make code-rewriting based on the results of a captured match easier to work with.

The benefit would be limited to the expression itself (since the result will either commonly be destructured or yielded to named variables), but there it could add a great deal of clarity.

The first potential is dealing with a mix of named and unnamed captures [...] In this case I'd consider raising an exception, but don't have a good idea of how to best deal with it at the moment.

Yes. I agree. Either all unnamed, or all named. 🙂

There are some minor other issues, but those are mostly due to the nature of hashes involving duplicate keys and validating the syntax, which is done for other constructs as is.

These are good considerations to make. Another one is that node matchers can yield to a block, e.g.:

my_matcher(node) do |capture|
  ...
end

but I think in the case of the extended syntax, the matcher could yield keyword arguments. WDYT?

I'd be curious to get people's thoughts on this.

I think it is a great suggestion! Would you be interested in working on this? 🙂

baweaver · 2019-02-04T01:13:45Z

These are good considerations to make. Another one is that node matchers can yield to a block [...]

I think in the case of the extended syntax, the matcher could yield keyword arguments. WDYT?

I've done similar things when destructuring hashes, and it makes for some nice syntax, especially considering you know exactly what keys are going to be present preventing any of the usual issues with key presence mismatches versus hashes:

node = parse '1 + 2'

NodePattern
  .new('(send $<a>(...) :+ $<b>(...))')
  .match(node) do |a:, b:|
    # a is 1, b is 2
  end

The benefit would be limited to the expression itself (since the result will either commonly be destructured or yielded to named variables), but there it could add a great deal of clarity.

One thing that I'd noticed for autocorrect is that if you want to extract the values you'd have to run the matcher twice to do so. Not sure if that's correct or not, but if this is circumvented it may be really useful.

I'd been using autocorrect with extracted s-expression trees to change code around, and named captures would make it clearer in those specific contexts:

module RuboCop
  module Cop
    module Tests
      class RailsActionCableWebsocket < RuboCop::Cop::Cop
        MSG = 'Deprecated!'

        # Original
        def_node_search :websocket_set, <<~AST
          (send
            (const nil? :ActionCable) :WebSocket=
            $(...))
        AST

        # Proposed
        def_node_search :websocket_set_alt, <<~AST
          (send
            (const nil? :ActionCable) :WebSocket=
            $<websocket_handler>(...))
        AST

        # Unchanged
        def on_send(node)
          matching_nodes = websocket_set(node)
          add_offense(node, location: :expression) if matching_nodes.any?
        end

        def autocorrect(node)
          lambda do |corrector|
            # Original
            matching_nodes = websocket_set(node)
            content = matching_nodes.first.source

            corrector.replace(
              node.loc.expression,
              "ActionCable.adapters.WebSocket = #{content}"
            )

            # Proposed
            matching_nodes = websocket_set_alt(node)
            content = matching_nodes[:websocket_handler].source

            corrector.replace(
              node.loc.expression,
              "ActionCable.adapters.WebSocket = #{content}"
            )
          end
        end
      end
    end
  end
end

For this specific code it's not a massive change, but it does make it clear exactly what content it is you're substituting in instead of relying on the index and giving it positional (connascence)[http://connascence.io/pages/about.html] (I still love Jim's talk on that concept).

I'd just be very careful not to let it go into method-based captures (e.g. matches.websocket_handler) as that's a good way to get all types of unintended collisions with the language itself.

I think it is a great suggestion! Would you be interested in working on this? 🙂

I'd certainly consider it, though I think my first priority in contribution may be more towards documenting the existing content and exposing how it works in more of a guide-based format.

It took me a fair amount of time to grok what was what and how exactly to do some of this, so I think there'd be substantial gain in focusing on that first. Noted that's a separate issue though, and I still have quite a bit to learn about how this all works. :)

Definitely up for chatting more, are you all active on the Gitter channel?

rrosenblum · 2019-02-05T16:10:44Z

Somewhat related to this. I have had some issues with the current implementation of named captures when using the same name multiple times across optional sections.

For example, (send _name :== _name) will only match when the value of _name is the same on both sides.

(done from memory so this may not show off the exact issue)
Something like

(send
  {
    (send _name :== _)
    (send _name :!= _)
  }
  :== _name)

may not wind up matching. My theory is that during a first pass, _name gets registered to a variable partially matching the pattern (foo == bar == baz). The value for name is not reset, or held onto, if the second part of the matcher is invoked, foo != bar == foo.

I bring this up because I assume the same area of the code will be touched when looking into this feature.

Drenmi · 2019-02-06T05:15:55Z

For example, (send _name :== _name) will only match when the value of _name is the same on both sides.

This is intentional behavior. It is mentioned in the code file as “unification”. 🙂

rrosenblum · 2019-02-06T14:09:32Z

I don't think I did the best job explaining the issue that I have run into. The functionality of matching named captures via (send _name :== _name) works great. I have run into issues with it when working with more complex patterns that have multiple options for a pattern.

Given the code foo == bar && foo == baz and the pattern (and (send $_name :== _) (send $_name :== _)), the pattern will only match when the left hand sides of both == methods are the same variable. This works exactly as I would expect, and want, it to.

However, given the code foo && foo == baz and the pattern

(and {    
      (send $_ name :!= _)    
      $_name    
     }    
     (send $_name :== _)    
)

This will fail to match the code even though we have a direct match in the second option of the left hand side of the && pattern.

For clarity, the following pattern will match the example code. This will force you to have to check that the variables match within the code rather than being able to handle it directly in pattern.

(and {    
      (send $_ :!= _)    
      $_    
     }    
     (send $_ :== _)    
)

It seems like when there are multiple parts to a pattern, the named captures are unable to hang onto multiple potential matches. My assumption is that when the first part of the group (send $_name :!= _) fails to match, it winds up registering _name as nil, or something similar, and prevents the more generic $_name from registering anything onto _name.

I hope this clarifies the issue that I was trying to convey. I realize that this is tangentially related to the issue being reported, and I will gladly open a new issue to move this conversation to.

stale · 2019-05-08T19:43:24Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contribution and understanding!

baweaver · 2019-05-09T05:41:11Z

I'll be looking more into this and a few ideas, though I'd noted the any order groupings taking <> so I might need to rethink that a bit.

stale · 2019-08-07T05:53:05Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contribution and understanding!

stale · 2019-09-06T06:07:51Z

This issues been automatically closed due to lack of activity. Feel free to re-open it if you ever come back to it.

andrykonchin · 2023-04-25T13:55:36Z

+1 for this feature

Drenmi · 2023-05-04T01:58:50Z

Re-opening this, as I think it's still worth tracking.

Drenmi added the feature request label Feb 5, 2019

stale bot added the stale Issues that haven't been active in a while label May 8, 2019

stale bot removed the stale Issues that haven't been active in a while label May 9, 2019

stale bot added the stale Issues that haven't been active in a while label Aug 7, 2019

stale bot closed this as completed Sep 6, 2019

Drenmi reopened this May 4, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Named captures syntax in NodePattern #6724

Named captures syntax in NodePattern #6724

baweaver commented Feb 3, 2019

Drenmi commented Feb 4, 2019

baweaver commented Feb 4, 2019

rrosenblum commented Feb 5, 2019

Drenmi commented Feb 6, 2019

rrosenblum commented Feb 6, 2019 •

edited

stale bot commented May 8, 2019

baweaver commented May 9, 2019

stale bot commented Aug 7, 2019

stale bot commented Sep 6, 2019

andrykonchin commented Apr 25, 2023

Drenmi commented May 4, 2023

Named captures syntax in NodePattern #6724

Named captures syntax in NodePattern #6724

Comments

baweaver commented Feb 3, 2019

Describe the solution you'd like

Psuedo-Implementation

Additional context

Potential Issues

Forked Return Types

Mixed Captures

Other Issues

Thoughts?

Drenmi commented Feb 4, 2019

baweaver commented Feb 4, 2019

rrosenblum commented Feb 5, 2019

Drenmi commented Feb 6, 2019

rrosenblum commented Feb 6, 2019 • edited

stale bot commented May 8, 2019

baweaver commented May 9, 2019

stale bot commented Aug 7, 2019

stale bot commented Sep 6, 2019

andrykonchin commented Apr 25, 2023

Drenmi commented May 4, 2023

rrosenblum commented Feb 6, 2019 •

edited