Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sub, sub!, gsub, and gsub! should set back references #34405

Merged
merged 4 commits into from Mar 28, 2019

Conversation

Projects
None yet
8 participants
@shugo
Copy link
Contributor

shugo commented Nov 8, 2018

Summary

ActiveSupport::SafeBuffer has a trap that back references are unavailable:

"foo123".html_safe.sub(/([a-z]+)([0-9]+)/) {
  $2 + $1 #=> undefined method `+' for nil:NilClass (NoMethodError)
}

This PR sets back references using Proc#binding.

@rails-bot

This comment has been minimized.

Copy link

rails-bot commented Nov 8, 2018

Thanks for the pull request, and welcome! The Rails team is excited to review your changes, and you should hear from @kamipo (or someone else) soon.

If any changes to this PR are deemed necessary, please add them as extra commits. This ensures that the reviewer can see what has changed since they last reviewed the code. Due to the way GitHub handles out-of-date commits, this should also make it reasonably obvious what issues have or haven't been addressed. Large or tricky changes may require several passes of review and changes.

This repository is being automatically checked for code quality issues using Code Climate. You can see results for this analysis in the PR status below. Newly introduced issues should be fixed before a Pull Request is considered ready to review.

Please see the contribution instructions for more information.

@rails-bot rails-bot bot added the activesupport label Nov 8, 2018

@fxn

This comment has been minimized.

Copy link
Member

fxn commented Nov 8, 2018

There is a technical difficulty here in that, despite the leading dollar, $1 and friends are lexically scoped.

Existing discussions: #10598, #33395, #1555.

@shugo

This comment has been minimized.

Copy link
Contributor Author

shugo commented Nov 8, 2018

@fxn

There is a technical difficulty here in that, despite the leading dollar, $1 and friends are lexically scoped.

Yes, that's why Proc#binding is used in this PR.
Proc#binding represents the context of a block, and Binding#eval evaluates code in the context.

@fxn

This comment has been minimized.

Copy link
Member

fxn commented Nov 8, 2018

@shugo Oh! Sorry, totally missed this was a PR instead of an issue, and also missed it was you, who of course is aware of my remark.

@amatsuda

This comment has been minimized.

Copy link
Member

amatsuda commented Nov 8, 2018

Huge 👍 from me.
This elegantly solves the well-known Rails pitfall with a very simple Ruby magic. This may be the best PR that I saw this year!

@shugo

This comment has been minimized.

Copy link
Contributor Author

shugo commented Nov 8, 2018

@fxn

@shugo Oh! Sorry, totally missed this was a PR instead of an issue, and also missed it was you, who of course is aware of my remark.

No problem. I found the following article in Issue #10598 you mentioned:
https://makandracards.com/makandra/11171-how-to-fix-gsub-on-safebuffer-objects

It seems that I'm one or more laps behind.
Thanks anyway.

@matthewd

This comment has been minimized.

Copy link
Member

matthewd commented Nov 8, 2018

Does this have a race condition if block is stored in a [possibly global / shared across threads] variable, and is not a literal?

It sounds like two threads could each call set_block_back_references, with the same block, before they both then hit block.call. Does $~'s scoping magic make that okay?

@shugo

This comment has been minimized.

Copy link
Contributor Author

shugo commented Nov 8, 2018

@matthewd

Does this have a race condition if block is stored in a [possibly global / shared across threads] variable, and is not a literal?

It sounds like two threads could each call set_block_back_references, with the same block, before they both then hit block.call. Does $~'s scoping magic make that okay?

Two threads can call set_block_back_references with the same block, but there is no condition because:

  1. There is no context switch in set_block_back_references.
  2. set_block_back_references temporally stores match_data in a thread/fiber-local storage (Thread.current[:__active_support_safe_buffer_backref]).
  3. The variable binding of $_ is different among each call of block in the same way as the variable binding of a local variable is different among each call of a method.
@shugo

This comment has been minimized.

Copy link
Contributor Author

shugo commented Nov 8, 2018

  1. There is no context switch in set_block_back_references.

Ah, this is wrong, but I believe 2 and 3 are enough.

@shugo

This comment has been minimized.

Copy link
Contributor Author

shugo commented Nov 8, 2018

  1. The variable binding of $_ is different among each call of block in the same way as the variable binding of a local variable is different among each call of a method.

I realized that this doesn't apply to orphan Proc objects.
For example, the following program works as expected:

require "active_support"
require "active_support/core_ext"

def run
  block = Proc.new {
    10.times do
      p $~
      sleep(0.1)
    end
    $&.upcase
  }

  t = Thread.start {
    p "bar".html_safe.sub(/\w+/, &block)
  }
  p "baz".html_safe.sub(/\w+/, &block)
  t.join
end

run

However, the following program doesn't work:

require "active_support"
require "active_support/core_ext"

def make_block
  Proc.new {
    10.times do
      p $~
      sleep(0.1)
    end
    $&.upcase
  }
end

block = make_block
t = Thread.start {
  p "bar".html_safe.sub(/\w+/, &block)
}
p "baz".html_safe.sub(/\w+/, &block) #=> "BAR"!!!
t.join

In the latter case, block is orphan and the same variable binding of $_ is closed in block.

However, String#sub does not support back references in orphan blocks either.

def make_block
  Proc.new {
    10.times do
      p $~ #=> nil
      sleep(0.1)
    end
    $&.upcase #=> undefined method `upcase' for nil:NilClass (NoMethodError)
  }
end

block = make_block
t = Thread.start {
  p "bar".sub(/\w+/, &block)
}
p "baz".sub(/\w+/, &block)
t.join
@fxn

This comment has been minimized.

Copy link
Member

fxn commented Nov 9, 2018

@shugo what is an orphaned block? Are there different scoping rules or something that explain the difference in behavior?

@shugo

This comment has been minimized.

Copy link
Contributor Author

shugo commented Nov 9, 2018

@fxn

what is an orphaned block? Are there different scoping rules or something that explain the difference in behavior?

If a method call which created a block has already been returned, such a block is called an orphan block.
But I was wrong. The difference in behavior is not directly related to orphan blocks.

  1. The variable binding of $_ is different among each call of block in the same way as the variable binding of a local variable is different among each call of a method.

This is wrong.

Basically, the variable binding of $_ is same if block is same.
However, $_ in the root environment of a thread has a different binding in CRuby.
If block and a thread are created in the same context, $_ of block is resolved to $_ of the root environment of that thread.

In the following code, svar means special variables such as $_, and ec->root_lep means root local environment pointer of a thread, and ec->root_svar means root specials variables of a thread.

static inline struct vm_svar *
lep_svar(const rb_execution_context_t *ec, const VALUE *lep)
{
    VALUE svar;

    if (lep && (ec == NULL || ec->root_lep != lep)) {
        svar = lep[VM_ENV_DATA_INDEX_ME_CREF];
    }
    else {
        svar = ec->root_svar;
    }

    VM_ASSERT(svar == Qfalse || vm_svar_valid_p(svar));

    return (struct vm_svar *)svar;
}

It seems that JRuby doesn't have this feature.

Anyway, use of $_ in the same block is not thread safe, but it's a problem of Ruby itself, and String#sub has the same issue.
The best solution is a new version of String#sub which passes MatchData as a block parameter: https://bugs.ruby-lang.org/issues/12745

@shugo

This comment has been minimized.

Copy link
Contributor Author

shugo commented Nov 9, 2018

Anyway, use of $_ in the same block is not thread safe, but it's a problem of Ruby itself, and String#sub has the same issue.

Strictly speaking, it's not exactly the same issue.
String#sub sets $_ in the context where it's called, not in the context of a given block, so $_ is unavailable if the block is created in a different context.

The summary of thread safety of $_ is below:

CRuby:

method name block and thread in the same context block and thread in a different context
String#sub OK (thread safe) NG ($_ is unavaiable)
SafeBuffer#sub OK (thread safe) NG (not thread safe)

JRuby:

method name block and thread in the same context block and thread in a different context
String#sub NG (not thread safe) NG ($_ is unavaiable)
SafeBuffer#sub NG (not thread safe) NG (not thread safe)
@shugo

This comment has been minimized.

Copy link
Contributor Author

shugo commented Dec 12, 2018

@kamipo @matthewd Do you still have any worries?

private

def html_escape_interpolated_argument(arg)
(!html_safe? || arg.html_safe?) ? arg : CGI.escapeHTML(arg.to_s)
end

def set_block_back_references(block, match_data)
Thread.current[:__active_support_safe_buffer_backref] = match_data

This comment has been minimized.

Copy link
@nobu

nobu Feb 13, 2019

Contributor

block.binding.eval("proc {|m| $~ = m}").call(match_data) could eliminate the thread local variable.

This comment has been minimized.

Copy link
@shugo

shugo Feb 14, 2019

Author Contributor

@nobu Thanks for your suggestion. I've fixed it.

@kamipo

kamipo approved these changes Mar 27, 2019

@rafaelfranca rafaelfranca requested a review from matthewd Mar 28, 2019

@matthewd

This comment has been minimized.

Copy link
Member

matthewd commented Mar 28, 2019

Does the proc version change any of the table in #34405 (comment)?

I think this is the situation I was still worried about:

def run
  block = Proc.new {
    n = $&.to_i
    if n > 1
      p "987654321".sub(/#{n - 1}/, &block)
    end
    "(#{$&}#{n})"
  }

  p "987654321".sub(/9/, &block)
end

run

... which doesn't actually work as I thought it would, even in plain Ruby. So not a worry after all. 😄

@matthewd matthewd merged commit 78ace9c into rails:master Mar 28, 2019

2 checks passed

codeclimate All good!
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
@shugo

This comment has been minimized.

Copy link
Contributor Author

shugo commented Mar 28, 2019

@matthewd
You are right. It doesn't work in plain Ruby either.

Thank you all!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.