Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use string index in String #sub/#gsub when String pattern passed without creating a Regexp #5952

Merged
merged 2 commits into from Nov 11, 2019

Conversation

fidothe
Copy link
Contributor

@fidothe fidothe commented Oct 30, 2019

This PR addresses #5905. It's still pretty naive, and I'm really not a Java programmer, so there's probably a few dumb mistakes and style things aside from the big MatchData problem.

The big known problem remaining is how to populate the $~ MatchData. In MRI the MatchData from a string-only String#sub or #gsub will return a Regexp from MatchData#regexp. I don't think we currently have a way of doing that, so that means changes to MatchData.

MRI is clearly doing something odd here, because if you call #inspect on the MatchData returned from a string pattern #sub, it's in a different form to what you get from invoking it with a Regexp pattern (this from MRI 2.6.3):

~ $ irb
irb(main):001:0> "a".sub("a", "b")
=> "b"
irb(main):002:0> $~
=> #<MatchData: a>
irb(main):003:0> "a".sub(/a/, "b")
=> "b"
irb(main):004:0> $~
=> #<MatchData "a">

Once I know what the best approach to handling that is, I can fill the hole in the #sub implementation and #gsub's returned MatchDatas will behave as expected.

@fidothe
Copy link
Contributor Author

@fidothe fidothe commented Oct 31, 2019

Question for @enebo / @headius. We're failing 4 Rubyspec specs for #sub and 4 for #gsub. (They're failing for me on master too, so it looks like no new problems there.) Are those failures expected?

@fidothe
Copy link
Contributor Author

@fidothe fidothe commented Oct 31, 2019

I can provide a gsubFast method overload that takes a RubyString and not a RubyRegexp. Is that worth doing for symmetry, will anyone use it? (It's only user is in RubyDate, where it is very definitely using a Regexp.)

@fidothe
Copy link
Contributor Author

@fidothe fidothe commented Oct 31, 2019

(Also, I'm going out on a limb and assuming that we don't want to replicate the inconsistency in output from MRI's MatchData#inspect)

fidothe added 2 commits Nov 8, 2019
`String#sub` and `#gsub` called with a String arg should use string indexing
and not compile a Regexp.

This requires:

* Constructing MatchData instances appropriately so that `$~` et al get
populated correctly (but only in those situations where it would
happen).

* Using the `Regexp` static method `regsub()` to perform the replacement
string parsing so that special sequences (`\0`, `\'`, etc) are correctly
expanded and inserted.

@enebo wrote a new `StringSupport.index()` method that makes the
position indexing easier, which we use here.

As a style refactor, the `gsubCommon19()` methods were renamed
`gsubCommon()`, since there's no 1.9 distinction anymore and the methods
are all private.
@fidothe fidothe force-pushed the use-string-matching-5905 branch from 68e7c90 to 33bb5ff Compare Nov 11, 2019
@fidothe fidothe changed the title WIP: Use string index in String #sub/#gsub when String pattern passed without creating a Regexp Use string index in String #sub/#gsub when String pattern passed without creating a Regexp Nov 11, 2019
@fidothe
Copy link
Contributor Author

@fidothe fidothe commented Nov 11, 2019

Squashed the code changes down, left the new benchmark as its own commit to make before/after running easier for others.

Benchmark results before/after from my machine https://gist.github.com/fidothe/7d76f60cf2adbb8a2aab8adf21f40192

@fidothe
Copy link
Contributor Author

@fidothe fidothe commented Nov 11, 2019

As I said before, this is the first Java code I've written, so I imagine it's full of style problems. (I found redundant casting and stuff, but assume there's more stuff like that I didn't recognise.) Let me know if there's stuff you want changed (and why, so I learn...) and I will happily oblige.

@enebo enebo added this to the JRuby 9.2.10.0 milestone Nov 11, 2019
@enebo enebo merged commit 0e4a2a6 into jruby:master Nov 11, 2019
5 checks passed
@enebo
Copy link
Member

@enebo enebo commented Nov 11, 2019

@fidothe Awesome job! Getting some more perf out of a common method hopefully is a pretty satisfying start in hacking JRuby.

As far as style I made a few very minor changes: 457d12b. Unfortunately, RubyString is not so consistent from a single-style perspective. Some people more heavily favor marking anything which will not change as final whereas others (myself included) feel final is more useful as a marker when that finalness important or aids in helping signal intent. We have some other oddities in this file like following MRI naming vs using more obvious full word variable names.

There was 3 places with the else was not joined with the previous { in that commit above. For conditionals we use {} always and they are on same line as if/else. For single line ifs they require {} if you want if and body on different lines or no {} if the body is on the same line as the if.

@kares
Copy link
Member

@kares kares commented Dec 5, 2019

Matt, your Java is very nice. Would certainly assumed you're an experienced programmer based on this!

@fidothe
Copy link
Contributor Author

@fidothe fidothe commented Dec 5, 2019

Matt, your Java is very nice. Would certainly assumed you're an experienced programmer based on this!

Aw, thanks 😊 @kares

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants