Support for String#scrub #2912

Merged
merged 2 commits into from Mar 10, 2014

3 participants

@YorickPeterse
Rubinius member

Note that this is currently still a work in progress so expect plenty of Git rebases.

In the current implementation it uses Encoding::Converter (thanks to @headius for that one) which isn't exactly webscale. The options we have are either tuning the Converter API or moving the code to C++. Another option is to see if we can port over MRI's C logic to pure Ruby, but I'm not sure how well that would work.

@YorickPeterse
Rubinius member

Related issue: #2901

@YorickPeterse
Rubinius member

Corresponding MRI code can be found here: https://github.com/ruby/ruby/blob/trunk/string.c#L8022

@YorickPeterse
Rubinius member

@brixen / @dbussink So in the current setup this uses the Encoding conversion API cooked up mostly by @headius. Having ran some basic I noticed that this API is waaaay slower than String#scrub in MRI. Are there any alternative APIs that we can use, or do we have to resort to using C++ for this?

@YorickPeterse
Rubinius member

To clarify, it would be nice if we could keep this in Ruby (both because it's Ruby and for future references for other implementations). Having said that, if performance is an issue I can understand the need to move it to C++.

@YorickPeterse
Rubinius member

Actually that implementation is quite different compared to what MRI does (as stated in the README). Some basic benchmarks also showed that it was even slower than the current implementation we have here. It was however a source of information on the behaviour of this method when I initially started looking into it.

@headius headius added a commit to jruby/jruby that referenced this pull request Mar 5, 2014
@headius headius Implement String#scrub (issues remain in jcodings).
* From my impl (modified) at rubinius/rubinius#2912
* Fails due to ArrayStoreException in jcodings
b287f70
YorickPeterse added some commits Jan 28, 2014
@YorickPeterse YorickPeterse Experimental String#scrub with Encoding::Converter
Massive thanks to @headius for coming up with the idea of using
Encoding::Converter and providing a proof of concept. I mainly adapted this for
Rbx with some minor changes.

Note that this particular implementation is *really* slow compared to MRI, it
is however faster than the String#chars version used before. Although I'd
prefer not to we might want to move this over to C++ land if high performance
is required.
e467bb8
@YorickPeterse YorickPeterse Added specs for String#scrub. c0ae661
@headius headius added a commit to jruby/jruby that referenced this pull request Mar 8, 2014
@headius headius Implement String#scrub (issues remain in jcodings).
* From my impl (modified) at rubinius/rubinius#2912
* Fails due to ArrayStoreException in jcodings
29d8a05
@brixen brixen merged commit 49b6cb9 into master Mar 10, 2014

1 check passed

Details default The Travis CI build passed
@YorickPeterse YorickPeterse deleted the string-scrub branch Mar 10, 2014
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment