Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve split #138

Merged
merged 2 commits into from
Jun 27, 2022
Merged

Improve split #138

merged 2 commits into from
Jun 27, 2022

Conversation

sjamesr
Copy link
Contributor

@sjamesr sjamesr commented May 31, 2021

Omit leading empty matches from Pattern.split, improve performance
Fixes #131.

This change modifies Pattern.split to omit a leading empty match. This
behavior was specified in JDK8 and brings RE2/J split into line with
more recent JDK implementations.

Furthermore, the split function no longer needs determine the number of
matches before assembling the result. The upshot is that the number of
find() calls is halved in many cases. The benchmark in the previous
change shows a significant improvement.

Reference impl (JDK):
BenchmarkSplit.benchmarkSplit     JDK  avgt    5  14.217 ± 0.410  us/op

RE2J (before):
BenchmarkSplit.benchmarkSplit    RE2J  avgt    5  95.807 ± 6.737  us/op

RE2J (after):
BenchmarkSplit.benchmarkSplit    RE2J  avgt    5  49.092 ± 0.717  us/op

@google-cla google-cla bot added the cla: yes label May 31, 2021
@sjamesr sjamesr force-pushed the improve_split branch 2 times, most recently from 79203fc to 354f528 Compare May 31, 2021 07:50
@sjamesr sjamesr force-pushed the improve_split branch 4 times, most recently from 6e16977 to c1aec38 Compare June 2, 2021 04:05
@codecov-commenter
Copy link

codecov-commenter commented Jun 2, 2021

Codecov Report

Merging #138 (b598dec) into master (49b30f4) will decrease coverage by 0.02%.
The diff coverage is 95.83%.

@@            Coverage Diff             @@
##           master     #138      +/-   ##
==========================================
- Coverage   89.23%   89.20%   -0.03%     
==========================================
  Files          19       19              
  Lines        3027     3029       +2     
  Branches      609      612       +3     
==========================================
+ Hits         2701     2702       +1     
  Misses        187      187              
- Partials      139      140       +1     
Impacted Files Coverage Δ
java/com/google/re2j/Pattern.java 80.00% <95.83%> (-0.77%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 49b30f4...b598dec. Read the comment docs.

Fixes google#131.

This change modifies Pattern.split to omit a leading empty match. This
behavior was specified in JDK8 and brings RE2/J split into line with
more recent JDK implementations.

Furthermore, the split function no longer needs determine the number of
matches before assembling the result. The upshot is that the number of
find() calls is halved in many cases. The benchmark in the previous
change shows a significant improvement.

Reference impl (JDK):
BenchmarkSplit.benchmarkSplit     JDK  avgt    5  14.217 ± 0.410  us/op

RE2J (before):
BenchmarkSplit.benchmarkSplit    RE2J  avgt    5  95.807 ± 6.737  us/op

RE2J (after):
BenchmarkSplit.benchmarkSplit    RE2J  avgt    5  49.092 ± 0.717  us/op
@sjamesr sjamesr merged commit 7bf197f into google:master Jun 27, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Incorrect behavior for Pattern::split
2 participants