Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement RubyDateParser in Java #4635

Merged
merged 7 commits into from Jun 13, 2017

Conversation

@muga
Copy link
Contributor

@muga muga commented May 30, 2017

This PR introduces RubyDateParser in Java to improve the performance for strptime. We discussed about this implementation on #4591.

@muga muga force-pushed the muga:implement_rubydateparser_in_java branch from 8551d09 to 745b70f May 30, 2017
@headius
Copy link
Member

@headius headius commented May 30, 2017

@muga The failures in Travis appear to be related to your changes. Can you look into them?

To reproduce the MRI failure, run the associated file like this:

$ jruby test/mri/runner.rb <path to file> -n <failing test>

The failure in the "-Ptest" suite doesn't appear to be related.

@muga
Copy link
Contributor Author

@muga muga commented May 31, 2017

Sure, I will take a look later.

@muga
Copy link
Contributor Author

@muga muga commented Jun 7, 2017

I found a bug in our strptime implementation. Will fix it.

@muga
Copy link
Contributor Author

@muga muga commented Jun 8, 2017

Since cwyear, year, sec_fraction, seconds and _cent are variable-length integers, their data types need to be changed with BigInteger.

@muga
Copy link
Contributor Author

@muga muga commented Jun 8, 2017

I need to care of taint mark in objects.

@headius
Copy link
Member

@headius headius commented Jun 8, 2017

Don't sweat the tainting too much, but if we regressed we should at least try. I would be surprised if tainting and trusting survive into Ruby 3.

@muga muga force-pushed the muga:implement_rubydateparser_in_java branch from 2a4e3fb to 886deb6 Jun 8, 2017
@muga
Copy link
Contributor Author

@muga muga commented Jun 8, 2017

I did git rebase master and push -f to pass CI tests.

@kares
Copy link
Member

@kares kares commented Jun 8, 2017

this should now pass 💚

those BigInteger pieces look ugly-sh ... guess we need them to pass the specs 🐼
probably common for sec/sec_fract but even for year/cent ... crazy, who would use that 🙈

Copy link
Member

@headius headius left a comment

This looks great! Can you add some documentation -- perhaps in the .flex file -- explaining how to generate the lexer?

@enebo
Copy link
Member

@enebo enebo commented Jun 8, 2017

@headius if this is documented for this or incorporated into the Maven part of the buid then we should also hook up:

./core/src/main/java/org/jruby/lexer/StrftimeLexer.flex
./core/src/main/java/org/jruby/lexer/JavaSignatureLexer.flex

I can tell at one point these were in our build (see evidence of jflex.bin but no references). Obviously we can manually generate these like our main parser but jflex is something I am positive maven can generate.

@headius
Copy link
Member

@headius headius commented Jun 8, 2017

@enebo I don't have any problem with generating the source every time, except that if we want these sources in javadoc or available for browsing online it would be preferable to regenerate them infrequently. I doubt anyone cares about the javadoc or source for these though.

It would, however, be very nice if we at least had the option to generate them as a build target or tool script.

@enebo
Copy link
Member

@enebo enebo commented Jun 8, 2017

@headius yeah I don't care if it is hidden behind a -P or generated every time. I guess we have two other lexers that use jflex so +1 to hooking it up in maven somewhere...

@headius headius merged commit 616daac into jruby:master Jun 13, 2017
1 check passed
1 check passed
continuous-integration/travis-ci/pr The Travis CI build passed
Details
@headius
Copy link
Member

@headius headius commented Jun 13, 2017

@muga If you get around to writing up some docs, let us know (or if you'd like to help us address #4665).

@headius headius added this to the JRuby 9.1.12.0 milestone Jun 13, 2017
@enebo
Copy link
Member

@enebo enebo commented Jun 13, 2017

Is there any benchmarking on this yet?

@muga
Copy link
Contributor Author

@muga muga commented Jun 14, 2017

@headius I will take it soon.

@enebo Not yet. let me try that as much as possible.

@muga muga changed the title [WIP] Implement RubyDateParser in Java Implement RubyDateParser in Java Jun 14, 2017
@muga
Copy link
Contributor Author

@muga muga commented Jun 14, 2017

Removed "[WIP]" from the title of this PR but, too late..

@muga
Copy link
Contributor Author

@muga muga commented Jun 17, 2017

@headius I filed #4674 and added how to generate .java file on the .flex file. Please take a look when you have time.

@muga
Copy link
Contributor Author

@muga muga commented Jun 17, 2017

@enebo I did check the performance of this Java strptime by micro bench based on https://github.com/macournoyer/tinyrb.

This Java implementation is basically faster than the current version (9.1.9.0) of strptime. But, it seems that it depends on the complexity of the format specified by an user.

If you have any questions, please let me know.

1,000,000 times strptime('Thu Jul 29 16:39:41 EST 1999', '%a %b %d %H:%M:%S %Z %Y')
                     user     system      total        real
mri 2.4.1        0.010000   0.000000   0.770000 (  0.967933)
jruby 9.1.9.0    0.010000   0.010000  36.310000 ( 32.148263)
jruby (master)   0.000000   0.000000  13.350000 (  7.906198)

1,000,000 times strptime('2001-02-03T23:59:60', '%Y-%m-%dT%H:%M:%S')
                     user     system      total        real
mri 2.4.1        0.000000   0.000000   0.290000 (  0.332614)
jruby 9.1.9.0    0.030000   0.000000  23.330000 ( 17.393525)
jruby (master)   0.010000   0.000000  12.430000 (  7.507632)

1,000,000 times strptime('2001-02-03', '%Y-%m-%d')
                     user     system      total        real
mri 2.4.1        0.010000   0.000000   0.200000 (  0.230139)
jruby 9.1.9.0    0.000000   0.000000  18.390000 ( 11.724120)
jruby (master)   0.010000   0.010000  11.370000 (  6.513374)

The raw data:

$ bench/run
======================environment======================
mri
  command: ~/.rbenv/versions/2.4.1/bin/ruby
  version: ruby 2.4.1p111 (2017-03-22 revision 58053) [x86_64-darwin16]
jruby9.1.9.0
  command: ~/.rbenv/versions/jruby-9.1.9.0/bin/ruby
  version: jruby 9.1.9.0 (2.3.3) 2017-05-15 28aa830 Java HotSpot(TM) 64-Bit Server VM 25.131-b11 on 1.8.0_131-b11 +jit [darwin-x86_64]
jruby (master)
  command: /Users/muga/works/sources/jruby-jruby/bin/ruby
  version: jruby 9.1.11.0-SNAPSHOT (2.3.3) 2017-05-26 83098f6 Java HotSpot(TM) 64-Bit Server VM 25.131-b11 on 1.8.0_131-b11 +jit [darwin-x86_64]
===========bench/strptime_a_b_d_H:M:S_Z_Y.rb===========
Rehearsal --------------------------------------------------
mri 2.4.1        0.580000   0.020000   1.390000 (  0.965282)
jruby 9.1.9.0    0.020000   0.010000  32.490000 ( 27.140128)
jruby (master)   0.020000   0.000000  12.950000 (  8.379499)
---------------------------------------- total: 46.830000sec

                     user     system      total        real
mri 2.4.1        0.010000   0.000000   0.770000 (  0.967933)
jruby 9.1.9.0    0.010000   0.010000  36.310000 ( 32.148263)
jruby (master)   0.000000   0.000000  13.350000 (  7.906198)
================bench/strptime_Y-m-d.rb================
Rehearsal --------------------------------------------------
mri 2.4.1        0.000000   0.010000   0.230000 (  0.259136)
jruby 9.1.9.0    0.010000   0.000000  17.650000 ( 11.693753)
jruby (master)   0.000000   0.000000  11.370000 (  6.598751)
---------------------------------------- total: 29.250000sec

                     user     system      total        real
mri 2.4.1        0.010000   0.000000   0.200000 (  0.230139)
jruby 9.1.9.0    0.000000   0.000000  18.390000 ( 11.724120)
jruby (master)   0.010000   0.010000  11.370000 (  6.513374)
=============bench/strptime_Y-m-dTH:M:S.rb=============
Rehearsal --------------------------------------------------
mri 2.4.1        0.000000   0.000000   0.260000 (  0.332353)
jruby 9.1.9.0    0.080000   0.000000  23.090000 ( 18.208028)
jruby (master)   0.010000   0.010000  13.000000 (  7.417241)
---------------------------------------- total: 36.350000sec

                     user     system      total        real
mri 2.4.1        0.000000   0.000000   0.290000 (  0.332614)
jruby 9.1.9.0    0.030000   0.000000  23.330000 ( 17.393525)
jruby (master)   0.010000   0.000000  12.430000 (  7.507632)
@enebo
Copy link
Member

@enebo enebo commented Jun 17, 2017

The speed up is great. Do you know why MRI is so much faster though? Even with changing this to Java and we are an order of magnitude slower?

@muga
Copy link
Contributor Author

@muga muga commented Jun 18, 2017

@enebo Since MRI's Date._strptime is all native code, MRI is much faster. I did check the performance by method call itegrations basis. With more method call iterations, Java strptime is more faster. It might be improved by JIT compiler.

1,000 times Date._strptime('2001-02-03', '%Y-%m-%d')
                     user     system      total        real
mri 2.4.1        0.000000   0.000000   0.110000 (  0.261414)
jruby 9.1.9.0    0.000000   0.000000   8.840000 (  4.920246)
jruby (master)   0.000000   0.000000   7.410000 (  3.804355)

10,000 times Date._strptime('2001-02-03', '%Y-%m-%d')
                     user     system      total        real
mri 2.4.1        0.000000   0.000000   0.080000 (  0.082622)
jruby 9.1.9.0    0.000000   0.000000  12.260000 (  5.507443)
jruby (master)   0.000000   0.000000   8.500000 (  4.096224)

100,000 times Date._strptime('2001-02-03', '%Y-%m-%d')
                     user     system      total        real
mri 2.4.1        0.000000   0.000000   0.220000 (  0.252380)
jruby 9.1.9.0    0.000000   0.000000  18.020000 ( 11.152461)
jruby (master)   0.000000   0.000000  12.880000 (  6.405134)

1,000,000 times Date._strptime('2001-02-03', '%Y-%m-%d')
                     user     system      total        real
mri 2.4.1        0.000000   0.000000   1.150000 (  1.177237)
jruby 9.1.9.0    0.000000   0.000000  56.590000 ( 50.715796)
jruby (master)   0.000000   0.000000  18.680000 ( 13.563366)

10,000,000 times Date._strptime('2001-02-03', '%Y-%m-%d')
                     user     system      total        real
mri 2.4.1        0.000000   0.000000  11.090000 ( 11.629846)
jruby 9.1.9.0    0.000000   0.000000 467.210000 (476.611497)
jruby (master)   0.000000   0.000000  94.060000 ( 93.278737)

The raw data: https://gist.github.com/muga/7b61e9284841c222275f9636dd9b7808

@enebo
Copy link
Member

@enebo enebo commented Jun 19, 2017

In general, I very very rarely find C an order of a magnitude faster than Java. Usually they are comparable (notable exception is penalty of transcoding internally to/from UTF16-LE in Java which then can drop performance in 1/2 or a little more). Actually, I did not review this so I did not notice it uses java.lang.String for parsing/lexing. That is one 2+x speed issue so we can entertain using ByteList and a Stream over a Reader.

I just realized you are opening up some more work on this but I will also look at some profiling and see if we can get some more wins out of this. Part of me anticipates being able to beat MRI for this...I am optimistic! :)

@enebo
Copy link
Member

@enebo enebo commented Jun 19, 2017

Ah just a comment since I did not realize this last night...this does actually perform Ruby per call whereas MRI does do this all in C. So that will be some penalty as well.

@muga
Copy link
Contributor Author

@muga muga commented Jun 19, 2017

@enebo @kares Thank you for improving it. I built 65e1495 and measured the performance with my benchmark again.

1,000 times Date._strptime('2001-02-03', '%Y-%m-%d')
                     user     system      total        real
mri 2.4.1        0.000000   0.000000   0.060000 (  0.077963)
jruby 9.1.9.0    0.000000   0.000000   8.010000 (  4.363465)
jruby (master)   0.000000   0.000000   6.970000 (  4.222631)

10,000 times Date._strptime('2001-02-03', '%Y-%m-%d')
                     user     system      total        real
mri 2.4.1        0.000000   0.000000   0.070000 (  0.104304)
jruby 9.1.9.0    0.000000   0.000000  12.020000 (  6.102424)
jruby (master)   0.000000   0.000000   7.520000 (  3.981396)

100,000 times Date._strptime('2001-02-03', '%Y-%m-%d')
                     user     system      total        real
mri 2.4.1        0.000000   0.000000   0.190000 (  0.225209)
jruby 9.1.9.0    0.000000   0.000000  17.780000 ( 12.446274)
jruby (master)   0.000000   0.010000   9.340000 (  5.049573)

1,000,000 times Date._strptime('2001-02-03', '%Y-%m-%d')
                     user     system      total        real
mri 2.4.1        0.000000   0.000000   1.200000 (  1.367820)
jruby 9.1.9.0    0.000000   0.000000  62.540000 ( 66.204895)
jruby (master)   0.000000   0.000000  14.170000 ( 11.629003)

10,000,000 times Date._strptime('2001-02-03', '%Y-%m-%d')
                     user     system      total        real
mri 2.4.1        0.000000   0.000000  11.790000 ( 13.940100)
jruby 9.1.9.0    0.000000   0.000000 527.610000 (652.799233)
jruby (master)   0.000000   0.000000  54.140000 ( 63.306614)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked issues

Successfully merging this pull request may close these issues.

None yet

4 participants