New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement RubyDateParser in Java #4635

Merged
merged 7 commits into from Jun 13, 2017

Conversation

Projects
None yet
4 participants
@muga
Contributor

muga commented May 30, 2017

This PR introduces RubyDateParser in Java to improve the performance for strptime. We discussed about this implementation on #4591.

@headius

This comment has been minimized.

Member

headius commented May 30, 2017

@muga The failures in Travis appear to be related to your changes. Can you look into them?

To reproduce the MRI failure, run the associated file like this:

$ jruby test/mri/runner.rb <path to file> -n <failing test>

The failure in the "-Ptest" suite doesn't appear to be related.

@muga

This comment has been minimized.

Contributor

muga commented May 31, 2017

Sure, I will take a look later.

@muga

This comment has been minimized.

Contributor

muga commented Jun 7, 2017

I found a bug in our strptime implementation. Will fix it.

@muga

This comment has been minimized.

Contributor

muga commented Jun 8, 2017

Since cwyear, year, sec_fraction, seconds and _cent are variable-length integers, their data types need to be changed with BigInteger.

@muga

This comment has been minimized.

Contributor

muga commented Jun 8, 2017

I need to care of taint mark in objects.

@headius

This comment has been minimized.

Member

headius commented Jun 8, 2017

Don't sweat the tainting too much, but if we regressed we should at least try. I would be surprised if tainting and trusting survive into Ruby 3.

@muga

This comment has been minimized.

Contributor

muga commented Jun 8, 2017

I did git rebase master and push -f to pass CI tests.

@kares

This comment has been minimized.

Member

kares commented Jun 8, 2017

this should now pass 💚

those BigInteger pieces look ugly-sh ... guess we need them to pass the specs 🐼
probably common for sec/sec_fract but even for year/cent ... crazy, who would use that 🙈

@headius

This looks great! Can you add some documentation -- perhaps in the .flex file -- explaining how to generate the lexer?

@enebo

This comment has been minimized.

Member

enebo commented Jun 8, 2017

@headius if this is documented for this or incorporated into the Maven part of the buid then we should also hook up:

./core/src/main/java/org/jruby/lexer/StrftimeLexer.flex
./core/src/main/java/org/jruby/lexer/JavaSignatureLexer.flex

I can tell at one point these were in our build (see evidence of jflex.bin but no references). Obviously we can manually generate these like our main parser but jflex is something I am positive maven can generate.

@headius

This comment has been minimized.

Member

headius commented Jun 8, 2017

@enebo I don't have any problem with generating the source every time, except that if we want these sources in javadoc or available for browsing online it would be preferable to regenerate them infrequently. I doubt anyone cares about the javadoc or source for these though.

It would, however, be very nice if we at least had the option to generate them as a build target or tool script.

@enebo

This comment has been minimized.

Member

enebo commented Jun 8, 2017

@headius yeah I don't care if it is hidden behind a -P or generated every time. I guess we have two other lexers that use jflex so +1 to hooking it up in maven somewhere...

@headius headius merged commit 616daac into jruby:master Jun 13, 2017

1 check passed

continuous-integration/travis-ci/pr The Travis CI build passed
Details
@headius

This comment has been minimized.

Member

headius commented Jun 13, 2017

@muga If you get around to writing up some docs, let us know (or if you'd like to help us address #4665).

@headius headius added this to the JRuby 9.1.12.0 milestone Jun 13, 2017

@enebo

This comment has been minimized.

Member

enebo commented Jun 13, 2017

Is there any benchmarking on this yet?

@muga

This comment has been minimized.

Contributor

muga commented Jun 14, 2017

@headius I will take it soon.

@enebo Not yet. let me try that as much as possible.

@muga muga changed the title from [WIP] Implement RubyDateParser in Java to Implement RubyDateParser in Java Jun 14, 2017

@muga

This comment has been minimized.

Contributor

muga commented Jun 14, 2017

Removed "[WIP]" from the title of this PR but, too late..

@enebo enebo modified the milestones: JRuby 9.1.12.0, JRuby 9.2.0.0 Jun 15, 2017

@muga

This comment has been minimized.

Contributor

muga commented Jun 17, 2017

@headius I filed #4674 and added how to generate .java file on the .flex file. Please take a look when you have time.

@muga

This comment has been minimized.

Contributor

muga commented Jun 17, 2017

@enebo I did check the performance of this Java strptime by micro bench based on https://github.com/macournoyer/tinyrb.

This Java implementation is basically faster than the current version (9.1.9.0) of strptime. But, it seems that it depends on the complexity of the format specified by an user.

If you have any questions, please let me know.

1,000,000 times strptime('Thu Jul 29 16:39:41 EST 1999', '%a %b %d %H:%M:%S %Z %Y')
                     user     system      total        real
mri 2.4.1        0.010000   0.000000   0.770000 (  0.967933)
jruby 9.1.9.0    0.010000   0.010000  36.310000 ( 32.148263)
jruby (master)   0.000000   0.000000  13.350000 (  7.906198)

1,000,000 times strptime('2001-02-03T23:59:60', '%Y-%m-%dT%H:%M:%S')
                     user     system      total        real
mri 2.4.1        0.000000   0.000000   0.290000 (  0.332614)
jruby 9.1.9.0    0.030000   0.000000  23.330000 ( 17.393525)
jruby (master)   0.010000   0.000000  12.430000 (  7.507632)

1,000,000 times strptime('2001-02-03', '%Y-%m-%d')
                     user     system      total        real
mri 2.4.1        0.010000   0.000000   0.200000 (  0.230139)
jruby 9.1.9.0    0.000000   0.000000  18.390000 ( 11.724120)
jruby (master)   0.010000   0.010000  11.370000 (  6.513374)

The raw data:

$ bench/run
======================environment======================
mri
  command: ~/.rbenv/versions/2.4.1/bin/ruby
  version: ruby 2.4.1p111 (2017-03-22 revision 58053) [x86_64-darwin16]
jruby9.1.9.0
  command: ~/.rbenv/versions/jruby-9.1.9.0/bin/ruby
  version: jruby 9.1.9.0 (2.3.3) 2017-05-15 28aa830 Java HotSpot(TM) 64-Bit Server VM 25.131-b11 on 1.8.0_131-b11 +jit [darwin-x86_64]
jruby (master)
  command: /Users/muga/works/sources/jruby-jruby/bin/ruby
  version: jruby 9.1.11.0-SNAPSHOT (2.3.3) 2017-05-26 83098f6 Java HotSpot(TM) 64-Bit Server VM 25.131-b11 on 1.8.0_131-b11 +jit [darwin-x86_64]
===========bench/strptime_a_b_d_H:M:S_Z_Y.rb===========
Rehearsal --------------------------------------------------
mri 2.4.1        0.580000   0.020000   1.390000 (  0.965282)
jruby 9.1.9.0    0.020000   0.010000  32.490000 ( 27.140128)
jruby (master)   0.020000   0.000000  12.950000 (  8.379499)
---------------------------------------- total: 46.830000sec

                     user     system      total        real
mri 2.4.1        0.010000   0.000000   0.770000 (  0.967933)
jruby 9.1.9.0    0.010000   0.010000  36.310000 ( 32.148263)
jruby (master)   0.000000   0.000000  13.350000 (  7.906198)
================bench/strptime_Y-m-d.rb================
Rehearsal --------------------------------------------------
mri 2.4.1        0.000000   0.010000   0.230000 (  0.259136)
jruby 9.1.9.0    0.010000   0.000000  17.650000 ( 11.693753)
jruby (master)   0.000000   0.000000  11.370000 (  6.598751)
---------------------------------------- total: 29.250000sec

                     user     system      total        real
mri 2.4.1        0.010000   0.000000   0.200000 (  0.230139)
jruby 9.1.9.0    0.000000   0.000000  18.390000 ( 11.724120)
jruby (master)   0.010000   0.010000  11.370000 (  6.513374)
=============bench/strptime_Y-m-dTH:M:S.rb=============
Rehearsal --------------------------------------------------
mri 2.4.1        0.000000   0.000000   0.260000 (  0.332353)
jruby 9.1.9.0    0.080000   0.000000  23.090000 ( 18.208028)
jruby (master)   0.010000   0.010000  13.000000 (  7.417241)
---------------------------------------- total: 36.350000sec

                     user     system      total        real
mri 2.4.1        0.000000   0.000000   0.290000 (  0.332614)
jruby 9.1.9.0    0.030000   0.000000  23.330000 ( 17.393525)
jruby (master)   0.010000   0.000000  12.430000 (  7.507632)
@enebo

This comment has been minimized.

Member

enebo commented Jun 17, 2017

The speed up is great. Do you know why MRI is so much faster though? Even with changing this to Java and we are an order of magnitude slower?

@muga

This comment has been minimized.

Contributor

muga commented Jun 18, 2017

@enebo Since MRI's Date._strptime is all native code, MRI is much faster. I did check the performance by method call itegrations basis. With more method call iterations, Java strptime is more faster. It might be improved by JIT compiler.

1,000 times Date._strptime('2001-02-03', '%Y-%m-%d')
                     user     system      total        real
mri 2.4.1        0.000000   0.000000   0.110000 (  0.261414)
jruby 9.1.9.0    0.000000   0.000000   8.840000 (  4.920246)
jruby (master)   0.000000   0.000000   7.410000 (  3.804355)

10,000 times Date._strptime('2001-02-03', '%Y-%m-%d')
                     user     system      total        real
mri 2.4.1        0.000000   0.000000   0.080000 (  0.082622)
jruby 9.1.9.0    0.000000   0.000000  12.260000 (  5.507443)
jruby (master)   0.000000   0.000000   8.500000 (  4.096224)

100,000 times Date._strptime('2001-02-03', '%Y-%m-%d')
                     user     system      total        real
mri 2.4.1        0.000000   0.000000   0.220000 (  0.252380)
jruby 9.1.9.0    0.000000   0.000000  18.020000 ( 11.152461)
jruby (master)   0.000000   0.000000  12.880000 (  6.405134)

1,000,000 times Date._strptime('2001-02-03', '%Y-%m-%d')
                     user     system      total        real
mri 2.4.1        0.000000   0.000000   1.150000 (  1.177237)
jruby 9.1.9.0    0.000000   0.000000  56.590000 ( 50.715796)
jruby (master)   0.000000   0.000000  18.680000 ( 13.563366)

10,000,000 times Date._strptime('2001-02-03', '%Y-%m-%d')
                     user     system      total        real
mri 2.4.1        0.000000   0.000000  11.090000 ( 11.629846)
jruby 9.1.9.0    0.000000   0.000000 467.210000 (476.611497)
jruby (master)   0.000000   0.000000  94.060000 ( 93.278737)

The raw data: https://gist.github.com/muga/7b61e9284841c222275f9636dd9b7808

@enebo

This comment has been minimized.

Member

enebo commented Jun 19, 2017

In general, I very very rarely find C an order of a magnitude faster than Java. Usually they are comparable (notable exception is penalty of transcoding internally to/from UTF16-LE in Java which then can drop performance in 1/2 or a little more). Actually, I did not review this so I did not notice it uses java.lang.String for parsing/lexing. That is one 2+x speed issue so we can entertain using ByteList and a Stream over a Reader.

I just realized you are opening up some more work on this but I will also look at some profiling and see if we can get some more wins out of this. Part of me anticipates being able to beat MRI for this...I am optimistic! :)

@enebo

This comment has been minimized.

Member

enebo commented Jun 19, 2017

Ah just a comment since I did not realize this last night...this does actually perform Ruby per call whereas MRI does do this all in C. So that will be some penalty as well.

@muga

This comment has been minimized.

Contributor

muga commented Jun 19, 2017

@enebo @kares Thank you for improving it. I built 65e1495 and measured the performance with my benchmark again.

1,000 times Date._strptime('2001-02-03', '%Y-%m-%d')
                     user     system      total        real
mri 2.4.1        0.000000   0.000000   0.060000 (  0.077963)
jruby 9.1.9.0    0.000000   0.000000   8.010000 (  4.363465)
jruby (master)   0.000000   0.000000   6.970000 (  4.222631)

10,000 times Date._strptime('2001-02-03', '%Y-%m-%d')
                     user     system      total        real
mri 2.4.1        0.000000   0.000000   0.070000 (  0.104304)
jruby 9.1.9.0    0.000000   0.000000  12.020000 (  6.102424)
jruby (master)   0.000000   0.000000   7.520000 (  3.981396)

100,000 times Date._strptime('2001-02-03', '%Y-%m-%d')
                     user     system      total        real
mri 2.4.1        0.000000   0.000000   0.190000 (  0.225209)
jruby 9.1.9.0    0.000000   0.000000  17.780000 ( 12.446274)
jruby (master)   0.000000   0.010000   9.340000 (  5.049573)

1,000,000 times Date._strptime('2001-02-03', '%Y-%m-%d')
                     user     system      total        real
mri 2.4.1        0.000000   0.000000   1.200000 (  1.367820)
jruby 9.1.9.0    0.000000   0.000000  62.540000 ( 66.204895)
jruby (master)   0.000000   0.000000  14.170000 ( 11.629003)

10,000,000 times Date._strptime('2001-02-03', '%Y-%m-%d')
                     user     system      total        real
mri 2.4.1        0.000000   0.000000  11.790000 ( 13.940100)
jruby 9.1.9.0    0.000000   0.000000 527.610000 (652.799233)
jruby (master)   0.000000   0.000000  54.140000 ( 63.306614)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment