Date library using Joda-Time (v2) #890

merged 100 commits into from Sep 22, 2013


None yet

4 participants

eregon commented Jul 16, 2013

The branch was moved to my own fork so I can happily push --force if I need to rework the first commits.

  • Date/DateTime constructors are very liberal for the arguments, they accept negative args and such. Replacing them entirely with Joda is hard and joda-time only handles civil format (y,m,d, h,m,s) at creation. The civil constructor has a fast path for positive args.
  • ajd is computed lazily and used only in a couple places: constructors not supported by joda-time, #ajd and #amjd, #<=>(Integer) and marshaling (since we want MRI compat).
  • Arbitrary precision is kept, with a @sub_millis field, optimized as 0 when not used otherwise as a Rational.
  • {Time,Date}#strftime has been improved a lot, with a proper lexer and better support.

Improve perf of:

  • strptime
  • other parsing methods

Could you rebase and try again? There were problems with the master's build.

eregon commented Jul 18, 2013

Rebased, this is a work in progress though, I will merge it when it will be much more complete.
The PR is mainly for discussions and review of the code, sorry if I was not clear.
(There should be the mentioned above failure in the build)

eregon added some commits Jul 13, 2013
@eregon eregon start using Joda's DateTime in lib/date.rb
* Tests are unaffected by this change.
* Zone offset and Gregorian cutover are computed from the DateTime.
* They are still copied in instances variables for ease of compatibility with current code.
* @ajd need to be preserved for now as DateTime only has millisecond precision,
  date.rb has arbitrary (although mentioned not expected > ns) (via Rational).
* Joda methods are used for year,week,day,hour,min and second
  (but not fractions since it might be imprecise).
@eregon eregon use RubyDateFormat to format Date, yeah! b86f3b2
@eregon eregon add formatting letters in javadoc to RubyDateFormat formats 16452f5
@eregon eregon add missing format %g (short weekyear) to RubyDateFormat 18dfb67
@eregon eregon fix RubyDateFormat %c, blank-padded day, not zero-padded 3d1d480
@eregon eregon fix RubyDateFormat FORMAT_CENTURY: use getYear() so it does not depen…
…d on Chronology

* In ISOChronology, it is fine, but in GJChronology, it's century + 1.
* Still a problem for the year right now with GJChronology (but no test!)
@eregon eregon RubyDateFormat: handle ignored POSIX modifiers as MRI be55dc2
@eregon eregon handle extra date formats in RubyDateFormat 6fb734f
@eregon eregon raise an ArgumentError when millis is a Bignum as Joda cannot support…
… such "times"

* that is earlier than 292 275 054 BC or later than 292 278 993 AD ...
@eregon eregon fix logic for Date#year: should add 1 only during JulianChronology (b…
…efore @sg)
@eregon eregon Date#rfc3339 is now defined as strftime('%FT%T%:z') 5d82115
@eregon eregon %v in Time is %e-%^b-%Y and is %e-%b-%Y for Date/DateTime 01e299d
@eregon eregon handle %L as a special case of %N and fix 1.8/1.9 conditions b8b60de
@eregon eregon separate millis and nsec nicely with setNSec() only ms to ns
* I so wish millis of DateTime was always 0 and nsec be nanoseconds
@eregon eregon do not use DateTimeZone.forOffsetHoursMinutes() since it can be probl…
…ematic in non-latest joda-time

* also, Ruby handles division of a negative in a way not conforming to this usage
@eregon eregon remove all unused code for Date#strftime. Best commit ever! 275d52b
@eregon eregon handle Astronomical year numbering within RubyDateFormat too
* from logic in date.rb (which is way simpler)
* tests coming in RubySpec
@eregon eregon Date#initialize: do not compute @of and @sg, much faster to save them…
… directly for now

ruby 2.0.0p247 924.9 ns/i ± 13.60 ( 1.5%) <=>   1081082 ips

jruby: before 20.72 µs/i ± 0.337 ( 1.6%) <=>     48260 ips

jruby: after 10.61 µs/i ± 0.102 ( 1.0%) <=>     94207 ips

jruby: without creation of the DateTime 7.483 µs/i ± 0.212 ( 2.8%) <=>    133643 ips

jruby: without computation of millis from ajd (through Rationals) 4.980 µs/i ± 0.068 ( 1.4%) <=>    200788 ips

* ideal scenario is creating the DateTime from civil parameters directly (year, month, day)
  and never rely on @ajd, but it means rewriting many methods of Date
@eregon eregon Optimize Date._iso8601
* Passes test__iso8601

* ruby 2.0.0p247
  Date._iso8601 9.046 µs/i ± 0.053 ( 0.6%) <=>    110548 ips
* before
  Date._iso8601 72.35 µs/i ± 24.68 (34.1%) <=>     13821 ips
* after
  Date._iso8601 5.185 µs/i ± 0.065 ( 1.3%) <=>    192847 ips

* Date.iso8601 is twice faster, bottleneck seems now #complete_frags
@eregon eregon forgot a private' eb44861
@eregon eregon Date._* parsing methods return an empty hash in 2.0, do it since we i…
…ntend to pass those tests

* Actually the C code simply allocate the Hash upfront and
  pass it to sub-matchers, leaving it untouched if nothing matches.
@eregon eregon optimize Date.valid_date_frags?: use simple return and clear checks i…
…nstead of generic and throw/catch
@eregon eregon add a fast path for civil date creation from parsing [CHEAT]
* but MRI does it too in d_new_by_frags()!
* Takes 21.08µs vs 33.78µs for Date.iso8601("2011-03-08"),
  might not be worth it and get removed.
@eregon eregon rewrite and optimize Date.complete_frags
* single #max_by instead of #map + #select + #sort_by + #last
* simple #count instead of #values_at + #compact
* no use in counting if :hour, :min and :sec match for every format
* do not bother setting those for simple Dates (not DateTimes)
* remove sec = min(sec, 59), this is nonsense!
@eregon eregon set offset to 0 if out of range [-day, +day] when parsing and warn as…
@eregon eregon default width of year is varying between 4 and 5 depending on the sig…
…n in MRI
@eregon eregon handle ':' after width and other format flags in strftime
* proper handling would be a parser
@eregon eregon handle correctly padding of negative numbers and their padder in form…

* `"-" + Long.toString(-value)` is simply `Long.toString(value)`
@eregon eregon can now use the same named capture with the recent bugfix d3ab83b
@eregon eregon Date#iso8601_timediv should round down the precision given 0a049e2
@eregon eregon use a more readable style with less return for case statements f2f66c7
@eregon eregon Date#chronology is already private by keyword above.
* Removes useless return statement
@eregon eregon add a fast path for Date.civil, building the joda.DateTime directly e950c05
@eregon eregon delegate to legacy code if d <= 0 for clarity 40b4f39
@eregon eregon allow passing sub_millis to Date#initialize, so we can use joda DateT…
…ime for +- months and year

* still arbitrary precision, but common case optimized (sub_millis = 0)
* @ajd should go away soon, or at least be lazy evaluated
@eregon eregon fast path for UTC default GJChronology: 3.109 vs 4.741 µs for 094f50f
@eregon eregon call Date#+ in Date#- if argument is Numeric 1e9005a
@eregon eregon compute @ajd lazily in Date: 1.825 vs 3.109 µs for a208c73
@eregon eregon use @sub_millis for Date#sec_fraction, and get rid of now unused #tim…
…e() helper
@eregon eregon remove other unused helpers b310096
@eregon eregon remove more unused private methods: Date#wnum{0,1} and Date#weeknum{0,1} d5b216b
@eregon eregon use joda commercial date methods and remove now unused helper #commer…
@eregon eregon remove old commented code 8797179
@eregon eregon avoid computing jd for checking Date#julian?
* weird case of JI/JVM opti? Instant#isAfter() is `a.getMillis() > b.getMillis()`
  Date#julian?     1.124 µs/i ± 0.024 ( 2.1%) <=>    889477 ips         2.430 µs/i ± 0.043 ( 1.8%) <=>    411488 ips 6.830 µs/i ± 0.269 ( 3.9%) <=>    146417 ips

 direct comparison
  Date#julian?     1.160 µs/i ± 0.038 ( 3.3%) <=>    862193 ips         2.318 µs/i ± 0.036 ( 1.6%) <=>    431411 ips 3.961 µs/i ± 0.060 ( 1.5%) <=>    252462 ips
@eregon eregon use getDayOfWeek() for Date#wday 909ebcf
@eregon eregon remove extra line between methods and modifiers (private, once, alias…
…es, ...)

... so one can see easily they belong together.
* also remove an old "require 'enumerator'"
@eregon eregon use Date.julian_leap?/Date.gregorian_leap? for Date#leap? 072ac37
@eregon eregon Ruby 1.9 always supports Rational(Rational(...), ...) 2a6c3f0
@eregon eregon remove unused constants (no more present in MRI) 1d07c06
@eregon eregon use @dt and @sub_millis instead of @ajd for creating new Dates fa735dc
@eregon eregon toJulianDayNumber already returns jd in Date#start 34d48d1
@eregon eregon constant access style 1dde2f1
@eregon eregon add getters for @dt and @sub_millis in Date 1aa5675
@eregon eregon use DateTime#compareTo instead of computing two #ajd
Date#== 586.2 ns/i ± 5.836 ( 1.0%) <=>   1705839 ips

with cached #ajd
Date#== 752.8 ns/i ± 21.67 ( 2.9%) <=>   1328355 ips

uncached #ajd
Date#== 2.931 µs/i ± 0.028 ( 1.0%) <=>    341210 ips
@eregon eregon save ajd in cache if given in Date#initialize 90e4151
@eregon eregon use joda DateTime millis for Date#hash
* Joda-Time says one should use
  ((int) (getMillis() ^ (getMillis() >>> 32))) + (getChronology().hashCode())
  which can be (as Fixnum is a long)
  (getMillis() ^ getChronology().hashCode())
  but Date used simply the ajd so we do the most similar option
@eregon eregon distribute #once for each method for clarity in Date 1a88876
@eregon eregon keep #once only for expensive methods in Date
* The first 3 are only an offset of #ajd
* The last 2 are so simple they should never be the bottleneck
@eregon eregon add a fast path for DateTime.{new,civil}: 2 vs 13 µs
old: 13.64 µs/i ± 0.182 ( 1.3%) <=>     73296 ips
now: 2.081 µs/i ± 0.116 ( 5.6%) <=>    480493 ips

* for now, only used if max ms precision
@eregon eregon improve Time#to_date and Time#to_datetime by calling ::civil directly
Time#to_date     10.41 µs/i ± 0.047 ( 0.5%) <=>     96020 ips
Time#to_datetime 21.15 µs/i ± 0.084 ( 0.4%) <=>     47270 ips
Time#to_date     5.605 µs/i ± 0.172 ( 3.1%) <=>    178412 ips
Time#to_datetime 13.97 µs/i ± 0.029 ( 0.2%) <=>     71534 ips
@eregon eregon indent more continued condition in Date.valid_date_frags? for readabi…
@eregon eregon use the joda DateTime constructors for Date{Time,}#to_{time,date,date…
@eregon eregon improve Time#to_datetime with knowledge of implementation (both are E…

Time#to_datetime 13.97 µs/i ± 0.029 ( 0.2%) <=>     71534 ips
Time#to_datetime 8.256 µs/i ± 0.124 ( 1.5%) <=>    121125 ips
@eregon eregon optimize
old: 9.128 µs/i ± 0.125 ( 1.4%) <=>    109552 ips
new: 3.613 µs/i ± 0.057 ( 1.6%) <=>    276757 ips

* trying to get directly the DateTime and call withTimeAtStartOfDay()
  seems to not be any better
@eregon eregon optimize simply use Time#to_datetime
old: 20.43 µs/i ± 0.294 ( 1.4%) <=>     48930 ips
new: 8.900 µs/i ± 0.152 ( 1.7%) <=>    112355 ips
@eregon eregon do not create a Rational if we have nothing under ms in Time#to_datetime 7cb5fe2
@eregon eregon rename argument ajd to dt_or_ajd in Date#initialize 6a64b70
@eregon eregon support Date#+ directly with DateTime and sub_millis (no ajd conversion) e54e4bc
@eregon eregon support Date - Date directly with DateTime and sub_millis (no ajd con…
@eregon eregon implement Date#{jd,day_fraction} without needing ajd 5b9d544
@eregon eregon adopt the new inspect format from MRI
* it does not depend on ajd, yeah!
@eregon eregon do not pretend to fit RubyDateFormat in Java's DateFormat
* we want ByteList support
* Locale is not a concern
* having the pattern and data as internal state is bad for caching and usage
* only depends on ruby_1_9, might worth be moved in Runtime or even singleton if the context is passed
@eregon eregon use a Java enum instead of self-defined constants in RubyDateFormat
* reorder them alphabetically by their %<format> notation
* remove unused FORMAT_DATE_1
@eregon eregon remove serialVersionUID since RubyDateFormat is no more Serializable 0e76f6e
@eregon eregon use ByteList in RubyDateFormat and TimeOutputFormatter 6c50ba5
@eregon eregon faster way to append String to ByteList: list.append(str.charAt(i))
Date#strftime("%Y-%m-%d") 5.300 µs/i ± 0.020 ( 0.4%) <=>    188668 ips
Date#strftime("%Y-%m-%d") 3.624 µs/i ± 0.009 ( 0.3%) <=>    275922 ips
@eregon eregon add convenience and efficient method RubyDateFormat.compileAndFormat()
* compileAndFormat() improves from 8.198 µs to 5.321 µs for Date#strftime("%Y-%m-%d")
@eregon eregon use Token.str() to create FORMAT_STRING Token in RubyDateFormat 7b57571
@eregon eregon handle the fact newer Joda-Time can not have offset of 24h
* makes some tests fail unfortunately, MRI accepts offset of +24 or -24
@eregon eregon add a proper lexer width JFlex for {Time,Date}#strftime !
* Tokenize exactly as the old one but in a much more proper way
* Should handle every edge case, with only a grammar of a dozen lines
* Remove massive amount of code from TimeOutputFormatter and RubyDateFormat
* Add a helper for composed formats to enhance readability
@eregon eregon rename RubyDateFormat.Token.fmt() to format() 87ee632
@eregon eregon reuse StrftimeLexer in RubyDateFormat, creating a new one is expensive ce87c2c
@eregon eregon Allow multiple flags in Flags pattern
* also escape the '+' inside the character class
@eregon eregon fix bug due to what seems a look-ahead bug in JFlex 1.4.3
* the look-ahead part (or a substring) would be captured in the current expression
* JFlex 1.4 seems fine
@eregon eregon fix DateTime#jisx0301, it was calling DateTime#iso8601 which is not D…
@eregon eregon stop depending on Ruby version in RubyDateFormat
* nsec will be 0 in 1.8 anyway
@eregon eregon use Date arbitrary precision in #strftime (%N,%L) output
* need ThreadContext, use constructor to pass it
* pass all related Date/DateTime tests
* more arguments, but casting from Object (long versus Rational) seems bad
@eregon eregon give Date#strftime's output the same encoding as the format String ae6188d
@eregon eregon taint resulting String in Date#strftime 497eae4
@eregon eregon fix Date#inspect to match MRI
* infinite sg is represented specially
* ns rational is simplified if possible
* s is expressed in UTC
@eregon eregon handle all knowns formats in Date#marshal_load cc2fd27
@eregon eregon add our own marshal format for Date
ruby 2.0.0p247
dump   8.563 µs/i ± 0.169 ( 2.0%) <=>    116785 ips
load   7.091 µs/i ± 0.446 ( 6.3%) <=>    141015 ips
reload 16.93 µs/i ± 0.346 ( 2.0%) <=>     59059 ips

jruby before with cached ajd
dump   5.991 µs/i ± 0.033 ( 0.6%) <=>    166926 ips
load   15.55 µs/i ± 1.890 (12.1%) <=>     64275 ips
reload 24.53 µs/i ± 0.461 ( 1.9%) <=>     40757 ips

jruby before without cached ajd
dump   8.306 µs/i ± 1.438 (17.3%) <=>    120389 ips
load   15.61 µs/i ± 0.384 ( 2.5%) <=>     64062 ips
reload 27.76 µs/i ± 1.315 ( 4.7%) <=>     36014 ips

jruby using [@dt, @of, @sg, @sub_millis]
dump   30.56 µs/i ± 0.059 ( 0.2%) <=>     32722 ips
load   123.4 µs/i ± 0.482 ( 0.4%) <=>      8099 ips
reload 165.6 µs/i ± 4.235 ( 2.6%) <=>      6038 ips

jruby using [@dt.getMillis, @of, @sg, @sub_millis]
dump   5.154 µs/i ± 0.044 ( 0.9%) <=>    194039 ips
load   10.85 µs/i ± 0.062 ( 0.6%) <=>     92154 ips
reload 19.41 µs/i ± 0.082 ( 0.4%) <=>     51505 ips
@eregon eregon remove caching in Date, should no more be needed
* ajd is only used in #<=> for comparison with ajd, pretty rare
* #day_fraction is rarely used
* #julian? must be optimized anyway since the off-by-one problem with joda DateTime negative years
@eregon eregon improve Date#julian? perf 0b58ecc
@eregon eregon use the Date::UNIX_EPOCH_IN_AJD constant instead of literal 8ab37b0
@eregon eregon improve greatly perf of 1.324 µs vs 8.816 µs
* take advantage of the fact the default TZ should not change
@eregon eregon move CHRONO_ITALY_UTC constant with other Chronology constants 8729336
@eregon eregon follow MRI on parsing DateTime: accept a 60th leap second f3e9b9b
@eregon eregon adapt a couple lines to Date tests to run almost all assertions
* avoid some irrelevant assertions
* using minitest/exclude by method would exclude way too much
@eregon eregon remove all Date tests exclusions for 1.9 and 2.0 except one! 5ab0268
@eregon eregon merged commit f831eaa into jruby:master Sep 22, 2013

1 check was pending

default The Travis CI build is in progress
eregon commented Sep 22, 2013

@headius Merged!

atambo commented Sep 22, 2013

I think your modifications to the tests in the /test/externals/ will get blown away the next time they are sync'd with mri. Maybe you should just extract those tests out into the /spec/regression/ directory?

Also, I wonder if these patches fix some of these rails/jruby issues about Time marshalling:

eregon commented Sep 22, 2013

I think your modifications to the tests in the /test/externals/ will get blown away the next time they are sync'd with mri. Maybe you should just extract those tests out into the /spec/regression/ directory?

@atambo Yes, I fear they will indeed get away. There is only one minor additional test. I used that technique to comment uninteresting failing assertions, yet testing other assertions within the same method (since minitest/exclude can only exclude by method I think). Excluding blindly these methods would reduce the coverage too much to my taste. I guess we could copy the passing/interesting assertions to some other suite, but it seems a wrong solution.

What if I merged the test changes in jruby/ruby ?

I fixed Date (and also old Rational) marshaling but not Time.

headius commented Sep 22, 2013

If you make the test changes to jruby/ruby on the appropriate branch (jruby-ruby_1_9_3 most likely) that is probably an ok way to keep them around. We probably need a better system here, but it is what it is.

Once we're caught up, I can see about merging some of our test changes to MRI proper.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment