Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

consider merging into Base #12

Closed
StefanKarpinski opened this issue Aug 23, 2013 · 16 comments
Closed

consider merging into Base #12

StefanKarpinski opened this issue Aug 23, 2013 · 16 comments

Comments

@StefanKarpinski
Copy link
Contributor

Date and time functionality is so basic that I'm starting to feel like having it in Julia Base would be best. The main problem is that timezone data and leap second data need to be updated more often than one may want to update Julia itself. Maybe those can be included in point releases while ensuring that point releases are non-disruptive enough that people are willing to make those updates even for production systems.

@StefanKarpinski
Copy link
Contributor Author

cc: @nolta, @aviks, @JeffBezanson, @ViralBShah.

@aviks
Copy link
Contributor

aviks commented Aug 24, 2013

I would love for Base to have a DateTime implementation, and I'm sure no one is surprised when I say that :)

Plenty of systems do point releases when timezone data changes. Its quite the done thing.

There are a few uses for nolta's Calendar. How onerous is the migration if this becomes Base? Should we make an effort to make the API similar?

@StefanKarpinski
Copy link
Contributor Author

There are a few uses for nolta's Calendar. How onerous is the migration if this becomes Base? Should we make an effort to make the API similar?

These should definitely be unified. I'm not clear at this point what the relationship between Datetime and Calendar is. Maybe @nolta or @karbarcca could expand on that? Since Datetime handles timezones and leap seconds, it seems to me like it might obviate the need to rely on an external package like ICU, which I believe Calendar does.

@quinnj
Copy link
Owner

quinnj commented Aug 24, 2013

Here are the main functionality differences from what I understand:

-Calendar inherits more mature formatting and IO capabilities, DateTime has date(::String) which does a best guess for string parsing, but no formal format() function for Date/Datetime creation from strings through a specified format pattern (i.e. "YY-DD-MM")
--Calendar also inherits localization features that allow the displaying of dates/timezones according to system locale (i.e. China: 2012-12-6 GMT-0500下午4时09分23秒, France: 6 déc. 2012 16:07:56 UTC-05:00)
-I'm not quite clear on the low-level details, but Calendar datetimes, at least on the surface, seem to be mutable ("setter" functions are provided), whereas all Date/DateTimes in Datetime are immutable bitstypes
-I'm also not quite clear how Calendar (or ICU rather) handles the usage of different calendars (i.e. Gregorian, ISO, TAI, etc.), but Datetime types take a Calendar parameter which allows new calendars to define attributes/behaviors for date types
-Datetime provides a "day-precision" Date type which allows for smaller memory footprint and faster calculations due to the lack of overhead from timezones and leapseconds
-Datetime handles leap seconds
-Datetime provides a DateRange1 type in addition to the regular DateRange (which is similar to Calendar's CalendarTimeRange). DateRange1 requies the use of an "inclusion" boolean function as the step which allows for temporal expression creation, similar to Ruby's runt (i.e. give me the dates of the 1st and 3rd Tuesdays of every other month between 2001 and 2005)

In terms of performance, last time I ran benchmarks Datetime and Calendar were equal when dealing with datetimes/timestamps with specified timezones, but for UTC and the day-precision Date type, Datetime has significant advantages. Datetime also has a fairly extensive test suite.

In terms of the API, I'm still working on tweaking the inter-Period conversions for non-fixed durations (i.e. Year-Day). Stefan mentioned taking a "distribution" approach; we could also consider Calendar.jl's Duration type, but I personally don't like having Period types as well as Durations because of the conceptual overlap and I think it limits overall functionality. Currently in Java's dev channel, JSR-310, Joda-Time's successor, is taking the approach of providing "resolver" functions so the user can specify exactly what to do when a calculation or conversion involves any ambiguity.

@aviks
Copy link
Contributor

aviks commented Aug 24, 2013

Jacob has explained the differences in detail, but at a high level,

Calendar uses ICU to handle date (and timezone) arithmetic and formatting, while DateTime implements date arithmetic in pure Julia.

@StefanKarpinski
Copy link
Contributor Author

Currently in Java's dev channel, JSR-310, Joda-Time's successor, is taking the approach of providing "resolver" functions so the user can specify exactly what to do when a calculation or conversion involves any ambiguity.

It strikes me as usually a bad idea to let the user specify things like this since they tend not to know the best thing to do. I still rather like the distribution idea.

@StefanKarpinski
Copy link
Contributor Author

It would be great if the common functionality between Calendar and Datetime had the same interface so that you could swap one out for the other. Regarding specific items:

  • Calendar inherits more mature formatting and IO capabilities, DateTime has date(::String) which does a best guess for string parsing, but no formal format() function for Date/Datetime creation from strings through a specified format pattern (i.e. "YY-DD-MM")

This functionality should be added to Datetime, although it may be hard to make the parsing identical. Of course, Calendar and Datetime could use the same pure-Julia parsing once that exists, while Calendar could continue to use ICU for other features.

  • Calendar also inherits localization features that allow the displaying of dates/timezones according to system locale (i.e. China: 2012-12-6 GMT-0500下午4时09分23秒, France: 6 déc. 2012 16:07:56 UTC-05:00)

This seems much more optional than parsing and formatting, although it would be nice to have. Is localization on by default in Calendar or opt-in? If it's opt-in, it could just be an unsupported feature of a common API for Datetime.

  • I'm not quite clear on the low-level details, but Calendar datetimes, at least on the surface, seem to be mutable ("setter" functions are provided), whereas all Date/DateTimes in Datetime are immutable bitstypes

I really think that date and time values should be immutable. Even if the underlying library has mutable types, we may want to present an immutable API like we do for BigInts and BigFloats, while sometimes using mutation internally for performance. The Calendar type could remain mutable and the mutators could just not be part of the common API shared by Datetime and Calendar – if you use setters, then you're locking yourself into Calendar.

  • I'm also not quite clear how Calendar (or ICU rather) handles the usage of different calendars (i.e. Gregorian, ISO, TAI, etc.), but Datetime types take a Calendar parameter which allows new calendars to define attributes/behaviors for date types

Perhaps @nolta can explain how Calendar handles different calendars (it's possible it doesn't).

  • Datetime provides a "day-precision" Date type which allows for smaller memory footprint and faster calculations due to the lack of overhead from timezones and leap seconds

This is a nice feature, although I was under the impression that this was precisely what Calendar provided.

  • Datetime handles leap seconds

Honestly, the fact that ICU doesn't handle leap seconds and is slower than Calendar makes me think that maybe we should just go with Datetime and maybe keep ICU as a package for its date/time parsing and formatting functionality.

  • Datetime provides a DateRange1 type in addition to the regular DateRange (which is similar to Calendar's CalendarTimeRange). DateRange1 requires the use of an "inclusion" boolean function as the step which allows for temporal expression creation, similar to Ruby's runt (i.e. give me the dates of the 1st and 3rd Tuesdays of every other month between 2001 and 2005)

I'm not sure I follow this.

@quinnj
Copy link
Owner

quinnj commented Aug 25, 2013

About the "day-precision" Date type. The only type Calendar provides is the CalendarTime which is millisecond precision. Datetime provides the DateTime type which is millisecond precision, but also the Date type which is "day-precision".

W.r.t the DateRange1 type, both Calendar and Datetime provide a range interface for dates which work how one familiar with Julia ranges would expect (i.e. start:step:stop, [range] expands to all elements for a given range). These are useful when the "step" you're looking to use is a fixed period type, e.g. every 2 years, or every 5 days, or every 6 months. They are not ideal, however, for expressing more exotic ranges like employee work schedules or holidays (e.g. Thanksgiving is the 4th Thursday of November). There is a Ruby package called runt that is solely devoted to this.

I think Datetime's solution is much simpler and will be natural for Julia users because it utilizes ranges. The usage pattern is instead of a period type for the step in "start:step:stop", an anonymous function is used which indicates whether a given date should be "included" or not (hence the term "inclusion function"). We'll take the Thanksgiving example from the manual:

julia> dt = date(2009,1,1)
2009-01-01

julia> dt2 = date(2013,1,1)
2013-01-01

julia> Thanksgiving = x->dayofweek(x)==Thursday && month(x)==November && dayofweekinmonth(x)==4
# function

julia> f = dt:Thanksgiving:dt2
2009-11-26:# function:2012-11-22

julia> [f]
4-element Date{ISOCalendar} Array:
 2009-11-26
 2010-11-25
 2011-11-24
 2012-11-22

This is a trivial example, but you can imagine the power. You could, for example, create a HOLIDAYS function which would return all national U.S. holidays and say for any day of the week between Monday and Friday and !HOLIDAYS to get legal business days.

@ViralBShah
Copy link

+1 on merging into Base. On leap second and timezone data - we could put the data in a package that can be updated independently of the DateTime functionality in Base.

@quinnj
Copy link
Owner

quinnj commented Aug 27, 2013

I just pushed a few changes that put all leap second and timezone data into the "tzdata" folder, so basically a user could replace this folder with updated data and it should all work fine. I guess we could add a path-setter function of some sort, but I think this is fine for now.

@StefanKarpinski
Copy link
Contributor Author

The main use case that we need to make sure to address is very conservative production installations where even making a point upgrade may be too risky to do regularly. For those users, if we make it relatively easy to only swap out the timezone and leap second data, they should be ok. That sort of installation tends to be managed by a team of well-informed professional sysadmins who should be able to handle that if it's well documented. DateTime should also degrade reasonably if an installation decides not to update those files. I suspect this isn't hard – most timezones don't change so most things should keep working if the file gets stale. Likewise, if you don't update leap seconds but you don't care about leap second accuracy, then everything should be fine. Perhaps it should be possible to turn off leap second and/or time zone support entirely.

@quinnj
Copy link
Owner

quinnj commented Aug 28, 2013

All timezone-related functionality currently defaults to UTC which is effectively "turning it off" (alternatively, a user could specify the even more generic Offset{0}), so I'm not worried about timezone data.

For leap seconds, I actually mocked up the ability to turn it off, but ultimately scrapped it because I figured it wouldn't really be that useful. I think in all reality, a user is going to bump into leap seconds pretty rarely (how many people even know about them?), and those that do care, will almost certainly be worried about keeping it up to date. The changes I pushed today and mentioned above allow a user to simply replace the tzdata folder with a fresh set of data and everything "just works".

Before merging into Base, I would request an IRC sprint or something similar with a select few (those originally cc'd?) or even just one other dev to go over the code. I really did about 99% of this solo and don't feel comfortable (as no one should) without some peer review and feedback. I'm positive there are some needed changes and particularly with the Timezone/Leap Second data API I would love some help ironing it out.

@ViralBShah
Copy link

How about making a PR, so that everyone can take a look and comment?

@StefanKarpinski
Copy link
Contributor Author

I do think this should probably not go into 0.2 since we're trying to get that stable and this might be a big piece of unstable changes. But getting things into Base is generally a great way to get them reviewed carefully ;-)

@gitfoxi
Copy link

gitfoxi commented Dec 25, 2013

+1

At-least DataFrames.readtable would be a good start with readcsv after it gets merged into Base.

@quinnj
Copy link
Owner

quinnj commented Aug 26, 2014

😄

@quinnj quinnj closed this as completed Aug 26, 2014
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants