New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

reduced dates #7

Closed
jpmckinney opened this Issue Jan 26, 2013 · 6 comments

Comments

Projects
None yet
2 participants
@jpmckinney
Member

jpmckinney commented Jan 26, 2013

PopIt uses MongoDB's native Date objects, so it cannot use ISO 8601:2004 reduced dates.

@evdb

This comment has been minimized.

evdb commented Mar 1, 2013

The approach in PopIt is (was :) ) to treat all dates as a range. This works well in that it can represent a specific date like 2 Mar 2012 (by using a range from midnight to the following midnight). It can also represent arbitrary ranges like 1 Jan 2012 to 30 April 2012.

However it has the following problems:

  • It is clunky - for the cases where the date is known it feels silly to have two dates stored.
  • Timezones become relevant as there is a time component and this can lead to unexpected issues. This is what mysociety/popit#207 refers to and using strings to represent the date would be a good solution.

But the following advantages:

  • there is no arbitrary constraint that date must be either accurate to the day, the month or the year. Allows for greater granularity.
  • sorting in both directions is simpler as there is a high date and a low date to use as expected. See the extra code that was needed in the mzalendo codebase for sorting purposes when using a reduced date style.
  • the date is an object that further fields can be added to. For example the date could easily be given a name, or a comment (I'll create another issue regarding the name and the intended use).
@jpmckinney

This comment has been minimized.

Member

jpmckinney commented Mar 1, 2013

With respect to the advantages:

  1. Does that arbitrary constraint come up in practice? How big a disadvantage is that constraint?
  2. Can you provide an example where the ISO reduced dates do not give the sort order you would expect, when sorting their original values as strings? All I can tell is that you need to do some magic for your sentinel value "future" and null values to be properly sorted.
  3. Linking to #20
@evdb

This comment has been minimized.

evdb commented Mar 1, 2013

Does that arbitrary constraint come up in practice? How big a disadvantage is that constraint?

It has not, but it is realistic to expect that it will. Without this it won't be possible to represent "Q2 2012" or "late 2012" more accurately than just "2012". It would also not be possible to represent a particular week of the year if that week spanned two months. And using the ISO 8601 format of 2012-W12 would not sort correctly either.

For the Kenyan elections we would have been able to give a very specific range that the election should have occurred in according to the legislation.

Can you provide an example where the ISO reduced dates do not give the sort order you would expect, when sorting their original values as strings?

Ascending sorts tend to be fine, the problem is that when you do descending sorts with partial dates the expected result is not just the reverse of the ascending. Take these dates as an example:

ascending: 2001, 2001-03, 2001-03-15, 2001-04
descending: 2001, 2001-04, 2001-03, 2001-03-15

This sorting order could be contentious, but it is what we found to be the expected order when displaying the dates to the user. Having each partial date contain a definite high and low date for the range made sorting easier. Notably the Extended Date/Time Format (EDTF) 1.0 spec dodges the question completely.

All I can tell is that you need to do some magic for your sentinel value "future" and null values to be properly sorted.

That is a little messy, but using 'future' to distinguish between not known and not happened yet was useful. It is not deal though - if future is the end of the range it is a good way to say something like '15 Mar 2001 onwards', if future is the start of the range then its meaning is not clear.

@jpmckinney

This comment has been minimized.

Member

jpmckinney commented Mar 1, 2013

  1. ISO 8601 already has support for time intervals, e.g. 1985-04-12T23:20:50/1985-06-25T10:30:00. This is quite obviously awful for querying against. Having use cases (along the lines of your "test suite" from an earlier email) to lend support to why we would want to do certain types of queries would be great.
  2. Can we come up with a list of use cases in which we must sort by date, and in which the date is likely to be a reduced date? (If all use cases are ones where we are always have full dates, then sorting reduced dates is a non-issue). For these use cases, it would also be relevant to include an idea of how many records would be sorted. If it turns out that the number of records is small, then that sorting can be done by the app and not the DB. I see the "correct" sort order as being contentious - preferably our use cases can leave it up to each app.

Re: future let's have that discussion in #17

Update: Just for clarity:

Default reverse sort order on strings:

2001-04, 2001-03-15, 2001-03, 2001

Desired reverse sort order on string:

2001, 2001-04, 2001-03, 2001-03-15

i.e. sort by precision in normal order, then reverse sort by string all the strings with the same precision

@jpmckinney

This comment has been minimized.

Member

jpmckinney commented Mar 8, 2013

Btw, here is a US gov draft specification which may be more suitable than ISO 8601: http://www.loc.gov/standards/datetime/pre-submission.html

@jpmckinney

This comment has been minimized.

Member

jpmckinney commented Nov 1, 2014

Closing, as this has not come up again as an issue. We can re-open as necessary.

@jpmckinney jpmckinney closed this Nov 1, 2014

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment