Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dates can't reference some esoteric but valid dates #9048

Closed
nik9000 opened this issue Dec 23, 2014 · 7 comments
Closed

Dates can't reference some esoteric but valid dates #9048

nik9000 opened this issue Dec 23, 2014 · 7 comments

Comments

@nik9000
Copy link
Member

nik9000 commented Dec 23, 2014

We're looking at providing some kind of structured search for wikidata.org. It contains dates like

And, try as I might, I can't use dates for them. I know I can still store them as longs containing seconds since epoch but that'll only work backwards. I can't find any dates hugely in the future on wikidata right now but wouldn't be surprised if I ended up with stuff like

I know these dates are really silly for handling log messages but they matter to me if no one else.

Is this something Elasticsearch should have builtin or should I go make another plugin?

@clintongormley
Copy link

All I want for Christmas is BigDate? :)

As you say, this is an esoteric use case. Probably not a good idea to mix up our existing date type with these dates, otherwise somebody is sure to try to produce a date histogram starting at 1s after the Big Bang, with an interval of 30 minutes...

At first I thought of having a year field type, but that probably wouldn't cover all of your use cases. It really is some kind of BigDate, isn't it? Any ideas about how you would implement this?

@nik9000
Copy link
Member Author

nik9000 commented Dec 24, 2014

At first I thought of having a year field type, but that probably wouldn't cover all of your use cases. It really is some kind of BigDate, isn't it? Any ideas about how you would implement this?

Probably BigDate is a good thing to call it, yeah.

So some points:

  • Most date parsing in Java fails on those dates. You kind of have to do your own year handling. So whatever we build would need hand rolled year parsing at least. From there it could delegate to something else to get everything else.
  • Looks like signed longs for milliseconds can go +/- 292 million years. Signed longs for seconds would do +/- 292 billion years which really should comfortably hold everything we expect to talk about having ever happened in the past.
  • These values all have impressive error bars on them. Looks like the Big Bang's error values are +/- 37 million years which is pretty impressively precise in the grand scheme of things, but not so precise on the scale of seconds.
  • Many of these values have a known total ordering. Big Bang before hyperinflation. In fact we're reasonably sure they occurred so close together than second precision isn't good enough.

So we have this dichotomy - second precision would mostly be overkill from an error bars standpoint but not good from a total ordering standpoint. For that reason I think something like just stuffing it in a BigDecimal holding seconds is probably a good choice. It looks like its implemented as an arbitrary precision mantissa and a scale which seems like a reasonably efficient way to handle my dichotomy. I haven't checked how that'd play with Lucene or doc values or anything

@jpountz
Copy link
Contributor

jpountz commented Dec 24, 2014

I haven't checked how that'd play with Lucene or doc values or anything

I think there are two issues here: sorting and range queries. Sorting would be quite easy if you manage to find an encoding scheme for your dates so that the lexicographical order of your encoded dates matches the numeric order of your dates. (You don't actually need a plugin for that, you could just do the encoding on client-side into a string field.)

For range queries, we had a similar issue for ipv6 addresses and @mikemccand worked on a nice patch that automatically adds prefix terms so that range queries are fast: https://issues.apache.org/jira/browse/LUCENE-5879 but it raises a couple of design issues that have prevented it from being committed so far.

@uschindler
Copy link
Contributor

At PANGAEA, where we also have dates going back to geological areas :) the best you can do is: Use Microweich Excel magic and encode dates as double: Full days since epoch. If you are close to epoch you have best precision, but you can still go back billions of years. Thi s works fine for sorting, and if the double value ×86,400,000 is in long range you can still use real date formatters. Otherwise just see it as days/years or what you like by scaling.

I don't think you need support for this in ES, just convert your dates on client.

@jpountz
Copy link
Contributor

jpountz commented Jan 9, 2015

++ @uschindler !

@jpountz
Copy link
Contributor

jpountz commented Jan 9, 2015

@nik9000 I'm curious whether this idea would address your needs?

@clintongormley
Copy link

Assuming that this issue has been resolved. Feel free to reopen

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants