Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add nanosecond field mapper #37755

Merged
merged 35 commits into from Feb 4, 2019

Conversation

Projects
None yet
5 participants
@spinscale
Copy link
Member

commented Jan 23, 2019

This adds a dedicated field mapper that supports nanosecond resolution -
at the price of a reduced date range.

When using the date field mapper, the time is stored as milliseconds since the epoch
in a long in lucene. This field mapper stores the time in nanoseconds
since the epoch - which means its range is much smaller, ranging roughly from
1677 to 2262.

If you are searching across indices where a date field is in nanoseconds
and another date field in milliseconds, then an internal mechanism will
convert the nanoseconds to milliseconds losing precision and then merge
the search results.

Relates #27330
Closes #32601

@elasticmachine

This comment has been minimized.

Copy link

commented Jan 23, 2019

@spinscale

This comment has been minimized.

Copy link
Member Author

commented Jan 23, 2019

@elasticmachine run elasticsearch-ci/1

@jimczi
Copy link
Member

left a comment

Thanks @spinscale , I did a first pass and added some comments.

This means that it will accept dates with optional timestamps, which conform
to the formats supported by
<<strict-date-time,`strict_date_optional_time_nanos`>> or
nanoseconds-since-the-epoch.

This comment has been minimized.

Copy link
@jimczi

jimczi Jan 23, 2019

Member

Is it nanoseconds-since-the-epoch or epoch_millis ? I don't understand which one is used by default.

<1> The `date` field uses the default `format`.
<2> This document uses a plain date.
<3> This document includes a time.
<4> This document uses milliseconds-since-the-epoch.

This comment has been minimized.

Copy link
@jimczi

jimczi Jan 23, 2019

Member

Why, shouldn't this use the nanoseconds-since-the-epoch ?

<2> This document uses a plain date.
<3> This document includes a time.
<4> This document uses milliseconds-since-the-epoch.
<5> Note that the `sort` values that are returned are all in milliseconds-since-the-epoch.

This comment has been minimized.

Copy link
@jimczi

jimczi Jan 23, 2019

Member

It should be in nanoseconds since epoch if there is a single index (the field is declared as a nanosecond date) ?

import static org.elasticsearch.common.time.DateUtils.toLong;

/** A {@link FieldMapper} for dates. */
public class NanosecondDateFieldMapper extends FieldMapper {

This comment has been minimized.

Copy link
@jimczi

jimczi Jan 23, 2019

Member

I wonder if the nanosecond type could be included in DateFieldMapper directly. The code is almost the same, only the parsing changes so I don't think we need a separate class ?

This comment has been minimized.

Copy link
@spinscale

spinscale Jan 23, 2019

Author Member

valid, looking if that works out!

This comment has been minimized.

Copy link
@spinscale

spinscale Jan 23, 2019

Author Member

i have not worked on the other items but just on this one. Maybe you can take a look if my change makes sense. I created an enum, that has different methods for converting a string to a long for milliseconds and nanoseconds. Does that make sense to you?

This comment has been minimized.

Copy link
@jimczi

jimczi Jan 23, 2019

Member

++, it looks good

import static org.hamcrest.Matchers.arrayWithSize;
import static org.hamcrest.Matchers.is;

public class NanosecondIT extends ESIntegTestCase {

This comment has been minimized.

Copy link
@jimczi

jimczi Jan 23, 2019

Member

Let's add rest tests rather than ESIntegTestCase. These tests shouldn't bee too difficult to migrate to yaml ?

@jpountz
Copy link
Contributor

left a comment

Woohoo, great to see that all the work that has been done wrt java-time support now allows to support nanosecond timestamps with so little additional code. Some comments/questions:

  • I'm not a fan of the nanosecond name because it doesn't tell whether this is a date/time or a duration. I'd rather like something like timestamp, timestamp_nanos or even date_nanos.
  • I left a note about some computations that are wrong with negative timestamps (ie. before 1970). While we could fix it, I wonder that a better option might be to reject nanosecond timestamps before 1970 at index time? I can't think of a use-case that requires nanosecond precision before 1970.
  • I agree with Jim it's a bit confusing that this new field supports nanoseconds but treats numbers as millis at index time.
  • Does this field work with date_histogram aggregations already?
same mapping parameters as the `date` formatter can be used.

NOTE: You can query across indices that have a date formatter and a
nanosecond formatter. Internally the nanosecond based dates will be

This comment has been minimized.

Copy link
@jpountz

jpountz Jan 24, 2019

Contributor

maybe s/formatter/field/? "date formatter" reads to me as "nanosecond field with a format that doesn't include nanos" but I think you mean date fields?


long seconds = nanoSecondsSinceEpoch / 1_000_000_000;
long nanos = nanoSecondsSinceEpoch % 1_000_000_000;
return Instant.ofEpochSecond(seconds, nanos);

This comment has been minimized.

Copy link
@jpountz

jpountz Jan 24, 2019

Contributor

I think this is incorrect for negative timestamps? you might want to use Math.floorDiv/floorMod instead.

return 0;
}

return nanoSecondsSinceEpoch / 1_000_000;

This comment has been minimized.

Copy link
@jpountz

jpountz Jan 24, 2019

Contributor

This is incorrect for negative times too.

@spinscale

This comment has been minimized.

Copy link
Member Author

commented Jan 24, 2019

I'd favor date_nanos here, easier to remember than anything else...

@spinscale

This comment has been minimized.

Copy link
Member Author

commented Jan 24, 2019

regarding treating the field differently with its default formatters: I do think that it makes sense to have it the same as the date field. Otherwise just switching from date to this one will suddenly index a completely different date when indexing just a number - I think that is more confusing?

Also, there is no epoch_nanos date formatter currently - I'd prefer people using human readable dates when indexing.

I am currently working on the fact that both can use the same default date formatter, so that the only difference would be in indexing, but not in configuration.

Does that make sense?

@jpountz

This comment has been minimized.

Copy link
Contributor

commented Jan 24, 2019

Otherwise just switching from date to this one will suddenly index a completely different date when indexing just a number - I think that is more confusing?

It might be indeed. We might want to add a big warning to the docs that this field will interpret numbers as millis.

Does that make sense?

Yes, thanks.

@spinscale

This comment has been minimized.

Copy link
Member Author

commented Jan 24, 2019

If everyone is good with rejecting pre 1970 dates, I'm happy to add that :-)

@spinscale

This comment has been minimized.

Copy link
Member Author

commented Jan 24, 2019

regarding aggregations: those are still fully in milliseconds (I already updated the docs with this limitation, but havent pushed it yet, working on Jim's review comments at the moments).

spinscale added some commits Jan 23, 2019

Add nanosecond field mapper
This adds a dedicated field mapper that supports nanosecond resolution -
at the price of a reduced date range.

When using the date field mapper, the time is stored as milliseconds since the epoch
in a long in lucene. This field mapper stores the time in nanoseconds
since the epoch - which means its range is much smaller, ranging roughly from
1677 to 2262.

If you are searching across indices where a date field is in nanoseconds
and another date field in milliseconds, then an internal mechanism will
convert the nanoseconds to milliseconds losing precision and then merge
the search results.

Relates #27330
incorporate review comments
- fix docs
- switch integration test to YAML test
- fix missing numeric types/fielddata types
- remove dead code in SearchPhaseController
- rename mapper from nanosecond to date_nanos
- only allow positive nanosecond dates (after 1970)
- add more tests, ensure that doc value fields are working
- add more tests, ensure that aggregations are working

This is just a super dirty prototype I hashed together on friday
afternoon, I will come up with a cleaner implementation next.

@spinscale spinscale force-pushed the spinscale:1901-nanosecond-fieldmapper branch from 1599452 to ec6ef2a Jan 25, 2019

@spinscale

This comment has been minimized.

Copy link
Member Author

commented Jan 28, 2019

@jimczi I made some more changes that we talked about.

One more open thing is to throw an exception when search_after is used in combination with nanosecond/millisecond indices. Can you point me to the code I should take a look at to do this?

@spinscale

This comment has been minimized.

Copy link
Member Author

commented Jan 31, 2019

@jimczi I reverted the merging code, the custom nanosecond sort field, and kept the fieldmapper, the custom fielddata for script doc values and field doc values plus tests

@jimczi
Copy link
Member

left a comment

It looks great @spinscale . I left some comments and questions but it's getting close ;).

Show resolved Hide resolved ...g/elasticsearch/index/fielddata/plain/SortedNumericDVIndexFieldData.java Outdated
Show resolved Hide resolved ...g/elasticsearch/index/fielddata/plain/SortedNumericDVIndexFieldData.java Outdated
@@ -56,7 +56,27 @@ public long ramBytesUsed() {
public final ScriptDocValues<?> getLegacyFieldValues() {

This comment has been minimized.

Copy link
@jimczi

jimczi Jan 31, 2019

Member

This function is unused since #30831. I'll open a separate pr to remove it.

This comment has been minimized.

Copy link
@spinscale

spinscale Jan 31, 2019

Author Member

I left the code in here for now, but can also remove it, whatever you prefer

This comment has been minimized.

Copy link
@jimczi

jimczi Feb 1, 2019

Member

I opened #38087

Show resolved Hide resolved ...g/elasticsearch/index/fielddata/plain/SortedNumericDVIndexFieldData.java Outdated
Show resolved Hide resolved docs/reference/mapping/types/date_nanos.asciidoc Outdated
@jimczi

jimczi approved these changes Feb 1, 2019

Copy link
Member

left a comment

I left a minor comment but it looks ready to me.
Thanks for iterating @spinscale

@@ -181,21 +247,24 @@ public TypeParser() {
}
}

public static final class DateFieldType extends MappedFieldType {

This comment has been minimized.

Copy link
@jimczi

jimczi Feb 1, 2019

Member

This class can remain final ?

This comment has been minimized.

Copy link
@spinscale

spinscale Feb 1, 2019

Author Member

yes, done

@spinscale

This comment has been minimized.

Copy link
Member Author

commented Feb 1, 2019

@elasticmachine run tests elasticsearch-ci/1

1 similar comment
@spinscale

This comment has been minimized.

Copy link
Member Author

commented Feb 1, 2019

@elasticmachine run tests elasticsearch-ci/1

@spinscale spinscale merged commit 87f3579 into elastic:master Feb 4, 2019

7 checks passed

elasticsearch-ci/1 Build finished.
Details
elasticsearch-ci/2 Build finished.
Details
elasticsearch-ci/default-distro Build finished.
Details
elasticsearch-ci/docbldesx Build finished.
Details
elasticsearch-ci/docs-check Build finished.
Details
elasticsearch-ci/oss-distro-docs Build finished.
Details
elasticsearch-ci/packaging-sample Build finished.
Details

@colings86 colings86 added v7.0.0-beta1 and removed v7.0.0 labels Feb 7, 2019

dimitris-athanasiou added a commit to dimitris-athanasiou/elasticsearch that referenced this pull request Feb 12, 2019

Add nanosecond field mapper (elastic#37755)
This adds a dedicated field mapper that supports nanosecond resolution -
at the price of a reduced date range.

When using the date field mapper, the time is stored as milliseconds since the epoch
in a long in lucene. This field mapper stores the time in nanoseconds
since the epoch - which means its range is much smaller, ranging roughly from
1970 to 2262.

Note that aggregations will still be in milliseconds.
However docvalue fields will have full nanosecond resolution

Relates elastic#27330
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.