New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement synthetic source support for range fields #107081
Implement synthetic source support for range fields #107081
Conversation
This PR adds basic synthetic source support for range fields. There are following notable properties of synthetic source produced: * Ranges are always normalized to be inclusive on both ends (this is how they are stored). * Original order of ranges is not preserved. * Date ranges are always expressed in epoch millis, format is not preserved. * IP ranges are always expressed as a range of IPs while it could have been originally provided as a CIDR. This PR only implements retrieval of data for source reconstruction from doc values.
Pinging @elastic/es-storage-engine (Team:StorageEngine) |
Hi @lkts, I've created a changelog YAML for you. |
server/src/main/java/org/elasticsearch/index/mapper/RangeType.java
Outdated
Show resolved
Hide resolved
@elasticmachine update branch |
This is ready for review. |
"lte": 299 | ||
} | ||
} | ||
---- |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we should add this to the list of limitations for synthetic source
? docs/reference/mapping/fields/synthetic-source.asciidoc
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is specific to ranges and i followed existing docs that have specific examples on type page (for example ip
).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am open to moving it though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What I mean is that in the docs about synthetic source, where we provide limitations we should mention also this one, not necessarily including an example but just explaining it so that users have it together with all other limitations.
rest-api-spec/src/yamlRestTest/resources/rest-api-spec/test/range/20_synthetic_source.yml
Show resolved
Hide resolved
server/src/main/java/org/elasticsearch/common/network/InetAddresses.java
Show resolved
Hide resolved
server/src/main/java/org/elasticsearch/index/mapper/RangeType.java
Outdated
Show resolved
Hide resolved
server/src/test/java/org/elasticsearch/index/mapper/IpRangeFieldMapperTests.java
Outdated
Show resolved
Hide resolved
Thanks for your PR, this looks great...very comprehensive test coverage. I left a few comments. |
@elasticmachine update branch |
configuration. Synthetic `_source` cannot be used together with | ||
<<copy-to,`copy_to`>> or with <<doc-values,`doc_values`>> disabled. | ||
|
||
Synthetic source always sorts values and removes duplicates for all `range` fields except `ip_range` . Ranges are sorted by their lower bound and then by upper bound. For example: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe move these documented limitations to a paragraph in synthetic-source.asciidoc
(at the end of the file)? I think all the limitations are mentioned there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or maybe summarize the limitations in synthetic-source.asciidoc
and keep the extensive examples here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have such examples that are specific to a field type for other fields like ip
, that's why i added it here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we already have a general statement that order is not preserved in synthetic-source
.
server/src/main/java/org/elasticsearch/index/mapper/BinaryDocValuesSyntheticFieldLoader.java
Outdated
Show resolved
Hide resolved
ip_range: { "gte": "::", "lte": "10.10.10.10" } | ||
- match: | ||
hits.hits.7._source: | ||
ip_range: { "gte": "10.10.10.10", "lte": "ffff:ffff:ffff:ffff:ffff:ffff:ffff:ffff" } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This mix of IPv6 and IPv4 is pretty unfortunate but i don't see a good way to address that. We could look at other end of range and try to format it in the same way but it's pretty brittle in my opinion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually our docs say that we support mixed ranges so maybe this is okay?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think when it come to this the semantic is something like:
- apply gte to all IPv4 fields as if you only had the gte defined
- apply the lte to all IPv6 fields as if you only had the lte defined
It is like saying for IPv4 fields the range is [10.10.10.10, null]
while for IPv6 fields the range is [null, ffff:ffff:ffff:ffff:ffff:ffff:ffff:ffff]
. Does it make sense?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes that would be ideal, let me see if i can do that.
@elasticmachine update branch |
@elasticmachine update branch |
@elasticmachine update branch |
I left a couple of comments...everything else looks good. |
@elasticmachine update branch |
This PR adds basic synthetic source support for range fields. There are
following notable properties of synthetic source produced:
they are stored).
have been originally provided as a CIDR.
This PR only implements retrieval of data for source reconstruction from
doc values similar to fields already supporting synthetic source.
Contributes to #106460.