-
Notifications
You must be signed in to change notification settings - Fork 24.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SQL: uniquely named inner_hits sections for each nested field condition #45039
Conversation
and combine the multiple values it generates into a single list. This also introduces a limitation (its origin it's with Elasticsearch though) on the sorting capabilities when the sorting is based on the nested fields filtered: only one of the conditions applied to nested documents will be used in the nested sorting.
Pinging @elastic/es-search |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good! Left a few comments.
; | ||
|
||
selectNestedAndRootDocument_WithMultipleConditions_AndNestedSorting | ||
SELECT CONCAT(CONCAT(first_name, ' '), last_name) AS name, dep.dep_name, dep.dep_id, dep.from_date, dep.to_date FROM test_emp WHERE dep.from_date > '1990-01-01' AND dep.dep_name='Production' AND dep.to_date < '2000-01-01' ORDER BY dep.dep_id, dep.from_date, name; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would sort by dep_name, dep_id so that there that the dep_id acts like a second level (tie breaker) sorting.
Maybe also include more entries with the same dep_name to ensure this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You won't get a response that's much different than what is already in this one: Otmar and Kazuhide will switch places. Sorting happens differently for nested documents than for fields without relationships (normal fields).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it, thx!
} | ||
// Then sort the resulting List based on the offset of the same inner hit. Each inner_hit match will have an offset value, | ||
// relative to its location in the _source | ||
SortedSet<SearchHit> tree = new TreeSet<SearchHit>(new NestedHitOffsetComparator()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think since the duplicates have been removed with the lhm
HashMap above we can have a simple ArrayList to sort them instead of a TreeMap.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, I think you are right. An ArrayList with Collections.sort
should be enough. I'll change.
try { | ||
// the name of an inner_hits section is "[path]_[number]", so keep all inner_hits matching this rule | ||
Long.valueOf(entry.getKey().substring(path.length() + 1)); | ||
} catch(NumberFormatException nfe) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the case that this happens?
And if there is, can't it result in StringIndexOutOfBoundsException
because of the path.length() + 1
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it is possible for a scenario where there are two nested fields: one has the path pathA
and the other has the path pathA_
. Looking for elements matching pathA_[number]
means startsWith(pathA)
is true
and what's left after pathA_
should be a number. In the case of pathA__34
(second path inner_hits name) - the first condition (startsWith) is true, but the second condition is not and the pathA__34
is filtered out. Can you think of another way in which I can check the pattern path_[number]
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you could simply use:
int endOfPath = s.lastIndexOf('_');
return entry.getKey().substring(0, endOfPath).equals(path)
and remove the parsing of the trailing number
Simplifying the inner_hits extraction
SearchHit[] h = s.getHits(); | ||
for(int i = 0; i < h.length; i++) { | ||
lhm.put(h[i].getNestedIdentity().getOffset(), h[i]); | ||
for(Entry<String, SearchHits> entry : hit.getInnerHits().entrySet()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
minor: whitespace for for (
int endOfPath = entry.getKey().lastIndexOf('_'); | ||
if (endOfPath >= 0 && entry.getKey().substring(0, endOfPath).equals(path)) { | ||
SearchHit[] h = entry.getValue().getHits(); | ||
for(int i = 0; i < h.length; i++) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM - it's good fix until the nested doc support gets reviewed.
This PR fixes #33080 and #44544 by having each nested field condition (which translates to a
nested
query) have itsinner_hits
section to be uniquely named. The results of the query will be combined into a single list of nested documents.This PR also highlights a limitation (its origin it's with Elasticsearch though) on the sorting capabilities when the sorting is based on the nested fields filtered: only one of the conditions applied to nested documents will be used in the nested sorting.