Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SQL: uniquely named inner_hits sections for each nested field condition #45039

Merged
merged 5 commits into from
Aug 8, 2019

Conversation

astefan
Copy link
Contributor

@astefan astefan commented Jul 31, 2019

This PR fixes #33080 and #44544 by having each nested field condition (which translates to a nested query) have its inner_hits section to be uniquely named. The results of the query will be combined into a single list of nested documents.

This PR also highlights a limitation (its origin it's with Elasticsearch though) on the sorting capabilities when the sorting is based on the nested fields filtered: only one of the conditions applied to nested documents will be used in the nested sorting.

and combine the multiple values it generates into a single list.
This also introduces a limitation (its origin it's with Elasticsearch
though) on the sorting capabilities when the sorting is based on the
nested fields filtered: only one of the conditions applied to nested
documents will be used in the nested sorting.
@astefan astefan requested review from costin and matriv July 31, 2019 12:20
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-search

Copy link
Contributor

@matriv matriv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! Left a few comments.

;

selectNestedAndRootDocument_WithMultipleConditions_AndNestedSorting
SELECT CONCAT(CONCAT(first_name, ' '), last_name) AS name, dep.dep_name, dep.dep_id, dep.from_date, dep.to_date FROM test_emp WHERE dep.from_date > '1990-01-01' AND dep.dep_name='Production' AND dep.to_date < '2000-01-01' ORDER BY dep.dep_id, dep.from_date, name;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would sort by dep_name, dep_id so that there that the dep_id acts like a second level (tie breaker) sorting.
Maybe also include more entries with the same dep_name to ensure this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You won't get a response that's much different than what is already in this one: Otmar and Kazuhide will switch places. Sorting happens differently for nested documents than for fields without relationships (normal fields).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it, thx!

}
// Then sort the resulting List based on the offset of the same inner hit. Each inner_hit match will have an offset value,
// relative to its location in the _source
SortedSet<SearchHit> tree = new TreeSet<SearchHit>(new NestedHitOffsetComparator());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think since the duplicates have been removed with the lhm HashMap above we can have a simple ArrayList to sort them instead of a TreeMap.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, I think you are right. An ArrayList with Collections.sort should be enough. I'll change.

try {
// the name of an inner_hits section is "[path]_[number]", so keep all inner_hits matching this rule
Long.valueOf(entry.getKey().substring(path.length() + 1));
} catch(NumberFormatException nfe) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the case that this happens?
And if there is, can't it result in StringIndexOutOfBoundsException because of the path.length() + 1 ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is possible for a scenario where there are two nested fields: one has the path pathA and the other has the path pathA_. Looking for elements matching pathA_[number] means startsWith(pathA) is true and what's left after pathA_ should be a number. In the case of pathA__34 (second path inner_hits name) - the first condition (startsWith) is true, but the second condition is not and the pathA__34 is filtered out. Can you think of another way in which I can check the pattern path_[number]?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you could simply use:

int endOfPath = s.lastIndexOf('_');
return entry.getKey().substring(0, endOfPath).equals(path)

and remove the parsing of the trailing number

SearchHit[] h = s.getHits();
for(int i = 0; i < h.length; i++) {
lhm.put(h[i].getNestedIdentity().getOffset(), h[i]);
for(Entry<String, SearchHits> entry : hit.getInnerHits().entrySet()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor: whitespace for for (

int endOfPath = entry.getKey().lastIndexOf('_');
if (endOfPath >= 0 && entry.getKey().substring(0, endOfPath).equals(path)) {
SearchHit[] h = entry.getValue().getHits();
for(int i = 0; i < h.length; i++) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here

Copy link
Contributor

@matriv matriv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Member

@costin costin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM - it's good fix until the nested doc support gets reviewed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

SQL: generated nested documents queries return incorrect inner_hits
5 participants