-
Notifications
You must be signed in to change notification settings - Fork 151
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix querying with a path into nested collections with wildcards #7404
Conversation
6a7af1b
to
ce0d767
Compare
Comparing a collection with a list could fail if there was wildcards in the path and therefore multiple collections to compare with right hand list. Linklist is implicitly having wildcard in the path, so if linklists is in the path there will be a similar problem. This part is not solved.
ce0d767
to
3a5e559
Compare
test/test_parser.cpp
Outdated
q = origin->query("link[0].list = {4, 5, 6}"); | ||
CHECK_EQUAL(q.count(), 1); | ||
q = origin->query("link.list = {4, 5, 6}"); | ||
// CHECK_EQUAL(q.count(), 1); // Fails |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The tricky thing is to define what the correct behaviour should be. The current behaviour is to try to do an element by element comparison on {4, 5, 6, 1, 2, 3} vs {4, 5, 6}. This is clearly broken and I think we should change it to use the same design that you have introduced in this PR. We could debate on if this is a breaking change or a fix, but I'd vote to call it a fix.
CHECK_EQUAL(q.count(), 1); | ||
q = table->query("{3, 2, 1} = value[*][*]"); | ||
CHECK_EQUAL(q.count(), 1); | ||
q = table->query("value[*][*] = dict[*][*]"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please make sure that we also have meaningful comparisons when there is a expression comparison type involved as well. When comparing many-to-many lists, my intuition is that it is correct to apply ANY/ALL/NONE to the final list, not the amalgamation of all lists combined. I think this is how the code works as you have it written. But to be sure can you try adding a few combinations? eg:
ANY value[*][*] = ANY dict[*][*] // eg: {{1, 2, 3}, {4, 5, 6}} should match on {{1}, {10}} but not on {{4}, {1}}
ALL value[*][*] <= ANY dict[*][*] // eg: {{1, 2, 3}, {4, 5, 6}} should match on {{1, 10}, {10}} but not on {{2, 10}, {10}}
ALL value[*][*] = ALL dict[*][*]
NONE value[*][*] = ANY dict[*][*]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added a few more tests.
Pull Request Test Coverage Report for Build jorgen.edelbo_134Details
💛 - Coveralls |
@ironage would you prefer that I fix the problem involving linklist before this is merged? |
@ironage I have now fixed the case where linklists are involved. What do you think about the cases, I have commented out? |
test/test_parser.cpp
Outdated
verify_query(test_context, t2, "ALL list.integers == 1", 2); // row 0 matches {1}. row 1 matches (any of) {} {1} | ||
verify_query(test_context, t2, "NONE list.integers == 1", 1); // row 1 matches (any of) {}, {0}, {2}, {3} ... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The old behaviour is to combine all the lists along the chain into one big list, so the explanations for these results are wrong. I think the new behaviour is correct and much easier to reason about. I am in favour of changing this, I just think we should be very clear about it in the changelog in case someone was relying on it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In fact, the right thing to do may be to increase the major version number again. Most SDKs haven't even released v14 yet so this breaking change would be lumped in to all the rest.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is really weird that "ALL {} == 1" matches.
I am not sure we should bump the major. I would call this change a fix. The prior behavior was never really specified, was it? Also the fact that we only had to change 2 checks that had the comment that the result might be surpricing, suggest that this was just "works as implemented".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am quite in doubt how to describe what is breaking here. I think that in the case where you had ANY specified (implicitly or explicitly) and you compare with a single value or ANY {x,y,z}, then it gives the same results as before. And I think all other cases were broken.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's fair. We can label it a fix, and say we never supported those other many-to-many lists comparisons. I agree that the old behaviour for those cases was strange/broken.
src/realm/query_expression.hpp
Outdated
return m_destination_index < m_ctrl.matches.size(); | ||
} | ||
|
||
void evaluate(size_t index, ValueBase& destination) override |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It feels a bit weird to have two modes of evaluation like this where on one hand if the path is empty, the results are driven by the index
parameter and the query node is stateless. Then in the other case, the index
is ignored and the query node is advancing internal state at every call to evaluate. The other downside is that all query expressions have to pay the cost of two extra virtual function calls per evaluation with no benefit (more()
and reset_path()
).
What do you think about removing more()
and reset_path()
and then changing the design to something like this:
struct QueryIndex {
size_t index; // advanced directly for most queries
size_t sub_index; // usually 0: used by many-to-many lists
size_t sub_size; // usually 1: but can be set by the query node upon a call to evaluate
};
...
void evaluate(QueryIndex& index, ValueBase& destination) override {
if (index.index == m_last_evaluated_index) {
// fetch new lists
}
m_last_evaluated_index = index.index
auto& matches = m_ctrl.matches[index.sub_index];
// set destination as before
}
...
// main driver loop also changes to iterate on index.index and while (index.sub_index < index.sub_size)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for this suggestion. I have a tendency to think that virtual functions is the answer to all problems.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The updated design looks much cleaner to me, thank you!
// In case of wildcard query strings, we will get a value for every collection matching the path | ||
// We need to match those separately against the other value - which might also come in multiple | ||
// instances. | ||
Subexpr::Index right_index(start); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks optimal to me, but since it is such a hot path, could you check how much performance is lost on normal queries that don't have any more()
? I think the cost is two more conditionals per evaluation, so hopefully not too bad.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you have a query comparing 2 integer properties, you will have a 2% degradation of performance. On the other hand, I think this change will speed up queries over linklists as we will only fetch values until we have a match. And we are also avoiding some copying to intermediate buffers.
this->get_lists(index, list_refs, 1); | ||
this->get_lists(index, list_refs); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Odd that we never requested more than one list anywhere. There should be room for the multi-value row-by-row comparison in cases like this. Eg. for queries such as dictionary.@size() == my_int_property
we can grab 8 lists at a time, and set destination.init(not_from_list, 8)
and set all the values to the sizes of the lists. It is a niche optimization that only works for only_unary_links
that end in a collection count. Not blocking because we didn't have that optimized before, just observing that I think this is the case that the extra parameter was intended for.
Comparing a collection with a list could fail if there was wildcards in the path and therefore multiple collections to compare with right hand list.
Linklist is implicitly having wildcard in the path, so if linklists is in the path there will be a similar problem. This part is not solved.
What, How & Why?
Fixes #7393
☑️ ToDos
bindgen/spec.yml
, if public C++ API changed