Fix querying with a path into nested collections with wildcards #7404

jedelbo · 2024-03-01T12:15:51Z

Comparing a collection with a list could fail if there was wildcards in the path and therefore multiple collections to compare with right hand list.

Linklist is implicitly having wildcard in the path, so if linklists is in the path there will be a similar problem. This part is not solved.

What, How & Why?

Fixes #7393

☑️ ToDos

📝 Changelog update
🚦 Tests (or not relevant)
C-API, if public C++ API changed
bindgen/spec.yml, if public C++ API changed

Comparing a collection with a list could fail if there was wildcards in the path and therefore multiple collections to compare with right hand list. Linklist is implicitly having wildcard in the path, so if linklists is in the path there will be a similar problem. This part is not solved.

ironage · 2024-03-01T23:45:29Z

test/test_parser.cpp

+    q = origin->query("link[0].list = {4, 5, 6}");
+    CHECK_EQUAL(q.count(), 1);
+    q = origin->query("link.list = {4, 5, 6}");
+    // CHECK_EQUAL(q.count(), 1); // Fails


The tricky thing is to define what the correct behaviour should be. The current behaviour is to try to do an element by element comparison on {4, 5, 6, 1, 2, 3} vs {4, 5, 6}. This is clearly broken and I think we should change it to use the same design that you have introduced in this PR. We could debate on if this is a breaking change or a fix, but I'd vote to call it a fix.

ironage · 2024-03-01T23:55:01Z

test/test_parser.cpp

+    CHECK_EQUAL(q.count(), 1);
+    q = table->query("{3, 2, 1} = value[*][*]");
+    CHECK_EQUAL(q.count(), 1);
+    q = table->query("value[*][*] = dict[*][*]");


Please make sure that we also have meaningful comparisons when there is a expression comparison type involved as well. When comparing many-to-many lists, my intuition is that it is correct to apply ANY/ALL/NONE to the final list, not the amalgamation of all lists combined. I think this is how the code works as you have it written. But to be sure can you try adding a few combinations? eg:

ANY value[*][*] = ANY dict[*][*] // eg: {{1, 2, 3}, {4, 5, 6}} should match on {{1}, {10}} but not on {{4}, {1}} ALL value[*][*] <= ANY dict[*][*] // eg: {{1, 2, 3}, {4, 5, 6}} should match on {{1, 10}, {10}} but not on {{2, 10}, {10}} ALL value[*][*] = ALL dict[*][*] NONE value[*][*] = ANY dict[*][*]

I added a few more tests.

coveralls-official · 2024-03-04T11:14:48Z

Pull Request Test Coverage Report for Build jorgen.edelbo_134

Details

323 of 332 (97.29%) changed or added relevant lines in 6 files are covered.
84 unchanged lines in 17 files lost coverage.
Overall coverage increased (+0.005%) to 90.906%

Changes Missing Coverage	Covered Lines	Changed/Added Lines	%
src/realm/query_expression.hpp	136	140	97.14%
src/realm/query_expression.cpp	51	56	91.07%

Files with Coverage Reduction	New Missed Lines	%
src/realm/query_expression.cpp	1	87.52%
test/fuzz_tester.hpp	1	59.58%
src/realm/array_string.cpp	2	85.25%
src/realm/cluster.cpp	2	77.52%
src/realm/sync/noinst/server/server_history.cpp	2	67.94%
test/test_sync.cpp	2	91.31%
src/realm/query_expression.hpp	3	93.76%
src/realm/sync/network/network.cpp	3	89.73%
src/realm/sync/transform.cpp	3	63.0%
src/realm/table.cpp	3	90.81%

Totals
Change from base Build 2106:	0.005%
Covered Lines:	238492
Relevant Lines:	262349

💛 - Coveralls

jedelbo · 2024-03-04T12:05:21Z

@ironage would you prefer that I fix the problem involving linklist before this is merged?

jedelbo · 2024-03-05T15:42:14Z

@ironage I have now fixed the case where linklists are involved. What do you think about the cases, I have commented out?

ironage · 2024-03-05T23:43:07Z

test/test_parser.cpp

    verify_query(test_context, t2, "ALL list.integers == 1", 2);  // row 0 matches {1}. row 1 matches (any of) {} {1}
    verify_query(test_context, t2, "NONE list.integers == 1", 1); // row 1 matches (any of) {}, {0}, {2}, {3} ...


The old behaviour is to combine all the lists along the chain into one big list, so the explanations for these results are wrong. I think the new behaviour is correct and much easier to reason about. I am in favour of changing this, I just think we should be very clear about it in the changelog in case someone was relying on it.

In fact, the right thing to do may be to increase the major version number again. Most SDKs haven't even released v14 yet so this breaking change would be lumped in to all the rest.

It is really weird that "ALL {} == 1" matches.

I am not sure we should bump the major. I would call this change a fix. The prior behavior was never really specified, was it? Also the fact that we only had to change 2 checks that had the comment that the result might be surpricing, suggest that this was just "works as implemented".

I am quite in doubt how to describe what is breaking here. I think that in the case where you had ANY specified (implicitly or explicitly) and you compare with a single value or ANY {x,y,z}, then it gives the same results as before. And I think all other cases were broken.

That's fair. We can label it a fix, and say we never supported those other many-to-many lists comparisons. I agree that the old behaviour for those cases was strange/broken.

ironage · 2024-03-06T01:12:40Z

src/realm/query_expression.hpp

+        return m_destination_index < m_ctrl.matches.size();
+    }
+
+    void evaluate(size_t index, ValueBase& destination) override


It feels a bit weird to have two modes of evaluation like this where on one hand if the path is empty, the results are driven by the index parameter and the query node is stateless. Then in the other case, the index is ignored and the query node is advancing internal state at every call to evaluate. The other downside is that all query expressions have to pay the cost of two extra virtual function calls per evaluation with no benefit (more() and reset_path()).
What do you think about removing more() and reset_path() and then changing the design to something like this:

struct QueryIndex { size_t index; // advanced directly for most queries size_t sub_index; // usually 0: used by many-to-many lists size_t sub_size; // usually 1: but can be set by the query node upon a call to evaluate }; ... void evaluate(QueryIndex& index, ValueBase& destination) override { if (index.index == m_last_evaluated_index) { // fetch new lists } m_last_evaluated_index = index.index auto& matches = m_ctrl.matches[index.sub_index]; // set destination as before } ... // main driver loop also changes to iterate on index.index and while (index.sub_index < index.sub_size)

Thank you for this suggestion. I have a tendency to think that virtual functions is the answer to all problems.

The updated design looks much cleaner to me, thank you!

ironage · 2024-03-06T18:48:27Z

src/realm/query_expression.hpp

+            // In case of wildcard query strings, we will get a value for every collection matching the path
+            // We need to match those separately against the other value - which might also come in multiple
+            // instances.
+            Subexpr::Index right_index(start);


This looks optimal to me, but since it is such a hot path, could you check how much performance is lost on normal queries that don't have any more()? I think the cost is two more conditionals per evaluation, so hopefully not too bad.

If you have a query comparing 2 integer properties, you will have a 2% degradation of performance. On the other hand, I think this change will speed up queries over linklists as we will only fetch values until we have a match. And we are also avoiding some copying to intermediate buffers.

ironage · 2024-03-06T18:56:08Z

src/realm/query_expression.cpp

-        this->get_lists(index, list_refs, 1);
+        this->get_lists(index, list_refs);


Odd that we never requested more than one list anywhere. There should be room for the multi-value row-by-row comparison in cases like this. Eg. for queries such as dictionary.@size() == my_int_property we can grab 8 lists at a time, and set destination.init(not_from_list, 8) and set all the values to the sizes of the lists. It is a niche optimization that only works for only_unary_links that end in a collection count. Not blocking because we didn't have that optimized before, just observing that I think this is the case that the extra parameter was intended for.

cla-bot bot added the cla: yes label Mar 1, 2024

github-actions bot assigned jedelbo Mar 1, 2024

jedelbo force-pushed the je/query-wildcard branch from 6a7af1b to ce0d767 Compare March 1, 2024 12:20

jedelbo requested a review from ironage March 1, 2024 12:20

jedelbo force-pushed the je/query-wildcard branch from ce0d767 to 3a5e559 Compare March 1, 2024 12:22

ironage reviewed Mar 1, 2024

View reviewed changes

jedelbo added 2 commits March 4, 2024 11:34

Fix compilation and update test

fb66b64

Merge branch 'master' into je/query-wildcard

483b5dd

Do not merge values from different objects into a common list in queries

e061380

jedelbo requested a review from ironage March 5, 2024 15:40

ironage reviewed Mar 5, 2024

View reviewed changes

ironage reviewed Mar 6, 2024

View reviewed changes

Update after review

1057eb6

jedelbo requested a review from ironage March 6, 2024 13:25

ironage reviewed Mar 6, 2024

View reviewed changes

ironage approved these changes Mar 6, 2024

View reviewed changes

jedelbo added 2 commits March 7, 2024 11:01

Update test

6220240

Merge branch 'master' into je/query-wildcard

5833c2e

jedelbo merged commit edf7064 into master Mar 7, 2024
35 of 37 checks passed

jedelbo deleted the je/query-wildcard branch March 7, 2024 13:56

github-actions bot locked as resolved and limited conversation to collaborators Apr 6, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix querying with a path into nested collections with wildcards #7404

Fix querying with a path into nested collections with wildcards #7404

jedelbo commented Mar 1, 2024 •

edited

Loading

ironage Mar 1, 2024

ironage Mar 1, 2024

jedelbo Mar 4, 2024

coveralls-official bot commented Mar 4, 2024 •

edited

Loading

jedelbo commented Mar 4, 2024

jedelbo commented Mar 5, 2024 •

edited

Loading

ironage Mar 5, 2024

ironage Mar 6, 2024

jedelbo Mar 6, 2024

jedelbo Mar 6, 2024

ironage Mar 6, 2024

ironage Mar 6, 2024

jedelbo Mar 6, 2024

ironage Mar 6, 2024

ironage Mar 6, 2024

jedelbo Mar 7, 2024

ironage Mar 6, 2024

		verify_query(test_context, t2, "ALL list.integers == 1", 2); // row 0 matches {1}. row 1 matches (any of) {} {1}
		verify_query(test_context, t2, "NONE list.integers == 1", 1); // row 1 matches (any of) {}, {0}, {2}, {3} ...

		this->get_lists(index, list_refs, 1);
		this->get_lists(index, list_refs);

Fix querying with a path into nested collections with wildcards #7404

Fix querying with a path into nested collections with wildcards #7404

Conversation

jedelbo commented Mar 1, 2024 • edited Loading

What, How & Why?

☑️ ToDos

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

coveralls-official bot commented Mar 4, 2024 • edited Loading

Pull Request Test Coverage Report for Build jorgen.edelbo_134

Details

💛 - Coveralls

jedelbo commented Mar 4, 2024

jedelbo commented Mar 5, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jedelbo commented Mar 1, 2024 •

edited

Loading

coveralls-official bot commented Mar 4, 2024 •

edited

Loading

jedelbo commented Mar 5, 2024 •

edited

Loading