add min_fold_count_limit optimization #423

u9g · 2023-08-02T15:22:00Z

Stop expanding @fold early if the fold contains no @output fields and the count of the fold is not outputted, while also not invaliding other filters by skipping the rest of the iterator such as filtering for an upperbound. (@filter(< 10))

u9g

some thoughts

trustfall_core/src/interpreter/execution.rs

u9g · 2023-08-02T15:32:05Z

trustfall_core/src/interpreter/execution.rs

+    // We do not apply the min_fold_count_limit optimization if we have an upper bound,
+    // in the form of max_fold_count_limit, because if we have a required upperbound,
+    // then we can't stop at the lower bound, if we are required to observe whether we hit the
+    // upperbound, it's no longer safe to stop at the lower bound.


is it too much to repeat the same exact thing another time at the end here after the comma?

Yeah this is a bit hard to read. Can you try rephrasing it a bit, perhaps splitting into two or more sentences with clearer structure?

reworded to now be:

If we must collect the fold up to our upperbound of `max_fold_count_limit`, then we won't use our lowerbound of `min_fold_count_limit`, as by definition the upperbound will be larger than the lowerbound.

I'm getting tripped up on the "by definition" bit. What definition? Don't we get the upper and lower bounds from separate filters, and it's technically possible that we have a nonsensical query with self-contradicting filters that end up violating this "definition"?

Shouldn’t that be an impossible filter, and since we solve for the solution space of filters, it would be impossible?

Being impossible means it produces no results, not that it can't be executed by a user. The code still has to handle it.

We can assert on it anyways, if that would be better

I don't think we can assert on it — that's what I'm saying. A query with self-contradicting filters is valid. It's okay to execute it, and Trustfall must not crash on it. We may one day choose to trigger a lint on such queries, but even then, the query is allowed to be executed, and must still produce the correct — empty — result.

Come to think of it, it's probably best to make some query test cases like that. How about these:

{ Number(min: 30, max: 30) { ... on Composite { value @output primeFactor @fold @transform(op: "count") @filter(op: ">=", value: ["$min"]) @filter(op: "<=", value: ["$max"]) } } } args: { "min": 2, "max": 3, }

This one should return { value: 30 } because the fold count is 3, which is >= 2 and <= 3.

The new fold count optimization shouldn't kick in because we still need to check the other filter.

{ Number(min: 30, max: 30) { ... on Composite { value @output primeFactor @fold @transform(op: "count") @filter(op: ">=", value: ["$min"]) @filter(op: "<=", value: ["$max"]) } } } args: { "min": 3, "max": 2, }

This one should return no results because the filter conditions are >= 3 && <= 2.

The new fold count optimization probably shouldn't kick in here either, for the same reason.

Added a test for this

trustfall_core/src/interpreter/execution.rs

obi1kenobi

Just some nitpicks on wording. Writing clear and concise comments is an art worth practicing.

The code change looks broadly correct and the test case changes look great.

obi1kenobi · 2023-08-02T15:38:43Z

trustfall_core/src/interpreter/execution.rs

+    // We do not apply the min_fold_count_limit optimization if we have an upper bound,
+    // in the form of max_fold_count_limit, because if we have a required upperbound,
+    // then we can't stop at the lower bound, if we are required to observe whether we hit the
+    // upperbound, it's no longer safe to stop at the lower bound.


Yeah this is a bit hard to read. Can you try rephrasing it a bit, perhaps splitting into two or more sentences with clearer structure?

trustfall_core/src/interpreter/execution.rs

Co-authored-by: Predrag Gruevski <2348618+obi1kenobi@users.noreply.github.com>

obi1kenobi

Just some nitpicks to make the Rust more idiomatic, otherwise this is good to go.

trustfall_core/src/interpreter/execution.rs

Co-authored-by: Predrag Gruevski <2348618+obi1kenobi@users.noreply.github.com>

… into min_fold_count_limit

…ter but can't

trustfall_core/src/interpreter/execution.rs

u9g · 2023-08-05T04:36:29Z

trustfall_core/src/interpreter/execution.rs

+                && tagged_fold_count.kind == FoldSpecificFieldKind::Count
+        })
+    });
+    let safe_to_skip_part_of_fold =


is safe_to_skip_part_of_fold too non-specific as to which part we can skip?

The name isn't great. But see the comment above on an idea how we can skip it entirely.

Meta: try naming things for what they represent and not for what should be done with them. Right now, you tend to give a lot of imperative names like that, but those tend to age poorly because they might end up being used differently than originally intended. Names that describe what the value represents, not how it's going to be used, will age much better. Try thinking of some names like that and maybe put a few ideas in reply if you want feedback?

Right, to name this variable something better, I think fold_values_not_observed or fold_values_ignored would suffice since that's what this boolean represents is that the fold values are not observed, neither the final count of values not the individual values by @fold'ing on them.

obi1kenobi · 2023-08-05T14:42:46Z

trustfall_core/src/interpreter/execution.rs

+            // This optimization is only valid if we know that every
+            // filter applied to the folded element count is a comparison
+            // that we can determine the number of elements needed to satisfy
+            // the filter.


this portion of this comment doesn't really make sense to me:

a comparison that we can determine the number of elements needed to satisfy the filter.

To simplify, you can out "this optimization is only valid if" and replace it with "every filter on the fold count must ..."

But also, I'm not sure this comment is at the right level of abstraction. This comment makes a general statement about "this optimization" where the optimization is never defined — the function's docs say "further optimizations." It doesn't say anything about this match branch, so it would be hard for the reader to connect all the clues to figure out what's going on.

Hm, you're right. Maybe something like If we find a filter that we can't partially evaluate, fail early. would be a better comment.

Much better! Small nit: "partial evaluation" is also the term for an unrelated idea, which might confuse the reader. Perhaps "filter that would require evaluating the fold past its minimum count" or something?

I don't love my suggestion, perhaps you can come up with something even better.

I see what you're saying, If we find a filter that requires us to evaluate the entire fold, fail early. is better I think since it doesn't require knowing what we are partially evaluating. Although this gives the mistaken impression that every fold count filter that gets here will need to be fully evaluated which isn't necessarily the case. Another try would be If we don't know how many elements of the fold are required to satisfy this filter, fail early.

The latter is good, I like that. "Requires us to evaluate the entire fold" is technically incorrect as you point out.

Okay great, updated to that

trustfall_core/src/interpreter/execution.rs

obi1kenobi · 2023-08-05T14:48:59Z

trustfall_core/src/interpreter/execution.rs

+            // Queries that do not observe the fold count nor any fold contents may be able to
+            // be optimized by only partially expanding the fold, just enough to check any filters
+            // that may be applied to the fold count.
+            //
+            // For example, if `@filter(op: ">", value: ["$ten"])` is our only filter on the count
+            // of the fold, we can stop computing the rest of the fold after seeing we have 11 elements.
+            Some(min_fold_count_limit) if safe_to_skip_part_of_fold => {
+                iterator.take(*min_fold_count_limit).collect()
+            }


Do you think it's worth tweaking the implementation here a bit to simplify the logic?

Here's what I noticed: safe_to_skip_part_of_fold = false with min_fold_count_limit = Some(...) is the same as safe_to_skip_part_of_fold = true with min_fold_count_limit = None. Also, safe_to_skip_part_of_fold has a pretty awkward name, which is usually a sign that the concept it represents is muddy as well.

In that light, it seems weird that we're splitting the logic so that the caller of this function prepares the two values, but isn't allowed to combine them. Perhaps we can eliminate safe_to_skip_part_of_fold completely by making the caller turn the min_fold_count_limit to None if the fold expansion couldn't be short-circuited?

In that case, the bulk of this comment would be best moved outside into the caller as well, since that's where most of the logic is on whether this optimization can be applied. This function just ends up doing what it's told.

Right, I removed this if safe_to_skip_part_of_fold from collect_fold() and put it in the caller, and combined the boolean checks, this way collect_fold() only has to execute on the option, it doesn't have to deal with extra conditions if the option already has one built in.

obi1kenobi · 2023-08-05T14:51:59Z

trustfall_core/src/interpreter/execution.rs

+                && tagged_fold_count.kind == FoldSpecificFieldKind::Count
+        })
+    });
+    let safe_to_skip_part_of_fold =


The name isn't great. But see the comment above on an idea how we can skip it entirely.

Meta: try naming things for what they represent and not for what should be done with them. Right now, you tend to give a lot of imperative names like that, but those tend to age poorly because they might end up being used differently than originally intended. Names that describe what the value represents, not how it's going to be used, will age much better. Try thinking of some names like that and maybe put a few ideas in reply if you want feedback?

add min_fold_count_limit optimization

2922874

u9g commented Aug 2, 2023

View reviewed changes

trustfall_core/src/interpreter/execution.rs Outdated Show resolved Hide resolved

trustfall_core/src/interpreter/execution.rs Outdated Show resolved Hide resolved

trustfall_core/src/interpreter/execution.rs Outdated Show resolved Hide resolved

u9g commented Aug 2, 2023

View reviewed changes

trustfall_core/src/interpreter/execution.rs Show resolved Hide resolved

obi1kenobi reviewed Aug 2, 2023

View reviewed changes

u9g and others added 5 commits August 3, 2023 21:38

Merge branch 'main' into min_fold_count_limit

6233217

use usize_from_field_value in get_min_fold_count_limit

4d5927a

Update trustfall_core/src/interpreter/execution.rs

8d8b871

Co-authored-by: Predrag Gruevski <2348618+obi1kenobi@users.noreply.github.com>

Update trustfall_core/src/interpreter/execution.rs

a5574af

Co-authored-by: Predrag Gruevski <2348618+obi1kenobi@users.noreply.github.com>

Improve comment about the optimization

f960c05

u9g mentioned this pull request Aug 4, 2023

Ability to @tag properties inside a @fold, then use the tags outside that @fold #341

Open

u9g added 4 commits August 4, 2023 09:47

Merge branch 'main' into min_fold_count_limit

d197c10

disable optimization when we have a tag on the count of folded elements

acd55be

rewrite comment again

22af47d

improve comments

735684a

obi1kenobi reviewed Aug 4, 2023

View reviewed changes

trustfall_core/src/interpreter/execution.rs Outdated Show resolved Hide resolved

trustfall_core/src/interpreter/execution.rs Outdated Show resolved Hide resolved

trustfall_core/src/interpreter/execution.rs Outdated Show resolved Hide resolved

obi1kenobi reviewed Aug 4, 2023

View reviewed changes

trustfall_core/src/interpreter/execution.rs Outdated Show resolved Hide resolved

u9g and others added 10 commits August 4, 2023 23:15

Merge branch 'main' into min_fold_count_limit

c8a6e23

use values() over iter()

9b7fe6a

improve variable names

a68060f

Update trustfall_core/src/interpreter/execution.rs

78a49bb

Co-authored-by: Predrag Gruevski <2348618+obi1kenobi@users.noreply.github.com>

Merge branch 'min_fold_count_limit' of https://github.com/u9g/trustfall…

62d776e

… into min_fold_count_limit

use .any

9e6545b

combine booleans to make safe_to_skip_part_of_fold arg

625392d

Merge branch 'main' into min_fold_count_limit

2318846

remove comments that point out something obvious and rely on implem

3e2caa1

add comment and return None if we try getting the lowerbound of a fil…

17ff414

…ter but can't

u9g commented Aug 5, 2023

View reviewed changes

u9g mentioned this pull request Aug 5, 2023

Add a test for ignoring necessary filters #431

Merged

obi1kenobi reviewed Aug 5, 2023

View reviewed changes

u9g added 2 commits August 5, 2023 11:37

get rid of safe_to_skip_part_of_fold and move around comments

8814461

improve comment for why we return None

7e6354a

obi1kenobi approved these changes Aug 5, 2023

View reviewed changes

obi1kenobi added A-adapter Area: plugging data sources into the interpreter C-enhancement Category: raise the bar on expectations R-relnotes Release: document this in the release notes of the next release labels Aug 5, 2023

obi1kenobi merged commit 3fd0c1c into obi1kenobi:main Aug 5, 2023
18 checks passed

u9g deleted the min_fold_count_limit branch August 5, 2023 19:28

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add min_fold_count_limit optimization #423

add min_fold_count_limit optimization #423

u9g commented Aug 2, 2023

u9g left a comment

u9g Aug 2, 2023

obi1kenobi Aug 2, 2023

u9g Aug 4, 2023

obi1kenobi Aug 4, 2023

u9g Aug 4, 2023

obi1kenobi Aug 4, 2023

u9g Aug 4, 2023

obi1kenobi Aug 4, 2023

u9g Aug 5, 2023

obi1kenobi left a comment

obi1kenobi Aug 2, 2023

obi1kenobi left a comment

u9g Aug 5, 2023

obi1kenobi Aug 5, 2023

u9g Aug 5, 2023

obi1kenobi Aug 5, 2023

u9g Aug 5, 2023 •

edited

Loading

obi1kenobi Aug 5, 2023

u9g Aug 5, 2023

obi1kenobi Aug 5, 2023

u9g Aug 5, 2023

obi1kenobi Aug 5, 2023

u9g Aug 5, 2023

obi1kenobi Aug 5, 2023

add min_fold_count_limit optimization #423

add min_fold_count_limit optimization #423

Conversation

u9g commented Aug 2, 2023

u9g left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

obi1kenobi left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

obi1kenobi left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

u9g Aug 5, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

u9g Aug 5, 2023 •

edited

Loading