Cache repeated predicate evaluations during triggers #713

seadowg · 2023-04-05T13:36:31Z

Work towards #689

These changes improve the performance of calling FormDef#setValue for cases when triggerables have repeated predicates. For instance, the example form in the issue would have to evaluate 500k XPath expressions when a question is answered to compute 10 calculates that share the same predicate (but just access different children of the resulting nodes). With the changes in this PR, this drops 50k evaluations as we cache the initial predicate evaluation and then reuse it. Like in the example form, this will be beneficial to any form that has calculates that look up items on secondary instances.

~~The following predicate expressions are not cached (due to the complexity involved in supporting them):~~

* Non eq expressions
* Eq expressions where either side is a function call

It's likely that instead of this set of limitations, we really just want to not cache expressions that contain functions that's return value is not idempotent with respect to the nodes the predicate is filtering or the node that the expression is being evaluated from. This would be functions like now() and random(). We'll need some example forms and test scenarios to tease that out further.

EDIT: We've decided to allow any kind of expression other than those containing non-string functions. We can probably add support for those down the line, but effort we'll need to be put in to test them out.

I've also added the ability to disable caching from FormEntryController. This will let clients (like Collect) make this an opt-in/out feature while we get more confident with it.

TODO:

backfill test with 2 predicates
is there any useful predicate in a repeat that can be written without a function?
- some things with mod for example to alternate values between instances. But we don't think these will cause problems because each instance has its own evaluation context.
Allow string functions
Try on slow devices
- Moto G5 from 23s to 3s on target form 🚀

Only do this for eq predicates and avoid expressions with functions. Allowing this for any kind of expression causes tests to fail. The caching is also limited to triggerable evalutions so that it doesn't kick in other cases like nodeset calculations for selects. Caching during these cases also causes problems (probably because this cache isn't being cleared in those cases).

…s regression

seadowg · 2023-04-07T12:36:35Z

src/main/java/org/javarosa/form/api/FormEntryController.java

@@ -345,4 +345,8 @@ private static FormIndex getRepeatGroupIndex(FormIndex index, FormDef formDef) {
            }
        }
    }
+
+    public void disablePredicateCaching() {


This will let Collect (or any other client) present this as an opt-in feature.

lognaturel · 2023-04-07T16:44:55Z

src/main/java/org/javarosa/xpath/expr/XPathExpression.java

+    /**
+     * Returns true if this expression is not idempotent with respect to the current state of the form.
+     */
+    public abstract boolean isNotIdempotent();


I personally find methods that are inherently negated difficult to reason about. Any reason not to make this isIdempotent? It would avoid the awkward double negation at https://github.com/getodk/javarosa/pull/713/files#diff-ff9982ce5c3c5b5119956e3bd9a20d8934090f9e6be0affad5c1598d66f72072R43

Totally agree with this. One thing I realized looking at this again is that isNotIdempotent() is often "faster" to computer for non-idempotent functions because we don't need to traverse the whole expression - we just need to find something that isn't idempotent. I'll make the change because I don't think that's a big enough deal to preserve what definitely feels like a worse API, but I wanted to make sure I'd brought that up.

Thanks, I hadn’t considered that. Agreed it’s not likely to be significant in this context but really good to at least notice.

lognaturel · 2023-04-07T16:49:17Z

I'm satisfied that I understand this and it's low risk/high reward for the single-predicate case. I don't have a good understanding of the multiple predicate case and I'd like to spend a little more time tracing it. But that doesn't need to happen immediately. How about getting this into a Collect beta ASAP (after you've considered my suggested change to isIdempotent) as opt-out for now so we can easily compare before and after?

build.gradle

lognaturel · 2023-04-11T16:35:10Z

Failing:

    @Test
    public void calculatesSupportMultiplePredicates() throws Exception {
        Scenario scenario = Scenario.init("Some form", html(
            head(
                title("Some form"),
                model(
                    mainInstance(t("data id=\"some-form\"",
                        t("calc"),
                        t("calc2"),
                        t("input")
                    )),
                    instance("instance",
                        t("item",
                            t("name", "Bob Smith"),
                            t("yob", "1966"),
                            t("child",
                                t("name", "Sally Smith"),
                                t("yob", "1988")
                            ),
                            t("child",
                                t("name", "Kwame Smith"),
                                t("yob", "1990"))
                        ),
                        t("item",
                            t("name", "Hu Xao"),
                            t("yob", "1972"),
                            t("child",
                                t("name", "Foo Bar"),
                                t("yob", "1988")
                            ),
                            t("child",
                                t("name", "Foo2 Bar"),
                                t("yob", "2008")
                            )
                        ),
                        t("item",
                            t("name", "Baz Quux"),
                            t("yob", "1968"),
                            t("child",
                                t("name", "Baz2 Quux"),
                                t("yob", "1988")
                            ),
                            t("child",
                                t("name", "Baz3 Quux"),
                                t("yob", "1988")
                            )
                        )
                    ),
                    bind("/data/calc").type("string")
                        .calculate("count(instance('instance')/root/item[yob < 1970]/child[yob = 1988])"),
                    bind("/data/calc2").type("string")
                            .calculate("count(instance('instance')/root/item[yob < 1970]/child[yob = 1990])"),
                    bind("/data/input").type("string")
                )
            ),
            body(input("/data/input"))
        ));

        assertThat(scenario.answerOf("/data/calc").getValue(), equalTo(3));
        assertThat(scenario.answerOf("/data/calc2").getValue(), equalTo(1));
    }

src/test/java/org/javarosa/form/api/MultiplePredicateTest.java

src/main/java/org/javarosa/core/model/condition/EvaluationContext.java

lognaturel · 2023-04-12T17:18:23Z

There's an issue if the same predicate is used at different predicate indexes on the same ref. I was able to resolve this by adding the predicate index to the cache key. Failing test:

    @Test
    public void calculatesSupportMultiplePredicatesInOnePartOfPath() throws Exception {
        Scenario scenario = Scenario.init("Some form", html(
            head(
                title("Some form"),
                model(
                    mainInstance(t("data id=\"some-form\"",
                        t("calc"),
                        t("calc2"),
                        t("input")
                    )),
                    instance("instance",
                        t("item",
                            t("value", "A"),
                            t("count", "2"),
                            t("id", "A2")
                        ),
                        t("item",
                            t("value", "A"),
                            t("count", "3"),
                            t("id", "A3")
                        ),
                        t("item",
                            t("value", "B"),
                            t("count", "2"),
                            t("id", "B2")
                        )
                    ),
                    bind("/data/calc").type("string")
                        .calculate("instance('instance')/root/item[value = 'A'][count = /data/input]/id"),
                    bind("/data/calc2").type("string")
                        .calculate("count(instance('instance')/root/item[count = /data/input])"),
                    bind("/data/input").type("string")
                )
            ),
            body(input("/data/input"))
        ));

        scenario.answer("/data/input", "3");
        assertThat(scenario.answerOf("/data/calc").getValue(), equalTo("A3"));
        assertThat(scenario.answerOf("/data/calc2").getValue(), equalTo(1));

        scenario.answer("/data/input", "2");
        assertThat(scenario.answerOf("/data/calc").getValue(), equalTo("A2"));
        assertThat(scenario.answerOf("/data/calc2").getValue(), equalTo(2));

        scenario.answer("/data/input", "7");
        assertThat(scenario.answerOf("/data/calc"), nullValue());
        assertThat(scenario.answerOf("/data/calc2").getValue(), equalTo(0));
    }

lognaturel

I've identified a bug but merging anyway so it can get exercised more broadly in Collect.

lognaturel · 2023-04-12T18:14:53Z

Currently this caching is in place for expressions in the primary instance. Unlike secondary instances, the primary instance is mutable. Because this caching is only in place within a single evaluation cascade (all triggerables triggered by one trigger), I can't currently come up with a form that would cause a problem. If any trigger is included in multiple predicates, that trigger will go before all expressions with the predicate in the cascade.

seadowg added 13 commits April 3, 2023 14:31

Add ability to measure XPathEqExpr#eval calls

93bbf2d

Simplify EvaluationContext creation

c1d4eb1

Introduce interface for predicate caching

6bbcd94

Attach PredicateCache to EvaluationContext

25b6028

Fix multiple secondary instance case and add test for possible repeat…

9f0f960

…s regression

Make predicate caching optional

b94c562

Rename test

f02b27c

Add test to enforce caching

8b9d931

Spike out detecting functions in expressions instead of limiting to Eq

219ac9d

Allow string functions to be cached

6c0119e

Make sure function args are taken into account for idempotence

b7177a9

Revise predicate caching tests

80f8ff8

seadowg mentioned this pull request Apr 7, 2023

Introduce predicate caching as an opt-in setting getodk/collect#5546

Merged

3 tasks

Backfill test for multiple predicates

bd4f176

seadowg marked this pull request as ready for review April 7, 2023 12:35

seadowg requested a review from lognaturel April 7, 2023 12:35

seadowg commented Apr 7, 2023

View reviewed changes

lognaturel reviewed Apr 7, 2023

View reviewed changes

Replace isNotIdempotent with isIdempotent

504eddc

seadowg requested a review from lognaturel April 10, 2023 07:54

lognaturel reviewed Apr 11, 2023

View reviewed changes

build.gradle Outdated Show resolved Hide resolved

lognaturel reviewed Apr 11, 2023

View reviewed changes

src/test/java/org/javarosa/form/api/MultiplePredicateTest.java Outdated Show resolved Hide resolved

Bump version

434772c

seadowg force-pushed the repeated-expr branch from 5e15479 to 434772c Compare April 12, 2023 09:28

seadowg added 3 commits April 12, 2023 11:29

Fix multiple predicate case

e030db1

Fix different child names case

2944358

Remove need for extra param

6692608

seadowg requested a review from lognaturel April 12, 2023 10:09

lognaturel reviewed Apr 12, 2023

View reviewed changes

src/main/java/org/javarosa/core/model/condition/EvaluationContext.java Show resolved Hide resolved

lognaturel approved these changes Apr 12, 2023

View reviewed changes

lognaturel merged commit d7aa1d1 into getodk:master Apr 12, 2023

lognaturel mentioned this pull request Apr 12, 2023

Cascade-level filtered expression caching results in bad evaluations with multiple predicates #714

Closed

seadowg deleted the repeated-expr branch April 13, 2023 06:53

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cache repeated predicate evaluations during triggers #713

Cache repeated predicate evaluations during triggers #713

seadowg commented Apr 5, 2023 •

edited

Loading

seadowg Apr 7, 2023

lognaturel Apr 7, 2023

seadowg Apr 10, 2023 •

edited

Loading

lognaturel Apr 11, 2023

lognaturel commented Apr 7, 2023

lognaturel commented Apr 11, 2023 •

edited

Loading

lognaturel commented Apr 12, 2023 •

edited

Loading

lognaturel left a comment

lognaturel commented Apr 12, 2023

Cache repeated predicate evaluations during triggers #713

Cache repeated predicate evaluations during triggers #713

Conversation

seadowg commented Apr 5, 2023 • edited Loading

seadowg Apr 7, 2023

Choose a reason for hiding this comment

lognaturel Apr 7, 2023

Choose a reason for hiding this comment

seadowg Apr 10, 2023 • edited Loading

Choose a reason for hiding this comment

lognaturel Apr 11, 2023

Choose a reason for hiding this comment

lognaturel commented Apr 7, 2023

lognaturel commented Apr 11, 2023 • edited Loading

lognaturel commented Apr 12, 2023 • edited Loading

lognaturel left a comment

Choose a reason for hiding this comment

lognaturel commented Apr 12, 2023

seadowg commented Apr 5, 2023 •

edited

Loading

seadowg Apr 10, 2023 •

edited

Loading

lognaturel commented Apr 11, 2023 •

edited

Loading

lognaturel commented Apr 12, 2023 •

edited

Loading