Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cache repeated predicate evaluations during triggers #713

Merged
merged 19 commits into from
Apr 12, 2023

Conversation

seadowg
Copy link
Member

@seadowg seadowg commented Apr 5, 2023

Work towards #689

These changes improve the performance of calling FormDef#setValue for cases when triggerables have repeated predicates. For instance, the example form in the issue would have to evaluate 500k XPath expressions when a question is answered to compute 10 calculates that share the same predicate (but just access different children of the resulting nodes). With the changes in this PR, this drops 50k evaluations as we cache the initial predicate evaluation and then reuse it. Like in the example form, this will be beneficial to any form that has calculates that look up items on secondary instances.

The following predicate expressions are not cached (due to the complexity involved in supporting them):

* Non eq expressions
* Eq expressions where either side is a function call

It's likely that instead of this set of limitations, we really just want to not cache expressions that contain functions that's return value is not idempotent with respect to the nodes the predicate is filtering or the node that the expression is being evaluated from. This would be functions like now() and random(). We'll need some example forms and test scenarios to tease that out further.

EDIT: We've decided to allow any kind of expression other than those containing non-string functions. We can probably add support for those down the line, but effort we'll need to be put in to test them out.

I've also added the ability to disable caching from FormEntryController. This will let clients (like Collect) make this an opt-in/out feature while we get more confident with it.

TODO:

  • backfill test with 2 predicates
  • is there any useful predicate in a repeat that can be written without a function?
    • some things with mod for example to alternate values between instances. But we don't think these will cause problems because each instance has its own evaluation context.
  • Allow string functions
  • Try on slow devices
    • Moto G5 from 23s to 3s on target form 🚀

@seadowg seadowg marked this pull request as ready for review April 7, 2023 12:35
@seadowg seadowg requested a review from lognaturel April 7, 2023 12:35
@@ -345,4 +345,8 @@ private static FormIndex getRepeatGroupIndex(FormIndex index, FormDef formDef) {
}
}
}

public void disablePredicateCaching() {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will let Collect (or any other client) present this as an opt-in feature.

/**
* Returns true if this expression is not idempotent with respect to the current state of the form.
*/
public abstract boolean isNotIdempotent();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I personally find methods that are inherently negated difficult to reason about. Any reason not to make this isIdempotent? It would avoid the awkward double negation at https://github.com/getodk/javarosa/pull/713/files#diff-ff9982ce5c3c5b5119956e3bd9a20d8934090f9e6be0affad5c1598d66f72072R43

Copy link
Member Author

@seadowg seadowg Apr 10, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Totally agree with this. One thing I realized looking at this again is that isNotIdempotent() is often "faster" to computer for non-idempotent functions because we don't need to traverse the whole expression - we just need to find something that isn't idempotent. I'll make the change because I don't think that's a big enough deal to preserve what definitely feels like a worse API, but I wanted to make sure I'd brought that up.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I hadn’t considered that. Agreed it’s not likely to be significant in this context but really good to at least notice.

@lognaturel
Copy link
Member

I'm satisfied that I understand this and it's low risk/high reward for the single-predicate case. I don't have a good understanding of the multiple predicate case and I'd like to spend a little more time tracing it. But that doesn't need to happen immediately. How about getting this into a Collect beta ASAP (after you've considered my suggested change to isIdempotent) as opt-out for now so we can easily compare before and after?

@seadowg seadowg requested a review from lognaturel April 10, 2023 07:54
build.gradle Outdated Show resolved Hide resolved
@lognaturel
Copy link
Member

lognaturel commented Apr 11, 2023

Failing:

    @Test
    public void calculatesSupportMultiplePredicates() throws Exception {
        Scenario scenario = Scenario.init("Some form", html(
            head(
                title("Some form"),
                model(
                    mainInstance(t("data id=\"some-form\"",
                        t("calc"),
                        t("calc2"),
                        t("input")
                    )),
                    instance("instance",
                        t("item",
                            t("name", "Bob Smith"),
                            t("yob", "1966"),
                            t("child",
                                t("name", "Sally Smith"),
                                t("yob", "1988")
                            ),
                            t("child",
                                t("name", "Kwame Smith"),
                                t("yob", "1990"))
                        ),
                        t("item",
                            t("name", "Hu Xao"),
                            t("yob", "1972"),
                            t("child",
                                t("name", "Foo Bar"),
                                t("yob", "1988")
                            ),
                            t("child",
                                t("name", "Foo2 Bar"),
                                t("yob", "2008")
                            )
                        ),
                        t("item",
                            t("name", "Baz Quux"),
                            t("yob", "1968"),
                            t("child",
                                t("name", "Baz2 Quux"),
                                t("yob", "1988")
                            ),
                            t("child",
                                t("name", "Baz3 Quux"),
                                t("yob", "1988")
                            )
                        )
                    ),
                    bind("/data/calc").type("string")
                        .calculate("count(instance('instance')/root/item[yob < 1970]/child[yob = 1988])"),
                    bind("/data/calc2").type("string")
                            .calculate("count(instance('instance')/root/item[yob < 1970]/child[yob = 1990])"),
                    bind("/data/input").type("string")
                )
            ),
            body(input("/data/input"))
        ));

        assertThat(scenario.answerOf("/data/calc").getValue(), equalTo(3));
        assertThat(scenario.answerOf("/data/calc2").getValue(), equalTo(1));
    }

@seadowg seadowg requested a review from lognaturel April 12, 2023 10:09
@lognaturel
Copy link
Member

lognaturel commented Apr 12, 2023

There's an issue if the same predicate is used at different predicate indexes on the same ref. I was able to resolve this by adding the predicate index to the cache key. Failing test:

    @Test
    public void calculatesSupportMultiplePredicatesInOnePartOfPath() throws Exception {
        Scenario scenario = Scenario.init("Some form", html(
            head(
                title("Some form"),
                model(
                    mainInstance(t("data id=\"some-form\"",
                        t("calc"),
                        t("calc2"),
                        t("input")
                    )),
                    instance("instance",
                        t("item",
                            t("value", "A"),
                            t("count", "2"),
                            t("id", "A2")
                        ),
                        t("item",
                            t("value", "A"),
                            t("count", "3"),
                            t("id", "A3")
                        ),
                        t("item",
                            t("value", "B"),
                            t("count", "2"),
                            t("id", "B2")
                        )
                    ),
                    bind("/data/calc").type("string")
                        .calculate("instance('instance')/root/item[value = 'A'][count = /data/input]/id"),
                    bind("/data/calc2").type("string")
                        .calculate("count(instance('instance')/root/item[count = /data/input])"),
                    bind("/data/input").type("string")
                )
            ),
            body(input("/data/input"))
        ));

        scenario.answer("/data/input", "3");
        assertThat(scenario.answerOf("/data/calc").getValue(), equalTo("A3"));
        assertThat(scenario.answerOf("/data/calc2").getValue(), equalTo(1));

        scenario.answer("/data/input", "2");
        assertThat(scenario.answerOf("/data/calc").getValue(), equalTo("A2"));
        assertThat(scenario.answerOf("/data/calc2").getValue(), equalTo(2));

        scenario.answer("/data/input", "7");
        assertThat(scenario.answerOf("/data/calc"), nullValue());
        assertThat(scenario.answerOf("/data/calc2").getValue(), equalTo(0));
    }

Copy link
Member

@lognaturel lognaturel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've identified a bug but merging anyway so it can get exercised more broadly in Collect.

@lognaturel
Copy link
Member

Currently this caching is in place for expressions in the primary instance. Unlike secondary instances, the primary instance is mutable. Because this caching is only in place within a single evaluation cascade (all triggerables triggered by one trigger), I can't currently come up with a form that would cause a problem. If any trigger is included in multiple predicates, that trigger will go before all expressions with the predicate in the cascade.

@seadowg seadowg deleted the repeated-expr branch April 13, 2023 06:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants