-
Notifications
You must be signed in to change notification settings - Fork 105
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cache repeated predicate evaluations during triggers #713
Conversation
Only do this for eq predicates and avoid expressions with functions. Allowing this for any kind of expression causes tests to fail. The caching is also limited to triggerable evalutions so that it doesn't kick in other cases like nodeset calculations for selects. Caching during these cases also causes problems (probably because this cache isn't being cleared in those cases).
@@ -345,4 +345,8 @@ private static FormIndex getRepeatGroupIndex(FormIndex index, FormDef formDef) { | |||
} | |||
} | |||
} | |||
|
|||
public void disablePredicateCaching() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will let Collect (or any other client) present this as an opt-in feature.
/** | ||
* Returns true if this expression is not idempotent with respect to the current state of the form. | ||
*/ | ||
public abstract boolean isNotIdempotent(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I personally find methods that are inherently negated difficult to reason about. Any reason not to make this isIdempotent
? It would avoid the awkward double negation at https://github.com/getodk/javarosa/pull/713/files#diff-ff9982ce5c3c5b5119956e3bd9a20d8934090f9e6be0affad5c1598d66f72072R43
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Totally agree with this. One thing I realized looking at this again is that isNotIdempotent()
is often "faster" to computer for non-idempotent functions because we don't need to traverse the whole expression - we just need to find something that isn't idempotent. I'll make the change because I don't think that's a big enough deal to preserve what definitely feels like a worse API, but I wanted to make sure I'd brought that up.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, I hadn’t considered that. Agreed it’s not likely to be significant in this context but really good to at least notice.
I'm satisfied that I understand this and it's low risk/high reward for the single-predicate case. I don't have a good understanding of the multiple predicate case and I'd like to spend a little more time tracing it. But that doesn't need to happen immediately. How about getting this into a Collect beta ASAP (after you've considered my suggested change to |
Failing: @Test
public void calculatesSupportMultiplePredicates() throws Exception {
Scenario scenario = Scenario.init("Some form", html(
head(
title("Some form"),
model(
mainInstance(t("data id=\"some-form\"",
t("calc"),
t("calc2"),
t("input")
)),
instance("instance",
t("item",
t("name", "Bob Smith"),
t("yob", "1966"),
t("child",
t("name", "Sally Smith"),
t("yob", "1988")
),
t("child",
t("name", "Kwame Smith"),
t("yob", "1990"))
),
t("item",
t("name", "Hu Xao"),
t("yob", "1972"),
t("child",
t("name", "Foo Bar"),
t("yob", "1988")
),
t("child",
t("name", "Foo2 Bar"),
t("yob", "2008")
)
),
t("item",
t("name", "Baz Quux"),
t("yob", "1968"),
t("child",
t("name", "Baz2 Quux"),
t("yob", "1988")
),
t("child",
t("name", "Baz3 Quux"),
t("yob", "1988")
)
)
),
bind("/data/calc").type("string")
.calculate("count(instance('instance')/root/item[yob < 1970]/child[yob = 1988])"),
bind("/data/calc2").type("string")
.calculate("count(instance('instance')/root/item[yob < 1970]/child[yob = 1990])"),
bind("/data/input").type("string")
)
),
body(input("/data/input"))
));
assertThat(scenario.answerOf("/data/calc").getValue(), equalTo(3));
assertThat(scenario.answerOf("/data/calc2").getValue(), equalTo(1));
} |
There's an issue if the same predicate is used at different predicate indexes on the same ref. I was able to resolve this by adding the predicate index to the cache key. Failing test: @Test
public void calculatesSupportMultiplePredicatesInOnePartOfPath() throws Exception {
Scenario scenario = Scenario.init("Some form", html(
head(
title("Some form"),
model(
mainInstance(t("data id=\"some-form\"",
t("calc"),
t("calc2"),
t("input")
)),
instance("instance",
t("item",
t("value", "A"),
t("count", "2"),
t("id", "A2")
),
t("item",
t("value", "A"),
t("count", "3"),
t("id", "A3")
),
t("item",
t("value", "B"),
t("count", "2"),
t("id", "B2")
)
),
bind("/data/calc").type("string")
.calculate("instance('instance')/root/item[value = 'A'][count = /data/input]/id"),
bind("/data/calc2").type("string")
.calculate("count(instance('instance')/root/item[count = /data/input])"),
bind("/data/input").type("string")
)
),
body(input("/data/input"))
));
scenario.answer("/data/input", "3");
assertThat(scenario.answerOf("/data/calc").getValue(), equalTo("A3"));
assertThat(scenario.answerOf("/data/calc2").getValue(), equalTo(1));
scenario.answer("/data/input", "2");
assertThat(scenario.answerOf("/data/calc").getValue(), equalTo("A2"));
assertThat(scenario.answerOf("/data/calc2").getValue(), equalTo(2));
scenario.answer("/data/input", "7");
assertThat(scenario.answerOf("/data/calc"), nullValue());
assertThat(scenario.answerOf("/data/calc2").getValue(), equalTo(0));
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've identified a bug but merging anyway so it can get exercised more broadly in Collect.
Currently this caching is in place for expressions in the primary instance. Unlike secondary instances, the primary instance is mutable. Because this caching is only in place within a single evaluation cascade (all triggerables triggered by one trigger), I can't currently come up with a form that would cause a problem. If any trigger is included in multiple predicates, that trigger will go before all expressions with the predicate in the cascade. |
Work towards #689
These changes improve the performance of calling
FormDef#setValue
for cases when triggerables have repeated predicates. For instance, the example form in the issue would have to evaluate 500k XPath expressions when a question is answered to compute 10calculate
s that share the same predicate (but just access different children of the resulting nodes). With the changes in this PR, this drops 50k evaluations as we cache the initial predicate evaluation and then reuse it. Like in the example form, this will be beneficial to any form that has calculates that look up items on secondary instances.The following predicate expressions are not cached (due to the complexity involved in supporting them):* Non eq expressions* Eq expressions where either side is a function callIt's likely that instead of this set of limitations, we really just want to not cache expressions that contain functions that's return value is not idempotent with respect to the nodes the predicate is filtering or the node that the expression is being evaluated from. This would be functions likenow()
andrandom()
. We'll need some example forms and test scenarios to tease that out further.EDIT: We've decided to allow any kind of expression other than those containing non-string functions. We can probably add support for those down the line, but effort we'll need to be put in to test them out.
I've also added the ability to disable caching from
FormEntryController
. This will let clients (like Collect) make this an opt-in/out feature while we get more confident with it.TODO:
mod
for example to alternate values between instances. But we don't think these will cause problems because each instance has its own evaluation context.