Optimize load times and memory footprints of form entry with datasets #5623

seadowg · 2023-06-02T08:52:47Z

~~Blocked by #5694~~

Optimize load times

When a form is loaded in Collect calculates are run as part of "initialization" (when FormDef#initialize is called). This means that the complexity of calculate expressions affects the for load time. Recent changes to JavaRosa have introduced lazy indexing and caching to some calculate expressions on secondary instances which lets them run faster after this initialization step, but all of these solutions are "in memory" so the speed-up doesn't persist between form entry sessions.

We want to make changes to Collect that result in faster subsequent "loads" (probably really initialization) for forms that contain the following restricted set of calculate expressions on external secondary instances that compare nodes in the instance to something else of the following form:

 <relative (to the nodes being filtered) path expression> <operator> <"absolute" expression>

The "relative" part must be a relative path expression, and the "absolute" part can be an absolute path expression, a string literal or a numeric literal.

To measure this, we should use timings on a known slow device (like @lognaturel's trusty test phone). We're aiming for speed-ups (for subsequent loads) of around the order of an O(n) to O(1) complexity here.

Decrease memory footprint

Currently, external secondary instances are loaded into memory with the rest of the form. This is potentially problematic for larger datasets (we often see them exceed hundreds of thousands of records). Along with speeding up load times, we want to reduce the memory footprint of form entry for forms like this (on subsequent loads). This again only needs to apply to forms with our above restricted set of calculate expressions.

To measure this, we should use Android Studio's profiler with any emulator and monitor the total memory footprint of a loaded form with a larger dataset (100,000+ records). We should measure the same form with and without the dataset so that we have a rough idea of the size of the dataset in memory. We are aiming to see the total memory footprint drop so that the dataset's footprint is virtually gone (in subsequent loads).

The text was updated successfully, but these errors were encountered:

seadowg · 2023-06-02T09:35:56Z

@lognaturel as described in the issue here, I've realized that our recent optimization work didn't touch external secondary instances when used as itemsets. My thinking is that most forms of this kind would include a select on the instance that's later used to calculate things in the form (the latter of which we're optimizing and planning to improve on here). I still think this work is worth doing, but it feels like we need to apply the "predicate filter chain" (or something similar) to itemsets in JavaRosa (and then expand on it in Collect) to really get the speed-ups and memory drops we want?

seadowg · 2023-06-02T13:05:32Z

as described in the issue here, I've realized that our recent optimization work didn't touch external secondary instances when used as itemsets.

First step to clearing "needs discussion" here should be proving/disproving this statement.

seadowg · 2023-06-16T13:31:52Z

Digging into the above question, it seems that we do have some level of caching applied to choice filters on selects. It is implemented in a completely different way though, the new caching only affects evaluations that go through TriggerableDag where as choice lists are computed on demand, and the evaluation is mostly owned by ItemsetBinding (it doesn't touch TriggerableDag). To illustrate this, here's one passing and one failing test:

Passing:

Scenario scenario = Scenario.init("Some form", html(
    head(
        title("Some form"),
        model(
            mainInstance(t("data id=\"some-form\"",
                t("choice"),
                t("select")
            )),
            instance("instance",
                item("a", "A"),
                item("b", "B")
            ),
            bind("/data/choice").type("string"),
            bind("/data/select").type("string")
        )
    ),
    body(
        input("/data/choice"),
        select1Dynamic("/data/select", "instance('instance')/root/item[value = /data/choice]")
    )
));

int evaluations = Measure.withMeasure(asList("PredicateEvaluation", "IndexEvaluation"), () -> {
    scenario.answer("/data/choice", "a");

    scenario.choicesOf("/data/select");
    scenario.choicesOf("/data/select");
});

// Check that we do less than (size of secondary instance) * (number of choice lookups)
assertThat(evaluations, lessThan(4));

Failing:

Scenario scenario = Scenario.init("Some form", html(
    head(
        title("Some form"),
        model(
            mainInstance(t("data id=\"some-form\"",
                t("choice"),
                t("select")
            )),
            instance("instance",
                item("a", "A"),
                item("b", "B")
            ),
            bind("/data/choice").type("string"),
            bind("/data/select").type("string")
        )
    ),
    body(
        input("/data/choice"),
        select1Dynamic("/data/select", "instance('instance')/root/item[value = /data/choice]")
    )
));

int evaluations = Measure.withMeasure(asList("PredicateEvaluation", "IndexEvaluation"), () -> {
    scenario.answer("/data/choice", "a");
    scenario.choicesOf("/data/select");

    scenario.answer("/data/choice", "b");
    scenario.choicesOf("/data/select");
});

// Check that we do less than (size of secondary instance) * (number of choice lookups)
assertThat(evaluations, lessThan(4));

The first test passes because we cache choice list evaluations so that repeated look-ups (with the exact same literal expression) don't need to do anything. This doesn't use our predicate filter concept, so there's currently no way for a client (like Collect) to hook into the choice list evaluation like there is for all other expressions. This isn't the end of the world, but it would make our goal of reducing the memory footprint moot as we'd always load the full instance into memory to calculate choices whether we optimize how other expressions work or not.

I think the next step is to spike on replacing the current caching implementation for choice lists with the same PredicateFilter chain that the TriggerableDag uses with an aim of getting the failing test also passing. That would mean we could update FormEntryController#addPredicateFilter to modify the filter chain used on choice list evaluations as well (enabling us to add a PredicateFilter that reduces the memory footprint).

seadowg · 2023-06-16T15:41:02Z

It does look like we can use our new PredicateFilter approach to get the second test passing (specifically using IndexPredicateFilter) without having to remove the existing caching. I just spiked this out and manually added the filter to the EvaluationContext that ItemsetBinding uses, but I'm pretty sure we'd be able to rework it so that TriggerableDag and ItemsetBinding share a chain of PredicateFilter instances that can be customized externally with FormEntryController#addPredicateFilter.

Now that's worked out, I've run into another issue: if we want to reduce the memory footprint as detailed above (so that sessions after the first only use the memory needed to load the same form without the external instance), we'll run into trouble due to the choice list caching in ItemsetBinding as it will keep the last evaluated choice list (for each question as far as I understand) in memory. For a select1 backed by an instance (without filtering), this would mean the whole instance would be retained in memory by ItemsetBinding (in cachedFilteredChoiceList) regardless of what we do while evaluating the choice list. I think our solution to this issue can get around this by reworking the existing ItemsetBinding caching implementation so that it's a PredicateFilter that's further down the chain than a theoretical one Collect adds to avoid loading anything into memory (discussed briefly in "Notes").

Regardless of the complexity involved in reworking the existing ItemsetBinding caching, it feels to me like the two goals ("Optimize load times" and "Decrease memory footprint") are feasible. Let me know if you agree/disagree @lognaturel.

seadowg · 2023-07-26T14:21:43Z

think the next step is to spike on replacing the current caching implementation for choice lists with the same PredicateFilter chain that the TriggerableDag uses with an aim of getting the failing test also passing. That would mean we could update FormEntryController#addPredicateFilter to modify the filter chain used on choice list evaluations as well (enabling us to add a PredicateFilter that reduces the memory footprint).

I've pulled this out to a new issue that's not blocked by the above discussion.

seadowg changed the title ~~Loading a form with an external secondary instance should be faster after the first time~~ Optimize load times and memory footprints of form entry with datasets Jun 2, 2023

seadowg added the needs discussion label Jun 2, 2023

seadowg added this to the v2023.3 milestone Jun 2, 2023

seadowg added the blocked label Jul 26, 2023

seadowg removed the needs discussion label Jul 27, 2023

seadowg modified the milestones: v2023.3, v2023.4 Aug 9, 2023

seadowg modified the milestones: v2023.4, v2024.1 Nov 2, 2023

seadowg removed the blocked label Jan 18, 2024

seadowg modified the milestones: v2024.2, Later Jan 18, 2024

seadowg mentioned this issue Feb 16, 2024

Local entity create #5923

Closed

6 tasks

seadowg mentioned this issue Mar 12, 2024

Entity follow up work #5991

Open

13 tasks

seadowg modified the milestones: Later, v2024.3 Mar 21, 2024

seadowg mentioned this issue Apr 3, 2024

Make sure that entity lists are consistent between follow up forms #6029

Closed

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize load times and memory footprints of form entry with datasets #5623

Optimize load times and memory footprints of form entry with datasets #5623

seadowg commented Jun 2, 2023 •

edited

seadowg commented Jun 2, 2023 •

edited

seadowg commented Jun 2, 2023

seadowg commented Jun 16, 2023

seadowg commented Jun 16, 2023

seadowg commented Jul 26, 2023 •

edited

Optimize load times and memory footprints of form entry with datasets #5623

Optimize load times and memory footprints of form entry with datasets #5623

Comments

seadowg commented Jun 2, 2023 • edited