Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize load times and memory footprints of form entry with datasets #5623

Open
Tracked by #5991
seadowg opened this issue Jun 2, 2023 · 5 comments
Open
Tracked by #5991
Milestone

Comments

@seadowg
Copy link
Member

seadowg commented Jun 2, 2023

Blocked by #5694

Optimize load times

When a form is loaded in Collect calculates are run as part of "initialization" (when FormDef#initialize is called). This means that the complexity of calculate expressions affects the for load time. Recent changes to JavaRosa have introduced lazy indexing and caching to some calculate expressions on secondary instances which lets them run faster after this initialization step, but all of these solutions are "in memory" so the speed-up doesn't persist between form entry sessions.

We want to make changes to Collect that result in faster subsequent "loads" (probably really initialization) for forms that contain the following restricted set of calculate expressions on external secondary instances that compare nodes in the instance to something else of the following form:

 <relative (to the nodes being filtered) path expression> <operator> <"absolute" expression>

The "relative" part must be a relative path expression, and the "absolute" part can be an absolute path expression, a string literal or a numeric literal.

To measure this, we should use timings on a known slow device (like @lognaturel's trusty test phone). We're aiming for speed-ups (for subsequent loads) of around the order of an O(n) to O(1) complexity here.

Decrease memory footprint

Currently, external secondary instances are loaded into memory with the rest of the form. This is potentially problematic for larger datasets (we often see them exceed hundreds of thousands of records). Along with speeding up load times, we want to reduce the memory footprint of form entry for forms like this (on subsequent loads). This again only needs to apply to forms with our above restricted set of calculate expressions.

To measure this, we should use Android Studio's profiler with any emulator and monitor the total memory footprint of a loaded form with a larger dataset (100,000+ records). We should measure the same form with and without the dataset so that we have a rough idea of the size of the dataset in memory. We are aiming to see the total memory footprint drop so that the dataset's footprint is virtually gone (in subsequent loads).

@seadowg seadowg changed the title Loading a form with an external secondary instance should be faster after the first time Optimize load times and memory footprints of form entry with datasets Jun 2, 2023
@seadowg
Copy link
Member Author

seadowg commented Jun 2, 2023

@lognaturel as described in the issue here, I've realized that our recent optimization work didn't touch external secondary instances when used as itemsets. My thinking is that most forms of this kind would include a select on the instance that's later used to calculate things in the form (the latter of which we're optimizing and planning to improve on here). I still think this work is worth doing, but it feels like we need to apply the "predicate filter chain" (or something similar) to itemsets in JavaRosa (and then expand on it in Collect) to really get the speed-ups and memory drops we want?

@seadowg seadowg added this to the v2023.3 milestone Jun 2, 2023
@seadowg
Copy link
Member Author

seadowg commented Jun 2, 2023

as described in the issue here, I've realized that our recent optimization work didn't touch external secondary instances when used as itemsets.

First step to clearing "needs discussion" here should be proving/disproving this statement.

@seadowg
Copy link
Member Author

seadowg commented Jun 16, 2023

Digging into the above question, it seems that we do have some level of caching applied to choice filters on selects. It is implemented in a completely different way though, the new caching only affects evaluations that go through TriggerableDag where as choice lists are computed on demand, and the evaluation is mostly owned by ItemsetBinding (it doesn't touch TriggerableDag). To illustrate this, here's one passing and one failing test:

Passing:

Scenario scenario = Scenario.init("Some form", html(
    head(
        title("Some form"),
        model(
            mainInstance(t("data id=\"some-form\"",
                t("choice"),
                t("select")
            )),
            instance("instance",
                item("a", "A"),
                item("b", "B")
            ),
            bind("/data/choice").type("string"),
            bind("/data/select").type("string")
        )
    ),
    body(
        input("/data/choice"),
        select1Dynamic("/data/select", "instance('instance')/root/item[value = /data/choice]")
    )
));

int evaluations = Measure.withMeasure(asList("PredicateEvaluation", "IndexEvaluation"), () -> {
    scenario.answer("/data/choice", "a");

    scenario.choicesOf("/data/select");
    scenario.choicesOf("/data/select");
});

// Check that we do less than (size of secondary instance) * (number of choice lookups)
assertThat(evaluations, lessThan(4));

Failing:

Scenario scenario = Scenario.init("Some form", html(
    head(
        title("Some form"),
        model(
            mainInstance(t("data id=\"some-form\"",
                t("choice"),
                t("select")
            )),
            instance("instance",
                item("a", "A"),
                item("b", "B")
            ),
            bind("/data/choice").type("string"),
            bind("/data/select").type("string")
        )
    ),
    body(
        input("/data/choice"),
        select1Dynamic("/data/select", "instance('instance')/root/item[value = /data/choice]")
    )
));

int evaluations = Measure.withMeasure(asList("PredicateEvaluation", "IndexEvaluation"), () -> {
    scenario.answer("/data/choice", "a");
    scenario.choicesOf("/data/select");

    scenario.answer("/data/choice", "b");
    scenario.choicesOf("/data/select");
});

// Check that we do less than (size of secondary instance) * (number of choice lookups)
assertThat(evaluations, lessThan(4));

The first test passes because we cache choice list evaluations so that repeated look-ups (with the exact same literal expression) don't need to do anything. This doesn't use our predicate filter concept, so there's currently no way for a client (like Collect) to hook into the choice list evaluation like there is for all other expressions. This isn't the end of the world, but it would make our goal of reducing the memory footprint moot as we'd always load the full instance into memory to calculate choices whether we optimize how other expressions work or not.

I think the next step is to spike on replacing the current caching implementation for choice lists with the same PredicateFilter chain that the TriggerableDag uses with an aim of getting the failing test also passing. That would mean we could update FormEntryController#addPredicateFilter to modify the filter chain used on choice list evaluations as well (enabling us to add a PredicateFilter that reduces the memory footprint).

@seadowg
Copy link
Member Author

seadowg commented Jun 16, 2023

It does look like we can use our new PredicateFilter approach to get the second test passing (specifically using IndexPredicateFilter) without having to remove the existing caching. I just spiked this out and manually added the filter to the EvaluationContext that ItemsetBinding uses, but I'm pretty sure we'd be able to rework it so that TriggerableDag and ItemsetBinding share a chain of PredicateFilter instances that can be customized externally with FormEntryController#addPredicateFilter.

Now that's worked out, I've run into another issue: if we want to reduce the memory footprint as detailed above (so that sessions after the first only use the memory needed to load the same form without the external instance), we'll run into trouble due to the choice list caching in ItemsetBinding as it will keep the last evaluated choice list (for each question as far as I understand) in memory. For a select1 backed by an instance (without filtering), this would mean the whole instance would be retained in memory by ItemsetBinding (in cachedFilteredChoiceList) regardless of what we do while evaluating the choice list. I think our solution to this issue can get around this by reworking the existing ItemsetBinding caching implementation so that it's a PredicateFilter that's further down the chain than a theoretical one Collect adds to avoid loading anything into memory (discussed briefly in "Notes").

Regardless of the complexity involved in reworking the existing ItemsetBinding caching, it feels to me like the two goals ("Optimize load times" and "Decrease memory footprint") are feasible. Let me know if you agree/disagree @lognaturel.

@seadowg
Copy link
Member Author

seadowg commented Jul 26, 2023

think the next step is to spike on replacing the current caching implementation for choice lists with the same PredicateFilter chain that the TriggerableDag uses with an aim of getting the failing test also passing. That would mean we could update FormEntryController#addPredicateFilter to modify the filter chain used on choice list evaluations as well (enabling us to add a PredicateFilter that reduces the memory footprint).

I've pulled this out to a new issue that's not blocked by the above discussion.

@seadowg seadowg modified the milestones: v2023.3, v2023.4 Aug 9, 2023
@seadowg seadowg modified the milestones: v2023.4, v2024.1 Nov 2, 2023
@seadowg seadowg removed the blocked label Jan 18, 2024
@seadowg seadowg modified the milestones: v2024.2, Later Jan 18, 2024
@seadowg seadowg mentioned this issue Feb 16, 2024
6 tasks
@seadowg seadowg mentioned this issue Mar 12, 2024
13 tasks
@seadowg seadowg modified the milestones: Later, v2024.3 Mar 21, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: not ready
Development

No branches or pull requests

1 participant