Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize form download with entities #6372

Merged
merged 8 commits into from
Aug 29, 2024

Conversation

seadowg
Copy link
Member

@seadowg seadowg commented Aug 23, 2024

Closes #6344

This drops the benchmark target for first time entity form download to 25 seconds. As part of this work, I also fixed a bug related to special characters in property names.

Why is this the best possible solution? Were any other approaches considered?

I just needed to make a few tweaks to get down to 25 seconds. I chose this number due to the results of the new SearchBenchmarkTest which measure the performance of forms using search(). We'd expect the initial form load for search() forms and the initial download for entities forms to be roughly the same as they both insert the same number of rows into a SQLite DB.

How does this change affect users? Describe intentional changes to behavior and behavior that could have accidentally been affected by code changes. In other words, what are the regression risks?

This again changes the entity DB structure, so the app should be uninstalled before testing. The main changes here are all related to entity form download so that's the core area to focus on when testing.

Before submitting this PR, please make sure you have:

  • added or modified tests for any new or changed behavior
  • run ./gradlew connectedAndroidTest (or ./gradlew testLab) and confirmed all checks still pass
  • added a comment above any new strings describing it for translators
  • added any new strings with date formatting to DateFormatsTest
  • verified that any code or assets from external sources are properly credited in comments and/or in the about file.
  • verified that any new UI elements use theme colors. UI Components Style guidelines

@seadowg seadowg marked this pull request as ready for review August 26, 2024 14:32
@seadowg seadowg mentioned this pull request Aug 26, 2024
6 tasks
@lognaturel
Copy link
Member

lognaturel commented Aug 27, 2024

A bit of a random thought -- could we put the form defs in the okhttp cache instead of having to set a real server URL and actually download a file?

I think I've mentioned this before but I suspect that the search/pulldata import is artificially slow because of the UI feedback it gives every 100 rows.

I tried to run the search() benchmark but always get an exception from it.clickOnForm("100k Entities Filter search()") saying that form can't be found. So weird, it's there!

@seadowg
Copy link
Member Author

seadowg commented Aug 27, 2024

I think I've mentioned this before but I suspect that the search/pulldata import is artificially slow because of the UI feedback it gives every 100 rows.

Interesting! The progress reporting happens on another thread, so I'd be surprised if there's any major impact there. I just experimented with removing progress reporting, and it looks like it gains me about a second in SearchBenchmarkTest (from 17 to 16), but that's definitely within the "margin of error" for the fidelity we're looking at here (we're rounding down to the nearest second in these measurements).

I tried to run the search() benchmark but always get an exception from it.clickOnForm("100k Entities Filter search()") saying that form can't be found. So weird, it's there!

I'm starting to think your device is cursed! Let's have a look at it together later as I'd really like to get numbers from it.

@lognaturel
Copy link
Member

The progress reporting happens on another thread

Oh good, yes, I probably should have just looked at the implementation!

I manually timed the search() form on A13 because we couldn't figure out why the test isn't working:
First open: 45s
Subsequent open: 2s
Filter: 3s

This branch's EntitiesBenchmarkTest on A13:
Downloading form with http cache: 74s
Downloading form second time with http cache: 175s
Loading form first time: 3s
Loading form second time: 2s

master's EntitiesBenchmarkTest on A13:
Downloading form with http cache: 73s
Downloading form second time with http cache: 179s
Loading form first time: 3s
Loading form second time: 2s
Filtering select: 5s

I previously had several passes on master so not sure what has changed. It's slow because of excessive garbage collection so maybe Collect is being allocated less memory because of other things going on with the device.

Some possible next steps:

  • Use the 100k narrow in benchmarks instead of this wide one. It's more realistic and likely means we can have reasonable targets for the A13/phones with 3Gb of RAM.
  • Say we target devices with 4Gb of RAM for 100k Entities+

@lognaturel
Copy link
Member

lognaturel commented Aug 27, 2024

I think that with a device that is as RAM constrained as the A13, the overhead of the testing infrastructure is too much. One approach we could take is to do informal manual timing for it. If I run Collect and do the first time processing manually on this branch, it can take as little as 23 seconds. I got several runs at ~23s but now I'm back to being consistently at ~90s. This is so frustrating! It also takes that long with the narrow file.

I did some brief profiling and there's so much String instantiation. Maybe there's some way to reduce that somehow? Could there be cases where new Strings are built but references could be shared? I see in particular that creating all the branchId uuids and converting them to Strings takes up a lot of memory. It feels too bad because most Entities probably won't even ever need that uuid. Probably not worth getting distracted from the more important problems of duplicate downloads.

Let's focus on some of the 4Gb devices we have for now. Instead of having me do the benchmark check with the A13 at release time, would it be reasonable for @getodk/testers do it as part of regression testing with one of their 4Gb devices for now?

@seadowg
Copy link
Member Author

seadowg commented Aug 28, 2024

Let's focus on some of the 4Gb devices we have for now. Instead of having me do the benchmark check with the A13 at release time, would it be reasonable for @getodk/testers do it as part of regression testing with one of their 4Gb devices for now?

Yeah, it's feeling like the A13 is a bit of a nightmare to run tests on. We should give it a go manually with a self-signed released build at some point, as it's likely that debuggable builds and the extra overhead of the test runner aren't helping. I'll coordinate a target phone with @getodk/testers.

Use the 100k narrow in benchmarks instead of this wide one. It's more realistic and likely means we can have reasonable targets for the A13/phones with 3Gb of RAM.

I'm not sure that I agree. Looking at the entity lists, "entities 100k" only has 7 properties which does not feel unrealistic to me whereas the "narrow" only has two. It's obviously hard to make a call about what's going to be "normal" at the moment, but basing our performance on entities with just two properties feels risky to me. I'm thinking in line with your suggestion above, we just focus on wider 4GB performance for now so we can say something like "we suggest using a device with 4GB or more of RAM for entity lists of 100k with a few properties (up to 7)".

@lognaturel
Copy link
Member

"entities 100k" only has 7 properties which does not feel unrealistic to me whereas the "narrow" only has two.

The thing that's not very realistic is that the 7 properties are uuids. This means they're each 36 characters and have no overlap between values. I agree that 7+ properties is absolutely realistic but my experience so far is that many of the values are short. I believe that there isn't much performance difference between 10 properties with 72 characters between them vs. 2 properties with 72 characters between them, does that sound right?

The other unusual thing about uuid values is that there's absolutely no overlap between them. In typical usage, you'd see things like columns that represent a value from a select -- a city, grade, etc. The duplicate strings across the entity list would be interned and use up way less memory.

@seadowg
Copy link
Member Author

seadowg commented Aug 28, 2024

The thing that's not very realistic is that the 7 properties are uuids. This means they're each 36 characters and have no overlap between values. I agree that 7+ properties is absolutely realistic but my experience so far is that many of the values are short. I believe that there isn't much performance difference between 10 properties with 72 characters between them vs. 2 properties with 72 characters between them, does that sound right?

Ooof yeah that makes sense. Let's maybe look at improving the tests with a different form in a follow up PR then. I like the idea of 5-7 entities, but with a more normal scale as you suggest.

@lognaturel
Copy link
Member

Pixel 3a on this branch:

Downloading form with http cache: 38s
Downloading form second time with http cache: 78s
Loading form first time: 1s
Loading form second time: 1s
Filtering select: 1s

search() benchmark:

Loading form first time: 20s
Loading form second time: 1s
Filtering select: 0s

@seadowg
Copy link
Member Author

seadowg commented Aug 29, 2024

Pixel 3a on this branch:

Would like to bring that first download time down (although I think it's still with an acceptable range), but I think I'll wait to see where we're at once we merge this and start playing with #6376 as that makes optimizations to both initial and subsequent downloads. Just waiting for review on this one @grzesiek2010.


/**
* Prevents doing repeated [Cursor.getColumnIndex] lookups and also works around the lack of
* support for column names including a "." there (due to the mysterious bug 903852).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we have a full link to that bug? It will be easier to find,

Copy link
Member Author

@seadowg seadowg Aug 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's no public reference sadly! The best I can do is the @see for the comment in the Android source.

@grzesiek2010 grzesiek2010 merged commit f38ba7e into getodk:master Aug 29, 2024
6 checks passed
@seadowg seadowg deleted the optimize-download branch August 29, 2024 19:45
@dbemke
Copy link

dbemke commented Sep 2, 2024

@seadowg The crash from #6344 occurs - the difference is that now after scanning the QR code the form is in the list; the crash occurs after refreshing the list of blank forms (both cases mentioned in the issue). Should I reopen the issue?

@seadowg
Copy link
Member Author

seadowg commented Sep 2, 2024

@dbemke Yup! I'm guessing that it got fixed for creating the list, but not updating it.

@dbemke
Copy link

dbemke commented Sep 4, 2024

Tested with success!

Verified on a device with Android 10

Verified cases:

@WKobus
Copy link

WKobus commented Sep 4, 2024

Tested with Success

Verified on device with Android 14

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

SQlite syntax error- crash after refreshing list of blank forms containg entity form with property with "."
5 participants