Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ambient cache initialization speed improvement #1959

Open
wants to merge 27 commits into
base: main
Choose a base branch
from

Conversation

geolives-contact
Copy link
Contributor

In order to solve #1815

Here is the matching design proposal : #1877

Christophe Brasseur and others added 18 commits October 31, 2023 16:38
…Resource / putInternal and getTile / getResource which support that

Fix for NOT NULL contraint violation for "compressed" field when inserting tile or resource
Changed SQL query for ambient cache initialization
…iles table when resource / tile is needed in an offline region download
…s tables (added to the upgrade script) + queries modified to manage LRU based on new tables
Fixed existing tests to comply with new database model (work in progress) + temporarily disabled some database merging tests which crashed + added new MigrateFromV6Schema test method
added v6.db in test/fixtures/offline_database
@geolives-contact
Copy link
Contributor Author

The OfflineDatabase.CorruptDatabaseOnQuery still fails.

We would need to create a corrupted database in version 7 of the database model, which opens without any error, but where the first query triggers error of corruption

(with test database in version 6, simply instantiating the object triggers the corruption error when it tries to create the new tables for the model upgrade, which makes the test fail)

@louwers
Copy link
Collaborator

louwers commented Dec 18, 2023

How did you make the databases?

@louwers louwers added the enhancement New feature or request label Dec 18, 2023
@geolives-contact
Copy link
Contributor Author

geolives-contact commented Dec 19, 2023

@louwers

For databases used for merging / sideloading tests (satellite_test.db, sideload_ambient.db, sideload_sat.db, and sideload_sat_multiple.db), I simply upgraded them with a SQLite database tool by running the upgrade script to model version 7.

For v6.db database (which is used to test the upgrade from version 6 to version 7), I made a copy of v5.db and upgraded it with a SQLite database tool by running the upgrade script to model version 6.

I still haven't managed to create the upgraded corrupt-delayed.db database (a corrupted database in version 7 of the database model, which opens without any error, but where the first query triggers error of corruption) to make the OfflineDatabase.CorruptDatabaseOnQuery test succeed :-(

Copy link

github-actions bot commented Dec 19, 2023

Bloaty Results (iOS) 🐋

Compared to main

    FILE SIZE        VM SIZE    
 --------------  -------------- 
  +0.1% +20.0Ki  +0.1% +16.0Ki    TOTAL

Full report: https://maplibre-native.s3.eu-central-1.amazonaws.com/bloaty-results-ios/pr-1959-compared-to-main.txt

@geolives-contact
Copy link
Contributor Author

CorruptDatabaseOnQuery test has been temporarily disabled as discussed with @louwers
Could you please proceed with merge? :-)
Thanks a lot.

Copy link

github-actions bot commented Feb 16, 2024

Bloaty Results 🐋

Compared to main

    FILE SIZE        VM SIZE    
 --------------  -------------- 
  +0.0% +51.8Ki  +0.1% +21.6Ki    TOTAL

Full report: https://maplibre-native.s3.eu-central-1.amazonaws.com/bloaty-results/pr-1959-compared-to-main.txt

Compared to d387090 (legacy)

    FILE SIZE        VM SIZE    
 --------------  -------------- 
   +19% +22.2Mi  +401% +24.0Mi    TOTAL

Full report: https://maplibre-native.s3.eu-central-1.amazonaws.com/bloaty-results/pr-1959-compared-to-legacy.txt

"SELECT length(data) FROM ambient_tiles WHERE url_template = ?1 AND pixel_ratio = ?2 AND x = ?3 AND y = ?4 "
"AND z = ?5");
if (selectAmbientTilesResult) {
// std::cout << "-------- HASTILE - FOUND IN AMBIENT_TILES\n";
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are commented out lines that print used for debugging throughout this file.

Please remove them or change them to (debug) logging.

@louwers
Copy link
Collaborator

louwers commented Feb 29, 2024

I ran the offline_database.benchmark.cpp benchmarks:

$ python3 vendor/benchmark/tools/compare.py --no-color benchmarks ./mbgl-benchmark-runner-main ./mbgl-benchmark-runner --benchmark_filter='OfflineDatabase/.*'

Benchmark                                                     Time             CPU      Time Old      Time New       CPU Old       CPU New
------------------------------------------------------------------------------------------------------------------------------------------
OfflineDatabase/InvalidateRegion                           -0.0228         -0.0226        144618        141326        144231        140973
OfflineDatabase/DeleteRegion                               +0.0586         +0.0588      22971082      24316111      22911370      24259681
OfflineDatabase/InsertTileRegion                           -0.0590         -0.0587        236233        222301        235708        221865
OfflineDatabase/InvalidateAmbientCache                     -0.5156         -0.5156         81160         39313         81146         39308
OfflineDatabase/ClearAmbientCache                          +0.0617         +0.0620      23023497      24444877      22966607      24389660
OfflineDatabase/InsertTileCache                            -0.0130         -0.0131        218526        215695        218123        215269
OfflineDatabase/InsertBigTileCache                         -0.0167         -0.0166       2157706       2121650       2157331       2121517
OfflineDatabase/GetTile                                    +0.0092         +0.0092        138014        139284        138001        139266
OfflineDatabase/AddTilesToFullDatabase                     -0.0043         -0.0043        166905        166193        166889        166178
OfflineDatabase/AddTilesToDisabledDatabase                 +0.0309         +0.0309           932           961           932           961
OfflineDatabase/GetTileFromDisabledDatabase                +0.0375         +0.0376           744           772           744           772
OfflineDatabase/ResizeDatabase                             +0.0141         +0.0144            11            11            11            11
OVERALL_GEOMEAN                                            -0.0516         -0.0515             0             0             0             0

No significant difference interestingly.

@geolives-contact
Copy link
Contributor Author

Hello @louwers

I have just removed the cout logs in comments.

About the performance test, the improvements are significant when there is a lot of offline data downloaded as offline regions (for example 1,5 GB). The startup of the app is much faster in that case :-)

const Response& response,
const std::string& data,
bool compressed) {
bool OfflineDatabase::putResource(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is too much code duplication in this function and the putTile function below.

Instead of adding a bool I would for example define an enum class that can be converted to a string to get the related database name using another utility function.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We agree with you about the code duplication issue, but we encountered difficulties in resolving it :-(

Initially, we attempted to minimize code duplication in methods like getTile(), getResource(), putTile(), putResource(), etc., by using lambdas and c_str() to concatenate the table name in the query. However, this approach led to memory issues (EXC_BAD_ACCESS) and erratic behavior, particularly errors during the insertion of tiles/resources, for example :
[DEBUG] {abort at 23 in [INSERT INTO ambient_tiles (url_template, pixel_ratio, x, y, z, modified, must_revalidate, etag, expires, accessed, data, compressed) VALUES (?1, ?2, ?3, ?4, ?5, ?6, ?7, ?8, ?9, ?10, ?11, ?12)]: NOT NULL constraint failed: ambient_tiles.compressed}Database: %s

Consequently, we had to revert our changes, even though it resulted in increased code duplication.

You can find here an old version of the code here with our trials for getTile() and getResource() method: Geolives@cb8cf11

As our expertise in C++ is not as strong as it is in other higher-level languages, we welcome any assistance from the community to help resolve this issue of code duplication without causing adverse effects :-)

Thanks in advance.

@louwers
Copy link
Collaborator

louwers commented Mar 4, 2024

I downloaded 4GB of offline data.

This is a build before this PR. It loads pretty quickly...?

https://youtube.com/shorts/CZqLHsIXg4I?si=abdOzSF47ZvxOeJ5

This is a build that includes this PR:

https://youtube.com/shorts/ocZza9EIdzk?si=9-9kGuG6XKy4xnXd

I guess that means that initAmbientCacheSize is not called in this instance.

@@ -35,6 +35,35 @@ static constexpr const char* offlineDatabaseSchema =
" must_revalidate INTEGER NOT NULL DEFAULT 0,\n"
" UNIQUE (url_template, pixel_ratio, z, x, y)\n"
");\n"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file is generated and should not be modified manually.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you try updating offline_schema.js and running that?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I didn't know that the file was autogenerated, sorry, I will try to do that :-)

// First, try to find the tile in the 'tiles' table
std::optional<int64_t> selectTilesResult = extractTileDataSize(
tile,
"SELECT length(data) FROM tiles WHERE url_template = ?1 AND pixel_ratio = ?2 AND x = ?3 AND y = ?4 AND z = ?5");
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will always fail for users not using any offline functionality at all. I'm not comfortable proceeding without first ensuring the performance impact is minimal, otherwise we need to keep track of if the tiles table contains any tiles at all and skip this check.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We agree with you on this point, a boolean variable which would be initialized on startup (by counting tiles / resources, or even by checking if there is an offline_region in the database or not) and set when we add the first resource / tile is a good idea in our opinion, we will try to implement that.

Copy link
Collaborator

@louwers louwers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again apologies for taking so long for me to review this, but I wasn't at all familiar with this subsystem of the library.

It looks like the problematic initAmbientCacheSize method was only introduced so that cache size could be determined without taking into account offline data. mapbox/mapbox-gl-native#15622

It seems these changed can be reverted after the split?

By the way, I was not able to reproduce the performance problems. Are you calling setMaximumAmbientCacheSize on startup? It seems somehow I was able to store 3GB of data with OfflineActivity.kt, without callingsetMaximumAmbientCacheSize and without an evict being triggered (these are the only places that call initAmbientCacheSize).

@geolives-contact
Copy link
Contributor Author

@louwers

I downloaded 4GB of offline data.

This is a build before this PR. It loads pretty quickly...?

https://youtube.com/shorts/CZqLHsIXg4I?si=abdOzSF47ZvxOeJ5

This is a build that includes this PR:

https://youtube.com/shorts/ocZza9EIdzk?si=9-9kGuG6XKy4xnXd

I guess that means that initAmbientCacheSize is not called in this instance.

In our application, the problem was much more reproducible on iOS than on Android (the reason for this is unclear).

In the iOS test application integrated into the maplibre-native project, the problem could be reproduced as explained here: #1815, and as you can see in this video: https://github.com/maplibre/maplibre-native/assets/13694294/a7c2bf01-54ff-43ed-86b1-675efb79e00a.

On Android, the performance issue was significantly less pronounced.

@louwers
Copy link
Collaborator

louwers commented Mar 5, 2024

@Geolives Alright, I will try iOS as well.

What do you think about removing initAmbientCacheSize() and some of the other changes introduced alongside it? It does not serve a purpose anymore after the database is split does it?

@geolives-contact
Copy link
Contributor Author

What do you think about removing initAmbientCacheSize() and some of the other changes introduced alongside it? It does not serve a purpose anymore after the database is split does it?

It requires further investigation because we still need to compute the ambient cache size to avoid exceeding the maximum chosen size. However, you are right that we need to understand how it was done before integrating mapbox/mapbox-gl-native#15622

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants