Skip to content
This repository has been archived by the owner on Aug 8, 2023. It is now read-only.

Harden OfflineDatabase #12224

Merged
merged 6 commits into from
Aug 15, 2018
Merged

Harden OfflineDatabase #12224

merged 6 commits into from
Aug 15, 2018

Conversation

kkaefer
Copy link
Contributor

@kkaefer kkaefer commented Jun 26, 2018

This changes OfflineDatabase to be more resilient. The design is to catch SQLite exceptions on the body of all public functions, and return with a failure code if the query failed.

Also adds tests for corrupt databases, as well as support for operating without read or write access to the file system. Also handles disk full error (in those cases, the database remains readable, but any updates, such as timestamp updates or resource insertions) will fail.

@friedbunny friedbunny added performance Speed, stability, CPU usage, memory usage, or power usage Core The cross-platform C++ core, aka mbgl labels Jun 27, 2018
Copy link
Contributor

@friedbunny friedbunny left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like it has an inadvertent bump of the platform/ios/vendor/mapbox-events-ios submodule.

Please also add an iOS changelog entry. 🙇

Copy link
Contributor

@jfirebaugh jfirebaugh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We'll need to think about what the behavior should be when an offline download encounters a database error. For instance, if the disk is full, it looks like the current behavior may be indistinguishable from a successful download.

} catch (...) {
if (auto download = getDownload(regionID)) {
callback({}, download->getStatus());
} else {
callback(std::current_exception(), {});
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

std::current_exception() will always return an empty result here, since it's no longer inside a catch block.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is still pending; it should synthesize an exception. Also, listRegions, createRegion, and updateMetadata above and deleteRegion below need a similar change.

@@ -13,7 +13,12 @@ namespace mbgl {
OfflineDatabase::OfflineDatabase(std::string path_, uint64_t maximumCacheSize_)
: path(std::move(path_)),
maximumCacheSize(maximumCacheSize_) {
ensureSchema();
try {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: whitespace.

void migrateToVersion5();
void migrateToVersion6();
void migrateToVersion3();
void migrateToVersion6();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whitespace.

OfflineRegionDefinition definition { "", LatLngBounds::hull({1, 2}, {3, 4}), 5, 6, 1.0 };
OfflineRegionMetadata metadata;
return db.createRegion(definition, metadata);
return db.createRegion(definition, metadata);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whitespace.

@lilykaiser
Copy link

Fixes #10031
(I believe)

@kkaefer
Copy link
Contributor Author

kkaefer commented Aug 1, 2018

We'll need to think about what the behavior should be when an offline download encounters a database error. For instance, if the disk is full, it looks like the current behavior may be indistinguishable from a successful download.

The current behavior is to crash if the disk is full (or any IO error occurs). When a network error occurs, the offline download never completes. This PR changes the crash when the disk is full to an offline download that never completes, which, while not ideal, is arguably better than the previous behavior.

I think erroneous downloads is something worth fixing, but I'd like to do that in a followup PR to avoid delaying the important crash fixes in this PR even longer. Overall, I think we need a concept for what an offline download should do in the error case: Retry a few times? Record failures and add them to the status? How are sparse tilesets handled (they return 404s on purpose, which means such a download will never complete atm)? What happens when you go offline mid-download, or try to do a download over a spotty connection?

@jfirebaugh
Copy link
Contributor

Considering the call sequence OfflineDownload::ensureResourceOfflineDatabase::putRegionResourcesputRegionResourceInternalputInternalputTile...

If putTile fails from a database error, the exception will pop the stack up to putRegionResources. The transaction will roll back, so none of the database changes will take effect, although changes to status.completedResourceCount, completedResourceSize, completedTileCount, and completedTileSize from prior elements of the batch will still be applied, so there may be an inconsistency between the database and status.

However, there will be at least one resource that does not trigger an increment of completedResourceCount, so OfflineRegionStatus::complete() will return false (unless the region is affected by #11930). So you're right, although the status may overcount completed resources, it will not be marked as complete. That does seem better than crashing. Can you add a test for this to offline_download.test.cpp? And it looks straightforward to correct the status fields by accumulating in temporary variables that are added to the status fields only after the transaction is committed.

When a network error occurs, the offline download never completes.

If it's a permanent error, that's true. But transient errors will get retried, since OfflineDownload::ensureResource keeps the request alive until it succeeds. I do think that it would be fair to handle database IO errors differently though -- e.g. immediately notify via an observer method, deactivate the download, and require developer intervention to reactivate. (Similarly to how exceeding the tile count limit is handled.)

How are sparse tilesets handled (they return 404s on purpose, which means such a download will never complete atm)?

404s get turned into "no content" non-errors by the file source. We might need to take this into account if we change that behavior though.

@kkaefer kkaefer force-pushed the filesource branch 2 times, most recently from 02564b6 to 03a4dc2 Compare August 6, 2018 13:37
@kkaefer
Copy link
Contributor Author

kkaefer commented Aug 6, 2018

I fixed the test issues when using the VFS by porting the creation/deletion routines for sqlite3_file objects from https://www.sqlite.org/src/doc/trunk/src/test_vfstrace.c.

@tobrun
Copy link
Member

tobrun commented Aug 13, 2018

@kkaefer took this PR for a testrun and it's looking great!

Testrun
  • Open activity with a mapview
  • Use adb shell followed by su rm -rf /data/data/com.mapbox.mapboxsdk.testapp/files
  • Scroll around map, close and reopen activity with a map
  • Application is not crashing

I'm seeing is that the files/mbgl-offline.db is only recreated with a fresh application start.

@kkaefer kkaefer force-pushed the filesource branch 4 times, most recently from 5fa7f67 to 0c90386 Compare August 14, 2018 02:04
@jfirebaugh
Copy link
Contributor

I'm seeing is that the files/mbgl-offline.db is only recreated with a fresh application start.

I think the expected behavior is that a fresh application start isn't required, so this may need more investigation.

@kkaefer
Copy link
Contributor Author

kkaefer commented Aug 14, 2018

Implemented database recreating when it is deleted in
410566f. Unfortunately, the behavior on macOS and Linux is different and we're not seeing the DB moved error code there.

@kkaefer kkaefer force-pushed the filesource branch 2 times, most recently from 0933103 to e783076 Compare August 14, 2018 21:32
@kkaefer
Copy link
Contributor Author

kkaefer commented Aug 14, 2018

This is now ready for review 🎉

Copy link
Contributor

@jfirebaugh jfirebaugh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome! Having an expected polyfill will definitely be useful.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Core The cross-platform C++ core, aka mbgl high priority offline performance Speed, stability, CPU usage, memory usage, or power usage
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants