Skip to content

IndexedDB versioned page store #37

rhashimoto started this conversation in Ideas
IndexedDB versioned page store #37
Jan 28, 2022 · 1 comment

Here's a different idea for an IndexedDB VFS - allowing multiple versions of each page to minimize data movement.

The current VFS stores database pages in IndexedDB with a [filename, pageIndex] key, plus a metadata object for file size, page size, etc. That's a pretty straightforward way of emulating a block device. It caches all the writes for a SQLite transaction in memory plus spillover to another IndexedDB store, and doesn't flush them to the main store until the SQLite transaction finishes. Deferring that flush is good for two reasons: first, it minimizes when an exclusive lock is needed, and second, the pre-transaction contents of the database can be used instead of storing journal data (a savings of both space and time).

The cost of the deferred flush is that when the transaction overflows the in-memory write cache, then modified pages are written to the spill store and later read back and written to the database, a put-get-put operation on IndexedDB. Without the deferred flush, the VFS might need to read the current page, write it to the journal, and write the new contents to the database, a get-put-put operation but the get-put for the journal will sometimes be skipped.

The new idea is to store versioned database pages, i.e. with a [filename, pageIndex, version] key and an index on [filename, version], where newer versions are lower numbers. The current version number is stored in metadata. Fetching a page is now done by using an IDBKeyRange with a lower bound of the current version, which retrieves the first page at that version or higher (older). Here's how a transaction works with versioned pages:

  1. When a transaction starts, the VFS decrements the current version number (e.g. from 0 to -1).
  2. The VFS removes any existing pages at the new version number from IndexedDB (using the index). It should be rare that these pages exist. It would happen when some previous transaction crashed part-way through.
  3. The VFS writes any changed pages with the new version number to IndexedDB.
  4. At the end of the transaction, the VFS writes the new version number to metadata in IndexedDB. At this point, the changed pages become visible to other database connections.
  5. Finally, any pages that have multiple versions can be purged of older versions.

The per-page operation is now put-delete (put the new version, delete the old version), which only moves data once, and it has the same ability to avoid storing journal data. In addition, the purge doesn't need to happen immediately at the end of the transaction, as the old versions don't affect correctness. So purging could be deferred until the database is otherwise idle.

If deferring purging is to be an option, storing a purge version (i.e. the last version we purged at) in the metadata will make it more efficient to determine which pages might need to be purged of old versions. The disadvantage of deferring purging, other than the storage used, is that it would make prefetching pages for read (e.g. using getAll() or a cursor) less efficient since there is no way to fetch or traverse only the most recent version of multiple pages at once. If delete is not cheap then the best purging and prefetching strategy will likely be application-specific.

Replies

rhashimoto
Jan 31, 2022
Maintainer Author

I implemented this in a private branch. I like it a lot. I think the code is more elegant, especially because it doesn't use a write cache like the current VFS. I like everything about it except...it's slower. ☹️

On the benchmarks page, it's slower (on Chrome) on most of the operations. It is still slower if I disable purging obsolete pages (step 5, which can be deferred) and clearing pages with the new version number (step 2, which is unsafe so just for analysis). My hope was that it would be faster at least when deferring purging.

Why is it slower? I'm not sure yet. Right now I'm speculating that IDBObjectStore.get() is just slower with an IDBKeyRange than with a key (which I can measure if I get around to it), but it is also possible I've done something dumb in my code, such as await-ing somewhere that doesn't need to be synchronous.

I did learn some things along the way. The most important thing is that doing "smart" things with SQLite transactions in a VFS requires recognizing a transaction from the pattern of VFS calls. The problem is that those patterns can change with different PRAGMA settings. For example:

  • xLock/xUnlock - Using the reserved/exclusive locking mode to infer transaction boundaries works with the default settings. It doesn't work with PRAGMA locking_mode=exclusive. As an aside, note that clearing failed transaction data (step 2) can be done on exclusive lock instead of per transaction.
  • Journal xOpen/xClose - Watching the associated journal file open and close works with the default settings, except that xTruncate is called outside the journal lifetime. This approach needs a bit of tweaking to work with PRAGMA journal_mode=*, where "*" is "truncate" or "persist", and it doesn't work at all with "memory" or "off" because those don't use an external journal file.
0 replies
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Ideas
Labels
None yet
1 participant