Disk Array Restructuring #3146

benjaminwinger · 2024-03-26T16:10:39Z

To support bypassing the WAL when writing new pages to disk arrays (an important performance optimization for #2938), it will be necessary to track the end of the file within the file itself. Currently, new pages are added by using the filehandle's page counter. As a consequence of this, allowing writes to bypass the WAL will mean that any pages written as part of an aborted transaction will not be re-used. If a page counter is stored in the file, then as long as updated pages are written to the WAL (which will always include the header), we can truncate or overwrite untracked pages since the page counter won't be modified when pages are appended directly to the end of the file.

I think the best way to handle this is like what I'd done in the disk overflow file (#2857), that is, having a parent collection which stores a header tracking the number of pages.

This also provides an opportunity for some optimizations, as the parent collection can handle writing the headers instead of needing to allow each disk array to write its own header independently. Currently, disk array headers are 48 bytes and stored in their own pages. This is somewhat inefficient and leads to some bloating (E.g. 255 unnecessary pages in the hash index file: see #2625).
If we cut down what's being stored in the disk array headers to the bare minimum, we should be able to fit 256 headers into a single page:

For each array, we need to store:

numElements (8 bytes)
firstPIPPageIdx (currently stored as an 8 byte int, but really it's a page_idx_t, which is 4 bytes)

All other properties are either derived, or in the case of the element size, are known at compile time (it might be a good idea to include the element size anyway as a sanity check; it could be stored in 2 bytes, since elements cannot exceed the page size, or one if we store the alignedElementSizeLog2).

Global data:

Number of arrays (uint32_t is sufficient given that the number of arrays is constrained by the size of firstPIPPageIdx)
page counter (4 bytes)
Page index of the next header page (4 bytes; header pages are always read, so there's no disadvantage to storing them as a linked list).

This is sufficient to store headers for (4096 - 12)/12 = 340 arrays per page, and leaves 3 extra bytes per array header which could be used for additional data while still storing the 256 hash index headers in a single page.

Currently, disk arrays are initialized using a header page index. That would just be changed to a header index, as it no longer refers to a specific page, but remains fundamentally the same.

The text was updated successfully, but these errors were encountered:

benjaminwinger · 2024-03-26T20:41:52Z

On reflection, it might make sense to add WAL records noting that pages have been added to a file other than the WAL. Like the page update/insert record but without actually storing or writing the page. That would still have some overhead, but if it stores a range of pages it could be done with a single record.
Then the file could be truncated if the transaction is rolled back.

Edit: Given that copy hasn't previously supported rollback or clean recovery, it might be prudent to restrict bypassing the WAL in the disk array to hash index bulk insertions for now (basically what I already have implemented, with a flag to enable bypassing). Then work could be done later to support rolling back and recovering copy/bulk insertions.

benjaminwinger · 2024-06-11T21:39:31Z

Fixed by #3557 (with the exception of handling truncating the file on rollback, but that's a missing feature more broadly speaking).

benjaminwinger mentioned this issue Mar 28, 2024

Hash Index for multiple copy #2938

Open

6 tasks

benjaminwinger closed this as completed Jun 11, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Disk Array Restructuring #3146

Disk Array Restructuring #3146

benjaminwinger commented Mar 26, 2024

benjaminwinger commented Mar 26, 2024 •

edited

Loading

benjaminwinger commented Jun 11, 2024

Disk Array Restructuring #3146

Disk Array Restructuring #3146

Comments

benjaminwinger commented Mar 26, 2024

benjaminwinger commented Mar 26, 2024 • edited Loading

benjaminwinger commented Jun 11, 2024

benjaminwinger commented Mar 26, 2024 •

edited

Loading