Disk array packed headers #3557

benjaminwinger · 2024-05-29T20:07:51Z

Packs the disk array headers into as few pages as possible (each header page points to the next one to support adding new disk arrays to the disk array collection) and packs the hash index headers into two pages.
I've also fixed some padding issues related to #3484, but there still seems to be a small amount of uninitialized data in the main hash index file (either due to something in the slot, or maybe related to the disk array write iterator).

A new class DiskArrayCollection gets passed around with the MetadataDAHInfo structure since the indices no longer refer to individual pages that can be easily read independently. Instead the DiskArrayCollection reads all the headers when initialized and the MetadataDAHInfo provides the index of the header for a particular chunk and an easy way of constructing the disk array.

instead of 256

benjaminwinger · 2024-05-29T21:55:30Z

I'm not really sure why this is requiring a larger buffer pool size for the multi copy test to succeed. I'm suspicious that it might be caused by a subtle bug.

ray6080 · 2024-05-30T00:46:46Z

src/include/storage/wal_replayer_utils.h

@@ -3,6 +3,7 @@
 #include <string>

 #include "catalog/catalog_entry/node_table_catalog_entry.h"
+#include "common/types/internal_id_t.h"


Double check if this include is necessary.

I'm not actually sure why that was added; I don't recall modfying that file, but it is where table_id_t is declared.

src/include/storage/store/table_data.h

src/include/storage/store/struct_column.h

src/include/storage/store/rel_table.h

src/include/storage/store/null_column.h

ray6080 · 2024-05-30T00:55:48Z

src/include/common/constants.h

@@ -79,7 +79,7 @@ struct BufferPoolConstants {
    static constexpr uint64_t DEFAULT_VM_REGION_MAX_SIZE = (uint64_t)1 << 43; // (8TB)
 #endif

-    static constexpr uint64_t DEFAULT_BUFFER_POOL_SIZE_FOR_TESTING = 1ull << 26; // (64MB)
+    static constexpr uint64_t DEFAULT_BUFFER_POOL_SIZE_FOR_TESTING = 1ull << 27; // (128MB)


I wonder which test fails due to bm size.

It's the multi copy test (I've run into this before; that time it was caused by a bug), which I think writes a gigabyte or so of hash index data. But it doesn't need all that memory at one time, so it should be fine with a small buffer pool.

src/include/storage/index/hash_index_utils.h

src/include/storage/storage_structure/disk_array.h

ray6080 · 2024-05-30T01:07:50Z

src/include/storage/storage_structure/disk_array_collection.h

+    void prepareCommit();
+
+    void checkpointInMemory() {
+        for (size_t i = 0; i < headersForWriteTrx.size(); i++) {


Should we avoid size_t? as I understand, it's not compatible on 32-bit platforms.

It's not incompatible, it's just that we can't expect it to be 64-bit on a 32-bit platform, so it shouldn't be used for things like iterating over the elements in the disk array where we expect to be able to go beyond UINT32_MAX. But std::vector::size returns a size_t anyway (or at least usually does, according to cppreference, it's a size_type defined in the vector implementation).

Pack hash index headers into two pages

1b9caa3

instead of 256

benjaminwinger force-pushed the disk-array-packed-headers branch 2 times, most recently from c7c0c6e to 6bb464b Compare May 29, 2024 21:54

ray6080 approved these changes May 30, 2024

View reviewed changes

benjaminwinger force-pushed the disk-array-packed-headers branch from 6bb464b to b9244a2 Compare May 30, 2024 18:41

Pack disk array headers

30e17bc

benjaminwinger force-pushed the disk-array-packed-headers branch from b9244a2 to 30e17bc Compare May 30, 2024 21:47

benjaminwinger mentioned this pull request May 31, 2024

Track variable sized memory manager allocations through the buffer manager #3564

Merged

benjaminwinger merged commit 032bbd8 into master Jun 3, 2024
18 checks passed

benjaminwinger deleted the disk-array-packed-headers branch June 3, 2024 19:41

benjaminwinger mentioned this pull request Jun 3, 2024

Cleanup hash index initialization #3577

Merged

benjaminwinger mentioned this pull request Jun 11, 2024

Disk Array Restructuring #3146

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Disk array packed headers #3557

Disk array packed headers #3557

benjaminwinger commented May 29, 2024

benjaminwinger commented May 29, 2024

ray6080 May 30, 2024

benjaminwinger May 30, 2024

ray6080 May 30, 2024

benjaminwinger May 30, 2024

ray6080 May 30, 2024

benjaminwinger May 30, 2024 •

edited

Disk array packed headers #3557

Disk array packed headers #3557

Conversation

benjaminwinger commented May 29, 2024

benjaminwinger commented May 29, 2024

ray6080 May 30, 2024

Choose a reason for hiding this comment

benjaminwinger May 30, 2024

Choose a reason for hiding this comment

ray6080 May 30, 2024

Choose a reason for hiding this comment

benjaminwinger May 30, 2024

Choose a reason for hiding this comment

ray6080 May 30, 2024

Choose a reason for hiding this comment

benjaminwinger May 30, 2024 • edited

Choose a reason for hiding this comment

benjaminwinger May 30, 2024 •

edited