Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Disk array packed headers #3557

Merged
merged 2 commits into from
Jun 3, 2024
Merged

Disk array packed headers #3557

merged 2 commits into from
Jun 3, 2024

Conversation

benjaminwinger
Copy link
Collaborator

Packs the disk array headers into as few pages as possible (each header page points to the next one to support adding new disk arrays to the disk array collection) and packs the hash index headers into two pages.
I've also fixed some padding issues related to #3484, but there still seems to be a small amount of uninitialized data in the main hash index file (either due to something in the slot, or maybe related to the disk array write iterator).

A new class DiskArrayCollection gets passed around with the MetadataDAHInfo structure since the indices no longer refer to individual pages that can be easily read independently. Instead the DiskArrayCollection reads all the headers when initialized and the MetadataDAHInfo provides the index of the header for a particular chunk and an easy way of constructing the disk array.

@benjaminwinger benjaminwinger force-pushed the disk-array-packed-headers branch 2 times, most recently from c7c0c6e to 6bb464b Compare May 29, 2024 21:54
@benjaminwinger
Copy link
Collaborator Author

I'm not really sure why this is requiring a larger buffer pool size for the multi copy test to succeed. I'm suspicious that it might be caused by a subtle bug.

@@ -3,6 +3,7 @@
#include <string>

#include "catalog/catalog_entry/node_table_catalog_entry.h"
#include "common/types/internal_id_t.h"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Double check if this include is necessary.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not actually sure why that was added; I don't recall modfying that file, but it is where table_id_t is declared.

src/include/storage/store/table_data.h Outdated Show resolved Hide resolved
src/include/storage/store/struct_column.h Outdated Show resolved Hide resolved
src/include/storage/store/rel_table.h Outdated Show resolved Hide resolved
src/include/storage/store/null_column.h Outdated Show resolved Hide resolved
@@ -79,7 +79,7 @@ struct BufferPoolConstants {
static constexpr uint64_t DEFAULT_VM_REGION_MAX_SIZE = (uint64_t)1 << 43; // (8TB)
#endif

static constexpr uint64_t DEFAULT_BUFFER_POOL_SIZE_FOR_TESTING = 1ull << 26; // (64MB)
static constexpr uint64_t DEFAULT_BUFFER_POOL_SIZE_FOR_TESTING = 1ull << 27; // (128MB)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder which test fails due to bm size.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's the multi copy test (I've run into this before; that time it was caused by a bug), which I think writes a gigabyte or so of hash index data. But it doesn't need all that memory at one time, so it should be fine with a small buffer pool.

src/include/storage/index/hash_index_utils.h Outdated Show resolved Hide resolved
src/include/storage/storage_structure/disk_array.h Outdated Show resolved Hide resolved
void prepareCommit();

void checkpointInMemory() {
for (size_t i = 0; i < headersForWriteTrx.size(); i++) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we avoid size_t? as I understand, it's not compatible on 32-bit platforms.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not incompatible, it's just that we can't expect it to be 64-bit on a 32-bit platform, so it shouldn't be used for things like iterating over the elements in the disk array where we expect to be able to go beyond UINT32_MAX. But std::vector::size returns a size_t anyway (or at least usually does, according to cppreference, it's a size_type defined in the vector implementation).

@benjaminwinger benjaminwinger merged commit 032bbd8 into master Jun 3, 2024
18 checks passed
@benjaminwinger benjaminwinger deleted the disk-array-packed-headers branch June 3, 2024 19:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants