Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cleanup hash index initialization #3577

Merged
merged 2 commits into from
Jun 7, 2024
Merged

Conversation

benjaminwinger
Copy link
Collaborator

Following the rewriting of how hash index and disk array headers are created in #3557, I found that removing createEmptyHashIndexFiles was working with a few tweaks, so this makes the hash index able to initialize itself if the file provided is nonexistent/empty.

I've also removed a couple of unused functions (and functions which were only used in a couple of places and were better off written inline) and added manual zeroing of the padding in SlotEntry<T> (which has padding if the type is smaller than 8 bytes).

Copy link

github-actions bot commented Jun 3, 2024

Benchmark Result

Master commit hash: 032bbd827659b71d0491ad0ccd2fe61c6bd8ca95
Branch commit hash: bdbae38020746b4bf53edf95f941efd33245b6dc

Query Group Query Name Mean Time - Commit (ms) Mean Time - Master (ms) Diff
aggregation q24 633.79 650.63 -16.85 (-2.59%)
aggregation q28 12525.87 13467.04 -941.17 (-6.99%)
filter q14 116.25 132.66 -16.41 (-12.37%)
filter q15 112.66 130.85 -18.19 (-13.90%)
filter q16 287.70 311.15 -23.46 (-7.54%)
filter q17 432.64 458.97 -26.32 (-5.74%)
filter q18 1897.53 1933.43 -35.89 (-1.86%)
fixed_size_expr_evaluator q07 558.57 573.96 -15.39 (-2.68%)
fixed_size_expr_evaluator q08 787.25 799.67 -12.42 (-1.55%)
fixed_size_expr_evaluator q09 785.93 799.81 -13.87 (-1.73%)
fixed_size_expr_evaluator q10 236.29 248.96 -12.67 (-5.09%)
fixed_size_expr_evaluator q11 228.62 241.68 -13.05 (-5.40%)
fixed_size_expr_evaluator q12 230.96 241.80 -10.84 (-4.48%)
fixed_size_expr_evaluator q13 1467.66 1479.54 -11.88 (-0.80%)
fixed_size_seq_scan q23 112.71 122.32 -9.61 (-7.86%)
join q29 725.21 670.75 54.46 (8.12%)
join q30 1539.42 1448.53 90.89 (6.27%)
join q31 46.78 43.89 2.89 (6.59%)
ldbc_snb_ic q35 3291.85 3309.92 -18.07 (-0.55%)
ldbc_snb_ic q36 131.58 130.96 0.62 (0.48%)
ldbc_snb_is q32 11.62 12.49 -0.87 (-6.97%)
ldbc_snb_is q33 96.31 96.18 0.13 (0.14%)
ldbc_snb_is q34 98.23 102.81 -4.57 (-4.45%)
order_by q25 117.96 131.76 -13.80 (-10.47%)
order_by q26 426.77 437.09 -10.32 (-2.36%)
order_by q27 1377.17 1419.36 -42.20 (-2.97%)
scan_after_filter q01 161.42 171.35 -9.94 (-5.80%)
scan_after_filter q02 138.81 155.78 -16.97 (-10.89%)
shortest_path_ldbc100 q39 59.10 56.88 2.22 (3.91%)
var_size_expr_evaluator q03 2018.50 2051.60 -33.11 (-1.61%)
var_size_expr_evaluator q04 2206.98 2249.22 -42.24 (-1.88%)
var_size_expr_evaluator q05 2581.67 2551.60 30.06 (1.18%)
var_size_expr_evaluator q06 1343.67 1392.35 -48.69 (-3.50%)
var_size_seq_scan q19 1424.76 1469.83 -45.07 (-3.07%)
var_size_seq_scan q20 3084.92 3091.03 -6.11 (-0.20%)
var_size_seq_scan q21 2385.92 2391.72 -5.80 (-0.24%)
var_size_seq_scan q22 122.57 131.57 -9.01 (-6.85%)

@benjaminwinger benjaminwinger force-pushed the hash-index-cleanup2 branch 2 times, most recently from 085d492 to f79b680 Compare June 3, 2024 23:08
src/include/storage/index/hash_index_slot.h Outdated Show resolved Hide resolved
src/storage/index/hash_index.cpp Show resolved Hide resolved
@@ -199,8 +188,8 @@ void OverflowFile::readFromDisk(transaction::TransactionType trxType, common::pa

void OverflowFile::writePageToDisk(common::page_idx_t pageIdx, uint8_t* data) const {
if (pageIdx < numPagesOnDisk) {
// TODO: updatePage does an unnecessary read + copy. We just want to overwrite
DBFileUtils::updatePage(*getBMFileHandle(), dbFileID, pageIdx, false, *bufferManager, *wal,
DBFileUtils::updatePage(*getBMFileHandle(), dbFileID, pageIdx,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel we should rename the param bool isInsertingNewPage inside DBFileUtils::updatePage. The param is not intuitive. bool readPageBeforeUpdate should be better?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe. I'm hesitant to flip the meaning of the boolean though, as that's something easy to overlook. Maybe we should instead use the BufferManager::PageReadPolicy to make it more obvious what the parameter means?

Copy link

github-actions bot commented Jun 4, 2024

Benchmark Result

Master commit hash: 35eee929fc96706a163dc03b4f7706978d78ccb8
Branch commit hash: edc549de4d09a9a76a66b9b23019a3a14ca9f11a

Query Group Query Name Mean Time - Commit (ms) Mean Time - Master (ms) Diff
aggregation q24 633.07 652.25 -19.18 (-2.94%)
aggregation q28 13775.97 13072.70 703.26 (5.38%)
filter q14 116.79 132.51 -15.72 (-11.86%)
filter q15 117.57 137.05 -19.48 (-14.21%)
filter q16 289.47 310.13 -20.66 (-6.66%)
filter q17 432.31 450.91 -18.61 (-4.13%)
filter q18 1888.63 1953.11 -64.49 (-3.30%)
fixed_size_expr_evaluator q07 561.09 571.74 -10.66 (-1.86%)
fixed_size_expr_evaluator q08 785.44 799.51 -14.07 (-1.76%)
fixed_size_expr_evaluator q09 783.36 796.27 -12.91 (-1.62%)
fixed_size_expr_evaluator q10 235.77 247.65 -11.88 (-4.80%)
fixed_size_expr_evaluator q11 228.43 242.89 -14.46 (-5.95%)
fixed_size_expr_evaluator q12 229.54 241.68 -12.15 (-5.03%)
fixed_size_expr_evaluator q13 1474.56 1482.80 -8.24 (-0.56%)
fixed_size_seq_scan q23 111.11 123.69 -12.58 (-10.17%)
join q29 704.80 704.94 -0.14 (-0.02%)
join q30 1448.61 1395.87 52.74 (3.78%)
join q31 45.84 38.77 7.07 (18.24%)
ldbc_snb_ic q35 3211.46 3223.53 -12.07 (-0.37%)
ldbc_snb_ic q36 127.65 131.44 -3.79 (-2.88%)
ldbc_snb_is q32 11.73 11.62 0.11 (0.96%)
ldbc_snb_is q33 97.51 98.06 -0.55 (-0.56%)
ldbc_snb_is q34 98.65 100.23 -1.58 (-1.58%)
order_by q25 124.06 132.79 -8.73 (-6.57%)
order_by q26 426.06 446.48 -20.41 (-4.57%)
order_by q27 1409.39 1423.13 -13.74 (-0.97%)
scan_after_filter q01 168.13 174.51 -6.38 (-3.66%)
scan_after_filter q02 141.38 155.28 -13.90 (-8.95%)
shortest_path_ldbc100 q39 54.04 55.90 -1.86 (-3.33%)
var_size_expr_evaluator q03 2018.54 2049.14 -30.60 (-1.49%)
var_size_expr_evaluator q04 2235.93 2273.39 -37.46 (-1.65%)
var_size_expr_evaluator q05 2575.56 2545.55 30.02 (1.18%)
var_size_expr_evaluator q06 1349.35 1394.92 -45.57 (-3.27%)
var_size_seq_scan q19 1450.97 1468.85 -17.88 (-1.22%)
var_size_seq_scan q20 3095.97 3045.50 50.47 (1.66%)
var_size_seq_scan q21 2368.53 2375.60 -7.07 (-0.30%)
var_size_seq_scan q22 124.95 131.82 -6.88 (-5.22%)

@benjaminwinger benjaminwinger merged commit 6e039d8 into master Jun 7, 2024
19 checks passed
@benjaminwinger benjaminwinger deleted the hash-index-cleanup2 branch June 7, 2024 18:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants