Improve BenchmarkParquetReader #6275

yiweiHeOSS · 2023-08-25T20:54:40Z

This PR is to fix #6247.

So far the Benchmark test has 4 types, this PR added more types to the BenchmarkParquetReader test:

ShortDecimalType
LongDecimalType
VARCHAR

We should test them with different filter rates and null rates just like the previous tests in BenchmarkParquetReader.

Also, I noticed we have never done the filter test for the type HUGEINT (int128_t) before, which is the actual type of LongDecimalType. So this PR also implemented the code to generate the filter of the type HUGEINT (int128_t) and modify the code to generate the data of the type HUGEINT (int128_t) correctly.

While developing the code, we found there is another problem/enhancement we need to do, so I created an issue for it #6248 This could be the next step.

netlify · 2023-08-25T20:54:45Z

✅ Deploy Preview for meta-velox ready!

Name	Link
🔨 Latest commit	`0b772e0`
🔍 Latest deploy log	https://app.netlify.com/sites/meta-velox/deploys/652f37cb95e68d00082f03c0
😎 Deploy Preview	https://deploy-preview-6275--meta-velox.netlify.app
📱 Preview on mobile	Toggle QR Code... Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

yiweiHeOSS · 2023-08-28T21:47:20Z

My change to the below code cuase failures in another testcase:

return HugeInt::build(Random::rand32(gen), Random::rand32(gen));
to        return HugeInt::build(Random::rand64(gen), Random::rand64(gen));

[ FAILED ] E2EFilterTest.longDecimalDictionary
[ FAILED ] E2EFilterTest.longDecimalDirect

So I modify the code to add a temp to see the velue in the debugger:

class HugeInt {
 public:
  static constexpr FOLLY_ALWAYS_INLINE int128_t
  build(uint64_t hi, uint64_t lo) {
    // GCC does not allow left shift negative value.
    int128_t temp = (static_cast<__uint128_t>(hi) << 64) | lo;
    return temp;
  }

lo: 10936052917074306677
hi: 15573954114018521923
temp: -52993621163942803648006051651297827211

The error message later in the test:

[ RUN      ] E2EFilterTest.longDecimalDictionary
/root/velox/velox/dwio/common/tests/E2EFilterTestBase.cpp:122: Failure
Value of: resultBatch->equalValueAt(batches[batchIndex].get(), i, rowIndex)
  Actual: false
Expected: true
Content mismatch at resultBatch 0 at index 0: expected: {-5299362116394280364800605165.1297827211, {-10962914699381264738035368224.7093712783}} actual: {-502284656663603218480.5461940619, {-524318847941508198011.4504906639}}
Google Test trace:
/root/velox/velox/dwio/common/tests/E2EFilterTestBase.cpp:322: No row group skip
/root/velox/velox/dwio/common/tests/E2EFilterTestBase.cpp:235: Failure
Value of: result->equalValueAt(expectedColumn, i, expectedRow)
  Actual: false
Expected: true
Content mismatch at 0 column 0: expected: -5299362116394280364800605165.1297827211 actual: -502284656663603218480.5461940619
Google Test trace:
/root/velox/velox/dwio/common/tests/E2EFilterTestBase.cpp:322: No row group skip
/root/velox/velox/dwio/common/tests/E2EFilterTestBase.cpp:235: Failure
Value of: result->equalValueAt(expectedColumn, i, expectedRow)
  Actual: false
Expected: true

So the expected value looks good, but the actual value is corrupted. Maybe there is a problem in the code of E2EFilterTest related to write/read.

yiweiHeOSS · 2023-08-29T00:30:11Z

expected：-52993621163942803648006051651297827211：
-100111110111100011001100111111011001011011000101001100101111000110100000111011010101011101000001001000111010010101100110001011
actual：-5022846566636032184805461940619：
-111111011001011011000101001100101111000110100000111011010101011101000001001000111010010101100110001011
The head bits are cut in actual 100111110111100011001100

yiweiHeOSS · 2023-08-29T23:04:12Z

Created an issue for the issue above #6317

yingsu00

There shouldn't be any commits that just fix the format issues from previous commits. Please merge it with the previous one.

Add a separate commit to rewrite ColumnStats<StringView>::makeRangeFilter and add ColumnStats<StringView>::makeRandomFilter.

Separate the decimal benchmark and varchar benchmark into two commits.

Can you please add the benchmark result as a comment block at the end of the benchmark? Be sure to use release build, and include your hardware spec.

velox/dwio/common/tests/utils/BatchMaker.cpp

velox/dwio/common/tests/utils/FilterGenerator.cpp

velox/dwio/common/tests/utils/FilterGenerator.h

velox/dwio/common/tests/utils/FilterGenerator.cpp

velox/dwio/parquet/tests/reader/ParquetReaderBenchmark.cpp

yingsu00

Please merge the last commit "Change the callers of original makeRangeFilter()." into the first one.
Many of the commit message lines are too long. Please take a look at https://gist.github.com/robertpainsi/b632364184e70900af4ab688decf6f53 and update them accordingly.

velox/dwio/parquet/tests/reader/ParquetReaderBenchmark.cpp

velox/dwio/common/tests/utils/FilterGenerator.h

yiweiHeOSS · 2023-10-13T16:57:58Z

Simplified/shortened the commit messages and made the change according to the comments.

yingsu00

Can you merge "Merge branch 'main' into addTypes" into the proper commit of the 4 previous commits? It's not good to have a commit that just merge branch.

czentgr · 2023-11-02T00:49:08Z

@yiweiHeOSS please rebase to the lastest. Looks like this is over 2 weeks old already. Any reason the 4 commits were not squashed? Best to squash and provide a comprehensive description for the final commit (as found in this PR description). Please let me know if you need help with that.
@Yuhta Would you please be able to review this PR? Thank you!

yiweiHeOSS · 2023-11-02T01:15:15Z

@czentgr I intentionally made the 4 separate commits since they are required by @yingsu00.

facebook-github-bot · 2023-11-02T16:32:41Z

@Yuhta has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

yiweiHeOSS · 2023-11-02T18:26:09Z

Hi @Yuhta , just a reminder, the PR #6342 also needs to merge. It is a dependency of this PR, not sure if it is merged already.

facebook-github-bot · 2023-11-02T20:56:43Z

@Yuhta merged this pull request in 060232f.

conbench-facebook · 2023-11-02T21:18:41Z

Conbench analyzed the 1 benchmark run on commit 060232f8.

There were no benchmark performance regressions. 🎉

The full Conbench report has more details.

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Aug 25, 2023

yiweiHeOSS mentioned this pull request Aug 25, 2023

Add tests for type (Long/short)Decimal and VARCHAR into BenchmarkParquetReader #6247

Closed

yingsu00 assigned yiweiHeOSS and unassigned yiweiHeOSS Aug 25, 2023

yingsu00 reviewed Sep 5, 2023

View reviewed changes

yiweiHeOSS force-pushed the addTypes branch from 5e70c55 to ed3435d Compare September 7, 2023 21:45

yiweiHeOSS requested a review from yingsu00 September 8, 2023 00:18

yiweiHeOSS force-pushed the addTypes branch from ed3435d to 30f1e29 Compare September 8, 2023 23:33

yingsu00 reviewed Sep 10, 2023

View reviewed changes

velox/dwio/common/tests/utils/FilterGenerator.cpp Show resolved Hide resolved

velox/dwio/common/tests/utils/FilterGenerator.cpp Show resolved Hide resolved

velox/dwio/parquet/tests/reader/ParquetReaderBenchmark.cpp Show resolved Hide resolved

yiweiHeOSS force-pushed the addTypes branch from 30f1e29 to 4be054e Compare September 13, 2023 22:52

yiweiHeOSS requested a review from yingsu00 September 13, 2023 23:11

yingsu00 changed the title ~~Add tests for type (Long/short)Decimal and VARCHAR into BenchmarkParquetReader. Add the filter test code for HUGEINT (int128_t).~~ Improve BenchmarkParquetReader Oct 13, 2023

yingsu00 reviewed Oct 13, 2023

View reviewed changes

yiweiHeOSS force-pushed the addTypes branch from 4be054e to 299c146 Compare October 13, 2023 16:59

yiweiHeOSS requested a review from yingsu00 October 16, 2023 20:34

yingsu00 reviewed Oct 17, 2023

View reviewed changes

yiweiHeOSS force-pushed the addTypes branch from 9680426 to af87a6a Compare October 18, 2023 01:28

yiweiHeOSS added 4 commits October 17, 2023 18:37

ColumnStats<StringView>: rewrite makeRangeFilter, add makeRandomFilter

08148c6

Generate proper bits of random value for LongDecimalType test.

566cdc3

Add BenchmarkParquetReader tests for type (Long/short)Decimal.

e5e537c

Add tests for type VARCHAR and append the results.

0b772e0

yiweiHeOSS force-pushed the addTypes branch from df50635 to 0b772e0 Compare October 18, 2023 01:41

yiweiHeOSS requested a review from yingsu00 October 18, 2023 01:57

yingsu00 approved these changes Oct 19, 2023

View reviewed changes

yingsu00 requested a review from Yuhta October 19, 2023 23:09

facebook-github-bot closed this in 060232f Nov 2, 2023

facebook-github-bot added the Merged label Nov 2, 2023

Yuhta mentioned this pull request Nov 20, 2023

Generate proper bits of random value for LongDecimalType tests. #6342

Closed

yzhang1991 mentioned this pull request Dec 12, 2023

BatchMaker should produce real random values for HUGEINT #6317

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve BenchmarkParquetReader #6275

Improve BenchmarkParquetReader #6275

yiweiHeOSS commented Aug 25, 2023

netlify bot commented Aug 25, 2023 •

edited

yiweiHeOSS commented Aug 28, 2023

yiweiHeOSS commented Aug 29, 2023

yiweiHeOSS commented Aug 29, 2023

yingsu00 left a comment

yingsu00 left a comment

yiweiHeOSS commented Oct 13, 2023

yingsu00 left a comment

czentgr commented Nov 2, 2023

yiweiHeOSS commented Nov 2, 2023

facebook-github-bot commented Nov 2, 2023

yiweiHeOSS commented Nov 2, 2023

facebook-github-bot commented Nov 2, 2023

conbench-facebook bot commented Nov 2, 2023

Improve BenchmarkParquetReader #6275

Improve BenchmarkParquetReader #6275

Conversation

yiweiHeOSS commented Aug 25, 2023

netlify bot commented Aug 25, 2023 • edited

✅ Deploy Preview for meta-velox ready!

yiweiHeOSS commented Aug 28, 2023

yiweiHeOSS commented Aug 29, 2023

yiweiHeOSS commented Aug 29, 2023

yingsu00 left a comment

Choose a reason for hiding this comment

yingsu00 left a comment

Choose a reason for hiding this comment

yiweiHeOSS commented Oct 13, 2023

yingsu00 left a comment

Choose a reason for hiding this comment

czentgr commented Nov 2, 2023

yiweiHeOSS commented Nov 2, 2023

facebook-github-bot commented Nov 2, 2023

yiweiHeOSS commented Nov 2, 2023

facebook-github-bot commented Nov 2, 2023

conbench-facebook bot commented Nov 2, 2023

netlify bot commented Aug 25, 2023 •

edited