-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve BenchmarkParquetReader #6275
Conversation
✅ Deploy Preview for meta-velox ready!
To edit notification comments on pull requests, go to your Netlify site configuration. |
My change to the below code cuase failures in another testcase:
[ FAILED ] E2EFilterTest.longDecimalDictionary So I modify the code to add a temp to see the velue in the debugger:
lo: 10936052917074306677 The error message later in the test:
So the expected value looks good, but the actual value is corrupted. Maybe there is a problem in the code of E2EFilterTest related to write/read. |
expected:-52993621163942803648006051651297827211: |
Created an issue for the issue above #6317 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There shouldn't be any commits that just fix the format issues from previous commits. Please merge it with the previous one.
Add a separate commit to rewrite ColumnStats<StringView>::makeRangeFilter
and add ColumnStats<StringView>::makeRandomFilter
.
Separate the decimal benchmark and varchar benchmark into two commits.
Can you please add the benchmark result as a comment block at the end of the benchmark? Be sure to use release build, and include your hardware spec.
30f1e29
to
4be054e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please merge the last commit "Change the callers of original makeRangeFilter()." into the first one.
Many of the commit message lines are too long. Please take a look at https://gist.github.com/robertpainsi/b632364184e70900af4ab688decf6f53 and update them accordingly.
Simplified/shortened the commit messages and made the change according to the comments. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you merge "Merge branch 'main' into addTypes" into the proper commit of the 4 previous commits? It's not good to have a commit that just merge branch.
@yiweiHeOSS please rebase to the lastest. Looks like this is over 2 weeks old already. Any reason the 4 commits were not squashed? Best to squash and provide a comprehensive description for the final commit (as found in this PR description). Please let me know if you need help with that. |
@Yuhta has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
Conbench analyzed the 1 benchmark run on commit There were no benchmark performance regressions. 🎉 The full Conbench report has more details. |
This PR is to fix #6247.
So far the Benchmark test has 4 types, this PR added more types to the BenchmarkParquetReader test:
We should test them with different filter rates and null rates just like the previous tests in BenchmarkParquetReader.
Also, I noticed we have never done the filter test for the type HUGEINT (int128_t) before, which is the actual type of LongDecimalType. So this PR also implemented the code to generate the filter of the type HUGEINT (int128_t) and modify the code to generate the data of the type HUGEINT (int128_t) correctly.
While developing the code, we found there is another problem/enhancement we need to do, so I created an issue for it #6248 This could be the next step.