Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor CSV reader benchmarks with nvbench #11678

Merged
merged 5 commits into from
Sep 22, 2022

Conversation

PointKernel
Copy link
Member

@PointKernel PointKernel commented Sep 9, 2022

Description

Closes #10941

This PR refactors the CSV reader benchmarks with nvbench and reduces the number of test cases by isolating data type, IO type, column selection, and row selection.

Example output of the new benchmarks:

Benchmark results ## csv_read_data_type

[0] Quadro RTX 8000

data_type Samples CPU Time Noise GPU Time Noise bytes_per_second peak_memory_usage encoded_file_size
INTEGRAL 5x 1.140 s 0.09% 1.140 s 0.09% 235553841 1.202 GiB 668.564 MiB
FLOAT 5x 1.262 s 0.04% 1.262 s 0.04% 212718321 1.041 GiB 713.885 MiB
DECIMAL 5x 272.787 ms 0.03% 272.784 ms 0.03% 984060406 396.279 MiB 167.951 MiB
TIMESTAMP 7x 1.681 s 0.47% 1.681 s 0.47% 159723724 2.281 GiB 814.268 MiB
DURATION 7x 2.121 s 0.50% 2.121 s 0.50% 126587514 2.588 GiB 971.320 MiB
STRING 19x 496.713 ms 0.50% 496.710 ms 0.50% 540426462 859.526 MiB 277.082 MiB

csv_read_io

[0] Quadro RTX 8000

io Samples CPU Time Noise GPU Time Noise bytes_per_second peak_memory_usage encoded_file_size
FILEPATH 9x 1.185 s 0.49% 1.185 s 0.49% 226466264 1.445 GiB 618.876 MiB
HOST_BUFFER 5x 1.170 s 0.14% 1.170 s 0.14% 229459856 1.445 GiB 618.876 MiB

csv_read_column_selection

[0] Quadro RTX 8000

column_selection row_selection Samples CPU Time Noise GPU Time Noise bytes_per_second peak_memory_usage encoded_file_size
ALL ALL 5x 1.246 s 0.18% 1.246 s 0.18% 215514992 1.582 GiB 653.520 MiB
ALTERNATE ALL 5x 1.128 s 0.08% 1.128 s 0.08% 119009844 1.116 GiB 648.908 MiB
FIRST_HALF ALL 5x 1.143 s 0.07% 1.143 s 0.07% 117443933 1.121 GiB 653.520 MiB
SECOND_HALF ALL 5x 1.152 s 0.16% 1.152 s 0.16% 116478469 1.121 GiB 653.520 MiB

csv_read_row_selection

[0] Quadro RTX 8000

column_selection row_selection num_chunks Samples CPU Time Noise GPU Time Noise bytes_per_second peak_memory_usage encoded_file_size
ALL BYTE_RANGE 1 5x 1.244 s 0.16% 1.244 s 0.16% 215763257 1.582 GiB 653.520 MiB
ALL BYTE_RANGE 8 5x 1.170 s 0.04% 1.170 s 0.04% 229339594 202.596 MiB 653.520 MiB
ALL NROWS 1 5x 1.244 s 0.12% 1.244 s 0.12% 215808401 1.582 GiB 653.520 MiB
ALL NROWS 8 4x 4.560 s inf% 4.560 s inf% 58870122 320.771 MiB 653.520 MiB
ALL SKIPFOOTER 1 5x 1.245 s 0.10% 1.245 s 0.10% 215660012 1.582 GiB 653.520 MiB
ALL SKIPFOOTER 8 3x 7.443 s inf% 7.443 s inf% 36065528 1.269 GiB 653.520 MiB

Checklist

  • I am familiar with the Contributing Guidelines.
  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.

@PointKernel PointKernel added 3 - Ready for Review Ready for review by team libcudf Affects libcudf (C++/CUDA) code. CMake CMake build issue cuIO cuIO issue tech debt improvement Improvement / enhancement to an existing function non-breaking Non-breaking change labels Sep 9, 2022
@PointKernel PointKernel self-assigned this Sep 9, 2022
@PointKernel PointKernel requested a review from a team as a code owner September 9, 2022 20:19
@PointKernel PointKernel added this to PR-WIP in v22.10 Release via automation Sep 9, 2022
@codecov
Copy link

codecov bot commented Sep 9, 2022

Codecov Report

❗ No coverage uploaded for pull request base (branch-22.10@f485667). Click here to learn what that means.
Patch has no changes to coverable lines.

Additional details and impacted files
@@               Coverage Diff               @@
##             branch-22.10   #11678   +/-   ##
===============================================
  Coverage                ?   85.89%           
===============================================
  Files                   ?      151           
  Lines                   ?    23534           
  Branches                ?        0           
===============================================
  Hits                    ?    20214           
  Misses                  ?     3320           
  Partials                ?        0           

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.

v22.10 Release automation moved this from PR-WIP to PR-Reviewer approved Sep 21, 2022
Copy link
Contributor

@vuule vuule left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔥

@PointKernel
Copy link
Member Author

@gpucibot merge

@rapids-bot rapids-bot bot merged commit d4f46fc into rapidsai:branch-22.10 Sep 22, 2022
v22.10 Release automation moved this from PR-Reviewer approved to Done Sep 22, 2022
@PointKernel PointKernel deleted the refactor-csv-bench branch November 16, 2022 20:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3 - Ready for Review Ready for review by team CMake CMake build issue cuIO cuIO issue improvement Improvement / enhancement to an existing function libcudf Affects libcudf (C++/CUDA) code. non-breaking Non-breaking change
Projects
No open projects
Development

Successfully merging this pull request may close these issues.

Separate cuIO IO benchmarks from column type benchmarks
3 participants