Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Writing ORC files with KvikIO is 5x slower #12780

Closed
GregoryKimball opened this issue Feb 15, 2023 · 1 comment · Fixed by #12841
Closed

[BUG] Writing ORC files with KvikIO is 5x slower #12780

GregoryKimball opened this issue Feb 15, 2023 · 1 comment · Fixed by #12841
Assignees
Labels
bug Something isn't working cuIO cuIO issue libcudf Affects libcudf (C++/CUDA) code. Performance Performance related issue

Comments

@GregoryKimball
Copy link
Contributor

GregoryKimball commented Feb 15, 2023

Describe the bug
When running the orc_write_io_compression libcudf benchmarks, using LIBCUDF_CUFILE_POLICY=KVIKIO causes the benchmarks to run 5x slower.

When set to OFF or GDS the runtimes are 500-700 ms, however when set to KVIKIO the runtimes are >2.5 s.

LIBCUDF_CUFILE_POLICY										OFF	GDS	KVIKIO
orc_write_io_compression [Device=0 io=FILEPATH compression=NONE cardinality=0 run_length=1]	708	712	5529
orc_write_io_compression [Device=0 io=FILEPATH compression=NONE cardinality=1000 run_length=1]	749	731	2713
orc_write_io_compression [Device=0 io=FILEPATH compression=NONE cardinality=0 run_length=32]	517	534	2468
orc_write_io_compression [Device=0 io=FILEPATH compression=NONE cardinality=1000 run_length=32]	519	516	2519

Steps/Code to reproduce bug
Build from source (./build.sh libcudf benchmarks) and run:

export WORKSPACE=/raid && export CUDF_BENCHMARK_DROP_CACHE=1 && export LIBCUDF_CUFILE_POLICY=KVIKIO && ./ORC_WRITER_NVBENCH --benchmark 1 --devices 0 --profile --axis io=FILEPATH --axis compression=NONE

export WORKSPACE=/raid && export CUDF_BENCHMARK_DROP_CACHE=1 && export LIBCUDF_CUFILE_POLICY=GDS && ./ORC_WRITER_NVBENCH --benchmark 1 --devices 0 --profile --axis io=FILEPATH --axis compression=NONE

export WORKSPACE=/raid && export CUDF_BENCHMARK_DROP_CACHE=1 && export LIBCUDF_CUFILE_POLICY=OFF && ./ORC_WRITER_NVBENCH --benchmark 1 --devices 0 --profile --axis io=FILEPATH --axis compression=NONE

Expected behavior
I expect KVIKIO to be similar or better performance than the libcudf default data sink.

Environment overview (please complete the following information)
I collected these numbers with 3c39be5a9 and docker image 0022659d9d65 from rapidsai-dev-nightly

Additional context
The extra time is spent in the write step. The encoding kernels are not impacted.

KVIKIO
image

OFF
image

I also tried using the tuning parameters KVIKIO_TASK_SIZE, KVIKIO_NTHREADS and KVIKIO_COMPAT_MODE but could not recover performance completely. Increasing the number of threads showed a positive impact.

LIBCUDF_CUFILE_POLICY										OFF	GDS	KVIKIO	KVIKIO	KVIKIO	KVIKIO	KVIKIO	KVIKIO	KVIKIO	KVIKIO	KVIKIO	KVIKIO
KVIKIO_NTHREADS														1	16	64					16	16
KVIKIO_TASK_SIZE															4194304	16777216 67108864			67108864
KVIKIO_COMPAT_MODE																			ON	ON	ON
												
orc_write_io_compression [Device=0 io=FILEPATH compression=NONE cardinality=0 run_length=1]	708	712	5529	5453	3946	3913	5566	5482	5496	2650	1052	965
orc_write_io_compression [Device=0 io=FILEPATH compression=NONE cardinality=1000 run_length=1]	749	731	2713	2741	1043	958	2727	2723	2728	2736	1022	1005
orc_write_io_compression [Device=0 io=FILEPATH compression=NONE cardinality=0 run_length=32]	517	534	2468	2455	727	647	2476	2475	2474	2448	709	728
orc_write_io_compression [Device=0 io=FILEPATH compression=NONE cardinality=1000 run_length=32]	519	516	2519	2543	712	627	2532	2541	2534	2521	733	707

@GregoryKimball GregoryKimball added bug Something isn't working libcudf Affects libcudf (C++/CUDA) code. cuIO cuIO issue Performance Performance related issue labels Feb 15, 2023
@GregoryKimball
Copy link
Contributor Author

@madsbk would you please investigate the poor write throughput for the ORC writer with KVIKIO to file data sinks? Host buffers and device buffers are not impacted.

@GregoryKimball GregoryKimball changed the title [BUG] ORC writer with KvikIO is 5x slower than without KvikIO [BUG] Writing ORC files with KvikIO is 5x slower Feb 15, 2023
rapids-bot bot pushed a commit that referenced this issue Mar 13, 2023
For small reads and writes the overhead of using cuFile and/or KvikIO becomes significant. This PR introduces the threshold already used by the `GDS` to the `KVIKIO` backend as well. 

Closes #12780

### Future work
Let's optimize KvikIO for small reads and writes so we don't need this threshold. 
Tracking here: rapidsai/kvikio#178

#

Authors:
  - Mads R. B. Kristensen (https://github.com/madsbk)

Approvers:
  - Vukasin Milovanovic (https://github.com/vuule)
  - Nghia Truong (https://github.com/ttnghia)

URL: #12841
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working cuIO cuIO issue libcudf Affects libcudf (C++/CUDA) code. Performance Performance related issue
Projects
Archived in project
Development

Successfully merging a pull request may close this issue.

2 participants