[BUG] String column copy performs a full gather #6803
Labels
bug
Something isn't working
libcudf
Affects libcudf (C++/CUDA) code.
Performance
Performance related issue
Spark
Functionality that helps Spark RAPIDS
strings
strings issues (C++ and Python)
Describe the bug
While analyzing an Nsight trace of a Spark query, I noticed 10ms being spent on a filter. Digging deeper, the filter time was spent in a
cudf::table
copy constructor, with most of the time being spent copying one string column within the table.Spark queries often filter input data to remove nulls before further processing, and often nothing is filtered.
cudf::apply_boolean_mask
, used to implement the Spark filter, has short-circuit logic to avoid a full gather and instead copy-constructs the output table if it realizes nothing will be filtered. However copy-constructing a string column performs a full gather which can be much more expensive than simply copying the input column buffers.In this specific case, the string column consists of 178200 rows and each row is around 200 characters. The string copy constructor took almost 10 milliseconds on the GPU despite the column containing approximately 30MB of device memory.
Steps/Code to reproduce bug
Perform a copy of a string column where there are many strings that are relatively long (e.g.: 200+ characters per row). I've attached a gzipped Parquet file containing the string column from the specific filter case mentioned above for reference.
filtertest.parquet.gz
Expected behavior
A copy of a strings column view when the view starts at base offset 0 should be performed at device memory speed, copy the underlying buffers rather than performing the more complicated gather computation (which also requires synchronizing and a DtoH transfer).
The text was updated successfully, but these errors were encountered: