Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEA] Add 64-bit size type option at build-time for libcudf #13159

Open
10 tasks
GregoryKimball opened this issue Apr 17, 2023 · 2 comments
Open
10 tasks

[FEA] Add 64-bit size type option at build-time for libcudf #13159

GregoryKimball opened this issue Apr 17, 2023 · 2 comments
Labels
0 - Backlog In queue waiting for assignment feature request New feature or request libcudf Affects libcudf (C++/CUDA) code.

Comments

@GregoryKimball
Copy link
Contributor

GregoryKimball commented Apr 17, 2023

Many libcudf users have expressed interest in using a 64-bit size type (see #3958 for reference). The cudf::size_type uses a int32_t data type that limits the number of elements in libcudf columns to INT_MAX (2.1 billion) elements. For string columns this imposes a ~2 GB limit, for int32 columns this imposes a ~8 GB limit, and for list columns this imposes a leaf element count <2.1 billion. Downstream libraries must partition their data to avoid these limits.

We expect that using a 64-bit size type will incur significant penalties to memory footprint and data throughput. Memory footprint will double for all offset vectors, and runtime of most functions will increase due to the larger data sizes. Kernel performance may degrade even further due to increased register count and unoptimized shared memory usage.

As GPUs increase in memory, the limit from a 32-bit cudf::size_type will force data partitions to become smaller fractions of device memory. Excessive data partitioning also leads to performance penalties, so libcudf should enable its community to start experimenting with a 64-bit size type. Scoping for 64-bit size types in the cuDF-python layer will be tracked in a separate issue (#TBD).

  • Consult with thrust/cub experts about outstanding issues with 64-bit indexing. Some libcudf functions may depend on upstream changes in CCCL, please see cccl/47, thrust/1271, and cub/212. copy_if, reduce, parallel_for, merge and sort may have unresolved issues.
  • Consult with thrust/cub experts about making 32-bit kernels optional. Currently the 64-bit kernels and disabled in libcudf builds. Disabling the 32-bit kernels would avoid large increases in compile time and binary size when we enable 64-bit thrust/cub kernels.
  • Verify compatibility of 64-bit size type with cuco data structures (needs additional scoping)
  • Audit custom kernels in libcudf for the impact of a 64-big size type. Introduce conditional logic to adjust shared memory allocations and threads per block as needed based on the size type. Identify implementation details that take a 32-bit size type for granted.
  • Audit cuIO size types and their interaction with cudf::size_type
  • Resolve compilation errors from using a 64-bit size type
  • Resolve test failures from using a 64-bit size type
  • Review performance impact of a 64-bit size type using libcudf microbenchmark results
  • Add a build-time option for advanced users to use a 64-bit size type instead of a 32-bit size type.
  • Add a CI step to build and test the 64-bit size type option.

From this stage we will have a better sense of the impact and value of using a 64-bit size type with libcudf.

@GregoryKimball GregoryKimball added feature request New feature or request 0 - Backlog In queue waiting for assignment libcudf Affects libcudf (C++/CUDA) code. labels Apr 17, 2023
@GregoryKimball GregoryKimball changed the title [FEA] Add a build-time option for libcudf to use a 64-bit size type [FEA] Add 64-bit size type option at build-time option for libcudf Jul 22, 2023
@GregoryKimball GregoryKimball changed the title [FEA] Add 64-bit size type option at build-time option for libcudf [FEA] Add 64-bit size type option at build-time for libcudf Aug 2, 2023
@GregoryKimball
Copy link
Contributor Author

GregoryKimball commented Sep 1, 2023

On branch-23.10 commit ad9fa501192, I ran build.sh libcudf with a 64-bit size type and identified the unique lines that threw compilation errors.

Dictionary errors

cudf/cpp/src/groupby/sort/group_single_pass_reduction_util.cuh(64): error: no suitable conversion function from "cudf::dictionary32" to "cudf::size_type" exists
cudf/cpp/include/cudf/dictionary/detail/iterator.cuh(88): error: no suitable conversion function from "cudf::dictionary32" to "cudf::size_type" exists
cudf/cpp/include/cudf/detail/aggregation/aggregation.cuh(346): error: no suitable conversion function from "cudf::dictionary32" to "cudf::size_type" exists
cudf/cpp/src/groupby/sort/group_correlation.cu(87): error: no suitable conversion function from "cudf::dictionary32" to "cudf::size_type" exists
cudf/cpp/src/groupby/hash/multi_pass_kernels.cuh(107): error: no suitable conversion function from "cudf::dictionary32" to "cudf::size_type" exists
cudf/cpp/include/cudf/dictionary/detail/iterator.cuh(41): error: no suitable conversion function from "cudf::dictionary32" to "cudf::size_type" exists

AtomicAdd errors

cudf/cpp/include/cudf/detail/copy_if_else.cuh(97): error: no instance of overloaded function "atomicAdd" matches the argument list
cudf/cpp/include/cudf/detail/null_mask.cuh(108): error: no instance of overloaded function "atomicAdd" matches the argument list
cudf/cpp/src/groupby/sort/group_std.cu(153): error: no instance of overloaded function "atomicAdd" matches the argument list
cudf/cpp/src/bitmask/null_mask.cu(303): error: no instance of overloaded function "atomicAdd" matches the argument list
cudf/cpp/include/cudf/detail/valid_if.cuh(68): error: no instance of overloaded function "atomicAdd" matches the argument list
cudf/cpp/src/io/csv/csv_gpu.cu(274): error: no instance of overloaded function "atomicAdd" matches the argument list
cudf/cpp/src/io/csv/csv_gpu.cu(375): error: no instance of overloaded function "atomicAdd" matches the argument list
cudf/cpp/src/io/csv/csv_gpu.cu(207): error: no instance of overloaded function "atomicAdd" matches the argument list
cudf/cpp/src/io/csv/csv_gpu.cu(204): error: no instance of overloaded function "atomicAdd" matches the argument list
cudf/cpp/src/io/csv/csv_gpu.cu(264): error: no instance of overloaded function "atomicAdd" matches the argument list
cudf/cpp/src/io/csv/csv_gpu.cu(266): error: no instance of overloaded function "atomicAdd" matches the argument list
cudf/cpp/src/io/csv/csv_gpu.cu(281): error: no instance of overloaded function "atomicAdd" matches the argument list
cudf/cpp/src/io/csv/csv_gpu.cu(209): error: no instance of overloaded function "atomicAdd" matches the argument list
cudf/cpp/src/io/csv/csv_gpu.cu(283): error: no instance of overloaded function "atomicAdd" matches the argument list

Thrust

cudf/cpp/src/groupby/groupby.cu(269): error: no instance of overloaded function "std::transform" matches the argument list
cudf/cpp/src/copying/contiguous_split.cu(631): error: no instance of overloaded function "std::transform" matches the argument list
cudf/cpp/src/groupby/groupby.cu(304): error: no instance of overloaded function "std::all_of" matches the argument list
cudf/cpp/src/hash/md5_hash.cu(343): error: no instance of overloaded function "thrust::for_each" matches the argument list
cudf/cpp/include/cudf/lists/detail/scatter.cuh(245): error: no instance of overloaded function "thrust::sequence" matches the argument list
cudf/cpp/src/groupby/groupby.cu(313): error: no instance of overloaded function "std::transform" matches the argument list
cudf/cpp/src/filling/repeat.cu(124): error: no instance of overloaded function "thrust::upper_bound" matches the argument list

Device span errors

cudf/cpp/include/cudf/table/experimental/row_operators.cuh(848): error: no instance of constructor "std::optional<_Tp>::optional [with _Tp=cudf::device_span<const int, 18446744073709551615UL>]" matches the argument list
cudf/cpp/include/cudf/table/experimental/row_operators.cuh(848): error: no instance of constructor "std::optional<_Tp>::optional [with _Tp=cudf::device_span<const int32_t, 18446744073709551615UL>]" matches the argument list

int typing errors

cudf/cpp/src/binaryop/compiled/binary_ops.cuh(272): error: no instance of function template "cudf::util::div_rounding_up_safe" matches the argument list
cudf/cpp/include/cudf/detail/utilities/cuda.cuh(169): error: no instance of overloaded function "std::clamp" matches the argument list

Assorted errors

cudf/cpp/src/hash/spark_murmurhash3_x86_32.cu(230): error: no instance of overloaded function "std::max" matches the argument list
cudf/cpp/src/copying/purge_nonempty_nulls.cu(93): error: no instance of function template "cudf::detail::gather" matches the argument list
cudf/cpp/include/cudf/detail/copy_if.cuh(166): error: more than one instance of overloaded function "min" matches the argument list:

@revans2
Copy link
Contributor

revans2 commented Sep 1, 2023

The java code right now hard codes a signed 32-bits as the size type in many places. We can switch it to 64-bits everywhere along with a dynamic check depending on how the code is compiled. But also just so you are aware Spark has a top level limitation of a singed 32-bit int for the number of rows in a table. We can work around this in some places, but moving the Spark plugin over to a 64-bit index is not going to be super simple.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
0 - Backlog In queue waiting for assignment feature request New feature or request libcudf Affects libcudf (C++/CUDA) code.
Projects
Status: In Progress
Status: To be revisited
Development

No branches or pull requests

2 participants