Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] ContiguousSplitUntypedTest fails when run with the arena allocator #11249

Closed
jlowe opened this issue Jul 12, 2022 · 5 comments · Fixed by #11706
Closed

[BUG] ContiguousSplitUntypedTest fails when run with the arena allocator #11249

jlowe opened this issue Jul 12, 2022 · 5 comments · Fixed by #11706
Labels
bug Something isn't working libcudf Affects libcudf (C++/CUDA) code. tests Unit testing for project

Comments

@jlowe
Copy link
Member

jlowe commented Jul 12, 2022

Describe the bug
ContiguousSplitUntypedTest fails when run with the arena allocator but passes when run with the pool or cuda allocators:

[ RUN      ] ContiguousSplitUntypedTest.CalculationOverflow
unknown file: Failure
C++ exception with description "std::bad_alloc: out_of_memory: RMM failure at:/home/jlowe/src/spark-rapids-jni/thirdparty/cudf/cpp/build/_deps/rmm-src/include/rmm/mr/device/arena_memory_resource.hpp:159: Maximum pool size exceeded" 
thrown in the test body.
[  FAILED  ] ContiguousSplitUntypedTest.CalculationOverflow (8 ms)

Steps/Code to reproduce bug
Run cpp/build/gtests/COPYING_TEST --rmm_mode=arena

Expected behavior
Tests should pass with any supported RMM memory resource.

@jlowe jlowe added bug Something isn't working Needs Triage Need team to review and classify tests Unit testing for project libcudf Affects libcudf (C++/CUDA) code. labels Jul 12, 2022
@github-actions github-actions bot added this to Needs prioritizing in Bug Squashing Jul 12, 2022
@github-actions
Copy link

This issue has been labeled inactive-30d due to no recent activity in the past 30 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed. This issue will be labeled inactive-90d if there is no activity in the next 60 days.

@davidwendt
Copy link
Contributor

This appears to be an out of memory error which is not a problem with the algorithm. The COPYING_TEST runs fine on my 48GB GPU with the arena allocator. The max memory required for COPYING_TEST appears to be about 25GB with the arena allocator.
Was is the GPU stats where this error occurs? Can you run this on a larger GPU?

@github-actions
Copy link

This issue has been labeled inactive-30d due to no recent activity in the past 30 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed. This issue will be labeled inactive-90d if there is no activity in the next 60 days.

@jlowe
Copy link
Member Author

jlowe commented Sep 14, 2022

I ran this on a 16GB V100, so I guess if tests are expected to require more than that this is "working as designed." However it seems a bit excessive to need that much memory for a test.

@davidwendt
Copy link
Contributor

I agree. Actually, I don't think we should have this specific gtest. I'm inclined to disable it or remove it altogether.

Bug Squashing automation moved this from Needs prioritizing to Closed Sep 22, 2022
rapids-bot bot pushed a commit that referenced this issue Sep 22, 2022
Disables a `ContiguousSplitUntypedTest` that simply creates a very large (over 3GB) column to test the output buffer size does not overflow. The gtests ends requiring 25GB of device memory when used with the arena allocator as mentioned in #11249. Very large columns like this should be not part of the unit test for libcudf.
This PR disables the test so it can be available for testing on specific conditions outside of CI.

Closes #11249

Authors:
  - David Wendt (https://github.com/davidwendt)

Approvers:
  - Nghia Truong (https://github.com/ttnghia)
  - Bradley Dice (https://github.com/bdice)

URL: #11706
@bdice bdice removed the Needs Triage Need team to review and classify label Mar 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working libcudf Affects libcudf (C++/CUDA) code. tests Unit testing for project
Projects
No open projects
Development

Successfully merging a pull request may close this issue.

3 participants