Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEA] Experiment try to unify implementation of non-nested types and nested types in cudf::contains #11016

Open
ttnghia opened this issue Jun 1, 2022 · 3 comments
Labels
feature request New feature or request libcudf Affects libcudf (C++/CUDA) code. proposal Change current process or code

Comments

@ttnghia
Copy link
Contributor

ttnghia commented Jun 1, 2022

Currently, cudf::contains implements two separate code paths for non-nested types and nested types. That requires more effort to maintain all of them, comparing to just have one code path that can handle both.

Recently, there was an attempt to use just one code path to handle all data type (ref: #10770 (comment)) in groupby key hashing, which reported no performance regression on basic data types. As such, we should also test and try to unify the two code paths of cudf::contains.

Having just one code path that handles everything, maintenance will be much easier.

@ttnghia ttnghia added feature request New feature or request Needs Triage Need team to review and classify labels Jun 1, 2022
@github-actions github-actions bot added this to Needs prioritizing in Feature Planning Jun 1, 2022
@ttnghia
Copy link
Contributor Author

ttnghia commented Jun 1, 2022

There was mentioned that the current cudf's customized data structure unordered_multiset is still much faster than cuco::static_map. Since the code path for handling nested types relies on cuco::static_map, we can only address this issue once we can safely switch to using cuco::static_map without much performance regression.

@GregoryKimball GregoryKimball added proposal Change current process or code code quality libcudf Affects libcudf (C++/CUDA) code. and removed Needs Triage Need team to review and classify labels Jun 29, 2022
@github-actions
Copy link

This issue has been labeled inactive-30d due to no recent activity in the past 30 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed. This issue will be labeled inactive-90d if there is no activity in the next 60 days.

@vyasr
Copy link
Contributor

vyasr commented Oct 21, 2022

The other part of what needs to happen here (in addition to resolving performance regressions caused by differences between cuco::static_map and concurrent_unordered_map) is to introduce the same sort of has_nested templating to the equality comparators as I did for the lexicographic comparator. Then, rather than maintaining a completely separate code path in contains where one uses the old comparator and one uses the new one, we could use the same comparator in both cases.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request New feature or request libcudf Affects libcudf (C++/CUDA) code. proposal Change current process or code
Projects
No open projects
Feature Planning
Needs prioritizing
Development

No branches or pull requests

3 participants