Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEA] Split experimental/row_operators.cuh #11012

Open
ttnghia opened this issue May 31, 2022 · 16 comments
Open

[FEA] Split experimental/row_operators.cuh #11012

ttnghia opened this issue May 31, 2022 · 16 comments
Labels
libcudf Affects libcudf (C++/CUDA) code. proposal Change current process or code

Comments

@ttnghia
Copy link
Contributor

ttnghia commented May 31, 2022

Currently, experimental/row_operators.cuh file is huge: over 1200LOC and is growing. Including the entire file into a source file that only uses a small part of it is burdensome. I also feel dizzy while navigating it trying to find a struct/function of interest.

We should split it up (and re-organize files) into separate files like:

  • experimental/row_operators/common.cuh
  • experimental/row_operators/lexicographic.cuh
  • experimental/row_operators/equality.cuh
  • experimental/row_operators/hashing.cuh
  • ...
@ttnghia ttnghia added feature request New feature or request Needs Triage Need team to review and classify labels May 31, 2022
@github-actions github-actions bot added this to Needs prioritizing in Feature Planning May 31, 2022
@ttnghia
Copy link
Contributor Author

ttnghia commented May 31, 2022

Tag directly related devs: @devavret @jrhemstad @bdice.

@devavret
Copy link
Contributor

The only reason it was one monolithic header is because it was a replacement of the single row_operators.cuh header. Splitting it is fine. We might need to keep equality and hashing in the same header because they use each other's pre-processed table

@ttnghia
Copy link
Contributor Author

ttnghia commented May 31, 2022

Then the hashing header can just include the equality header. Some source files just need equality comparison.

Wait: Using each other table? Then I'm not sure...

@bdice
Copy link
Contributor

bdice commented May 31, 2022

This idea seems fine to me. We'll want to make sure that the designs stay parallel (wherever possible) despite being in separate files.

@ttnghia
Copy link
Contributor Author

ttnghia commented May 31, 2022

Thanks all. Then I'll go ahead and do this split. Will have a PR up shortly.

@jrhemstad
Copy link
Contributor

I'm fine with this, but let's wait on splitting them up until we are ready to move them out of "experimental". Then we can split them as we move their namespaces.

@GregoryKimball GregoryKimball added code quality libcudf Affects libcudf (C++/CUDA) code. proposal Change current process or code and removed Needs Triage Need team to review and classify feature request New feature or request labels Jun 29, 2022
@github-actions
Copy link

This issue has been labeled inactive-30d due to no recent activity in the past 30 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed. This issue will be labeled inactive-90d if there is no activity in the next 60 days.

@vyasr
Copy link
Contributor

vyasr commented Oct 21, 2022

I believe that addressing this is blocked on #11844, or at least should be conditioned on addressing that issue. Once the newer comparators are used everywhere, we can remove the old comparators and move the new ones into the places that we want them.

@divyegala
Copy link
Member

@vyasr @bdice @ttnghia @GregoryKimball

I am about to go ahead and start on this issue, please let me know your thoughts. I prefer breaking these up as:

  • lexicographical/row_operators.cuh
  • equality/row_operators.cuh
  • hash/row_operators.cuh

The reason I prefer to split them up this way is that we will be mirroring the sub-directory structure with the namespaces.

Any common code can be present in row_operators_common.cuh/hpp.

Furthermore, in my PR I will add some redundancy if any code in experimental relies on the current version of row_operators.cuh by copying over those sections. By doing this, when we are ready to drop experimental, we just delete the old row_operators.cuh, move up contents of experimental to the parent directory, and perform simple find-and-replace to remove experimental:: from the call sites.

@vyasr
Copy link
Contributor

vyasr commented Feb 10, 2023

I have a slight preference for subdirectories over underscores as well, and I am fine with the temporary duplication.

@bdice
Copy link
Contributor

bdice commented Feb 10, 2023

This proposal sounds fine to me. Roughly how far are we from migrating entirely to "experimental" comparators? Are there any hard cases left, or are all the remaining instances relatively straightforward?

@divyegala
Copy link
Member

@bdice there are a couple of hard cases left. The most notorious one being joins. The actual update is easy, of course, but it's going to be difficult testing each and every single join for nested types in terms of it being a mechanical, time consuming process.

That being said, I'm hoping we can drop experimental by the very end of 23.04 release, or very start of 23.06 release.

@ttnghia
Copy link
Contributor Author

ttnghia commented Feb 10, 2023

So you're proposing:

lexicographic/row_operators.cuh
equality/row_operators.cuh
...

instead of

row_operators/common.cuh
row_operators/lexicographic.cuh
...

Then what other files will be in these directories (lexicographic/, equality/, etc)? And where the row_operators_common.cuh/hpp files will be placed?

@divyegala
Copy link
Member

divyegala commented Feb 10, 2023

@ttnghia no other files for now, but giving us the option to have comparator-specific files to put in these sub-directories in the future.

row_operators_common.cuh/hpp file will be in the parent directory, which in this case is: cudf/include/table/

I suggest this because the namespacing is super confusing right now. row::lexicographic::... but we have no directories called row or lexicographic

@ttnghia
Copy link
Contributor Author

ttnghia commented Feb 10, 2023

The namespace does not always align with directories. You can see many files like cudf/detail/something.hpp vs cudf/something/detail/something_detail.hpp which all have the same namespace hierarchy.

@divyegala
Copy link
Member

That is a specific structure of the detail API that libcudf follows - I don't see where else cudf doesn't follow namespace <-> directory.

Alternatively, I would also be okay with row/lexicographic.cuh to follow the namespace row::lexicographic.

While I have been doing these updates to the experimental API, I always forget what the actual namespace of the comparators are because they don't make any sense when it comes to the directory/file naming.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
libcudf Affects libcudf (C++/CUDA) code. proposal Change current process or code
Projects
No open projects
Feature Planning
Needs prioritizing
Development

No branches or pull requests

7 participants