Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Distributed RDataFrame: add support for NodeProxy.GetColumnNames #15476

Closed

Conversation

gwmyers
Copy link

@gwmyers gwmyers commented May 10, 2024

This Pull request:

Update the distributed RDataFrame to use the HeadNode _localrdf to keep track of transform operations, and add the ability to call GetColumnNames on a NodeProxy object.

Changes or fixes:

  • Add GetColumnNames method to NodeProxy which calls the HeadNode GetColumnNames method
  • Propagate transform operations to the HeadNode _localrdf, mainly to keep track of user-defined columns

Checklist:

  • tested changes locally
  • updated the docs (if necessary)

This PR fixes #15442

@vepadulano
Copy link
Member

@gwmyers sorry for the late reply and thank you so much for this contribution 😄 ! I agree with the implementation, I took the liberty of modifying locally your changes also including a test. I would like to push them to your branch to update this PR if you are okay with it, let me know 👍 .

@gwmyers
Copy link
Author

gwmyers commented May 28, 2024

Hi @vepadulano , excellent! Thanks! Please feel free to push your updates to my branch.

Greg Myers and others added 2 commits May 28, 2024 17:26
Propagate the booking of transformations to the internal RDataFrame object
stored in the headnode and expose the `GetColumnNames` method through the proxy
interface.
Copy link

Test Results

    12 files      12 suites   2d 19h 33m 12s ⏱️
 2 642 tests  2 638 ✅ 0 💤  4 ❌
29 977 runs  29 937 ✅ 0 💤 40 ❌

For more details on these failures, see this check.

Results for commit 826446f.

@vepadulano
Copy link
Member

A few more changes were needed, also touching a couple of tests in roottest, so I opened another PR to follow up this one. Let's see what the CI has to say over there 👍

@vepadulano
Copy link
Member

The linked PR was merged, closing this PR now. @gwmyers thanks again for kickstarting this fix :D!

@vepadulano vepadulano closed this Jun 3, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Distributed RDataFrame does not see all defined column names
2 participants