Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DF] Fix snapshot of array branch with out-of-order size branch #10934

Merged
merged 7 commits into from
Jul 24, 2022

Conversation

eguiraud
Copy link
Member

@eguiraud eguiraud commented Jul 8, 2022

This PR fixes #10920 and #6932 (see those for more information).

@eguiraud eguiraud self-assigned this Jul 8, 2022
@phsft-bot
Copy link
Collaborator

Starting build on ROOT-debian10-i386/soversion, ROOT-performance-centos8-multicore/cxx17, ROOT-ubuntu18.04/nortcxxmod, ROOT-ubuntu2004/python3, mac1015/cxx17, mac11/cxx14, windows10/cxx14
How to customize builds

@phsft-bot
Copy link
Collaborator

Starting build on ROOT-debian10-i386/soversion, ROOT-performance-centos8-multicore/cxx17, ROOT-ubuntu18.04/nortcxxmod, ROOT-ubuntu2004/python3, mac1015/cxx17, mac11/cxx14, windows10/cxx14
How to customize builds

Sometimes, when users specify an explicit list of columns to be
written out by a Snapshot, they might forget to list the names of
branches the store the sizes of array branches (e.g. `sz` for a
branch `x` with leaflist `x[sz]/F`).

We now explicitly check for this case and provide a (hopefully)
helpful error message.
Even when users do not specify them explicitly, we now add size
branches required by array branches to the list of branches that
will be written out by a Snapshot.
@phsft-bot
Copy link
Collaborator

Starting build on ROOT-debian10-i386/soversion, ROOT-performance-centos8-multicore/cxx17, ROOT-ubuntu18.04/nortcxxmod, ROOT-ubuntu2004/python3, mac1015/cxx17, mac11/cxx14, windows10/cxx14
How to customize builds

@eguiraud
Copy link
Member Author

@vepadulano @Axel-Naumann with the last commit the situation is now the following:

explicit column list passed, a necessary size branch was omitted:

  • compiled Snapshot throws an exception with a (hopefully) clear error message
  • jitted Snapshot silently adds the required size branches

explicit column list passed, a necessary size branch is listed after the branch that needs it:

  • compiled and jitted Snapshot both work

no explicit column list passed:

  • this is only possible with jitted Snapshots and this now works despite the fact that Snapshot reorders the column names in alphabetical order (which might put a size branch after the branch that needs it). In a subsequent PR I'll try to go back to Snapshot using the same ordering for the output columns as the ordering of the input columns -- it's not super straightforward.

@eguiraud eguiraud added this to the 6.26/06 milestone Jul 20, 2022
@phsft-bot
Copy link
Collaborator

Build failed on ROOT-debian10-i386/soversion.
Running on pcepsft10.dyndns.cern.ch:/build/workspace/root-pullrequests-build
See console output.

Failing tests:

@phsft-bot
Copy link
Collaborator

Build failed on mac11/cxx14.
Running on macphsft23.dyndns.cern.ch:/Users/sftnight/build/workspace/root-pullrequests-build
See console output.

Failing tests:

@phsft-bot
Copy link
Collaborator

Build failed on windows10/cxx14.
Running on null:C:\build\workspace\root-pullrequests-build
See console output.

Failing tests:

@phsft-bot
Copy link
Collaborator

Build failed on mac1015/cxx17.
Running on macitois22.dyndns.cern.ch:/Users/sftnight/build/workspace/root-pullrequests-build
See console output.

Failing tests:

Copy link
Member

@Axel-Naumann Axel-Naumann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

const auto counterStr =
leaf->GetLeafCount() ? std::string(leaf->GetLeafCount()->GetName()) : std::to_string(leaf->GetLenStatic());
auto *sizeLeaf = leaf->GetLeafCount();
const auto sizeLeafName = sizeLeaf ? std::string(sizeLeaf->GetName()) : std::to_string(leaf->GetLenStatic());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI, for a leaf [4][4], leaf->GetLenStatic() returns 16. I guess that's good?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I...don't know, we don't have test coverage for 2-dimensional arrays in RDF

continue; // not a variable-sized array or the size branch is already there, nothing to do

// otherwise we must insert the size in colsWithoutAliases _and_ colsWithAliases
colsWithoutAliases.insert(colsWithoutAliases.begin() + i, countLeaf->GetName());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This means you insert the size column for branch X right before branch X, and then you check that branch X again in the next for loop iteration. Is that intentional?

Copy link
Member Author

@eguiraud eguiraud Jul 22, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah thank you , I guess we can also increment the counter to avoid the redundant work, will do

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

tree/dataframe/src/RDFInterfaceUtils.cxx Show resolved Hide resolved

// make input tree
{
TFile f(inFile, "recreate");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it make sense to use in-memory trees or TMemFile for these kind of tests?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes in principle it makes sense, in practice I don't know if it works e.g. with TTreeProcessorMT (which has to re-open the file in each thread). I noted it down to try and convert all our RDF tests to use TMemFiles whenever possible, but I'd address this separately.

Co-authored-by: Axel Naumann <Axel.Naumann@cern.ch>
@phsft-bot
Copy link
Collaborator

Starting build on ROOT-debian10-i386/soversion, ROOT-performance-centos8-multicore/cxx17, ROOT-ubuntu18.04/nortcxxmod, ROOT-ubuntu2004/python3, mac1015/cxx17, mac11/cxx14, windows10/cxx14
How to customize builds

Co-authored-by: Axel Naumann <Axel.Naumann@cern.ch>
@phsft-bot
Copy link
Collaborator

Starting build on ROOT-debian10-i386/soversion, ROOT-performance-centos8-multicore/cxx17, ROOT-ubuntu18.04/nortcxxmod, ROOT-ubuntu2004/python3, mac1015/cxx17, mac11/cxx14, windows10/cxx14
How to customize builds

@phsft-bot
Copy link
Collaborator

Build failed on ROOT-performance-centos8-multicore/cxx17.
Running on olbdw-01.cern.ch:/data/sftnight/workspace/root-pullrequests-build
See console output.

Failing tests:

@phsft-bot
Copy link
Collaborator

Build failed on ROOT-ubuntu2004/python3.
Running on root-ubuntu-2004-1.cern.ch:/home/sftnight/build/workspace/root-pullrequests-build
See console output.

Failing tests:

@eguiraud eguiraud merged commit 56c28ff into root-project:master Jul 24, 2022
@eguiraud eguiraud deleted the fix-snapshot-size-columns branch July 24, 2022 03:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[DF] RDataFrame confused by array variables in 6.26/04
4 participants