Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RDataFrame] RDataFrame.Redefine does not work with Snapshot #10233

Closed
AlkaidCheng opened this issue Mar 25, 2022 · 5 comments
Closed

[RDataFrame] RDataFrame.Redefine does not work with Snapshot #10233

AlkaidCheng opened this issue Mar 25, 2022 · 5 comments
Assignees
Milestone

Comments

@AlkaidCheng
Copy link

After calling the RDataFrame.Redefine method, saving the Snapshot containing the redefined column will raise an error.

Reproducer:

import ROOT
import numpy as np
rdf = ROOT.RDF.MakeNumpyDataFrame({"bar": np.arange(0, 10, 1)})
rdf = rdf.Redefine("bar", "bar + 1")
rdf.Snapshot("output", "output.root")

This gives:

TypeError: Template method resolution failed:
  none of the 3 overloaded methods succeeded. Full details:
  ROOT::RDF::RResultPtr<ROOT::RDF::RInterface<ROOT::Detail::RDF::RLoopManager,void> > ROOT::RDF::RInterface<ROOT::Detail::RDF::RLoopManager,void>::Snapshot(basic_string_view<char,char_traits<char> > treename, basic_string_view<char,char_traits<char> > filename, initializer_list<string> columnList, const ROOT::RDF::RSnapshotOptions& options = ROOT::RDF::RSnapshotOptions()) =>
    TypeError: takes at least 3 arguments (2 given)
  ROOT::RDF::RResultPtr<ROOT::RDF::RInterface<ROOT::Detail::RDF::RLoopManager,void> > ROOT::RDF::RInterface<ROOT::Detail::RDF::RLoopManager,void>::Snapshot(basic_string_view<char,char_traits<char> > treename, basic_string_view<char,char_traits<char> > filename, const vector<string>& columnList, const ROOT::RDF::RSnapshotOptions& options = ROOT::RDF::RSnapshotOptions()) =>
    TypeError: takes at least 3 arguments (2 given)
  ROOT::RDF::RResultPtr<ROOT::RDF::RInterface<ROOT::Detail::RDF::RLoopManager,void> > ROOT::RDF::RInterface<ROOT::Detail::RDF::RLoopManager,void>::Snapshot(basic_string_view<char,char_traits<char> > treename, basic_string_view<char,char_traits<char> > filename, basic_string_view<char,char_traits<char> > columnNameRegexp = "", const ROOT::RDF::RSnapshotOptions& options = ROOT::RDF::RSnapshotOptions()) =>
    logic_error: Error: column "bar" was passed to Snapshot twice. This is not supported: only one of the columns would be readable with RDataFrame.
  ROOT::RDF::RResultPtr<ROOT::RDF::RInterface<ROOT::Detail::RDF::RLoopManager,void> > ROOT::RDF::RInterface<ROOT::Detail::RDF::RLoopManager,void>::Snapshot(basic_string_view<char,char_traits<char> > treename, basic_string_view<char,char_traits<char> > filename, basic_string_view<char,char_traits<char> > columnNameRegexp = "", const ROOT::RDF::RSnapshotOptions& options = ROOT::RDF::RSnapshotOptions()) =>
    logic_error: Error: column "bar" was passed to Snapshot twice. This is not supported: only one of the columns would be readable with RDataFrame.

However, this works:

import ROOT
import numpy as np
rdf = ROOT.RDF.MakeNumpyDataFrame({"bar": np.arange(0, 10, 1)})
rdf = rdf.Redefine("bar", "bar + 1")
rdf.Snapshot("output", "output.root", ["bar"])

ROOT Version: 6.26/00 (conda install), 6.27/01 (swan bleeding edge)

@eguiraud
Copy link
Member

Hi @AlkaidCheng ,
thank you for the report, a fix is coming. In the meanwhile, the workaround is to specify the column names explicitly to Snapshot as you found out.

@eguiraud
Copy link
Member

There are two issues:

  1. Snapshot does not de-duplicate columns from data-sources and columns from Redefines when deciding which columns to write out
  2. the logic in Snapshot that creates output branches assumes that if there is an input branch for column "x", then that must be what we want to write out (which was the case until Redefine was introduced)

1 is easy to fix, I'm now looking into fixing 2. We definitely want these fixes in 6.26.02.

eguiraud added a commit to eguiraud/root that referenced this issue Apr 4, 2022
eguiraud added a commit to eguiraud/root that referenced this issue Apr 4, 2022
@eguiraud
Copy link
Member

eguiraud commented Apr 4, 2022

This issue should be fixed by #10319

@eguiraud
Copy link
Member

eguiraud commented Apr 6, 2022

Thank you again for reporting @AlkaidCheng . This is fixed in the master branch (which you can test out via the nightly builds) and the fix will be available in the upcoming v6.26.02 patch release. Let us know if you encounter any further issues.

@eguiraud eguiraud closed this as completed Apr 6, 2022
Neel-Shah-29 pushed a commit to Neel-Shah-29/root-1 that referenced this issue Apr 6, 2022
@AlkaidCheng
Copy link
Author

Thanks a lot for the follow-up!

lmoneta added a commit to lmoneta/root that referenced this issue Aug 15, 2022
lmoneta added a commit that referenced this issue Aug 16, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants