Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RDataFrame misidentifies vector<XYZTVector> type of a friend tree with identical branch name to another friend tree #6944

Closed
dudarboh opened this issue Dec 12, 2020 · 6 comments · Fixed by #6965

Comments

@dudarboh
Copy link

The issue is initially reported and discussed on the ROOT forum here

Describe the bug

RDataFrame has two tree friends.
Both tree friends have a branch with identical name.
Type of a branch in the 1st friend tree: XYZVector
Type of a branch in the 2nd friend tree: vector<XYZTVector>

1st friend tree is attached to RDataFrame before 2nd friend tree.
Access of 2nd friend tree branch results in error due to unexpected type of the branch assumed by RDataFrame as shown in the stand alone example bellow.

To Reproduce

Run this stand alone example:

import ROOT

vectors = '''
#include "Math/Vector3D.h"
#include "Math/Vector4D.h"
#include "TFile.h"
#include "TTree.h"
#include <vector>
using namespace ROOT::Math;
using namespace ROOT::VecOps;

RVec <double> getArrZ(const RVec<XYZTVector>& vec){
    auto getItemZ = [](const XYZTVector& item) { return item.Z(); };
    return Map(vec, getItemZ);
}

'''

ROOT.gInterpreter.Declare(vectors)

if __name__ == "__main__":
  ROOT.RDataFrame(1).Define("vec", "XYZVector(10, 10, 10)").Snapshot("Particle", "f1.root")
  ROOT.RDataFrame(1).Define("vec", "XYZVector(20, 20, 20)").Snapshot("Cluster", "f2.root")
  ROOT.RDataFrame(1).Define("vec", "std::vector<XYZTVector>{XYZTVector(30, 30, 30, 30)}").Snapshot("Vertex", "f3.root")

  ch1 = ROOT.TChain("Particle")
  ch1.Add("f1.root")
  ch2 = ROOT.TChain("Cluster")
  ch2.Add("f2.root")
  ch3 = ROOT.TChain("Vertex")
  ch3.Add("f3.root")

  ch1.AddFriend(ch2, "cluster")
  ch1.AddFriend(ch3, "vertex")

  df = ROOT.RDataFrame(ch1)
  print(df.Define("particle_z", "vec.Z()").Histo1D("particle_z").GetMean())
  print(df.Define("cluster_z", "cluster.vec.Z()").Histo1D("cluster_z").GetMean())
  print(df.Define("vertex_z", "getArrZ(vertex.vec)").Histo1D("vertex_z").GetMean())

which results in the error on 3rd print. Full output:

10.0
20.0
input_line_99:2:142: error: no matching function for call to 'getArrZ'
  ...= [](ROOT::Math::DisplacementVector3D<ROOT::Math::Cartesian3D<Double32_t>,ROOT::Math::DefaultCoordinateSystemTag>& var0){return getArrZ(var0)
                                                                                                                                     ^~~~~~~
input_line_42:10:15: note: candidate function not viable: no known conversion from 'ROOT::Math::DisplacementVector3D<ROOT::Math::Cartesian3D<Double32_t>,
      ROOT::Math::DefaultCoordinateSystemTag>' (aka 'DisplacementVector3D<Cartesian3D<double>, ROOT::Math::DefaultCoordinateSystemTag>') to
      'const RVec<ROOT::Math::XYZTVector>' (aka 'const RVec<LorentzVector<PxPyPzE4D<double> > >') for 1st argument
RVec <double> getArrZ(const RVec<XYZTVector>& vec){
              ^
input_line_103:2:142: error: no matching function for call to 'getArrZ'
  ...= [](ROOT::Math::DisplacementVector3D<ROOT::Math::Cartesian3D<Double32_t>,ROOT::Math::DefaultCoordinateSystemTag>& var0){return getArrZ(var0)
                                                                                                                                     ^~~~~~~
input_line_42:10:15: note: candidate function not viable: no known conversion from 'ROOT::Math::DisplacementVector3D<ROOT::Math::Cartesian3D<Double32_t>,
      ROOT::Math::DefaultCoordinateSystemTag>' (aka 'DisplacementVector3D<Cartesian3D<double>, ROOT::Math::DefaultCoordinateSystemTag>') to
      'const RVec<ROOT::Math::XYZTVector>' (aka 'const RVec<LorentzVector<PxPyPzE4D<double> > >') for 1st argument
RVec <double> getArrZ(const RVec<XYZTVector>& vec){
              ^
input_line_104:2:142: error: no matching function for call to 'getArrZ'
  ...= [](ROOT::Math::DisplacementVector3D<ROOT::Math::Cartesian3D<Double32_t>,ROOT::Math::DefaultCoordinateSystemTag>& var0){return getArrZ(var0)
                                                                                                                                     ^~~~~~~
input_line_42:10:15: note: candidate function not viable: no known conversion from 'ROOT::Math::DisplacementVector3D<ROOT::Math::Cartesian3D<Double32_t>,
      ROOT::Math::DefaultCoordinateSystemTag>' (aka 'DisplacementVector3D<Cartesian3D<double>, ROOT::Math::DefaultCoordinateSystemTag>') to
      'const RVec<ROOT::Math::XYZTVector>' (aka 'const RVec<LorentzVector<PxPyPzE4D<double> > >') for 1st argument
RVec <double> getArrZ(const RVec<XYZTVector>& vec){
              ^
Traceback (most recent call last):
  File "test.py", line 39, in <module>
    print(df.Define("vertex_z", "getArrZ(vertex.vec)").Histo1D("vertex_z").GetMean())
cppyy.gbl.std.runtime_error: Template method resolution failed:
  ROOT::RDF::RInterface<ROOT::Detail::RDF::RLoopManager,void> ROOT::RDF::RInterface<ROOT::Detail::RDF::RLoopManager,void>::Define(basic_string_view<char,char_traits<char> > name, basic_string_view<char,char_traits<char> > expression) =>
    runtime_error: 
RDataFrame: An error occurred during just-in-time compilation. The lines above might indicate the cause of the crash
 All RDF objects that have not run an event loop yet should be considered in an invalid state.

  ROOT::RDF::RInterface<ROOT::Detail::RDF::RLoopManager,void> ROOT::RDF::RInterface<ROOT::Detail::RDF::RLoopManager,void>::Define(basic_string_view<char,char_traits<char> > name, basic_string_view<char,char_traits<char> > expression) =>
    runtime_error: 
RDataFrame: An error occurred during just-in-time compilation. The lines above might indicate the cause of the crash
 All RDF objects that have not run an event loop yet should be considered in an invalid state.

  ROOT::RDF::RInterface<ROOT::Detail::RDF::RLoopManager,void> ROOT::RDF::RInterface<ROOT::Detail::RDF::RLoopManager,void>::Define(basic_string_view<char,char_traits<char> > name, basic_string_view<char,char_traits<char> > expression) =>
    runtime_error: 
RDataFrame: An error occurred during just-in-time compilation. The lines above might indicate the cause of the crash
 All RDF objects that have not run an event loop yet should be considered in an invalid state.

Expected behavior

Expected behavior that identical branch names don't interfere as they accessed with different tree aliases cluster.vec and vertex.vec and should be distinguished with the following output:

10.0
20.0
30.0

Setup

  1. ROOT version 6.22/00 from /cvmfs/sft.cern.ch/lcg/views/LCG_98python3/x86_64-centos7-gcc9-opt/bin/root
  2. OS: RedHat CentOS 7.9.2009
  3. Python 3.7.6 from /cvmfs/sft.cern.ch/lcg/views/LCG_98python3/x86_64-centos7-gcc9-opt/bin/python

Additional context

  1. With Cluster tree attached as a friend df.GetColumnType("vertex.vec") yields :
ROOT::Math::DisplacementVector3D<ROOT::Math::Cartesian3D<Double32_t>,ROOT::Math::DefaultCoordinateSystemTag>
  1. Without Cluster tree attached as a friend df.GetColumnType("vertex.vec") yields :
ROOT::VecOps::RVec<ROOT::Math::LorentzVector<ROOT::Math::PxPyPzE4D<double> >>
@dudarboh dudarboh added the bug label Dec 12, 2020
@github-actions github-actions bot added this to Needs triage in Triage Dec 12, 2020
@eguiraud eguiraud removed this from Needs triage in Triage Dec 13, 2020
@eguiraud eguiraud self-assigned this Dec 13, 2020
@eguiraud
Copy link
Member

The issue is also present in master. Simple C++ repro:

#include "Math/Vector3D.h"
#include "Math/Vector4D.h"
#include "TChain.h"
#include "ROOT/RVec.hxx"
#include "ROOT/RDataFrame.hxx"
#include <vector>
#include <iostream>
using namespace ROOT::Math;
using namespace ROOT::VecOps;


int main()
{
   ROOT::RDataFrame(1)
      .Define("vec", [] { return XYZVector(10, 10, 10); })
      .Snapshot<XYZVector>("Particle", "f1.root", {"vec"});
   ROOT::RDataFrame(1)
      .Define("vec", [] { return std::vector<XYZTVector>{XYZTVector(30, 30, 30, 30)}; })
      .Snapshot("Vertex", "f3.root", {"vec"});

   std::cout << ROOT::RDataFrame("Vertex", "f3.root").GetColumnType("vec") << std::endl;

   TChain ch1("Particle");
   ch1.Add("f1.root");
   TChain ch3("Vertex");
   ch3.Add("f3.root");
   ch1.AddFriend(&ch3, "vertex");

   std::cout << ROOT::RDataFrame(ch1).GetColumnType("vertex.vec") << std::endl;

   return 0;
}

This prints the wrong type (the type of the other vec branch) in the second case.

Calling vertex.vec something else, e.g. vertex.vec2, removes the ambiguity.

Work in progress.

@eguiraud
Copy link
Member

Fixed by #6965

@eguiraud
Copy link
Member

eguiraud commented Jan 4, 2021

Should be fixed in master, must backport

@eguiraud
Copy link
Member

eguiraud commented Jan 5, 2021

@dudarboh could you please confirm that the problem is fixed in the nightlies?

@dudarboh
Copy link
Author

dudarboh commented Jan 5, 2021

Hi @eguiraud,
Yes everything works as expected with /cvmfs/sft.cern.ch/lcg/views/dev3/latest/x86_64-centos8-gcc10-opt/

Thanks for your work!

@dudarboh dudarboh closed this as completed Jan 5, 2021
@eguiraud
Copy link
Member

eguiraud commented Jan 5, 2021

That's great to hear, I'll leave this open for a short while more as a reminder to add a backport for the 6.22 branch if that's ok!

@eguiraud eguiraud reopened this Jan 5, 2021
@eguiraud eguiraud added this to Issues in Fixed in 6.24/00 via automation Jan 5, 2021
@eguiraud eguiraud closed this as completed Jan 5, 2021
@eguiraud eguiraud added this to Fixed in Fixed in 6.22/08 Jan 5, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
No open projects
2 participants