Skip to content
This repository has been archived by the owner on Jun 21, 2022. It is now read-only.

Include Allpix² data types/ custom data types #485

Closed
flome opened this issue May 8, 2020 · 14 comments · Fixed by #486
Closed

Include Allpix² data types/ custom data types #485

flome opened this issue May 8, 2020 · 14 comments · Fixed by #486

Comments

@flome
Copy link

flome commented May 8, 2020

Dear all,
we are currently starting to work with Allpix², a silicon detector simulation framework based on Geant4 which is really great. It outputs natively data into root files (which is nice, so we can use uproot) but uses custom data types (which is not nice, because uproot cannot de-serialize them correctly...)

If I want to read an entry from a branch of choice, this is the result:

import uproot
f = uproot.open("sim_output.root")

print(f['MCParticle/detector1'])
>>>  <TBranchElement b'detector1' at 0x7f8b748c4090>

print( f['MCParticle/detector1'].interpretation )
>>> asgenobj(STLVector(Pointer(allpix_3a3a_MCParticle)))

print( f['MCParticle/detector1']._streamer._fTypeName )
>>> b'vector<allpix::MCParticle*>'

print(f['MCParticle/detector1'].array()[0][0])
>>> <Undefined (failed to read 'allpix::MCParticle' version 6) at 0x7f8b73a15950>

resulting in the array containing only entries of class "undefined". Thing is, I really prefer using uproot than PyROOT or whatsoever. Is there anything I can do about it?

Thanks a lot for your awesome project!
Best regards!

@jpivarski jpivarski transferred this issue from scikit-hep/uproot3-methods May 8, 2020
@jpivarski
Copy link
Member

Although uproot-methods would put custom methods on the classes, if they're being read in as "undefined," it means that some C++ feature was not recognized, an Uproot problem. For example, maybe these objects contain std::map of non-strings/non-numbers, which is an example of something that hasn't been handled. Maybe it's the fact that it's a vector of pointers to MCParticles...?

Once Uproot can read these as plain Python objects with the member data as attributes starting with an underscore, then it would be time to add uproot-methods to provide a more Pythonic view.

To solve the first step, we need to figure out which C++ feature is causing the failure to read. You can look at the code that was generated to try to read it in

print(f._context.classes["allpix_3a3a_MCParticle"])

but to go further, I may need to look at the file. I don't know yet how much of a time investment this would be (though I'd love to have help!).

@flome
Copy link
Author

flome commented May 8, 2020

Hi Jim, thanks for clarification and moving the issue!
Running your suggestion outputs

print(f._context.classes["allpix_3a3a_MCParticle"])
>>> <class 'uproot.rootio.allpix_3a3a_MCParticle'>

I'd be happy to share the file with you, what is your preferred way I pass it to you? I appreciate your help a lot and I am happy to help with any kind of implementation if needed.
The class definition of MCParticle is pretty straightforward and consists mainly of

MCParticle(ROOT::Math::XYZPoint local_start_point,
                   ROOT::Math::XYZPoint global_start_point,
                   ROOT::Math::XYZPoint local_end_point,
                   ROOT::Math::XYZPoint global_end_point,
                   int particle_id,
                   double time);

source: https://gitlab.cern.ch/allpix-squared/allpix-squared/-/blob/master/src/objects/MCParticle.hpp

@jpivarski
Copy link
Member

I'm getting scattered with too many things going on—sorry. I meant

print(f._context.classes["allpix_3a3a_MCParticle"]._pycode)

@flome
Copy link
Author

flome commented May 8, 2020

Thanks again so much!

print(f._context.classes["allpix_3a3a_MCParticle"]._pycode)

output:

class allpix_3a3a_MCParticle(allpix_3a3a_Object):
    _methods = None
    _bases = [allpix_3a3a_Object]
    @classmethod
    def _recarray(cls):
        out = []
        out.append((' cnt', 'u4'))
        out.append((' vers', 'u2'))
        for base in cls._bases:
            out.extend(base._recarray())
        out.extend(ROOT_3a3a_Math_3a3a_PositionVector3D_3c_ROOT_3a3a_Math_3a3a_Cartesian3D_3c_double_3e2c_ROOT_3a3a_Math_3a3a_DefaultCoordinateSystemTag_3e_._recarray())
        out.extend(ROOT_3a3a_Math_3a3a_PositionVector3D_3c_ROOT_3a3a_Math_3a3a_Cartesian3D_3c_double_3e2c_ROOT_3a3a_Math_3a3a_DefaultCoordinateSystemTag_3e_._recarray())
        out.extend(ROOT_3a3a_Math_3a3a_PositionVector3D_3c_ROOT_3a3a_Math_3a3a_Cartesian3D_3c_double_3e2c_ROOT_3a3a_Math_3a3a_DefaultCoordinateSystemTag_3e_._recarray())
        out.extend(ROOT_3a3a_Math_3a3a_PositionVector3D_3c_ROOT_3a3a_Math_3a3a_Cartesian3D_3c_double_3e2c_ROOT_3a3a_Math_3a3a_DefaultCoordinateSystemTag_3e_._recarray())
        out.append(('particle_id_', numpy.dtype('>i4')))
        out.append(('time_', numpy.dtype('>f8')))
        out.extend(TRef._recarray())
        out.extend(TRef._recarray())
        return out
    _fields = ['local_5f_start_5f_point_5f_', 'global_5f_start_5f_point_5f_', 'local_5f_end_5f_point_5f_', 'global_5f_end_5f_point_5f_', 'particle_5f_id_5f_', 'time_5f_', 'parent_5f_', 'track_5f_']
    _classname = b'allpix::MCParticle'
    classname = 'allpix::MCParticle'
    _versions = versions
    _classversion = 6
    _hasreadobjany = False
    @classmethod
    def _readinto(cls, self, source, cursor, context, parent, asclass=None):
        start, cnt, classversion = _startcheck(source, cursor)
        startendcheck = True
        if cls._classversion != classversion:
            cursor.index = start
            if classversion in cls._versions:
                return cls._versions[classversion]._readinto(self, source, cursor, context, parent)
            elif cnt is None:
                startendcheck = False
            else:
                return Undefined.read(source, cursor, context, parent, cls.__name__)
        allpix_3a3a_Object._readinto(self, source, cursor, context, parent)
        self._local_5f_start_5f_point_5f_ = ROOT_3a3a_Math_3a3a_PositionVector3D_3c_ROOT_3a3a_Math_3a3a_Cartesian3D_3c_double_3e2c_ROOT_3a3a_Math_3a3a_DefaultCoordinateSystemTag_3e_.read(source, cursor, context, self)
        self._global_5f_start_5f_point_5f_ = ROOT_3a3a_Math_3a3a_PositionVector3D_3c_ROOT_3a3a_Math_3a3a_Cartesian3D_3c_double_3e2c_ROOT_3a3a_Math_3a3a_DefaultCoordinateSystemTag_3e_.read(source, cursor, context, self)
        self._local_5f_end_5f_point_5f_ = ROOT_3a3a_Math_3a3a_PositionVector3D_3c_ROOT_3a3a_Math_3a3a_Cartesian3D_3c_double_3e2c_ROOT_3a3a_Math_3a3a_DefaultCoordinateSystemTag_3e_.read(source, cursor, context, self)
        self._global_5f_end_5f_point_5f_ = ROOT_3a3a_Math_3a3a_PositionVector3D_3c_ROOT_3a3a_Math_3a3a_Cartesian3D_3c_double_3e2c_ROOT_3a3a_Math_3a3a_DefaultCoordinateSystemTag_3e_.read(source, cursor, context, self)
        self._particle_5f_id_5f_, self._time_5f_ = cursor.fields(source, cls._format1)
        self._parent_5f_ = TRef.read(source, cursor, context, self)
        self._track_5f_ = TRef.read(source, cursor, context, self)
        if startendcheck:
            if self.__class__.__name__ == cls.__name__:
                self.__class__ = cls._versions[classversion]
            try:
                _endcheck(start, cursor, cnt)
            except ValueError:
                cursor.index = start
                return Undefined.read(source, cursor, context, parent, cls.__name__)
        return self
    _format1 = struct.Struct('>id')
    _int32 = struct.Struct('>I')

@jpivarski
Copy link
Member

It looks like it's probably deeper than this first object: the failure might be in allpix_3a3a_Object._readinto or ROOT_3a3a_Math_3a3a_PositionVector3D_3c_ROOT_3a3a_Math_3a3a_Cartesian3D_3c_double_3e2c_ROOT_3a3a_Math_3a3a_DefaultCoordinateSystemTag_3e_.read (the name is long because of some templates).

I can scan through a file to try to figure out where the problem point is.

@flome
Copy link
Author

flome commented May 8, 2020

I am happy to pass the file (had to zip it to share it here, I can also provide a download link otherwise):
diodes_output.root.zip

Is there something I can do to help?
Thanks a lot!

@jpivarski
Copy link
Member

First the good news: the PR I just opened fixed the basic issue with reading allpix::MCParticle classes: I forgot to convert the name to allpix_3a3a_MCParticle for the specific case of class references inside TTrees. (It's related to the fact that there's a pointer: that's the corner case that I forgot, but the more fundamental issue is translating the C++ namespace into a name that a Python class can have.)

With this patch, you can read these objects from the file and they are not Undefined. You can find a lot of the fields inside, keeping in mind that they start with an underscore and the underscores in the original C++ names have been translated into

>>> hex(ord("_"))[2:]
'5f'

That's where uproot-methods would come in: you'd want high-level properties to access those low-level names.

The bad news is that fields like _global_5f_start_5f_point_5f_ are still undefined, and this is because the instance is version 0 but the only ROOT::Math::PositionVector3D<ROOT::Math::Cartesian3D<double>, ROOT::Math::DefaultCoordinateSystemTag> class defined in the ROOT file's TStreamerInfo is version 1.

How important are these fields?

@jpivarski
Copy link
Member

Not only are the version numbers wrong for these classes, but ROOT::Math::PositionVector3D<ROOT::Math::Cartesian3D<double>,ROOT::Math::DefaultCoordinateSystemTag> and its fCoordinates value, ROOT::Math::Cartesian3D<double>, both have a 4 byte thing that is not described in the TStreamerInfo.

For the outer class, this 4-byte thing is 229, 102, 217, 110, and for the inner class (field fCoordinates in the outer class), this 4-byte thing is 133, 176, 44, 232.

  • As signed integers, these would be -446244498 and -2052051736.
  • As unsigned integers, they'd be 3848722798 and 2242915560.
  • As single-precision floating point, they'd be -6.8134698e+22 and -1.6567456e-35.

So none of those interpretations make much sense. However, once you ignore the class version number and skip past these mysterious bytes, the x, y, z values are like -0.013535985514355168, -0.0012274825605789325, -0.14999999999999947, which looks really good.

I'm going to be revamping Uproot soon to replace Awkward 0.x with Awkward 1.x. Perhaps at the same time, I could expose more hooks to add custom logic for investigating these things, so that Uproot becomes more "hackable." I'll probably never solve all of these class types in general, but at least I can provide tools to let users do it. Right now, I have to modify the Uproot source code to investigate these things—adding print statements and such—but it would be much better if there were tools and instructions to do it by overriding some classes or something.

@flome
Copy link
Author

flome commented May 9, 2020

Hi there, got caught up for a while, thanks so much! I tested your issue branch and it works nicely! It helps us to proceed really, as one of the first studies planned is a time of flight simulation, where actually particle_id and time are the most important fields which are now accessible so this is really great!

The fields

ROOT::Math::XYZPoint local_start_point,
ROOT::Math::XYZPoint global_start_point,
ROOT::Math::XYZPoint local_end_point,
ROOT::Math::XYZPoint global_end_point,

explain where a given particle enters and exits a given detector with respect to its local or the general global coordinate system. For an assessment of hit distributions this would indeed be very valuable! Is there something I can provide you with to help or can you maybe instruct me where and what I need to adjust in the code to implement e. g. the fix dropping the version number and reading the values as you showed?
Could those actually be characters by any chance denoting a unit? The values make sense on a mm scale I think and match my expectations, as the first detector should indeed have its front layer at -0.15mm. I cannot figure out how this would work out though, ROOT is not really my strong point in any way so my knowledge about their data types is shockingly limited.

@jpivarski
Copy link
Member

How would you feel about hacking it in? The 4 bytes that I have to skip to find the floating point values don't seem to correspond to anything in the TStreamerInfo. It's probably a good hint that the class version number is wrong: this could be explained by the wrong TStreamerInfo being saved to the file (and if it's always read in ROOT with the correct TStreamerInfo in memory, because you loaded a library, then perhaps it could ignore what's in the file in favor of the C++ class in memory).

I think I could write a fake ROOT::Math::PositionVector3D<ROOT::Math::Cartesian3D<double>, ROOT::Math::DefaultCoordinateSystemTag> class that reads the right bytes by ignoring what the TStreamerInfo says it should be (for the other class version). That's what I was musing about above, where I said I should build these hooks into Uproot so that users can do these things.

@jpivarski
Copy link
Member

This will do it:

import struct

import uproot

f = uproot.open("issue486.root")
t = f["MCParticle"]

class PositionVector3D_double_DefaultCoordinateSystemTag(uproot.rootio.ROOTStreamedObject):
    @classmethod
    def _readinto(cls, self, source, cursor, context, parent, asclass=None):
        cursor.skip(20)
        self._x, self._y, self._z = cursor.fields(source, cls._format1)
        return self

    _format1 = struct.Struct(">ddd")

f._context.classes["ROOT_3a3a_Math_3a3a_PositionVector3D_3c_ROOT_3a3a_Math_3a3a_Cartesian3D_3c_double_3e2c_ROOT_3a3a_Math_3a3a_DefaultCoordinateSystemTag_3e_"] = PositionVector3D_double_DefaultCoordinateSystemTag

particle = t["detector1"].array()[0][0]

position = particle._global_5f_start_5f_point_5f_

print(position._x, position._y, position._z)

The key thing is that we have inserted a class into the TFile's lookup with name ROOT_3a3a_Math_3a3a_PositionVector3D_3c_ROOT_3a3a_Math_3a3a_Cartesian3D_3c_double_3e2c_ROOT_3a3a_Math_3a3a_DefaultCoordinateSystemTag_3e_. The cursor.skip(20) skips over two objects' 6-byte headers and those two mysterious 4-byte objects, a total of 20. The x, y, z are at the end of the structure and sets the cursor in the right position for the next object.

In general, classes like the one above are supposed to be generated from the TStreamerInfo. This is a hack because I gave up and wrote it by manually looking at the file. If there's another file that's slightly different, the above won't help.

What I'm thinking of adding is a "hackable" interface where most things are detected correctly, but exceptions like the above can be investigated and corrected by users. (My process of investigating the above involved adding print statements in the core uproot.rootio code, which is to say that it's not accessible to users. I'm thinking of changing that.)

jpivarski added a commit that referenced this issue May 9, 2020
Fix #485; another place where C++ class names need to be _safenames: inside TTrees.
@flome
Copy link
Author

flome commented May 9, 2020

Wow, it works like a charm! Thanks so much for your support, you've helped me out a great deal here! Is this class, wrapped e. g. into an MCParticle class a candidate for uproot-methods? In other words, would the AllpixSquared data types be possible to include there or is this not desired? Thank you!

The class definition contains the following:

/**
         * @brief ROOT class definition
         */
        ClassDefOverride(MCParticle, 6);
        /**
         * @brief Default constructor for ROOT I/O
         */
        MCParticle() = default;

Could this have something to do with the unexpected wrong class behaviour?

@flome
Copy link
Author

flome commented May 9, 2020

I have asked the forum of allpix-squared about this thing as well, maybe they can look into the Streamer issue, thanks a lot, again!

@jpivarski
Copy link
Member

If you want to add user-friendly methods, you can do so at

https://github.com/scikit-hep/uproot-methods/tree/master/uproot_methods/classes

Now that the files can be read, this issue can move back to uproot-methods. Just open a pull request and add a submodule in the same pattern as the ones there. Uproot automatically picks up these classes and use them as superclasses for the classes that it generates from streamers (though the hacked class won't do that automatically because we defined it ourselves—that would bypass the mechanism). It needs to be the "safename" with all the underscores to express the namespace and template specialization.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
2 participants