Include Allpix² data types/ custom data types #485
Comments
Although uproot-methods would put custom methods on the classes, if they're being read in as "undefined," it means that some C++ feature was not recognized, an Uproot problem. For example, maybe these objects contain Once Uproot can read these as plain Python objects with the member data as attributes starting with an underscore, then it would be time to add uproot-methods to provide a more Pythonic view. To solve the first step, we need to figure out which C++ feature is causing the failure to read. You can look at the code that was generated to try to read it in print(f._context.classes["allpix_3a3a_MCParticle"]) but to go further, I may need to look at the file. I don't know yet how much of a time investment this would be (though I'd love to have help!). |
Hi Jim, thanks for clarification and moving the issue!
I'd be happy to share the file with you, what is your preferred way I pass it to you? I appreciate your help a lot and I am happy to help with any kind of implementation if needed.
source: https://gitlab.cern.ch/allpix-squared/allpix-squared/-/blob/master/src/objects/MCParticle.hpp |
I'm getting scattered with too many things going on—sorry. I meant print(f._context.classes["allpix_3a3a_MCParticle"]._pycode) |
Thanks again so much!
output:
|
It looks like it's probably deeper than this first object: the failure might be in I can scan through a file to try to figure out where the problem point is. |
I am happy to pass the file (had to zip it to share it here, I can also provide a download link otherwise): Is there something I can do to help? |
First the good news: the PR I just opened fixed the basic issue with reading With this patch, you can read these objects from the file and they are not Undefined. You can find a lot of the fields inside, keeping in mind that they start with an underscore and the underscores in the original C++ names have been translated into >>> hex(ord("_"))[2:]
'5f' That's where uproot-methods would come in: you'd want high-level properties to access those low-level names. The bad news is that fields like How important are these fields? |
Not only are the version numbers wrong for these classes, but For the outer class, this 4-byte thing is
So none of those interpretations make much sense. However, once you ignore the class version number and skip past these mysterious bytes, the I'm going to be revamping Uproot soon to replace Awkward 0.x with Awkward 1.x. Perhaps at the same time, I could expose more hooks to add custom logic for investigating these things, so that Uproot becomes more "hackable." I'll probably never solve all of these class types in general, but at least I can provide tools to let users do it. Right now, I have to modify the Uproot source code to investigate these things—adding print statements and such—but it would be much better if there were tools and instructions to do it by overriding some classes or something. |
Hi there, got caught up for a while, thanks so much! I tested your issue branch and it works nicely! It helps us to proceed really, as one of the first studies planned is a time of flight simulation, where actually particle_id and time are the most important fields which are now accessible so this is really great! The fields
explain where a given particle enters and exits a given detector with respect to its local or the general global coordinate system. For an assessment of hit distributions this would indeed be very valuable! Is there something I can provide you with to help or can you maybe instruct me where and what I need to adjust in the code to implement e. g. the fix dropping the version number and reading the values as you showed? |
How would you feel about hacking it in? The 4 bytes that I have to skip to find the floating point values don't seem to correspond to anything in the TStreamerInfo. It's probably a good hint that the class version number is wrong: this could be explained by the wrong TStreamerInfo being saved to the file (and if it's always read in ROOT with the correct TStreamerInfo in memory, because you loaded a library, then perhaps it could ignore what's in the file in favor of the C++ class in memory). I think I could write a fake |
This will do it: import struct
import uproot
f = uproot.open("issue486.root")
t = f["MCParticle"]
class PositionVector3D_double_DefaultCoordinateSystemTag(uproot.rootio.ROOTStreamedObject):
@classmethod
def _readinto(cls, self, source, cursor, context, parent, asclass=None):
cursor.skip(20)
self._x, self._y, self._z = cursor.fields(source, cls._format1)
return self
_format1 = struct.Struct(">ddd")
f._context.classes["ROOT_3a3a_Math_3a3a_PositionVector3D_3c_ROOT_3a3a_Math_3a3a_Cartesian3D_3c_double_3e2c_ROOT_3a3a_Math_3a3a_DefaultCoordinateSystemTag_3e_"] = PositionVector3D_double_DefaultCoordinateSystemTag
particle = t["detector1"].array()[0][0]
position = particle._global_5f_start_5f_point_5f_
print(position._x, position._y, position._z) The key thing is that we have inserted a class into the TFile's lookup with name In general, classes like the one above are supposed to be generated from the TStreamerInfo. This is a hack because I gave up and wrote it by manually looking at the file. If there's another file that's slightly different, the above won't help. What I'm thinking of adding is a "hackable" interface where most things are detected correctly, but exceptions like the above can be investigated and corrected by users. (My process of investigating the above involved adding print statements in the core |
Fix #485; another place where C++ class names need to be _safenames: inside TTrees.
Wow, it works like a charm! Thanks so much for your support, you've helped me out a great deal here! Is this class, wrapped e. g. into an MCParticle class a candidate for uproot-methods? In other words, would the AllpixSquared data types be possible to include there or is this not desired? Thank you! The class definition contains the following:
Could this have something to do with the unexpected wrong class behaviour? |
I have asked the forum of allpix-squared about this thing as well, maybe they can look into the Streamer issue, thanks a lot, again! |
If you want to add user-friendly methods, you can do so at https://github.com/scikit-hep/uproot-methods/tree/master/uproot_methods/classes Now that the files can be read, this issue can move back to uproot-methods. Just open a pull request and add a submodule in the same pattern as the ones there. Uproot automatically picks up these classes and use them as superclasses for the classes that it generates from streamers (though the hacked class won't do that automatically because we defined it ourselves—that would bypass the mechanism). It needs to be the "safename" with all the underscores to express the namespace and template specialization. |
Dear all,
we are currently starting to work with Allpix², a silicon detector simulation framework based on Geant4 which is really great. It outputs natively data into root files (which is nice, so we can use uproot) but uses custom data types (which is not nice, because uproot cannot de-serialize them correctly...)
If I want to read an entry from a branch of choice, this is the result:
resulting in the array containing only entries of class "undefined". Thing is, I really prefer using uproot than PyROOT or whatsoever. Is there anything I can do about it?
Thanks a lot for your awesome project!
Best regards!
The text was updated successfully, but these errors were encountered: