Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reading subbranches of jagged custom struct #23

Closed
8me opened this issue Jul 3, 2021 · 14 comments
Closed

Reading subbranches of jagged custom struct #23

8me opened this issue Jul 3, 2021 · 14 comments

Comments

@8me
Copy link
Collaborator

8me commented Jul 3, 2021

I try to read a TBranchElement which contains subbranches from a data file which has the file format like this example file. In order to get more specific, I want to read the track information at E/Evt/trks. The track information contains the particles properties (for simulation) or the reconstructions (for measurement) per event, so this leads to a nested array structure. Taking a look at it using uproot (loading the given example file to fobj) I get:

In [14]: fobj["E/Evt/trks"].show()
name                 | typename                 | interpretation                
---------------------+--------------------------+-------------------------------
trks                 | vector<Trk>              | AsGroup(<TBranchElement 'trks'
trks.fUniqueID       | uint32_t[]               | AsJagged(AsDtype('>u4'))
trks.fBits           | uint32_t[]               | AsJagged(AsDtype('>u4'))
trks.usr_data        | std::vector<AAny>*       | AsObjects(AsArray(True, Fal...
trks.usr_names       | std::vector<std::stri... | AsObjects(AsArray(True, Fal...
trks.id              | int32_t[]                | AsJagged(AsDtype('>i4'))
trks.pos.x           | double[]                 | AsJagged(AsDtype('>f8'))
trks.pos.y           | double[]                 | AsJagged(AsDtype('>f8'))
trks.pos.z           | double[]                 | AsJagged(AsDtype('>f8'))
trks.dir.x           | double[]                 | AsJagged(AsDtype('>f8'))
trks.dir.y           | double[]                 | AsJagged(AsDtype('>f8'))
trks.dir.z           | double[]                 | AsJagged(AsDtype('>f8'))
trks.t               | double[]                 | AsJagged(AsDtype('>f8'))
trks.E               | double[]                 | AsJagged(AsDtype('>f8'))
trks.len             | double[]                 | AsJagged(AsDtype('>f8'))
trks.lik             | double[]                 | AsJagged(AsDtype('>f8'))
trks.type            | int32_t[]                | AsJagged(AsDtype('>i4'))
trks.rec_type        | int32_t[]                | AsJagged(AsDtype('>i4'))
trks.rec_stages      | std::vector<int32_t>*    | AsObjects(AsArray(True, Fal...
trks.status          | int32_t[]                | AsJagged(AsDtype('>i4'))
trks.mother_id       | int32_t[]                | AsJagged(AsDtype('>i4'))
trks.fitinf          | std::vector<double>*     | AsObjects(AsArray(True, Fal...
trks.hit_ids         | std::vector<int32_t>*    | AsObjects(AsArray(True, Fal...
trks.error_matrix    | std::vector<double>*     | AsObjects(AsArray(True, Fal...
trks.comment         | std::string*             | AsObjects(AsArray(True, Fal...

I'm able to read the single fields with UnROOT.jl

julia> fobj_online = UnROOT.ROOTFile(fpath_example)

ROOTFile(...) with 10 entries and 55 streamers.

julia> data, offsets = UnROOT.array(fobj, "E/Evt/trks/trks.t"; raw=true)
(UInt8[0x41, 0x90, 0xc3, 0x78, 0x59, 0xdb, 0x26, 0xbe, 0x41, 0x90  …  0x04, 0x2c, 0x41, 0x8a, 0x34, 0x8c, 0x55, 0xe6, 0xb6, 0x6d], Int32[70, 518, 958, 1406, 1854, 2302, 2750, 3198, 3646, 4078])

julia> reinterpret(Float64, reverse(data[1:8]))
1-element reinterpret(Float64, ::Vector{UInt8}):
 7.031144646401498e7

and with the offsets I can get the nested array structure preserved. I get stuck when I want to read this is an more efficient way and converting it to a trk struct preserving the nested array structure?!

@tamasgal
Copy link
Member

tamasgal commented Jul 3, 2021

I think this should be doable with the splitup() function. Can you have a look?

https://github.com/tamasgal/UnROOT.jl/blob/master/src/root.jl#L176

@8me
Copy link
Collaborator Author

8me commented Jul 3, 2021

When I try this on "E/Evt/trks" I get

julia> data, offsets = UnROOT.array(fobj, "E/Evt/trks"; raw=true)
(UInt8[0x00, 0x00, 0x00, 0x38, 0x00, 0x00, 0x00, 0x37, 0x00, 0x00  …  0x00, 0x38, 0x00, 0x00, 0x00, 0x36, 0x00, 0x00, 0x00, 0x38], Int32[68, 72, 76, 80, 84, 88, 92, 96, 100, 104])

julia> data
40-element Vector{UInt8}:
 0x00
 0x00
 0x00
 0x38
 0x00
 0x00
 0x00
 0x37
 0x00
 0x00
 0x00
 0x38
 0x00
 0x00
 0x00
 0x38
 0x00
    ⋮
 0x00
 0x00
 0x00
 0x38
 0x00
 0x00
 0x00
 0x38
 0x00
 0x00
 0x00
 0x36
 0x00
 0x00
 0x00
 0x38

and I don't know how to apply the splitup after this read returns not the full data contained in the subbranches.

@Moelf
Copy link
Member

Moelf commented Jul 3, 2021

oops, sorry, I see it's a jagged branch of your custom type, in that case, pattern match this:
https://github.com/tamasgal/UnROOT.jl/blob/94b8e6ebd42a1f53d67d14258ce88c4b5f30a798/src/custom.jl#L53-L78

@Moelf Moelf changed the title Reading subbranches Reading subbranches of jagged custom struct Jul 3, 2021
@8me
Copy link
Collaborator Author

8me commented Jul 3, 2021

I tried this already and I got

julia> data, offsets = UnROOT.array(fobj, "E/Evt/trks")
ERROR: MethodError: no method matching primitivetype(::Missing)
Closest candidates are:
  primitivetype(::UnROOT.TLeafI) at .../UnROOT.jl/src/bootstrap.jl:162
  primitivetype(::UnROOT.TLeafL) at .../UnROOT.jl/src/bootstrap.jl:197
  primitivetype(::UnROOT.TLeafF) at .../UnROOT.jl/src/bootstrap.jl:265
  ...
Stacktrace:
 [1] array(f::ROOTFile, path::String; raw::Bool)
   @ UnROOT .../UnROOT.jl/src/root.jl:158
 [2] array(f::ROOTFile, path::String)
   @ UnROOT .../UnROOT.jl/src/root.jl:139
 [3] top-level scope
   @ REPL[16]:1

(Maybe as remark: I didn't mention this attempt, because I had in mind that this feature is not really ready to use from a discussion with @tamasgal some time ago 🙈 )

@Moelf
Copy link
Member

Moelf commented Jul 3, 2021

this won't work because Julia doesn't know what Trk is:

vector<Trk>              | AsGroup(<TBranchElement 'trks'
```. Which is why you need to manually create a struct in Julia and use `splitup`.

@8me
Copy link
Collaborator Author

8me commented Jul 3, 2021

At first I followed the reading from the online file format (example file ) and also defined a struct Trk in order to do it in the same way, but this is what I meant with I don't get the full information.
For an online file it looks like

julia> data, offsets = UnROOT.array(fobj_online, "KM3NET_EVENT/KM3NET_EVENT/triggeredHits"; raw=true)
(UInt8[0x40, 0x00, 0x01, 0xb6, 0x00, 0x09, 0x00, 0x00, 0x00, 0x12  …  0x00, 0x01, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x04], Int32[88, 530, 1812])

julia> data
1950-element Vector{UInt8}:
 0x40
 0x00
 0x01
 0xb6
 0x00
 0x09
 0x00
 0x00
 0x00
 0x12
 0x30
 0x11
 0x79
 0x74
 0x0a
 0x5e
 0xf6
    ⋮
 0x03
 0x0b
 0x40
 0x00
 0x00
 0x0a
 0x00
 0x01
 0x00
 0x00
 0x00
 0x00
 0x00
 0x00
 0x00
 0x04
``` . 

@Moelf
Copy link
Member

Moelf commented Jul 3, 2021

the full information is just raw bytes, for example, if your branch is suppose to be a Vector{Int32}, then every four bytes in the rawbytes can be re-interpret into a Int32; if the branch is jagged like Vector{Vector{Int32}}, then we also need the offsets information.

As of now, the above case and all the std::vector<...> should be automatically handled. But for your custom struct, there's no way for UnROOT or uproot to know what to do just by looking at the *name of the class.

@8me
Copy link
Collaborator Author

8me commented Jul 3, 2021

I assumed something like this for the custom struct (without too detailed knowledge of the inner structure of ROOT files), this was the reason why I was going for each fields/subbranch individually and not by splitup in my original message. So if I got you right (sorry if not 🙈) I'm back at the point, that I know how to get the nested array for each field but not how to convert it to one nested array of Trk structs.

@Moelf
Copy link
Member

Moelf commented Jul 3, 2021

say your struct is:

type Point
  x::UInt8
  y::Uint8
end

And our real branch is: [ [Point(1,1), Point(2,2], [Point(3,3)]], then the rawbytes returned by ;raw = true is approximately something like this:

[0x01, 0x01, 0x02, 0x02, 0x03, 0x03]

with the caveat, in reality you need to swap bit order, but that can be done at last step.

And then the offsets will give you information about "first two Points are in event1" etc.

@tamasgal
Copy link
Member

tamasgal commented Jul 3, 2021

Sorry for my sparse participation. The easiest way is probably to read the split branches. Also in uproot it's quite complicated to read the class instances...

@tamasgal
Copy link
Member

tamasgal commented Jul 3, 2021

Back to my desk... So, to add a bit more information, if you read E/Evt/Trks data, you will likely only get some pointers to other baskets, which is a different story. In this case, we can exploit the fact that the original ROOT subclass has a high split-level, so that each "struct-field" (vector) is accessible in their own branch.
Jim also always recommends to split as aggressively as possible (e.g. split level 99) since those branches will eventually be easily interpretable using basic (primitive) leaf types.

Problems however may arise when data is multiply jagged, which is a bit "awkward", but I have some working examples... and of course in uproot, we have plenty of examples to study, some of them triggered by our data ;)

Many other KM3NeT data, especially those created by the DAQ system have a very low split level where you really need to provide the data types and structure, however, the jaggedness should be the same to my knowledge, which means that at least the raw data should be splittable correctly.

@tamasgal
Copy link
Member

tamasgal commented Jul 6, 2021

Just to sum up, this is where we are currently:

julia> using UnROOT

julia> f = UnROOT.ROOTFile("test/samples/km3net_offline.root")
ROOTFile("test/samples/km3net_offline.root") with 2 entries and 25 streamers.

julia> array(f, "E/Evt/hits/hits.tot")
ERROR: Cannot understand fClassName: Hit.
Stacktrace:
 [1] error(s::String)
   @ Base ./error.jl:33
 [2] autointerp_T(branch::UnROOT.TBranchElement_10, leaf::UnROOT.TLeafElement)
   @ UnROOT ~/Dev/UnROOT.jl/src/root.jl:190
 [3] array(f::ROOTFile, path::String; raw::Bool)
   @ UnROOT ~/Dev/UnROOT.jl/src/root.jl:170
 [4] array(f::ROOTFile, path::String)
   @ UnROOT ~/Dev/UnROOT.jl/src/root.jl:140
 [5] top-level scope
   @ REPL[18]:1

and now need to get this leaf-parsing right. @8me is working on it :)

@tamasgal
Copy link
Member

@all-contributors please add @8me for code, tests and data

@allcontributors
Copy link
Contributor

@tamasgal

I've put up a pull request to add @8me! 🎉

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants