Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixes both bugs in issue #438. #454

Merged
merged 2 commits into from
Sep 28, 2021
Merged

Fixes both bugs in issue #438. #454

merged 2 commits into from
Sep 28, 2021

Conversation

jpivarski
Copy link
Member

But the second bug offers no clues as to how it can be solved.

Here's the issue: according to TObject's streamers (and a lot of other things), fBits is 4 bytes.

>>> import uproot
>>> t = uproot.open("uproot-issue-438b.root:Delphes")
>>> t.file.streamer_named("TObject").show()
TObject (v1)
    fUniqueID: unsigned int (TStreamerBasicType)
    fBits: unsigned int (TStreamerBasicType)

This is also true of almost all of the fBits branches:

>>> for x in t.keys(filter_name="*.fBits"):
...     print(f"{len(t[x].debug_array(0)) / 4:10} {x}")
... 
       1.0 Event/Event.fBits
       1.0 Weight/Weight.fBits
    7924.5 Particle/Particle.fBits
     273.0 Track/Track.fBits
     819.0 Tower/Tower.fBits
     273.0 EFlowTrack/EFlowTrack.fBits
     477.0 EFlowPhoton/EFlowPhoton.fBits
     267.0 EFlowNeutralHadron/EFlowNeutralHadron.fBits
       8.0 GenJet/GenJet.fBits
       1.0 GenMissingET/GenMissingET.fBits
       6.0 Jet/Jet.fBits
       0.0 Electron/Electron.fBits
       0.0 Photon/Photon.fBits
       0.0 Muon/Muon.fBits
       1.0 MissingET/MissingET.fBits
       1.0 ScalarHT/ScalarHT.fBits

The exception is Particle/Particle.fBits, whose first event has a number of bytes that does not divide evenly by 4. Upon closer inspection, it's clearly a 6-byte value.

>>> t["Particle/Particle.fBits"].debug_array(0).view([("x", ">i4"), ("y", ">i2")])
array([(50331664, 0), (50331664, 0), (50331664, 0), ..., (50331664, 0),
       (50331664, 0), (50331664, 0)], dtype=[('x', '>i4'), ('y', '>i2')])

However, there's nothing to distinguish this branch with 6 bytes:

>>> t["Particle/Particle.fBits"].all_members
{
    '@fUniqueID': 0,
    '@fBits': 55574528,
    'fName': 'Particle.fBits',
    'fTitle': 'fBits[Particle_]',
    'fFillColor': 0,
    'fFillStyle': 1001,
    'fCompress': 101,
    'fBasketSize': 64000,
    'fEntryOffsetLen': 10,
    'fWriteBasket': 33,
    'fEntryNumber': 100,
    'fIOFeatures': <ROOT::TIOFeatures at 0x7fd20af3d400>,
    'fOffset': 0,
    'fMaxBaskets': 34,
    'fSplitLevel': 0,
    'fEntries': 100,
    'fFirstEntry': 0,
    'fTotBytes': 1710622,
    'fZipBytes': 11966,
    'fBranches': <TObjArray of 0 items at 0x7fd20af3d460>,
    'fLeaves': <TObjArray of 1 items at 0x7fd20af3d5b0>,
    'fBasketBytes': array([381, 307, 382, 348, 276, 408, 359, 400, 376, 358, 339, 320, 324,
       362, 474, 310, 434, 366, 427, 282, 333, 452, 378, 378, 392, 382,
       370, 349, 392, 385, 365, 316, 241,   0], dtype=int32),
    'fBasketEntry': array([  0,   3,   5,   7,  11,  12,  15,  18,  21,  24,  27,  29,  31,
        33,  36,  41,  43,  46,  49,  53,  54,  56,  61,  64,  68,  73,
        77,  81,  83,  87,  92,  95,  97, 100]),
    'fBasketSeek': array([     375,    17300,   658211,   676369,  1374256,  1410371,
        2055073,  2668842,  2705428,  3223503,  3741825,  3748989,
        3777650,  4534932,  5093995,  5133565,  5736716,  6572626,
        7166521,  7186122,  7660358,  8287949,  8333770,  8858670,
        9413125,  9435318, 10023382, 10544552, 10584617, 11540595,
       12060644, 12084058, 12654014,        0]),
    'fClassName': <TString 'TObject' at 0x7fd20af9bb30>,
    'fParentName': <TString 'GenParticle' at 0x7fd20af9bba0>,
    'fClonesName': <TString '' at 0x7fd20af9bc10>,
    'fCheckSum': 2417737773,
    'fClassVersion': 1,
    'fID': 1,
    'fType': 31,
    'fStreamerType': 15,
    'fMaximum': 0,
    'fBranchCount': None,
    'fBranchCount2': None
}

from branches with 4 bytes:

>>> t["Jet/Jet.fBits"].all_members
{
    '@fUniqueID': 0,
    '@fBits': 55574528,
    'fName': 'Jet.fBits',
    'fTitle': 'fBits[Jet_]',
    'fFillColor': 0,
    'fFillStyle': 1001,
    'fCompress': 101,
    'fBasketSize': 64000,
    'fEntryOffsetLen': 400,
    'fWriteBasket': 1,
    'fEntryNumber': 100,
    'fIOFeatures': <ROOT::TIOFeatures at 0x7fd209e72a30>,
    'fOffset': 0,
    'fMaxBaskets': 10,
    'fSplitLevel': 0,
    'fEntries': 100,
    'fFirstEntry': 0,
    'fTotBytes': 2131,
    'fZipBytes': 352,
    'fBranches': <TObjArray of 0 items at 0x7fd209e72a90>,
    'fLeaves': <TObjArray of 1 items at 0x7fd209e72be0>,
    'fBasketBytes': array([352,   0,   0,   0,   0,   0,   0,   0,   0,   0], dtype=int32),
    'fBasketEntry': array([  0, 100,   0,   0,   0,   0,   0,   0,   0,   0]),
    'fBasketSeek': array([14605752,        0,        0,        0,        0,        0,
              0,        0,        0,        0]),
    'fClassName': <TString 'TObject' at 0x7fd209e6e4a0>,
    'fParentName': <TString 'Jet' at 0x7fd209e6e510>,
    'fClonesName': <TString '' at 0x7fd209e6e580>,
    'fCheckSum': 2417737773,
    'fClassVersion': 1,
    'fID': 1,
    'fType': 31,
    'fStreamerType': 15,
    'fMaximum': 0,
    'fBranchCount': None,
    'fBranchCount2': None
}

So they can't be given different interpretations.

In the end, I think that the 6-bytes vs 4-bytes indicator might be in the fBits themselves. See the first 4 bytes of the first event:

>>> for x in t.keys(filter_name="*.fBits"):
...     print(f"{str(t[x].debug_array(0)[:4]):15} {x}")
... 
[3 0 0 0]       Event/Event.fBits
[3 0 0 0]       Weight/Weight.fBits
[ 3  0  0 16]   Particle/Particle.fBits
[ 3  0  0 16]   Track/Track.fBits
[ 3  0  0 16]   Tower/Tower.fBits
[ 3  0  0 16]   EFlowTrack/EFlowTrack.fBits
[ 3  0  0 16]   EFlowPhoton/EFlowPhoton.fBits
[ 3  0  0 16]   EFlowNeutralHadron/EFlowNeutralHadron.fBits
[3 0 0 0]       GenJet/GenJet.fBits
[3 0 0 0]       GenMissingET/GenMissingET.fBits
[3 0 0 0]       Jet/Jet.fBits
[]              Electron/Electron.fBits
[]              Photon/Photon.fBits
[]              Muon/Muon.fBits
[3 0 0 0]       MissingET/MissingET.fBits
[3 0 0 0]       ScalarHT/ScalarHT.fBits

The ones that have a 16 in them also have event sizes that are divisible by 6:

>>> for x in t.keys(filter_name="*.fBits"):
...     print(f"{len(t[x].debug_array(0)) / 6:10} {x}")
... 
0.6666666666666666 Event/Event.fBits
0.6666666666666666 Weight/Weight.fBits
    5283.0 Particle/Particle.fBits
     182.0 Track/Track.fBits
     546.0 Tower/Tower.fBits
     182.0 EFlowTrack/EFlowTrack.fBits
     318.0 EFlowPhoton/EFlowPhoton.fBits
     178.0 EFlowNeutralHadron/EFlowNeutralHadron.fBits
5.333333333333333 GenJet/GenJet.fBits
0.6666666666666666 GenMissingET/GenMissingET.fBits
       4.0 Jet/Jet.fBits
       0.0 Electron/Electron.fBits
       0.0 Photon/Photon.fBits
       0.0 Muon/Muon.fBits
0.6666666666666666 MissingET/MissingET.fBits
0.6666666666666666 ScalarHT/ScalarHT.fBits

That 16 could be kIsReferenced...

>>> np.array([uproot.const.kIsReferenced], ">i4").view("u1")
array([ 0,  0,  0, 16], dtype=uint8)

If this is the issue, then the fBits would have to become an object type (uproot.AsObjects) that includes the extra 2 bytes if it sees a 16. That would be slow to deserialize (before AwkwardForth) and annoying because nobody cares about fBits.

I had another idea: why not just read them in as 1-byte values? Then at least it doesn't break.

>>> import uproot
>>> t = uproot.open("uproot-issue-438b.root:Delphes")
>>> for x in t.keys(filter_name="*.fBits"):
...     print(t[x].array())
... 
[[3, 0, 0, 0], [3, 0, 0, 0], [3, 0, 0, 0, ... 0, 0, 0], [3, 0, 0, 0], [3, 0, 0, 0]]
[[3, 0, 0, 0], [3, 0, 0, 0], [3, 0, 0, 0, ... 0, 0, 0], [3, 0, 0, 0], [3, 0, 0, 0]]
[[3, 0, 0, 16, 0, 0, 3, 0, 0, 16, 0, 0, 3, ... 3, 0, 0, 16, 0, 0, 3, 0, 0, 16, 0, 0]]
[[3, 0, 0, 16, 0, 0, 3, 0, 0, 16, 0, 0, 3, ... 3, 0, 0, 16, 0, 0, 3, 0, 0, 16, 0, 0]]
[[3, 0, 0, 16, 0, 0, 3, 0, 0, 16, 0, 0, 3, ... 3, 0, 0, 16, 0, 0, 3, 0, 0, 16, 0, 0]]
[[3, 0, 0, 16, 0, 0, 3, 0, 0, 16, 0, 0, 3, ... 3, 0, 0, 16, 0, 0, 3, 0, 0, 16, 0, 0]]
[[3, 0, 0, 16, 0, 0, 3, 0, 0, 16, 0, 0, 3, ... 3, 0, 0, 16, 0, 0, 3, 0, 0, 16, 0, 0]]
[[3, 0, 0, 16, 0, 0, 3, 0, 0, 16, 0, 0, 3, ... 3, 0, 0, 16, 0, 0, 3, 0, 0, 16, 0, 0]]
[[3, 0, 0, 0, 3, 0, 0, 0, 3, 0, 0, 0, 3, 0, ... [3, 0, 0, 0, 3, 0, 0, 0, 3, 0, 0, 0]]
[[3, 0, 0, 0], [3, 0, 0, 0], [3, 0, 0, 0, ... 0, 0, 0], [3, 0, 0, 0], [3, 0, 0, 0]]
[[3, 0, 0, 0, 3, 0, 0, 0, 3, 0, 0, 0, 3, 0, ... [3, 0, 0, 0, 3, 0, 0, 0, 3, 0, 0, 0]]
[[], [], [], [], [], [], [], [], [], [], ... [], [], [], [], [], [], [], [], [], []]
[[], [], [], [], [], [], [], [], [], [], ... [], [], [], [], [], [], [], [], [], []]
[[], [], [], [], [], [], [], [], [], [], ... [], [], [], [], [], [], [], [], [], []]
[[3, 0, 0, 0], [3, 0, 0, 0], [3, 0, 0, 0, ... 0, 0, 0], [3, 0, 0, 0], [3, 0, 0, 0]]
[[3, 0, 0, 0], [3, 0, 0, 0], [3, 0, 0, 0, ... 0, 0, 0], [3, 0, 0, 0], [3, 0, 0, 0]]

That's the second commit.

@jpivarski jpivarski changed the title Fixes first bug in issue #438. Fixes both bugs in issue #438. Sep 28, 2021
@jpivarski jpivarski enabled auto-merge (squash) September 28, 2021 21:48
@jpivarski jpivarski linked an issue Sep 28, 2021 that may be closed by this pull request
@jpivarski jpivarski merged commit 1f484ff into main Sep 28, 2021
@jpivarski jpivarski deleted the jpivarski/bugs-with-delphes branch September 28, 2021 22:04
This was referenced Mar 4, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

bugs with uproot and Delphes interactions
1 participant