Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

More issues with ElementLink branches in DAOD_PHYSLITE #951

Closed
nikoladze opened this issue Sep 1, 2023 · 1 comment · Fixed by #1114
Closed

More issues with ElementLink branches in DAOD_PHYSLITE #951

nikoladze opened this issue Sep 1, 2023 · 1 comment · Fixed by #1114
Assignees
Labels
bug (unverified) The problem described would be a bug, but needs to be triaged

Comments

@nikoladze
Copy link
Contributor

I found some more issues with ElementLink branches in newer DAOD_PHYSLITE files.

An example file can be found in DAOD_PHYSLITE_24.0.2.art.pool.root.zip
(produced with produce_mc21_physlite_testfile_24.0.2_2023-09-01.sh)

>>> import uproot
>>> import awkward as ak
>>> uproot.__version__
'5.0.11'
>>> ak.__version__
'2.3.3'

Trying to read all ElementLink branches:

import uproot

def try_read_elementlinks(tree, use_forth=True):
    for key, branch in tree.iteritems(filter_typename="*ElementLink*"):
        try:
            if use_forth:
                branch.interpretation._forth = True
            else:
                branch.interpretation._forth = False
            branch.array()
        except Exception as e:
            print(f"Problem with {key}:\n{repr(e)}\n")

with uproot.open("DAOD_PHYSLITE_24.0.2.art.pool.root:CollectionTree") as tree:
    try_read_elementlinks(tree)

gives us

Problem with METAssoc_AnalysisMETAux./METAssoc_AnalysisMETAux.jetLink:
ValueError('basket 0 in tree/branch /CollectionTree;1:METAssoc_AnalysisMETAux./METAssoc_AnalysisMETAux.jetLink has the wrong number of bytes (4340) for interpretation AsStridedObjects(Model_ElementLink_3c_DataVector_3c_xAOD_3a3a_Jet_5f_v1_3e3e__v1)\nin file DAOD_PHYSLITE_24.0.2.art.pool.root')

Problem with EventInfoAuxDyn.hardScatterVertexLink:
<UnknownInterpretation 'none of the rules matched'>

Problem with TruthMuonsAuxDyn.parentLinks:
TypeError('Awkward Array does not support arrays with object dtypes.')

Problem with TruthTausAuxDyn.parentLinks:
TypeError('Awkward Array does not support arrays with object dtypes.')

Problem with TruthTausAuxDyn.childLinks:
TypeError('Awkward Array does not support arrays with object dtypes.')

Problem with GSFConversionVerticesAuxDyn.trackParticleLinks:
TypeError('Awkward Array does not support arrays with object dtypes.')

Playing a bit with this i believe there are 3 different issues - let's call them The Good, The Bad and The Ugly

The Good

METAssoc_AnalysisMETAux.jetLink seems to fail because it is stored memberwise. Memberwise seems on by default for ATLAS and (normal) splitting for this branch has been deactivated due to some issues it caused in existing code. So, since this is a single-jagged vector we now seem to have this memberwise:

>>> tree = uproot.open("DAOD_PHYSLITE_24.0.2.art.pool.root:CollectionTree")
>>> tree["METAssoc_AnalysisMETAux.jetLink"].interpretation.typename
'std::vector<ElementLink<DataVector<xAOD::Jet_v1>>>'
>>> tree["METAssoc_AnalysisMETAux.jetLink"].debug(0)
--+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+-
 64   0   0  92  64   9   0   0 167   6 112  11   0   0   0  10  26 253  25  25
  @ --- ---   \   @ --- --- --- --- ---   p --- --- --- --- --- --- --- --- ---
--+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+-
 26 253  25  25  26 253  25  25  26 253  25  25  26 253  25  25  26 253  25  25
--- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- ---
--+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+-
 26 253  25  25  26 253  25  25  26 253  25  25   0   0   0   0   0   0   0   0
--- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- ---
--+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+-
  0   0   0   1   0   0   0   2   0   0   0   3   0   0   0   4   0   0   0   5
--- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- ---
--+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+-
  0   0   0   6   0   0   0   7   0   0   0   8 255 255 255 255
--- --- --- --- --- --- --- --- --- --- --- --- --- --- --- ---

It seems this has 16 header bytes and then come all numbers for the m_persKey, followed by all numbers for the m_persIndex. If we wanted to put in a workaround for this we could abuse the fact that both these members are int32 and read it as a single list

>>> tree["METAssoc_AnalysisMETAux.jetLink"].debug(0, skip_bytes=16, dtype=">i4")
--+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+-
 26 253  25  25  26 253  25  25  26 253  25  25  26 253  25  25  26 253  25  25
--- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- ---
      452794649       452794649       452794649       452794649       452794649
--+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+-
 26 253  25  25  26 253  25  25  26 253  25  25  26 253  25  25   0   0   0   0
--- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- ---
      452794649       452794649       452794649       452794649               0
--+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+-
  0   0   0   0   0   0   0   1   0   0   0   2   0   0   0   3   0   0   0   4
--- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- ---
              0               1               2               3               4
--+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+-
  0   0   0   5   0   0   0   6   0   0   0   7   0   0   0   8 255 255 255 255
--- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- ---
              5               6               7               8              -1
>>> interpretation = uproot.interpretation.jagged.AsJagged(uproot.interpretation.numerical.AsDtype(">i4"), header_bytes=16)
>>> array = tree["METAssoc_AnalysisMETAux.jetLink"].array(interpretation)

... and then do some tricks to split each list in half and rezip it into the two record fields.

But it may be that it's actually a bug that this branch is not split (and splitting would cause it to be not memberwise). The discussion in athena!61982 seemed to have concluded that we can actually split these branches in newer ROOT versions, so i'll have to follow this up with the experts.

The Bad

For TruthMuonsAuxDyn.parentLinks, TruthTausAuxDyn.parentLinks, TruthTausAuxDyn.childLinks and GSFConversionVerticesAuxDyn.trackParticleLinks there seems to be a problem only in AwkwardForth mode

With

with uproot.open("DAOD_PHYSLITE_24.0.2.art.pool.root:CollectionTree") as tree:
    try_read_elementlinks(tree, use_forth=False)

They don't show up anymore as having a problem.

The Exception that is thrown in forth-mode always complains about object dtypes in awkward array and it seems in AwkwardForth mode (at least some) STLVector instances are produced.

The reason why i put all the previous examples in with uproot.open(...) blocks is that trying the second time (no matter if forth was used the first time or not) it actually works for these branches:

with uproot.open("DAOD_PHYSLITE_24.0.2.art.pool.root:CollectionTree") as tree:
    print("First Try")
    print("=========")
    try_read_elementlinks(tree)
    print("Second Try")
    print("==========")
    try_read_elementlinks(tree)
First Try
=========
Problem with METAssoc_AnalysisMETAux./METAssoc_AnalysisMETAux.jetLink:
ValueError('basket 0 in tree/branch /CollectionTree;1:METAssoc_AnalysisMETAux./METAssoc_AnalysisMETAux.jetLink has the wrong number of bytes (4340) for interpretation AsStridedObjects(Model_ElementLink_3c_DataVector_3c_xAOD_3a3a_Jet_5f_v1_3e3e__v1)\nin file DAOD_PHYSLITE_24.0.2.art.pool.root')

Problem with EventInfoAuxDyn.hardScatterVertexLink:
<UnknownInterpretation 'none of the rules matched'>

Problem with TruthMuonsAuxDyn.parentLinks:
TypeError('Awkward Array does not support arrays with object dtypes.')

Problem with TruthTausAuxDyn.parentLinks:
TypeError('Awkward Array does not support arrays with object dtypes.')

Problem with TruthTausAuxDyn.childLinks:
TypeError('Awkward Array does not support arrays with object dtypes.')

Problem with GSFConversionVerticesAuxDyn.trackParticleLinks:
TypeError('Awkward Array does not support arrays with object dtypes.')

Second Try
==========
Problem with METAssoc_AnalysisMETAux./METAssoc_AnalysisMETAux.jetLink:
ValueError('basket 0 in tree/branch /CollectionTree;1:METAssoc_AnalysisMETAux./METAssoc_AnalysisMETAux.jetLink has the wrong number of bytes (4340) for interpretation AsStridedObjects(Model_ElementLink_3c_DataVector_3c_xAOD_3a3a_Jet_5f_v1_3e3e__v1)\nin file DAOD_PHYSLITE_24.0.2.art.pool.root')

Problem with EventInfoAuxDyn.hardScatterVertexLink:
<UnknownInterpretation 'none of the rules matched'>

The Ugly

EventInfoAuxDyn.hardScatterVertexLink is weird. It seems to have a sub branch with the same name

>>> tree = uproot.open("DAOD_PHYSLITE_24.0.2.art.pool.root:CollectionTree")
>>> tree["EventInfoAuxDyn.hardScatterVertexLink"].keys()
['EventInfoAuxDyn.hardScatterVertexLink']

For which uproot doesn't know how to interpret it

>>> tree["EventInfoAuxDyn.hardScatterVertexLink/EventInfoAuxDyn.hardScatterVertexLink"].interpretation
<UnknownInterpretation 'none of the rules matched'>

TTree.show shows this is an ElementLink, this time not in a vector (makes sense, probably supposed to point to the hard scatter vertex of the event). We can see the key and index by skipping 10 bytes:

>>> tree["EventInfoAuxDyn.hardScatterVertexLink/EventInfoAuxDyn.hardScatterVertexLink"].debug(0, skip_bytes=10, dtype=">i4")
--+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+-
 55 209  69 151   0   0   0   0
  7 ---   E --- --- --- --- ---
      936461719               0

Sorry for the long text - just wanted to write down what i found so far (some things i found while writing this, so writing it down was already helpful 🙂 )

I guess for all these issues we have workarounds, but the AwkardForth ones are a bit worrisome (probably i should have called these "the ugly") since they seem to pop up a bit randomly and from our last debugging session i learned it can be quite difficult to find out what the issue is.

@jpivarski
Copy link
Member

Part of "The Bad" ("Awkward Array does not support arrays with object dtypes") seems to be the same as Issue #1101.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug (unverified) The problem described would be a bug, but needs to be triaged
Projects
Status: Done!
Development

Successfully merging a pull request may close this issue.

3 participants