In [1]:
import uproot
import awkward as ak
import numpy as np

In [2]:
uproot.__version__

'5.0.5'

In [3]:
ak.__version__

'2.1.1'

In [56]:
# https://cernbox.cern.ch/s/zvXLut4qivWhxBL
# produced after https://gitlab.cern.ch/atlas/athena/-/merge_requests/61982/diffs#c761c05ce753e3803eb4405015c53445f67e713e in rel 23.0
# with commands from https://gitlab.cern.ch/atlas/athena/-/blob/23.0/PhysicsAnalysis/DerivationFramework/DerivationFrameworkART/DerivationFrameworkPHYS/test/test_mc21PHYSLITE.sh
filename = "DAOD_PHYSLITE.art.afterMR61982.pool.root"

In [5]:
meta = uproot.open(f"{filename}:MetaData")

In [6]:
def read_vector_string(data, start=0):
    start = start + 6
    # the last 4 bytes of the 10-byte vector header tell us the size of the vector
    vector_size = np.frombuffer(data[start : start + 4].tobytes(), dtype=">i4")[0]
    pos = start + 4
    strings = []
    for _ in range(vector_size):
        # the first byte of one vector element tells us the length of the string
        string_len = data[pos]
        pos += 1
        # the rest is then just the string :)
        strings.append(data[pos : pos + string_len].tobytes().decode())
        pos += string_len
    return strings, pos

In [7]:
meta["EventFormatStreamDAOD_PHYSLITE"].num_baskets

1

In [8]:
data = meta["EventFormatStreamDAOD_PHYSLITE"].basket(0).data
pos = 0
branchNames, pos = read_vector_string(data, pos)  # m_branchNames: vector<string>
classNames, pos = read_vector_string(data, pos)  # m_classNames: vector<string>
parentNames, pos = read_vector_string(data, pos)  # m_parentNames: vector<string>
branchHashes = np.frombuffer(data[pos + 10 :].tobytes(), dtype=">u4")  # m_branchHashes: vector<unsigned int>

In [9]:
hash_to_branchname = dict(zip(branchHashes, branchNames))
hash_to_branchname

{644236368: 'xTrigDecisionAux.',
 227374555: 'METAssoc_AnalysisMETAux.',
 1004693582: 'EventInfoAux.',
 804885368: 'Kt4EMPFlowEventShapeAux.',
 164670216: 'AnalysisElectronsAux.',
 982622728: 'AnalysisJetsAux.',
 382957617: 'AnalysisLargeRJetsAux.',
 200696525: 'AnalysisMuonsAux.',
 625597357: 'AnalysisPhotonsAux.',
 788942073: 'AnalysisTauJetsAux.',
 1061584385: 'BTagging_AntiKt4EMPFlowAux.',
 226189860: 'BornLeptonsAux.',
 348183643: 'CombinedMuonTrackParticlesAux.',
 36553883: 'ExtrapolatedMuonTrackParticlesAux.',
 323446075: 'GSFConversionVerticesAux.',
 308113400: 'GSFTrackParticlesAux.',
 355456802: 'HLTNav_RepackedFeatures_METAux.',
 919843450: 'HLTNav_RepackedFeatures_ParticleAux.',
 809018786: 'HLTNav_Summary_DAODSlimmedAux.',
 303174265: 'HardScatterParticlesAux.',
 979674879: 'HardScatterVerticesAux.',
 360752931: 'InDetTrackParticlesAux.',
 447627443: 'MET_Core_AnalysisMETAux.',
 364553125: 'MET_TruthAux.',
 59619305: 'MuonSpectrometerTrackParticlesAux.',
 951301671: 'Prima

In [10]:
tree = uproot.open(f"{filename}:CollectionTree")

In [11]:
tree.show(filter_name="MET*", name_width=80, typename_width=50)

name                                                                             | typename                                           | interpretation                
---------------------------------------------------------------------------------+----------------------------------------------------+-------------------------------
METAssoc_AnalysisMETAux.                                                         | xAOD::MissingETAuxAssociationMap_v2                | AsGroup(<TBranchElement 'ME...
METAssoc_AnalysisMETAux./METAssoc_AnalysisMETAux.xAOD::AuxContainerBase          | unknown                                            | <UnknownInterpretation 'non...
METAssoc_AnalysisMETAux./METAssoc_AnalysisMETAux.jetLink                         | std::vector<ElementLink<DataVector<xAOD::Jet_v1>>> | AsJagged(AsStridedObjects(M...
METAssoc_AnalysisMETAux./METAssoc_AnalysisMETAux.objectLinks                     | std::vector<std::vector<ElementLink<DataVector<... | AsObjects(AsVector(True, As..

In [12]:
tree["METAssoc_AnalysisMETAux.jetLink"].debug(0, dtype=">i4")

--+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+-
 64   0   0  92  64   9   0   0 167   6 112  11   0   0   0  10  26 253  25  25
  @ --- ---   \   @ --- --- --- --- ---   p --- --- --- --- --- --- --- --- ---
     1073741916      1074331648     -1492750325              10       452794649
--+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+-
 26 253  25  25  26 253  25  25  26 253  25  25  26 253  25  25  26 253  25  25
--- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- ---
      452794649       452794649       452794649       452794649       452794649
--+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+-
 26 253  25  25  26 253  25  25  26 253  25  25   0   0   0   0   0   0   0   0
--- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- ---
      452794649       452794649       452794649               0               0
--+---+---+---+---+---+---+---+---+--

In [13]:
links = tree["METAssoc_AnalysisMETAux.jetLink"].array() # buggy
links

In [14]:
class Interpretation(uproot.interpretation.Interpretation):
    cache_key = "jetlink"
    _header_bytes = 16

    def basket_array(
        self,
        data,
        byte_offsets,
        basket,
        branch,
        context,
        cursor_offset,
        library,
        interp_options,
    ):
        byte_starts = byte_offsets[:-1] + self._header_bytes
        byte_stops = byte_offsets[1:]

        # mask out the headers
        header_offsets = np.arange(self._header_bytes)
        header_idxs = (byte_offsets[:-1] + header_offsets[:, np.newaxis]).ravel()
        mask = np.full(len(data), True, dtype=np.bool_)
        mask[header_idxs] = False
        data = data[mask]

        byte_counts = byte_stops - byte_starts
        counts = byte_counts

        offsets = np.empty(len(counts) + 1, dtype=np.int32)
        offsets[0] = 0
        np.cumsum(counts, out=offsets[1:])
        
        starts = offsets[:-1]
        stops = offsets[1:]
        counts = stops - starts

        content = ak.contents.NumpyArray(data.view(">i4").newbyteorder().byteswap())
        starts_key = starts // 4
        stops_key = (starts + counts // 2) // 4
        starts_index = stops_key
        stops_index = stops // 4

        m_persKey = ak.Array(
            ak.contents.ListArray(
                ak.index.Index(starts_key), ak.index.Index(stops_key), content
            )
        )
        m_persIndex = ak.Array(
            ak.contents.ListArray(
                ak.index.Index(starts_index), ak.index.Index(stops_index), content
            )
        )
        return ak.zip({"m_persKey": m_persKey, "m_persIndex": m_persIndex})

Interpretation.final_array = uproot.interpretation.objects.AsObjects.final_array

In [15]:
tree._file.array_cache.clear()
tree._file.object_cache.clear()
array = tree["METAssoc_AnalysisMETAux.jetLink"].array(interpretation=Interpretation())
array

In [16]:
array[0]

In [17]:
array[1]

In [18]:
tree["METAssoc_AnalysisMETAux.objectLinks"].array().m_persKey

In [19]:
hash_to_branchname[956497600]

'AnalysisElectrons'

In [20]:
hash_to_branchname[518718875]

'AnalysisTauJets'

In [21]:
hash_to_branchname[902907695]

'AnalysisPhotons'

In [22]:
tree["METAssoc_AnalysisMETAux.objectLinks"].array().m_persKey[0]

In [23]:
tree["METAssoc_AnalysisMETAux.objectLinks"].array().m_persIndex[0]

In [24]:
tree["METAssoc_AnalysisMETAux.objectLinks"].array().m_persIndex[1]

In [25]:
def read_metassoc(tree):
    array = {}
    for key in tree.keys(filter_name="METAssoc*"):
        try:
            key = key.split("/")[1]
        except IndexError:
            pass
        if "AuxContainerBase" in key or key.endswith(".") or not "." in key:
            continue
        field = key.split(".")[-1]
        if "jetLink" in key:
            interpretation = Interpretation()
        else:
            interpretation = None
        array[field] = tree[key].array(interpretation=interpretation)
    return ak.zip(array, depth_limit=1)

In [26]:
assoc = read_metassoc(tree)
assoc

In [27]:
assoc.overlapIndices[15]

In [28]:
def pp_hash(stuff):
    for event in stuff.tolist():
        print([hash_to_branchname[h] for h in event])
        print()

In [29]:
pp_hash(assoc.objectLinks[15].m_persKey)

['AnalysisMuons']

[]

[]

[]

[]

[]

[]

[]

['AnalysisMuons', 'AnalysisPhotons', 'AnalysisTauJets']



In [30]:
assoc.isMisc[0]

In [31]:
assoc.isMisc[1]

In [32]:
hash_to_branchname[518718875]

'AnalysisTauJets'

In [33]:
hash_to_branchname[980095599]

'AnalysisMuons'

In [34]:
hash_to_branchname[518718875]

'AnalysisTauJets'

In [35]:
assoc.isMisc

In [36]:
assoc.objectLinks.m_persIndex[:1]

In [37]:
assoc.objectLinks.m_persIndex[assoc.isMisc]

In [38]:
assoc.isMisc == 1

In [39]:
assoc.isMisc == 1

In [40]:
array = ak.Array([[[1], [], [2, 3]], [[4, 5], [6], []]])
array

In [41]:
mask = ak.Array([[True, False, True], [False, False, True]])
mask

In [42]:
array[mask]

In [43]:
assoc.objectLinks[assoc.isMisc == 1].m_persKey[:, 0]

In [44]:
pp_hash(ak.firsts(assoc.objectLinks[assoc.isMisc == 1].m_persKey))

['AnalysisPhotons']

[]

['AnalysisMuons', 'AnalysisTauJets']

['AnalysisTauJets']

[]

[]

['AnalysisPhotons']

[]

[]

[]

['AnalysisPhotons']

[]

[]

['AnalysisMuons', 'AnalysisTauJets']

[]

['AnalysisMuons', 'AnalysisPhotons', 'AnalysisTauJets']

[]

[]

[]

[]

[]

[]

[]

['AnalysisMuons']

[]

['AnalysisPhotons']

[]

[]

['AnalysisPhotons']

[]

[]

[]

[]

[]

[]

[]

['AnalysisMuons', 'AnalysisTauJets']

[]

['AnalysisMuons']

['AnalysisPhotons']

[]

[]

[]

[]

['AnalysisPhotons']

['AnalysisPhotons']

['AnalysisMuons']

[]

['AnalysisMuons']

['AnalysisMuons', 'AnalysisPhotons', 'AnalysisTauJets']

['AnalysisMuons', 'AnalysisPhotons', 'AnalysisTauJets']

[]

[]

[]

[]

[]

['AnalysisPhotons']

[]

['AnalysisMuons', 'AnalysisTauJets']

[]

['AnalysisPhotons']

[]

['AnalysisMuons', 'AnalysisPhotons']

['AnalysisMuons']

['AnalysisMuons', 'AnalysisPhotons', 'AnalysisTauJets']

[]

[]

['AnalysisMuons', 'AnalysisTauJets']

[]

['AnalysisMuons']

['AnalysisMuons', 'Analysis

In [45]:
assoc.jettrkpx

In [46]:
assoc.jetLink.m_persKey[0]

In [47]:
assoc.calkey

In [48]:
assoc.trkpx[0]

In [49]:
assoc.trkkey[0]

# Concept

Going through the steps of TJ's high-level summary (https://gitlab.cern.ch/khoo/METRecoTutorial)

1. A set of selected objects is identified as inputs to the lepton/photon/tau MET terms.

Ok, we assume to already have that - e.g. attached a `baseline` flag to the objects

2. Iterating over the selected objects in some defined order of priority, any objects overlapping a prior object are discarded.

The overlaps we get from the `objectLink` and `overlapingIndices` field in the association map:

In [50]:
assoc.objectLinks[0].m_persKey

In [51]:
assoc.overlapIndices[0]

3. The remaining objects are summed into their respective MET terms.

Ok, so with the procedure before we could have stored a flag e.g. `pass_met` and then do the summing of `px`, `py` and `summet` based on that

4. Energy/momentum associated with any of the selected objects is removed from the jets that contain them.

For the association we have `trkkey` and `calkey`

In [52]:
assoc.trkkey[0]

In [53]:
assoc.calkey[0]

These are links into `trkpx`, `trkpy`, ...

In [54]:
assoc.trkpx[0]

In [55]:
assoc.calpx[0]

mmhh something seems odd? have indices 2, 3 here but only 2 entries? -> check what the c++ does?
it's mentioned the cal/trk keys are bitmasks - is that true?

see the [tutorial](https://gitlab.cern.ch/khoo/METRecoTutorial/-/blob/master/Root/DemoMETRebuilding.cxx#L174) how to remove energy momentum of the jet - also helps for the next step:

5. Jets are added into their own MET term if they pass a selection defined by the MET group2 and if they did not lose the bulk of their energy in step 4.
6. Any jets failing the overlap removal have their residual momentum added to the core soft term.
7. The momentum associated with objects in the misc association that were not selected is added to the soft term.

Last point again should be able to use `objectLinks` and `trkKey`

finally

8. All MET terms (hard and soft) are summed up to form the total MET in the event.