# Test PHYSLITE Combination

This notebook will test ideas for combining the calibrated values from PHYSLITE and the uncalibrated values from LLP1.

* Should be faster than running calibrations of LLP1
* Will be an in-memory test, which likely won't work for the full dataset.

The datasets we'll use:

In [1]:
did_LLP1 = "mc23_13p6TeV:mc23_13p6TeV.802746.Py8EG_Zprime2EJs_Ld20_rho40_pi10_Zp2600_l1.deriv.DAOD_LLP1.e8531_s4159_r15530_p6463"
did_PHYSLITE = "mc23_13p6TeV:mc23_13p6TeV.802746.Py8EG_Zprime2EJs_Ld20_rho40_pi10_Zp2600_l1.deriv.DAOD_PHYSLITE.e8531_s4159_r15530_p6491"

## Imports

In [2]:
import awkward as ak
import numpy as np
from func_adl_servicex_xaodr25 import FuncADLQueryPHYSLITE
from servicex import Sample, ServiceXSpec, dataset, deliver
from servicex_analysis_utils import to_awk

from calratio_training_data import RunConfig, fetch_training_data

## Fetching the data

First the PHYSLITE data. We have to do this by hand, of course.

In [3]:
# Define the base query
base_query = FuncADLQueryPHYSLITE()

# Query to fetch muons and MET
query = base_query.Select(
    lambda e: {
        "jets": e.Jets().Where(lambda j: j.pt() >= 40 and abs(j.eta()) < 2.5),
        "event_info": e.EventInfo("EventInfo"),
    }
).Select(
    lambda e: {
        "jet_pt": e.jets.Select(lambda jet: jet.pt() / 1000.0),
        "jet_eta": e.jets.Select(lambda jet: jet.eta()),
        "jet_phi": e.jets.Select(lambda jet: jet.phi()),
        "run": e.event_info.runNumber(),
        "event": e.event_info.eventNumber(),
    }
)

# Fetch the data
data = to_awk(
    deliver(
        ServiceXSpec(
            Sample=[
                Sample(
                    Name="did_PHYSLITE",
                    Dataset=dataset.Rucio(did_PHYSLITE),
                    Query=query,
                )
            ]
        ),
    )
)["did_PHYSLITE"]

# Next, reformat it so it is per-jet, the way our training data is
data_PHYSLITE = ak.values_astype(
    ak.zip(
        {
            "pt": ak.flatten(data["jet_pt"]),
            "eta": ak.flatten(data["jet_eta"]),
            "phi": ak.flatten(data["jet_phi"]),
            "runNumber": ak.flatten(
                ak.broadcast_arrays(data["run"], data["jet_pt"])[0], axis=1
            ),
            "eventNumber": ak.flatten(
                ak.broadcast_arrays(data["event"], data["jet_pt"])[0], axis=1
            ),
        },
        with_name="Momentum3D",
    ),
    np.float32,
)

data_PHYSLITE.type.show()

Output()

411807 * Momentum3D[
    pt: float32,
    eta: float32,
    phi: float32,
    runNumber: float32,
    eventNumber: float32
]


And the training data

* For this test we had to turn off the jet cleaning tool, as this version of LLP1 does not have the jet cleaning data. This was done by modifying the source code by hand (and hopefully not checking it in!).
    * Modify the `training_query.py` `good_training_jet` function - comment out the call to `jet_clean_llp`.

In [4]:
data_LLP1 = fetch_training_data(did_LLP1, RunConfig(run_locally=False, ignore_cache=False))
data_LLP1.type.show()

Output()

183797 * Momentum3D[
    runNumber: uint32,
    eventNumber: uint64,
    pt: float32,
    eta: float32,
    phi: float32,
    tracks: var * Momentum3D[
        eta: float32,
        phi: float32,
        pt: float32,
        vertex_nParticles: float32,
        d0: float32,
        z0: float32,
        chiSquared: float32,
        PixelShared: float32,
        SCTShared: float32,
        PixelHoles: float32,
        SCTHoles: float32,
        PixelHits: float32,
        SCTHits: float32
    ],
    clusters: var * Momentum3D[
        eta: float32,
        phi: float32,
        pt: float32,
        l1hcal: float32,
        l2hcal: float32,
        l3hcal: float32,
        l4hcal: float32,
        l1ecal: float32,
        l2ecal: float32,
        l3ecal: float32,
        l4ecal: float32,
        time: float32
    ],
    msegs: var * {
        etaPos: float32,
        phiPos: float32,
        etaDir: float32,
        phiDir: float32,
        t0: float32,
        chiSquared: float32
    }
]


## Combining

This combination is done in memory to test out the basic mechanism.

In [5]:
def group_by_event(d):
    key = ak.zip({"run": d.runNumber, "evt":d.eventNumber}, depth_limit=1)
    run_ordered = ak.argsort(key.run, stable=True)
    run_runs = ak.run_lengths(key[run_ordered].run)
    key_by_event = ak.unflatten(key[run_ordered], run_runs)

    event_ordered = ak.argsort(key_by_event.evt, stable=True, axis=-1)
    event_runs = ak.run_lengths(key_by_event[event_ordered].evt)
    group_event = ak.unflatten(key_by_event[event_ordered], ak.flatten(event_runs), axis=-1)

    return group_event

group_event_PHYSLITE = group_by_event(data_PHYSLITE)
group_event_LLP1 = group_by_event(data_LLP1)

TypeError: Encountered a scalar (int), but scalar conversion/promotion is disabled

In [71]:
group_event_LLP1[:].run[0][0]

In [56]:
ak.max(group_event_LLP1[0].evt, keepdims=True, axis=1)