# Exploring The Data

Looking at the data to see how to access enough columns to make this relevant.

In [1]:
from func_adl_servicex_xaodr21 import atlas_release
# TODO: Update to use R22/23 or whatever.
from func_adl_servicex_xaodr21 import SXDSAtlasxAODR21

from hist.dask import Hist
import dask_awkward as dak

print(f'Using release {atlas_release}')

Using release 21.2.231


Setup the dataset we will use for testing.

In [2]:
ttbar_all_rucio_dataset_name = "mc23_13p6TeV.601229.PhPy8EG_A14_ttbar_hdamp258p75_SingleLep.deriv.DAOD_PHYSLITE.e8514_s4162_r14622_p6026"
ttbar_all = f"rucio://{ttbar_all_rucio_dataset_name}?files=1"
ds = SXDSAtlasxAODR21(ttbar_all, backend='atlasr22')

## ServiceX Query

Do an event-level query - so lists of jets, met, etc, all at the top level.

In [3]:
# TODO: The EventInfo argument should default correctly (that may just be a matter of using func_adl xaod r22)
# TODO: dataclass should be supported so as not to lose type-following!
query = (ds
         .Select(lambda e: {
             'evt': e.EventInfo("EventInfo"),
             'jet': e.Jets("AnalysisJets", calibrate=False)
             })
         .Select(lambda ei: {
             'event_number': ei.evt.eventNumber(),
             'run_number': ei.evt.runNumber(),
             'jet_pt': ei.jet.Select(lambda j: j.pt()/1000)
         })
)



We do not have tight integration into `dask_awkward` until there is extra code working, so lets grab all the data.

In [4]:
# Start by grabbing the data as an awkward array
# TODO: Files should remain in the S3 cache and be read directly from there
data = query.AsAwkwardArray().value()

## Plots

Next, lets make plots of everything

In [5]:
# Quick construction, no other imports needed:
h = (
    Hist.new.Reg(20, 0, 100000000, name="x", label="x-axis")
    .Int64()
)
r1 = h.fill(data.event_number)

In [6]:
# Quick construction, no other imports needed:
h = (
    Hist.new.Reg(20, 0, 200, name="x", label="Jet $p_T$")
    .Int64()
)
r2 = h.fill(dak.flatten(data.jet_pt))

In [7]:
r1.compute()

In [8]:
r2.compute()