<h1 style="font-size:4em;">
    ServiceX <span style="color:blue;">intro</span> for ATLAS
</h1>
<br>
<br>
<br>
<br>
<br>
<br>

<img src="img/logo_ut.png" width="220"  style="float:right" alt="UTAustin">

### KyungEon Choi (UT Austin)

### Python/Columnar PHYSLITE Analysis Meeting | Oct 24, 2023



<br>
<br>
<br>
<br>
<br>

<h2>ServiceX</h2>
<img src="img/logo_servicex.png" width="120" height="100"  style="float:right" alt="ServiceX">
<img src="img/logo_irishep.png" width="150" height="100"  style="float:right" alt="iris-hep">
<br>
<font size="3">
    
- ServiceX is a software R\&D project of IRIS-HEP to investigate new computational models for HL-LHC era
- ServiceX is a <b>data delivery service! and more...</b> 
    
<!--     A service to <b><span style="color:red;">quickly</span></b> access a "fraction" of large data on the grid  -->

<br>
<br>    
<br>
<br>
Rucio is a popular data delivery service for analyzers to transfer/download PHYSLITE into Analysis Facility/local machine. The exact copy at the final destination.
<img src="img/rucio1.png" width="700"  style="float:center" alt="rucio">

or more often analyzers produce ROOT Ntuple (or RNTuple) from PHYS/PHYSLITE using analysis frameworks such as TopCPToolKit. Derived information + augmented information in ROOT Ntuple. Still the exact copy at the final destination.
<img src="img/rucio3.png" width="700"  style="float:center" alt="rucio">
    
<br>
<br>

ServiceX is deployed at a Kubernetes cluster and located between data center and local storage. It delivers data but does something to delivery only necessary information!
<img src="img/servicex1.png" width="700"  style="float:center" alt="servicex">
    
    
</font>
<br>
<br>
<br>
<br>

<h2> ServiceX under the hood </h2>
<font size="3">
<br>
A schematic of ServiceX includes key components (microservices).
<img src="img/servicex_detail_rev2.png" width="700"  style="float:center" alt="servicex2">
    
<b>Dataset Finder</b>
- Lookup input file list for transformers 
- Support Rucio datasets and XRootD paths
    
<b>Transformer</b>
- The core of ServiceX
- Extract and select information from input file and also possible to augment using available information
- Code injected to transformer is generated from user query
- Results can be streamed into different file formats
- Spawn transformer pods on-demand and Horizontal Pod Autoscaling (HPA)
- XCache layer in front of transformer to allow much faster access for popular datasets
- Co-location of ServiceX and data center to allow a wide network bandwidth

<b>Object store</b>
- Results from each tranformer are written to ServiceX object store
- Asynchronous delivery of files into local storage 
- or delivery of URIs (S3 paths) to consume result files later
    
<b>ServiceX App</b>
- Requests via a REST interface - users communicate using ServiceX client library
- Web dashboard for transformation status, access token, etc
    
</font>
<br>
<br>
<br>
<br>

<h2> ServiceX for PHYSLITE (Today)</h2>
<br>
<font size="3">

- <b>xAOD transformer running <code>EventLoop</code></b>
    - R22 C++ code generated from FuncADL query
    - Capable of applying systematics
- <b>Uproot transformer</b>
    - Python code similar to Nicolai's coffea PHYSLITE schema*

</font>    
<p style="text-align:right"><span>&#42;</span>Python code is not for production</p>
<br>
<br>
<br>

<h2>Talk to ServiceX</h2>
<br>
<font size="3">

- <b>ServiceX client library</b>
    - Python package - <code>pip install servicex</code>
    - Provides lots of features - submit ServiceX request, download/stream outputs, handling of access token, and <b>local cache</b>, etc
    - New version (v3) is about to be released 
    - <code>servicex==2.7.0</code> in the following demo

    <br>
    
- <b>ServiceX DataBinder</b>
    - Python package - <code>pip install servicex-databinder</code>
    - Wrapper pacakge of <code>servicex</code>
    - Provides easy manipulations of ServiceX request<b>s</b> using a single configuration file
    - <code>servicex-databinder==0.5.0</code> in the following demo
    
</font>    
<!-- <br> -->


<b>Input dataset</b>

- <code>mc21_13p6TeV.601229.PhPy8EG_A14_ttbar_hdamp258p75_SingleLep.deriv.DAOD_PHYSLITE.e8453_s3873_r13829_p5855</code>
- 583 files
- 892GB

<h3>xAOD transformer + ServiceX client library</h3>

- One more package needed to handle PHYSLITE: <code>func_adl_servicex_xaodr22</code>
- Running 20 files

In [None]:
from servicex import ignore_cache
from func_adl_servicex_xaodr22 import calib_tools, SXDSAtlasxAODR22

In [None]:
ds = SXDSAtlasxAODR22(
    'mc21_13p6TeV.601229.PhPy8EG_A14_ttbar_hdamp258p75_SingleLep.deriv.DAOD_PHYSLITE.e8453_s3873_r13829_p5855?files=20', 
    backend="servicex-testing1"
)
ds = calib_tools.query_update(    
    ds, calib_config=calib_tools.default_config("PHYSLITE")
)

In [None]:
good_jets = ds.Select(
    lambda e: {
        "run": e.EventInfo("EventInfo").runNumber(),
        "event": e.EventInfo("EventInfo").eventNumber(),
        "good_jets": e.Jets().Where(lambda j: (j.pt() / 1000 > 25.0) and (abs(j.eta()) < 2.5)),
    }
)

In [None]:
jet_pt = good_jets.Select(lambda e: {
    "run": e.run,
    "event": e.event,
    "pt": e.good_jets.Select(lambda ele: ele.pt() / 1000.0),
}).AsAwkwardArray()

In [None]:
with ignore_cache():
    jet_data = jet_pt.value()

In [None]:
print(jet_data.fields)
print(jet_data.pt)
print(len(jet_data))

<img src="img/servicex_demo_rev1.png" width="800"  style="float:center" alt="servicex3">

<h3>Uproot transformer + ServiceX DataBinder </h3>

In [None]:
%%writefile config_physlite.yaml
General:
  ServiceXName: servicex-testing1
  Transformer: python
  OutputFormat: root
  IgnoreServiceXCache: True

Sample: 
  - Name: ttbar_PHYSLITE
    RucioDID: mc21_13p6TeV:mc21_13p6TeV.601229.PhPy8EG_A14_ttbar_hdamp258p75_SingleLep.deriv.DAOD_PHYSLITE.e8453_s3873_r13829_p5855
    Function: DEF_function_physlite
  - Name: ttH
    Transformer: uproot
    RucioDID: user.kchoi:user.kchoi.fcnc_tHq_ML.ttH.v11
    Tree: nominal
    Filter: met_met > 100e3
    Columns: el_pt, el_eta, el_phi, el_e, el_charge

Definition:
  DEF_function_physlite: |
    def run_query(input_filenames=None):
      import uproot
      import awkward as ak
      schema = {
        "Electrons": ["pt", "eta", "phi", "m"],
        "Muons": ["pt", "eta", "phi"],
        "Jets": ["pt", "eta", "phi", "m"],
        "BTagging_AntiKt4EMPFlow": ["DL1dv01_pb"]
      }
      with uproot.open(f"{input_filenames}:CollectionTree") as o:
        evts = {}
        for objname, fields in schema.items():
          base = objname
          if objname in ["Electrons", "Muons", "Jets"]:
            base = "Analysis" + objname
          arrays = o.arrays(fields, aliases={field: f"{base}AuxDyn.{field}" for field in fields})
          arrays = ak.zip(dict(zip(arrays.fields, ak.unzip(arrays))))
          evts[objname] = arrays
        events = ak.zip(evts, depth_limit=1)
        events = events[ak.all(events.Electrons.pt > 100e3, axis=1)]
      return {"CollectionTree": events}


In [None]:
from servicex_databinder import DataBinder

In [None]:
sx_db = DataBinder('config_physlite.yaml')
out = sx_db.deliver()

<img src="img/servicex_demo2.png" width="800"  style="float:center" alt="servicex4">

<h1> Outlook </h1>

<img src="img/servicex_future1.png" width="800"  style="float:center" alt="servicex5">

<img src="img/servicex_future2.png" width="800"  style="float:center" alt="servicex6">