# Generate Qastle Samples

In hacking this together, a small class was built to generate `qastle`. It isn't obvious this should remain in this work flow, but it did get things going fast, and it is obviously useful. As a work item, this should be included in the `func_adl` packages (see [issue](https://github.com/iris-hep/func_adl_servicex/issues/9)).

But since that isn't here yet, lets do some samples using this class here.

In [1]:
from sx_multi import FuncAdlQastle

## Getting the electron $p_T$'s from an xAOD

This is tuned to fetch all the jet $p_T$ from an xAOD. It returns them as a jagged array, with each entry containing all the electron $p_T$'s for that events. The style of the query is matched to the `xAOD` data model - while the `func_adl` concepts will work anywhere, things like `e.pt()` or `e.Electrons("Electrons")` mean specific things with reference to the ATLAS `xAOD`'s datamodel.

In [2]:
ds = FuncAdlQastle()
ele_pt = ds \
        .Select(lambda e: e.Electrons("Electrons")) \
        .Select(lambda ls: ls.Select(lambda e: e.pt()/1000.0)) \
        .AsROOTTTree('data.root', 'mytree', 'ElePt')

In [3]:
print(ele_pt.value())

(call ResultTTree (call Select (call Select (call EventDataset 'FuncAdlQastle') (lambda (list e) (call (attr e 'Electrons') 'Electrons'))) (lambda (list ls) (call (attr ls 'Select') (lambda (list e) (/ (call (attr e 'pt')) 1000.0))))) 'ElePt' 'mytree' 'data.root')


Lets do the same, but only for those with a value greater than 30 GeV (the jet pT is in MeV in ATLAS).

In [4]:
ds = FuncAdlQastle()
ele_pt = ds \
        .Select(lambda e: e.Electrons("Electrons")) \
        .Select(lambda electrons: electrons.Where(lambda e: e.pt()/1000.0 > 30)) \
        .Select(lambda ls: ls.Select(lambda e: e.pt()/1000.0)) \
        .AsROOTTTree('data.root', 'mytree', 'ElePt')

In [5]:
print(ele_pt.value())

(call ResultTTree (call Select (call Select (call Select (call EventDataset 'FuncAdlQastle') (lambda (list e) (call (attr e 'Electrons') 'Electrons'))) (lambda (list electrons) (call (attr electrons 'Where') (lambda (list e) (> (/ (call (attr e 'pt')) 1000.0) 30))))) (lambda (list ls) (call (attr ls 'Select') (lambda (list e) (/ (call (attr e 'pt')) 1000.0))))) 'ElePt' 'mytree' 'data.root')


And only events where there are at least 3 of these electrons...

In [6]:
ds = FuncAdlQastle()
ele_pt = ds \
        .Select(lambda e: e.Electrons("Electrons")) \
        .Select(lambda electrons: electrons.Where(lambda e: e.pt()/1000.0 > 30)) \
        .Where(lambda electrons: electrons.count() >= 3) \
        .Select(lambda ls: ls.Select(lambda e: e.pt()/1000.0)) \
        .AsROOTTTree('data.root', 'mytree', 'ElePt')

In [7]:
print(ele_pt.value())

(call ResultTTree (call Select (call Where (call Select (call Select (call EventDataset 'FuncAdlQastle') (lambda (list e) (call (attr e 'Electrons') 'Electrons'))) (lambda (list electrons) (call (attr electrons 'Where') (lambda (list e) (> (/ (call (attr e 'pt')) 1000.0) 30))))) (lambda (list electrons) (>= (call (attr electrons 'count')) 3))) (lambda (list ls) (call (attr ls 'Select') (lambda (list e) (/ (call (attr e 'pt')) 1000.0))))) 'ElePt' 'mytree' 'data.root')


## Getting multiple items

If we want to get multiple items in this same format, and have them as arrays (e.g. a seperate array for eta, phi, pt), then we just double up a bit.

In [8]:
ds = FuncAdlQastle()
leptons_per_event_query = ds \
        .Select(lambda e: e.Electrons("Electrons")) \
        .Select(lambda ls: (ls.Select(lambda e: e.pt()/1000.0), ls.Select(lambda e: e.eta()), ls.Select(lambda e: e.phi()), ls.Select(lambda e: e.m()/1000.0))) \
        .AsROOTTTree('data.root', 'mytree', ('ElePt', 'EleEta', 'ElePhi', 'EleM'))

In [9]:
print(leptons_per_event_query.value())

(call ResultTTree (call Select (call Select (call EventDataset 'FuncAdlQastle') (lambda (list e) (call (attr e 'Electrons') 'Electrons'))) (lambda (list ls) (list (call (attr ls 'Select') (lambda (list e) (/ (call (attr e 'pt')) 1000.0))) (call (attr ls 'Select') (lambda (list e) (call (attr e 'eta')))) (call (attr ls 'Select') (lambda (list e) (call (attr e 'phi')))) (call (attr ls 'Select') (lambda (list e) (/ (call (attr e 'm')) 1000.0)))))) (list 'ElePt' 'EleEta' 'ElePhi' 'EleM') 'mytree' 'data.root')


Lets say you wanted to do a single entry per electron - instead of per event - so if you were going to feed this to a normal NN training. The result of below can be easily loaded into pandas - it isn't jagged at all.

In [10]:
ds = FuncAdlQastle()
lepton_table = ds \
        .SelectMany(lambda evt: evt.Electrons("Electrons")) \
        .Select(lambda e: (e.pt()/1000.0, e.eta(), e.phi(), e.m()/1000.0)) \
        .AsROOTTTree('data.root', 'mytree', ('ElePt', 'EleEta', 'ElePhi', 'EleM'))

In [11]:
print(lepton_table.value())

(call ResultTTree (call Select (call SelectMany (call EventDataset 'FuncAdlQastle') (lambda (list evt) (call (attr evt 'Electrons') 'Electrons'))) (lambda (list e) (list (/ (call (attr e 'pt')) 1000.0) (call (attr e 'eta')) (call (attr e 'phi')) (/ (call (attr e 'm')) 1000.0)))) (list 'ElePt' 'EleEta' 'ElePhi' 'EleM') 'mytree' 'data.root')


## Working with columns from a flat root tuple file.

The backend is a little different. It expects its outputs to be dictionaries. Eventually, the two backends will converge - this is just some work that is left to be done by the `func_adl` team.

First, lets get a few branches out:

In [12]:
ds = FuncAdlQastle()
jet_pt = ds.Select(lambda e: {'JetPT': e['AnalysisJetsAuxDyn.pt']})

In [13]:
print(jet_pt.value())

(call Select (call EventDataset 'FuncAdlQastle') (lambda (list e) (dict (list 'JetPT') (list (subscript e 'AnalysisJetsAuxDyn.pt')))))


Note some differences with the above:

- There is no `AsROOTTTree`. Eventually the `xAOD` transformer will move to that model
- The column names are specified as a dictionary. This will also be supported in the xAOD transformer.
- Column names have to be specified in quotes = this refers to the ATLAS PHYSLITE sample, and there are some issues we have to resovle before we can get rid of the quotes.

Several columns with no cuts are also going to be pretty easy:

In [14]:
ds = FuncAdlQastle()
jet_pt = ds.Select(lambda e: {
    'JetPT':  e['AnalysisJetsAuxDyn.pt'],
    'JetEta': e['AnalysisJetsAuxDyn.eta']
})

In [15]:
print(jet_pt.value())

(call Select (call EventDataset 'FuncAdlQastle') (lambda (list e) (dict (list 'JetPT' 'JetEta') (list (subscript e 'AnalysisJetsAuxDyn.pt') (subscript e 'AnalysisJetsAuxDyn.eta')))))
