# Using FuncX

This notebook will fetch data from ServiceX and process it via coffea on `funcx` processors.

In [1]:
from make_it_sync import make_sync
import matplotlib.pyplot as plt
from coffea import hist, processor

from sx_multi import FuncAdlQastle, sx_event_stream, process_coffea_funcx

# Fetching the data

We want to pull back only electrons with $p_T > 30$ GeV, and $abs(\eta)<2.5$. Aid since this is a super-simple algorithm - we will limit it to those events that have just two electrons.

In [2]:
ds = FuncAdlQastle()
leptons_per_event_query = ds \
        .Select(lambda e: e.Electrons("Electrons")) \
        .Select(lambda eles: eles.Where(lambda e: e.pt()/1000.0 > 30.0)) \
        .Select(lambda eles: eles.Where(lambda e: abs(e.eta()) < 2.5)) \
        .Where(lambda eles: len(eles) == 2) \
        .Select(lambda ls: (ls.Select(lambda e: e.pt()/1000.0), ls.Select(lambda e: e.eta()), ls.Select(lambda e: e.phi()), ls.Select(lambda e: e.m()/1000.0))) \
        .AsROOTTTree('data.root', 'mytree', ('ElePt', 'EleEta', 'ElePhi', 'EleM'))

And the dataset identifier we want to be scanning:

In [3]:
did = 'mc15_13TeV:mc15_13TeV.361106.PowhegPythia8EvtGen_AZNLOCTEQ6L1_Zee.merge.DAOD_STDM3.e3601_s2576_s2132_r6630_r6264_p2363_tid05630052_00'

# Define Coffea Process Function

This will get the data from above (in servicex). It gets access to a single file and must open it, and then build the invar mass.

In [4]:
class ZMassProcessor(processor.ProcessorABC):
    def __init__(self):
        self._accumulator = processor.dict_accumulator({
            "sumw": processor.defaultdict_accumulator(float),
            "mass": hist.Hist(
                "Events",
                hist.Bin("mass", "$Z_{ee}$ [GeV]", 60, 60, 120),
            ),
        })

    @property
    def accumulator(self):
        return self._accumulator

    def process(self, events):
        output = self.accumulator.identity()
        import awkward1 as ak

        # Because we aren't using a scheme, build one by hand.
        electrons = ak.zip({
            "pt": events.ElePt,
            "eta": events.EleEta,
            "phi": events.ElePhi,
            "mass": events.EleM,
            "charge": events.EleM,
        }, with_name="PtEtaPhiMCandidate")

        # form the invar mass, plot.
        cut = (ak.num(electrons) == 2)
        diele = electrons[cut][:, 0] + electrons[cut][:, 1]

        output["sumw"]['bogus'] += len(events)
        output["mass"].fill(
            mass=diele.mass,
        )

        return output

    def postprocess(self, final_results):
        return final_results

# Run through funcx

First, the data stream from servicex.

In [5]:
servicex_data = sx_event_stream(did, leptons_per_event_query)

Now we get the output stream from funcx of data that is done!

*Note* The first time this is called in a session you'll get asked for a `funcx` auth token. Follow the web url and get the token, and past it back into the box.

In [6]:
accumulated_results = process_coffea_funcx(servicex_data, ZMassProcessor())

Plot - attempt to update the plot as it comes in (but don't know how to do that!)

In [7]:
async def plot_stream(accumulator_stream):
  async for coffea_info in accumulator_stream:
    # Need to ask coffea folks how to anomate this!
    hist.plot1d(coffea_info['mass'])
    plt.show()
  return coffea_info

await plot_stream(accumulated_results)

TypeError: module, class, method, function, traceback, frame, or code object was expected, got ZMassProcessor

One of the cool things about the above is that the *data* never comes down to your machine over the wire except for the histograms! The data streamed off servicex is sent directly from servicex to the funcx processor machines. In this demo, both happened to be located in the `river` cluster at Chicago. While everything is controled from wherever this notebook is running - data only moves around in the analysis facility!