# Dummy IO Layer

Lets implement a dummy IO layer that will be used to mock up what it might look like if ServiceX was down below.

* Have a calculation that looks at both `pt` and `eta` in the same expression (like a filter)
* Something that looks at `events` as the central object (e.g. dataset)
* Look at jets in those guys.

## Debugging Help

Easiest to understand this is step through it with the debugger... This code is just for that.

In [1]:
# Here so we can step into a debugger and figure this out!
# In vscode's debugger make sure to disable "justmycode" in the workspace settings under the
# jupyter plug-in!
import dask_awkward as dak

x = dak.from_json("data0.json")
result3 = x[x.x > 2].x
result3

dask.awkward<x, npartitions=1>

## Single Layer

* Just an object that holds an I/O layer
* Represents a single array of random floating point numbers

First, lets just do it one line at a time so we can figure this out...

Create the metadata for an array of numbers (like jet `pt`)

In [2]:
from genericpath import samefile
import awkward as ak
sample_array = ak.from_iter([1, 2, 3, 4, 5])
# TODO: We have to go into core here - does this mean `typetracer_array` is not a good thing to access? If not, how should we do this?
metadata = dak.core.typetracer_array(sample_array)
metadata

Need a generator function - in the end this is where the ServiceX call will go, I think. Though I'm a little worried because by the time this gets called, it is called with a "chunk".

In [3]:
def generate_data(block):
    print(f'In generate_data: {block}')
    return ak.from_iter([i for i in range(0, 100)])

Create the Input Layer

In [4]:
# Ok to access layers here?
name = 'unique-name'
dsk = dak.layers.AwkwardInputLayer(
        name=name,
        columns=None,
        inputs=['chunk1'],
        io_func=generate_data,
        meta=metadata,
        behavior=None,
    )

Now that we have the graph, we can actually build the dask array object

In [5]:
import dask
# Really - this feels like accessing something internal
hlg = dask.highlevelgraph.HighLevelGraph.from_collections(name, dsk)
my_x = dak.core.new_array_object(hlg, name, meta=metadata, npartitions=1)
my_x

dask.awkward<unique-name, npartitions=1>

Now the code looks like before...

In [6]:
result4 = my_x[my_x > 2]
result4

dask.awkward<getitem, npartitions=1>

In [9]:
result4.compute()

In generate_data: chunk1


And what does the graph look like?

In [10]:
result4.__dask_graph__()

0,1
layer_type  AwkwardInputLayer  is_materialized  False  number of outputs  1,

0,1
layer_type,AwkwardInputLayer
is_materialized,False
number of outputs,1

0,1
layer_type  Blockwise  is_materialized  False  number of outputs  1  depends on unique-name,

0,1
layer_type,Blockwise
is_materialized,False
number of outputs,1
depends on,unique-name

0,1
layer_type  Blockwise  is_materialized  False  number of outputs  1  depends on unique-name  greater-879fa2ec66a67e0f05d738bfea176758,

0,1
layer_type,Blockwise
is_materialized,False
number of outputs,1
depends on,unique-name
,greater-879fa2ec66a67e0f05d738bfea176758


Lets build a function that takes a (dummy) servicex query and returns a dask array. Then we can ask - how do we go from one to many files?

In [13]:
def generate_sx_daq(query: str) -> dak.Array:

    # Each file that returns will be simulated by a block of 100 numbers.
    # Call with a string that is the block number.
    def generate_data(block):
        print(f'In generate_data: {block}')
        return ak.from_iter([i+(100*int(block)) for i in range(0, 100)])

    # Next, create the input layer that will be used to generate the data.
    name = 'unique-name'
    dsk = dak.layers.AwkwardInputLayer(
            name=name,
            columns=None,
            inputs=['0'],
            io_func=generate_data,
            meta=metadata,
            behavior=None,
        )

    # Create the high level graph that will hold all of this, and the actual array object
    hlg = dask.highlevelgraph.HighLevelGraph.from_collections(name, dsk)
    ar = dak.core.new_array_object(hlg, name, meta=metadata, npartitions=1)

    return ar

Now we do the actual query:

In [14]:
my_x1 = generate_sx_daq('(valid qastle query)')
result5 = my_x1[my_x1 > 2]
result5.compute()

In generate_data: 0


This above line should execute something like this:

* User uses the `compute()` method
* Triggers a call to the ServcieX backend (e.g. generate data, which does something long).
* The backend then returns files, either all of them or several at a time
* the graph is properly dealt with to calculate them all.