# Using Aggregate

Build and examine a graph that contains some sort of aggregate operation.

## Base Builder

Starting from previous code, build our own custom dict thingy.

In [1]:
from typing import List
import awkward as ak
import dask_awkward as dak
import random
import dask

def make_input_layer(name, inputs: List[str]):
    # Each file that returns will be simulated by a block of 100 numbers.
    # Call with a string that is the block number.
    def generate_data(block):
        print(f'In generate_data: {block}')
        return ak.from_iter([random.uniform(0, 10) for i in range(0, 100)])

    # Build the sample array up    
    # TODO: We have to go into core here - does this mean `typetracer_array` is not a good thing to access? If not, how should we do this?
    sample_array = ak.from_iter([1, 2, 3, 4, 5])
    metadata = dak.core.typetracer_array(sample_array)

    # Next, create the input layer that will be used to generate the data.
    dsk = dak.layers.AwkwardInputLayer(
            name=name,
            columns=None,
            inputs=inputs,
            io_func=generate_data,
            meta=metadata,
            behavior=None,
        )

    return dsk

def generate_sx_daq(query: str, inputs: List[str] = ['0', '1']) -> dak.Array:
    name = 'unique-name'
    input_layer = make_input_layer(name, inputs)

    # Create the high level graph that will hold all of this, and the actual array object
    hlg = dask.highlevelgraph.HighLevelGraph.from_collections(name, input_layer)
    ar = dak.core.new_array_object(hlg, name, meta=input_layer._meta, npartitions=len(inputs))

    return ar

# Doing a length or count operation

Lets build the length/count operation - see that it works - and then look at the high level layer layout.

In [2]:
import dask_histogram as dh
import mplhep as hep

x = generate_sx_daq("(query)")
h = dh.factory(x, axes=(dh.axis.Regular(20, 0, 10),))
r = h.compute()
_ = hep.histplot(r)

Lets look at the dask compute DAG

In [None]:
h.dask

Separate layers for each - can we go in after the first layer and "alter" it?

In [None]:
dsk = h.dask
print(type(dsk))

In [None]:
help(dsk)

In [None]:
dsk.layers

In [None]:
dsk.dependencies

In [None]:
my_input = dsk.layers['unique-name']
my_input

In [None]:
help(my_input)

In [None]:
print(f'dims: {my_input.dims}')
print(f'items: {my_input.items}')
print(f'keys: {list(my_input.keys())}')

This is probably obvious to most people - but it looks like it isn't the `HighLevelLayer` we want to alter, but rather the `AwkwardInputLayer` that we want to mess with. Though from above it looks like both the awkward array object and the input layer both independently know about the number of partitions (which is a little weird).

So - lets hit this with a hammer - just create a new awkward layer that has a different number of inputs and see what happens.

In [None]:
new_input = make_input_layer('unique-name', inputs=['0', '1', '2', '3'])
print(f'keys: {list(new_input.keys())}')

Moment of truth...

In [None]:
dsk.layers['unique-name'] = new_input
dsk

And we can see the layer2 there still has only 2 outputs, but I suspect it should have 4. Lets compare:

In [None]:
x_expected = generate_sx_daq("(query)", inputs=['0', '1', '2', '3'])
h_expected = dh.factory(x_expected, axes=(dh.axis.Regular(20, 0, 10),))
h_expected.dask

Ok - so it has 4 here... So we still need to modify the second layer somehow. This means being a little more invasive. We have to get into that second layer and somehow fix that up too. So, lets see what it looks like...

In [None]:
hist_on_block_key = [k for k in dsk.layers.keys() if k.startswith('hist-on-block')][0]
block_layer = dsk.layers[hist_on_block_key]
block_layer

Adjust the number of inputs...

In [None]:
print(block_layer.numblocks)
block_layer.numblocks = (4,)
print(block_layer.numblocks)
block_layer

In [None]:
[i for i in block_layer.items()]

So the number of outputs is still wrong. Need to get that fixed...

In [None]:
block_layer._dims = {'.0': 4}
block_layer

Ok - the layer looks good - how does the rest of the graph look?

In [None]:
dsk

Lets see what happens when it runs. We've been modifying this dsak thing in place... so we should just be able to execute it...

In [None]:
h.compute()