# Using dask and awkward together

Some basic tests/examples.

Lets first try with a single data file.

## Write out the data files

In [15]:
data = {
    'x': [1, 2, 3, 4, 5],
}

import json

num_files = 100

for i in range(num_files):
    with open(f'data{i}.json', 'w') as f:
        json.dump(data, f)


## Using awkward 2.0

Well, no new features - but just awkward - to load one of the files.

In [16]:
from pathlib import Path
import awkward as ak

file0 = Path('data0.json')
x = ak.from_json(file0)
x

In [17]:
x.x

In [18]:
x.x[x.x > 2]

## With awkward dask

Look at the same thing, but with awkward dask - run the `compute`...

In [21]:
import dask_awkward as dak

x = dak.from_json("data*.json")
result = x[x.x > 2]
result

dask.awkward<getitem, npartitions=100>

Ok - note that it already knows about the number of partitions here.

In [22]:
result.compute()

Interesting - it is a list of items... I suppose that is because this isn't an array. Ahh... What if I actually access x?

In [23]:
result2 = x[x.x > 2].x
print(result2)

dask.awkward<x, npartitions=100>


In [24]:
result2.compute()

Still not concatenating them - I guess there must be a reducer that already does that... But it did all 100 files no problem.

In [25]:
len(result2)

100

Also interesting - there is no `shape`. :-) Not at all surprised after everything.

## Getting at the compute graph to see what we can do with it.

In [28]:
g = result2.__dask_graph__()

In [29]:
type(g)

dask.highlevelgraph.HighLevelGraph

In [30]:
g.keys()

dict_keys([('getitem-dbb1001309169667e4001a0f9fe7fcf3', 0), ('getitem-dbb1001309169667e4001a0f9fe7fcf3', 1), ('getitem-dbb1001309169667e4001a0f9fe7fcf3', 2), ('getitem-dbb1001309169667e4001a0f9fe7fcf3', 3), ('getitem-dbb1001309169667e4001a0f9fe7fcf3', 4), ('getitem-dbb1001309169667e4001a0f9fe7fcf3', 5), ('getitem-dbb1001309169667e4001a0f9fe7fcf3', 6), ('getitem-dbb1001309169667e4001a0f9fe7fcf3', 7), ('getitem-dbb1001309169667e4001a0f9fe7fcf3', 8), ('getitem-dbb1001309169667e4001a0f9fe7fcf3', 9), ('getitem-dbb1001309169667e4001a0f9fe7fcf3', 10), ('getitem-dbb1001309169667e4001a0f9fe7fcf3', 11), ('getitem-dbb1001309169667e4001a0f9fe7fcf3', 12), ('getitem-dbb1001309169667e4001a0f9fe7fcf3', 13), ('getitem-dbb1001309169667e4001a0f9fe7fcf3', 14), ('getitem-dbb1001309169667e4001a0f9fe7fcf3', 15), ('getitem-dbb1001309169667e4001a0f9fe7fcf3', 16), ('getitem-dbb1001309169667e4001a0f9fe7fcf3', 17), ('getitem-dbb1001309169667e4001a0f9fe7fcf3', 18), ('getitem-dbb1001309169667e4001a0f9fe7fcf3', 19),

Ok - too hard to understand. Lets do it for one file.

In [37]:
import dask_awkward as dak

x = dak.from_json("data0.json")
result3 = x[x.x > 2].x
result3

dask.awkward<x, npartitions=1>

In [38]:
g = result3.__dask_graph__()
g.keys()

dict_keys([('getitem-684436b54b5781868e72ea82a6681bae', 0), ('from-json-d1854d5be36375807dfdd90f07a6e5a4', 0), ('x-4ffe2fb34d85b676913770e7536211e8', 0), ('greater-2201a1e6411991b73a37ffc2aba2d89a', 0), ('x-a75c4079032a8a2b5827b57088198e94', 0)])

In [39]:
for k, v in g.items():
    print(k, v)
    print()

('getitem-684436b54b5781868e72ea82a6681bae', 0) (subgraph_callable-0971dd35-7207-4bd5-9159-74db18846864, ('from-json-d1854d5be36375807dfdd90f07a6e5a4', 0), ('greater-2201a1e6411991b73a37ffc2aba2d89a', 0))

('from-json-d1854d5be36375807dfdd90f07a6e5a4', 0) (subgraph_callable-c868e464-32e3-4eb5-8c05-75c74932c6fe, 'c:/Users/gordo/Code/iris-hep/awkward-20-testing/notebooks/data0.json')

('x-4ffe2fb34d85b676913770e7536211e8', 0) (subgraph_callable-b22a8241-32ad-4dd1-8485-86238864273b, ('from-json-d1854d5be36375807dfdd90f07a6e5a4', 0), 'x')

('greater-2201a1e6411991b73a37ffc2aba2d89a', 0) (subgraph_callable-347dba23-292e-4f65-9125-6c8a9c509bdd, ('x-4ffe2fb34d85b676913770e7536211e8', 0), 2)

('x-a75c4079032a8a2b5827b57088198e94', 0) (subgraph_callable-b6bf1036-d182-41d4-a743-68648d595dfa, ('getitem-684436b54b5781868e72ea82a6681bae', 0), 'x')



## Getting a bad variable

Lets see how eager this whole thing is?

In [40]:
import dask_awkward as dak

x = dak.from_json("data0.json")
result4 = x[x.x > 2].y
result4

AttributeError: y not in fields.

Ok - this is terrifically bad. Or we'd have to have everything - and how would we specify methods, etc.?