Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: support typetracer in to_packed for IndexedOptionArray #2912

Merged
merged 3 commits into from
Jan 2, 2024

Conversation

agoose77
Copy link
Collaborator

@agoose77 agoose77 commented Dec 21, 2023

Fixes #2910 (@lgray)

  • Add test

Copy link

codecov bot commented Dec 21, 2023

Codecov Report

All modified and coverable lines are covered by tests ✅

Comparison is base (0a8e9c8) 81.98% compared to head (93b69b3) 81.98%.

Additional details and impacted files
Files Coverage Δ
src/awkward/contents/indexedoptionarray.py 88.15% <100.00%> (ø)

@lgray
Copy link
Contributor

lgray commented Dec 21, 2023

Checked in Connor's original reproducer - it is fixed with this patch and touching looks ok too.

@lgray
Copy link
Contributor

lgray commented Dec 21, 2023

Ack, I take that latter statement back. I hit a placeholder array now when I try to compute the following, the original repro wasn't doing the compute. Glad I checked it:

from coffea.nanoevents import NanoEventsFactory, PFNanoAODSchema
from distributed import Client
#import dask                                                                                                                                                                                                                                       
import dask_awkward as dak

import awkward as ak

import pyinstrument
import time

if __name__ == "__main__":
    client = Client()

    PFNanoAODSchema.warn_missing_crossrefs = False

    events = NanoEventsFactory.from_root(
        {"./nano_mc2017_1.root": {"object_path": "Events", "steps": [[0,40],[40, 80]]}},
        schemaclass=PFNanoAODSchema,
    ).events()

    fatjet = events.FatJet
    pf = ak.flatten(fatjet.constituents.pf, axis=2)

    unflat_pf = ak.unflatten(pf, counts=ak.flatten(ak.num(fatjet.constituents.pf, axis=2)), axis=1)

    print(dak.necessary_columns(unflat_pf.pt))

    print(unflat_pf.pt.compute())

results in:

(coffea-dev) lgray@Lindseys-MacBook-Pro coffea % python connor_repro.py   
/Users/lgray/miniforge3/envs/coffea-dev/lib/python3.8/site-packages/dask_awkward/lib/structure.py:895: UserWarning: Please ensure that dask.awkward<flatten, npartitions=2>
        is partitionwise-compatible with dask.awkward<flatten, npartitions=2>
        (e.g. counts comes from a dak.num(array, axis=1)),
        otherwise this unflatten operation will fail when computed!
  warnings.warn(
{'from-uproot-24ce6ace95a270ade088eba20e030c80': frozenset({'nPFCands', 'FatJet_nConstituents', 'nFatJetPFCands', 'PFCands_pt', 'FatJetPFCands_pFCandsIdx', 'nFatJet'})}
2023-12-21 08:29:18,789 - distributed.worker - WARNING - Compute Failed
Key:       ('pt-fb9d8e1608f0364952db5bf8768b4897', 1)
Function:  subgraph_callable-68716d38-5732-42a0-ab99-8fa39786
args:      ('PFCands', 'pFCandsIdxG', 'FatJetPFCands', 'FatJet', ('./nano_mc2017_1.root', 'Events', 40, 80, True))
kwargs:    {}
Exception: 'TypeError("PlaceholderArray supports only trivial slices, not ndarray\\n\\nThis error occurred while calling\\n\\n    ak.unflatten(\\n        repr-raised-TypeError\\n        <Array [30, 29, 40, 27, 71, ..., 50, 23, 4, 61, 38] type=\'68 * int64\'>\\n        axis = 1\\n        behavior = None\\n    )")'

Traceback (most recent call last):
  File "/Users/lgray/coffea-dev/awkward/src/awkward/_dispatch.py", line 62, in dispatch
    next(gen_or_result)
  File "/Users/lgray/coffea-dev/awkward/src/awkward/operations/ak_unflatten.py", line 90, in unflatten
    return _impl(array, counts, axis, highlevel, behavior, attrs)
  File "/Users/lgray/coffea-dev/awkward/src/awkward/operations/ak_unflatten.py", line 98, in _impl
    ctx.unwrap(array, allow_record=False, primitive_policy="error").to_packed(),
  File "/Users/lgray/coffea-dev/awkward/src/awkward/contents/listoffsetarray.py", line 2219, in to_packed
    next_content = next._content[: next._offsets[-1]].to_packed()
  File "/Users/lgray/coffea-dev/awkward/src/awkward/contents/indexedoptionarray.py", line 1744, in to_packed
    self.project().to_packed(),
  File "/Users/lgray/coffea-dev/awkward/src/awkward/contents/indexedoptionarray.py", line 583, in project
    return self._content._carry(nextcarry, False)
  File "/Users/lgray/coffea-dev/awkward/src/awkward/contents/recordarray.py", line 532, in _carry
    contents = [
  File "/Users/lgray/coffea-dev/awkward/src/awkward/contents/recordarray.py", line 533, in <listcomp>
    self.content(i)._carry(carry, allow_lazy)
  File "/Users/lgray/coffea-dev/awkward/src/awkward/contents/numpyarray.py", line 347, in _carry
    nextdata = self._data[carry.data]
  File "/Users/lgray/coffea-dev/awkward/src/awkward/_nplikes/placeholder.py", line 97, in __getitem__
    raise TypeError(
TypeError: PlaceholderArray supports only trivial slices, not ndarray

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "connor_repro.py", line 28, in <module>
    print(unflat_pf.pt.compute())
  File "/Users/lgray/miniforge3/envs/coffea-dev/lib/python3.8/site-packages/dask/base.py", line 314, in compute
    (result,) = compute(self, traverse=False, **kwargs)
  File "/Users/lgray/miniforge3/envs/coffea-dev/lib/python3.8/site-packages/dask/base.py", line 599, in compute
    results = schedule(dsk, keys, **kwargs)
  File "/Users/lgray/miniforge3/envs/coffea-dev/lib/python3.8/site-packages/distributed/client.py", line 3226, in get
    results = self.gather(packed, asynchronous=asynchronous, direct=direct)
  File "/Users/lgray/miniforge3/envs/coffea-dev/lib/python3.8/site-packages/distributed/client.py", line 2361, in gather
2023-12-21 08:29:18,794 - distributed.worker - WARNING - Compute Failed
Key:       ('pt-fb9d8e1608f0364952db5bf8768b4897', 0)
Function:  subgraph_callable-68716d38-5732-42a0-ab99-8fa39786
args:      ('PFCands', 'pFCandsIdxG', 'FatJetPFCands', 'FatJet', ('./nano_mc2017_1.root', 'Events', 0, 40, True))
kwargs:    {}
Exception: 'TypeError("PlaceholderArray supports only trivial slices, not ndarray\\n\\nThis error occurred while calling\\n\\n    ak.unflatten(\\n        repr-raised-TypeError\\n        <Array [51, 50, 63, 34, 22, ..., 46, 53, 41, 67, 38] type=\'65 * int64\'>\\n        axis = 1\\n        behavior = None\\n    )")'

    return self.sync(
  File "/Users/lgray/miniforge3/envs/coffea-dev/lib/python3.8/site-packages/distributed/utils.py", line 351, in sync
    return sync(
  File "/Users/lgray/miniforge3/envs/coffea-dev/lib/python3.8/site-packages/distributed/utils.py", line 418, in sync
    raise exc.with_traceback(tb)
  File "/Users/lgray/miniforge3/envs/coffea-dev/lib/python3.8/site-packages/distributed/utils.py", line 391, in f
    result = yield future
  File "/Users/lgray/miniforge3/envs/coffea-dev/lib/python3.8/site-packages/tornado/gen.py", line 769, in run
    value = future.result()
  File "/Users/lgray/miniforge3/envs/coffea-dev/lib/python3.8/site-packages/distributed/client.py", line 2224, in _gather
    raise exception.with_traceback(traceback)
  File "/Users/lgray/miniforge3/envs/coffea-dev/lib/python3.8/site-packages/dask/optimization.py", line 990, in __call__
    return core.get(self.dsk, self.outkey, dict(zip(self.inkeys, args)))
  File "/Users/lgray/miniforge3/envs/coffea-dev/lib/python3.8/site-packages/dask/core.py", line 149, in get
    result = _execute_task(task, cache)
  File "/Users/lgray/miniforge3/envs/coffea-dev/lib/python3.8/site-packages/dask/core.py", line 119, in _execute_task
    return func(*(_execute_task(a, cache) for a in args))
  File "/Users/lgray/miniforge3/envs/coffea-dev/lib/python3.8/site-packages/dask/core.py", line 119, in <genexpr>
    return func(*(_execute_task(a, cache) for a in args))
  File "/Users/lgray/miniforge3/envs/coffea-dev/lib/python3.8/site-packages/dask/core.py", line 119, in _execute_task
    return func(*(_execute_task(a, cache) for a in args))
  File "/Users/lgray/miniforge3/envs/coffea-dev/lib/python3.8/site-packages/dask_awkward/lib/core.py", line 1869, in __call__
    return self.fn(*args, **kwargs)
  File "/Users/lgray/coffea-dev/awkward/src/awkward/_dispatch.py", line 70, in dispatch
    return gen_or_result
  File "/Users/lgray/coffea-dev/awkward/src/awkward/_errors.py", line 85, in __exit__
    self.handle_exception(exception_type, exception_value)
  File "/Users/lgray/coffea-dev/awkward/src/awkward/_errors.py", line 95, in handle_exception
    raise self.decorate_exception(cls, exception)
TypeError: PlaceholderArray supports only trivial slices, not ndarray

This error occurred while calling

    ak.unflatten(
        repr-raised-TypeError
        <Array [30, 29, 40, 27, 71, ..., 50, 23, 4, 61, 38] type='68 * int64'>
        axis = 1
        behavior = None
    )

@lgray
Copy link
Contributor

lgray commented Dec 21, 2023

The necessary_columns report looks correct to me, so probably something in the rehydration machinery in awkward?

@lgray
Copy link
Contributor

lgray commented Dec 27, 2023

Almost there, it presently overtouches, from the same reproducer:

(coffea-dev) lgray@Lindseys-MacBook-Pro coffea % python connor_repro.py
/Users/lgray/miniforge3/envs/coffea-dev/lib/python3.8/site-packages/dask_awkward/lib/structure.py:895: UserWarning: Please ensure that dask.awkward<flatten, npartitions=2>
        is partitionwise-compatible with dask.awkward<flatten, npartitions=2>
        (e.g. counts comes from a dak.num(array, axis=1)),
        otherwise this unflatten operation will fail when computed!
  warnings.warn(
{'from-uproot-f43b2959a838308afd95be4cbeebc4fb': frozenset({'PFCands_d0', 'PFCands_lostInnerHits', 'PFCands_pdgId', 'PFCands_trkChi2', 'nFatJet', 'PFCands_dz', 'PFCands_pt', 'PFCands_vtxChi2', 'FatJetPFCands_pFCandsIdx', 'PFCands_mass', 'PFCands_eta', 'nFatJetPFCands', 'nPFCands', 'PFCands_charge', 'PFCands_trkQuality', 'PFCands_d0Err', 'PFCands_dzErr', 'PFCands_puppiWeightNoLep', 'PFCands_puppiWeight', 'PFCands_phi', 'FatJet_nConstituents', 'PFCands_pvAssocQuality'})}
[[[19.9, 4.32, 5.29, 6.13, 5.61, ..., 1.33, 0.67, 0.629, 0.668, 0.459]], ...]

should be only:

frozenset({'nPFCands', 'FatJet_nConstituents', 'nFatJetPFCands', 'PFCands_pt', 'FatJetPFCands_pFCandsIdx', 'nFatJet'})

@lgray
Copy link
Contributor

lgray commented Dec 27, 2023

As discussed via DM - this one is fine to go in as is, the resulting overtouching will be dealt with separately.

@agoose77 agoose77 merged commit d466d70 into main Jan 2, 2024
38 checks passed
@agoose77 agoose77 deleted the agoose77/fix-latest-dask-awkward-bugs branch January 2, 2024 11:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

ak.unflatten on a nested IndexedOptionArray typetracer hits a bug trying to calculate lengths
2 participants