Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Regression: wrong dtype error when comparing mask to True #2808

Closed
raymondEhlers opened this issue Nov 8, 2023 · 4 comments
Closed

Regression: wrong dtype error when comparing mask to True #2808

raymondEhlers opened this issue Nov 8, 2023 · 4 comments
Labels
bug The problem described is something that must be fixed

Comments

@raymondEhlers
Copy link
Contributor

raymondEhlers commented Nov 8, 2023

Version of Awkward Array

2.4.9

Description and code to reproduce

I have some analysis code which eventually produces a mask of shape n * var * bool. As a diagnostic, I want to compare the mask to True, flatten, and count_nonzero1. With awkward 2.3.1 and numpy 1.24.4, this worked. I updated my dependencies to awkward 2.4.9 and numpy 1.26.1, and I get a type error regarding the dtype. See the reproducer below (and the relevant parquet file, renamed to txt to allow for attachment
out.parquet.txt )

Python 3.11.6 (main, Oct  2 2023, 13:45:54) [Clang 15.0.0 (clang-1500.0.40.1)]
Type 'copyright', 'credits' or 'license' for more information
IPython 8.12.2 -- An enhanced Interactive Python. Type '?' for help.

In [1]: import awkward as ak

In [2]: arr = ak.from_parquet("out.parquet")

In [3]: arr.type.show()
38 * var * bool

In [4]: arr.layout
Out[4]:
<ListOffsetArray len='38'>
    <offsets><Index dtype='int64' len='39'>
        [  0   4  11  24  28  35  46  51  55  65  66  74  81  91  95 105 118
         129 140 146 157 181 187 194 201 201 216 223 234 249 258 263 273 282
         287 304 321 328 333]
    </Index></offsets>
    <content><NumpyArray dtype='bool' len='333'>
        [ True  True  True  True  True  True  True  True  True  True  True
          True  True  True  True  True  True  True  True  True  True  True
         ...
          True  True  True  True  True  True  True  True  True  True  True
          True  True  True]
    </NumpyArray></content>
</ListOffsetArray>

In [5]: arr == True
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Cell In[5], line 1
----> 1 arr == True

File ~/software/dev/mammoth/.venv-3.11/lib/python3.11/site-packages/awkward/_operators.py:50, in _binary_method.<locals>.func(self, other)
     48 if _disables_array_ufunc(other):
     49     return NotImplemented
---> 50 return ufunc(self, other)

File ~/software/dev/mammoth/.venv-3.11/lib/python3.11/site-packages/awkward/highlevel.py:1438, in Array.__array_ufunc__(self, ufunc, method, *inputs, **kwargs)
   1436 name = f"{type(ufunc).__module__}.{ufunc.__name__}.{method!s}"
   1437 with ak._errors.OperationErrorContext(name, inputs, kwargs):
-> 1438     return ak._connect.numpy.array_ufunc(ufunc, method, inputs, kwargs)

File ~/software/dev/mammoth/.venv-3.11/lib/python3.11/site-packages/awkward/_connect/numpy.py:449, in array_ufunc(ufunc, method, inputs, kwargs)
    441         raise TypeError(
    442             "no {}.{} overloads for custom types: {}".format(
    443                 type(ufunc).__module__, ufunc.__name__, ", ".join(error_message)
    444             )
    445         )
    447     return None
--> 449 out = ak._broadcasting.broadcast_and_apply(
    450     inputs, action, allow_records=False, function_name=ufunc.__name__
    451 )
    453 if len(out) == 1:
    454     return wrap_layout(out[0], behavior)

File ~/software/dev/mammoth/.venv-3.11/lib/python3.11/site-packages/awkward/_broadcasting.py:1026, in broadcast_and_apply(inputs, action, depth_context, lateral_context, allow_records, left_broadcast, right_broadcast, numpy_to_regular, regular_to_jagged, function_name, broadcast_parameters_rule)
   1024 backend = backend_of(*inputs, coerce_to_common=False)
   1025 isscalar = []
-> 1026 out = apply_step(
   1027     backend,
   1028     broadcast_pack(inputs, isscalar),
   1029     action,
   1030     0,
   1031     depth_context,
   1032     lateral_context,
   1033     {
   1034         "allow_records": allow_records,
   1035         "left_broadcast": left_broadcast,
   1036         "right_broadcast": right_broadcast,
   1037         "numpy_to_regular": numpy_to_regular,
   1038         "regular_to_jagged": regular_to_jagged,
   1039         "function_name": function_name,
   1040         "broadcast_parameters_rule": broadcast_parameters_rule,
   1041     },
   1042 )
   1043 assert isinstance(out, tuple)
   1044 return tuple(broadcast_unpack(x, isscalar) for x in out)

File ~/software/dev/mammoth/.venv-3.11/lib/python3.11/site-packages/awkward/_broadcasting.py:1004, in apply_step(backend, inputs, action, depth, depth_context, lateral_context, options)
   1002     return result
   1003 elif result is None:
-> 1004     return continuation()
   1005 else:
   1006     raise AssertionError(result)

File ~/software/dev/mammoth/.venv-3.11/lib/python3.11/site-packages/awkward/_broadcasting.py:973, in apply_step.<locals>.continuation()
    971 # Any non-string list-types?
    972 elif any(x.is_list and not is_string_like(x) for x in contents):
--> 973     return broadcast_any_list()
    975 # Any RecordArrays?
    976 elif any(x.is_record for x in contents):

File ~/software/dev/mammoth/.venv-3.11/lib/python3.11/site-packages/awkward/_broadcasting.py:629, in apply_step.<locals>.broadcast_any_list()
    626         nextinputs.append(x)
    627         nextparameters.append(NO_PARAMETERS)
--> 629 outcontent = apply_step(
    630     backend,
    631     nextinputs,
    632     action,
    633     depth + 1,
    634     copy.copy(depth_context),
    635     lateral_context,
    636     options,
    637 )
    638 assert isinstance(outcontent, tuple)
    639 parameters = parameters_factory(nextparameters, len(outcontent))

File ~/software/dev/mammoth/.venv-3.11/lib/python3.11/site-packages/awkward/_broadcasting.py:1004, in apply_step(backend, inputs, action, depth, depth_context, lateral_context, options)
   1002     return result
   1003 elif result is None:
-> 1004     return continuation()
   1005 else:
   1006     raise AssertionError(result)

File ~/software/dev/mammoth/.venv-3.11/lib/python3.11/site-packages/awkward/_broadcasting.py:973, in apply_step.<locals>.continuation()
    971 # Any non-string list-types?
    972 elif any(x.is_list and not is_string_like(x) for x in contents):
--> 973     return broadcast_any_list()
    975 # Any RecordArrays?
    976 elif any(x.is_record for x in contents):

File ~/software/dev/mammoth/.venv-3.11/lib/python3.11/site-packages/awkward/_broadcasting.py:693, in apply_step.<locals>.broadcast_any_list()
    690         nextinputs.append(x)
    691         nextparameters.append(NO_PARAMETERS)
--> 693 outcontent = apply_step(
    694     backend,
    695     nextinputs,
    696     action,
    697     depth + 1,
    698     copy.copy(depth_context),
    699     lateral_context,
    700     options,
    701 )
    702 assert isinstance(outcontent, tuple)
    703 parameters = parameters_factory(nextparameters, len(outcontent))

File ~/software/dev/mammoth/.venv-3.11/lib/python3.11/site-packages/awkward/_broadcasting.py:986, in apply_step(backend, inputs, action, depth, depth_context, lateral_context, options)
    979     else:
    980         raise ValueError(
    981             "cannot broadcast: {}{}".format(
    982                 ", ".join(repr(type(x)) for x in inputs), in_function(options)
    983             )
    984         )
--> 986 result = action(
    987     inputs,
    988     depth=depth,
    989     depth_context=depth_context,
    990     lateral_context=lateral_context,
    991     continuation=continuation,
    992     backend=backend,
    993     options=options,
    994 )
    996 if isinstance(result, tuple) and all(isinstance(x, Content) for x in result):
    997     if any(content.backend is not backend for content in result):

File ~/software/dev/mammoth/.venv-3.11/lib/python3.11/site-packages/awkward/_connect/numpy.py:415, in array_ufunc.<locals>.action(inputs, **ignore)
    410 parameters = functools.reduce(
    411     parameters_intersect, (c._parameters for c in contents)
    412 )
    414 input_args = [x.data if isinstance(x, NumpyArray) else x for x in inputs]
--> 415 result = backend.nplike.apply_ufunc(ufunc, method, input_args, kwargs)
    417 if isinstance(result, tuple):
    418     return tuple(
    419         NumpyArray(x, backend=backend, parameters=parameters)
    420         for x in result
    421     )

File ~/software/dev/mammoth/.venv-3.11/lib/python3.11/site-packages/awkward/_nplikes/array_module.py:206, in ArrayModuleNumpyLike.apply_ufunc(self, ufunc, method, args, kwargs)
    203     raise NotImplementedError
    205 if hasattr(ufunc, "resolve_dtypes"):
--> 206     return self._apply_ufunc_nep_50(ufunc, method, args, kwargs)
    207 else:
    208     return self._apply_ufunc_legacy(ufunc, method, args, kwargs)

File ~/software/dev/mammoth/.venv-3.11/lib/python3.11/site-packages/awkward/_nplikes/array_module.py:222, in ArrayModuleNumpyLike._apply_ufunc_nep_50(self, ufunc, method, args, kwargs)
    220 # Resolve these for the given ufunc
    221 arg_dtypes = tuple(input_arg_dtypes + [None] * ufunc.nout)
--> 222 resolved_dtypes = ufunc.resolve_dtypes(arg_dtypes)
    223 # Interpret the arguments under these dtypes, converting scalars to length-1 arrays
    224 resolved_args = [
    225     cast("ArrayLikeT", self.asarray(arg, dtype=dtype))
    226     for arg, dtype in zip(args, resolved_dtypes)
    227 ]

TypeError: Provided dtype must be a valid NumPy dtype, int, float, complex, or None.

In [7]: ak.__version__
Out[7]: '2.4.9'

In [8]: import numpy as np

In [9]: np.__version__
Out[9]: '1.26.1'

In [10]:

I'm a little rusty with this codebase at the moment , but I'm fairly confident this is a regression within awkward. Thanks in advance!

edit: Just for completeness, I'm using the obvious workaround of np.count_nonzero(np.asarray(ak.flatten(arr == True, axis=None))) -> np.count_nonzero(np.asarray(ak.flatten(arr, axis=None)) == True)

Footnotes

  1. I know I could just flatten and count_nonzero in this case, but sometimes it was conceptually cleaner to make it explicit - sometimes the check is compared to False. In any case, this comparison should work, and it worked previously.

@raymondEhlers raymondEhlers added the bug (unverified) The problem described would be a bug, but needs to be triaged label Nov 8, 2023
@agoose77 agoose77 added bug The problem described is something that must be fixed and removed bug (unverified) The problem described would be a bug, but needs to be triaged labels Nov 8, 2023
@agoose77
Copy link
Collaborator

agoose77 commented Nov 8, 2023

Thanks for the bug report! This is indeed a bug in Awkward. Let me take a look.

@agoose77
Copy link
Collaborator

agoose77 commented Nov 9, 2023

Closed by #2810

@agoose77 agoose77 closed this as completed Nov 9, 2023
@raymondEhlers
Copy link
Contributor Author

Thank you for such a quick fix!

@agoose77
Copy link
Collaborator

agoose77 commented Nov 9, 2023

Also, note that you can invoke numpy reducers directly on an awkward array, e.g. count_nonzero

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug The problem described is something that must be fixed
Projects
None yet
Development

No branches or pull requests

2 participants