ENH: Deprecation for invalid fancy index sequences. #2825

Closed
wants to merge 3 commits into from

3 participants

@dwf

This raises a DeprecationWarning when a non-array fancy index sequence a) is not a sequence of booleans b) contains something not castable to npy_intp (i.e. a float). The way it gets there is a bit shaky, but it works. I'd like to solicit feedback on the best way to do this before merging.

What this does do, so far, is add tests, and also identify where the changes need to be made. A better implementation of _verify_index is definitely possible and probably preferable.

dwf added some commits Dec 15, 2012
@dwf dwf ENH: Deprecation for invalid fancy index sequences.
This raises a `DeprecationWarning` when a non-array fancy index
sequence a) is not a sequence of booleans b) contains something
not castable to npy_intp (i.e. a float). The way it gets there
is a bit shaky, but it works. I'd like to solicit feedback on
the best way to do this before merging.
dd82d52
@dwf dwf TST: test for new fancy index deprecations. 9b11d07
@dwf dwf ENH: Only call _verify_index if not an array.
The array case is already handled correctly via an exception.
4699a1b
@seberg
NumPy member

Just before I forget it again, maybe we could add at least a warning for boolean indexes with a size that is not compatible to the underlying array?

@dwf

@seberg Can you give an example of what you mean?

@seberg
NumPy member
>>> a = np.arange(10).reshape(5,2)
>>> a
array([[0, 1],
       [2, 3],
       [4, 5],
       [6, 7],
       [8, 9]])
>>> index = np.array([[True, False, True, False, False, False, False, False]]).T
>>> index
array([[ True],
       [False],
       [ True],
       [False],
       [False],
       [False],
       [False],
       [False]], dtype=bool)
>>> a[index]
array([0, 4])

There is no broadcasting (maybe fine, it would need to align left and not right as normal broadcasting anyway), but due to just using nonzero basically, it fills up with False (here second dimensions) and silently strips any False (along the first dimension) at the end. There is the special case of 1-d boolean index of equal size (ravelling of a), but other then that maybe IMO it would be nice to:

  1. Warn about shape mismatches
  2. Maybe deprecate shape mismatches
  3. even larger maybe: Do something like broadcasting (but not exactly the same, so that is a downer), so that it repeats the boolean array along such an axis (if it has a shape of 1 along that axis) instead of filling it with False.
@dwf

Yikes. This deserves its own ticket, I think, especially since boolean indexing is handled in different code paths to this issue.

@seberg seberg commented on the diff Dec 17, 2012
numpy/core/src/multiarray/mapping.c
+ &ndim, dims, &arr, NULL);
+ if (ret) {
+ ret = -1;
+ goto end;
+ }
+
+ ret = 0;
+ if (dtype == NULL) {
+ /*
+ * This means PyArray_GetArrayParamsFromObject went and
+ * allocated us a (potentially large) array. Isn't there
+ * a way to just get a safe dtype?
+ */
+ dtype = PyArray_DESCR(arr);
+ }
+ if ((!PyTypeNum_ISINTEGER(dtype->type_num)) &&
@seberg
NumPy member
seberg added a note Dec 17, 2012

I did not try, but empty lists are not considered floating point here? Should maybe be tested explicitly. Also did you try what happens to scalar indices (ie. 0.0 when fancy indexing occurs, I somewhat think that case was not handled by the previous patch because of different execution paths?

@dwf
dwf added a note Dec 17, 2012

Not sure what happens to scalar indices and slices, for that matter, when fancy indices are present. Good point. Will investigate, and also check the empty case.

@dwf
dwf added a note Dec 17, 2012

Empty case raises the error, you're right. Needs an explicit test (though honestly, why are you indexing to get an empty array in the first place ಠ_ಠ yes yes, I know there are some legitimate uses).

Figured out a few more places where the scalar case should have a warning added.

@seberg
NumPy member
seberg added a note Mar 2, 2013

Just to note, I don't think you need to worry about the scalar cases, if we go that direction, they should be handled already further down by gh-2891. So my guess is this needs very little changes to be ready.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
@charris
NumPy member

@dwf What is the status of this?

@charris
NumPy member

@dwf Needs a rebase.

@charris
NumPy member

@seberg How much of this is covered by your work?

@seberg
NumPy member

Only any existing scalar special handling would be handled by that. Fancy indexing is not touched by the integer stuff.

@seberg
NumPy member

Closing this, since it should eventually superseded by index rewrite.

@seberg seberg closed this Sep 27, 2013
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment