Deprecate non integer function arguments #2891

Closed
wants to merge 6 commits into
from

Conversation

Projects
None yet
5 participants
Member

seberg commented Jan 7, 2013

These changes make numpy generally use the index machinery for converting
of integers (almost) everywhere, and deprecates the use of floating point
numbers.

This has many changes, so here is an overview:

  1. Deprecate use of booleans, floats or high dimensional arrays
    integers, where the high dimensional array part is due to
  2. deprecating array.index() if array.ndim != 0
  3. No attempt is made at giving a function specific meaningful
    DeprecationWarning. IE. indexing or np.reshape all give the same
    warnings. For the case of raised DeprecationWarnings, no attempt
    is made to show it. Indexing will raise an Index error instead
    (so you get the error you would get when the deprecation process is
    done in these cases)
  4. Because it is changed so deep down (and hard to pull apart in indexing)
    espeacially there it is possible to get multiple warnings for one
    command. arr[0.0,:] (for a 2-d array) will even give two warnings.
  5. I tried around a bit what is fastest, the cases where PyIntAsInt*
    were not used may have been a bit faster. Using index is actually
    much faster for arrays and array scalars.
  6. This fixes some occurances where integer division was not yet used
    and errors were created in Py3k

The biggest discussion maybe should be about point 3. Should we attempt to control the number of warnings and try to raise the original DeprecationWarning instead of basically giving the error that would occur after the deprecation is finished? The other thing would be maybe point 2, but honestly this is such an esoteric thing that I cannot believe anyone relies on it in itself, and other then that it leads to deprecating the use array([4]) for an integer in indexing and with this inside numpy.

Member

seberg commented Jan 8, 2013

OK, I think this is actually now far enough that it can be looked at closer. I personally wonder a bit about __int__ and higher dimensional arrays, but other then points 2-4 above may need a bit of discussion. The macros switching between npy_long/npy_longlong need some double checking...

Massive amount of tests and documentation is of course still needed, if this looks OK for starters.

numpy/core/tests/test_deprecations.py
import warnings
from nose.plugins.skip import SkipTest
import numpy as np
from numpy.testing import dec, run_module_suite, assert_raises
+from numpy.testing.utils import WarningManager
@charris

charris Jan 9, 2013

Owner

Still need to import run_module_suite.

numpy/core/tests/test_deprecations.py
@@ -72,29 +129,20 @@ class TestFloatScalarIndexDeprecation(object):
I interpret this to mean 2 years after the 1.8 release.
"""
+ message = "using a non-integer number instead of an integer will result in an error in the future"
@charris

charris Jan 9, 2013

Owner

Line is too long.

numpy/core/tests/test_deprecations.py
@@ -145,20 +196,12 @@ class TestFloatSliceParameterDeprecation(object):
I interpret this to mean 2 years after the 1.8 release.
"""
+ message = "using a non-integer number instead of an integer will result in an error in the future"
@charris

charris Jan 9, 2013

Owner

Another long line. Not sure of the best way to specify these long strings, could maybe shorten this one by dropping "instead of an integer" and replace "result in" by "raise".

Owner

charris commented Jan 9, 2013

Poor old mapping.c, it is going to get harder to merge that previous PR. How much does this do in addition to DWF's PR merged a while back?

Owner

charris commented Jan 9, 2013

I think you could pull out the refactoring of test_deprecations.py as a separate PR.

Member

seberg commented Jan 9, 2013

@charis the changes here basically undo DWFs changes :(, because it already uses PyArray_PyIntAsIntp everywhere... which causes point 3/4 noted above, the warning for indexing is not specific for indexing any more, and when raising an error you actually don't get a DeprecationWarning but the replaced error which also means that test_deprecations.py will cause failures if not changed at the same time. But maybe I could change that, make a deprecation flag version, and then keep DWF's more specific solution for that.

There is one more point I am still a little unsure about. PyArg_Parse* does not use indexing machinery (numpy uses it in some places too) but only checks that it is not a float. But that allows things like np.array(3.), though np.float64(3) is a float so it is not a big deal (cython is also as forgiving as numpy was for these things). Personally I somewhat think PyArg_Parse* should maybe use indexing, but they may have a good reason for not changing it. Though probably it doesn't make sense to worry too much about this stuff, heh...

Contributor

dwf commented Jan 9, 2013

Point 3 & 4 above is basically that there are multiple errors getting
raised at once, so it's hard to test the DeprecationWarning? As I recall,
@charris, you seemed to think that was a pretty fine point and removed some
tests of that flavour. Is it the case that there's no way to trigger the
thing you want to test without triggering another error that overwrites it?

On Wed, Jan 9, 2013 at 5:28 AM, seberg notifications@github.com wrote:

@charis https://github.com/charis the changes here basically undo DWFs
changes :(, because it already uses PyArray_PyIntAsIntp everywhere...
which causes point 3/4 noted above, the warning for indexing is not
specific for indexing any more, and when raising an error you actually
don't get a DeprecationWarning but the replaced error which also means that
test_deprecations.py will cause failures if not changed at the same time.
But maybe I could change that, make a deprecation flag version, and then
keep DWF's more specific solution for that.

There is one more point I am still a little unsure about. PyArg_Parse_does
*not_ use indexing machinery (numpy uses it in some places too) but only
checks that it is not a float. But that allows things like np.array(3.),
though np.float64(3) is a float so it is not a big deal (cython is also
as forgiving as numpy was for these things). Personally I somewhat think
PyArg_Parse* should maybe use indexing, but they may have a good reason
for not changing it. Though probably it doesn't make sense to worry too
much about this stuff, heh...


Reply to this email directly or view it on GitHubhttps://github.com/numpy/numpy/pull/2891#issuecomment-12038706.

Member

seberg commented Jan 9, 2013

@dwf you can use the WarnigManager (same as warnings.catch_warnings context manager), to test the warnings themself when they are not raised. I am not sure what you mean with the removed tests, but I did not follow that at the end.

Owner

rgommers commented Jan 9, 2013

Since we dropped 2.4/2.5 compatibility, just use catch_warnings.

+ /* Be a bit stricter and not allow bools */
+ if (PyBool_Check(o)) {
+ if (DEPRECATE("using a non-integer number instead of an integer"
+ " will result in an error in the future") < 0) {
@njsmith

njsmith Jan 9, 2013

Owner

"non-integer number" is a strange way of spelling "bool".

Owner

njsmith commented Jan 9, 2013

So am I understanding correctly:

  • PyArray_IntpConverter: this is the function for converting and normalizing axis= arguments (except for the ones that allow multiple axes to be specified together)
  • PyArray_PyIntAsInt: this is a broken function that no-one should use (but that is exported in the API)
  • PyArray_PyIntAsIntp: this is the function for converting Python integer indices into C indices (but not other sorts of integers, just indices), and uses the rule: an index is valid IF Python considers it an index (via __index__), UNLESS it is bool

?

(These are terribly misleading names if so. They seem to imply that the contract they implement is about ints, when in fact its about specific sorts of indices. I know we can't directly change the exported names, but can we at least run a search-replace over the source to use better names internally? And deprecate PyIntAsInt entirely? AND WHY DOES IntpConverter RETURN AN int?!?! Sorry, sorry, deep breath.)

Member

seberg commented Jan 9, 2013

Well, maybe this PR goes a little far. But this is not confined to indexing. As most functions after this would indirectly point at PyArray_PyIntAsIntp this also deprecates it the same for shapes, strides and axes (I had a quick search and cannot find any function using it for something else). Though some lone functions use the python PyArg_Parse* that converts with a "not float" rule (and thus allows float arrays as they are not python floats), but that could be changed. I personally think that the index machinery is good for these cases as well (and somewhat disagree with python). As for the boolean check, I can see that removed and maybe check it more specifically for indexing, since python does that also you can safely cast bools to int. Though in indexing boolean arrays are so special that treating bools like ints does not feel quite right.

Your point about IntpConverter using ints (the return is fine, its just a flag, but then the sequence function is the same) should not be a problem, since it is (in numpy) only used for axes/shape, etc. and numpy does not support that many axes, even if I actually doubt anything bad happens, but should double check. IntpConverter is the function for any number of axes, there is a second one for a single axis argument, yes.

I emptied PyArray_PyIntAsInt because I disliked the code duplication... outside of conversion_utils there are two places where it gets called, and both have to do with offsets into data types (it does seem like numpy does not support dtypes larger then int, though?), and I pointed the things like Axis conversion to it, since there it certainly is OK.

Owner

njsmith commented Jan 9, 2013

Sorry, I'm not saying it goes too far, just trying to make sure I understand what it does and the reasons. I agree that shapes, strides, axis, and integer indices should all use the same conversion rules (after all, they are all, at some level, indices!).

Regarding C types for reference:

  • intp and int are often different (all modern 64-bit machines have 32-bit int, but on such machines intp is 64-bits by definition). On OS X and Linux, long and intp are always the same (either 32 or 64 bit), but even this breaks down on Win64 (32 bit long, 64 bit intp). So we have always have to be careful about which ones we mean.
  • shape, stride, and array indices are by definition intp's (and have to be, to support >4GiB arrays on 64-bit architectures).
  • axis numbers don't really have any specific type, since they're not stored inside our C types, and they're limited to the range [0, 32] anyway, so it doesn't much matter what gets used.

Looking more closely at IntpConverter, I see I misread -- I guess what it does is take this PyArray_Dims struct (which is an ad hoc length-recorded vector-of-intp's?), and fill it in? It should still probably be called AxesConverter or something then?

A quick grep suggests that it's used for setting strides in at least one place, though. So using PyIntFromInt is definitely a bug, strides can be >4GiB. And maybe a better name would be IntpSetConverter or something. (Or I guess DimsConverter, except the PyArray_Dims name is also confusing in that it doesn't hold a set of dimensionalities or anything like that. I guess Dim is supposed to be a synonym for axis? Synonyms are bad style...)

I'm happy to see code duplication removed, but PyIntAsInt literally has no useful uses, right? Despite the name it's only designed for converting indices (broadly defined), not arbitrary Python integers, but our indices either must be intp's (shape, strides, index), or else it doesn't matter but we always use intp's by convention (axis, via PyArray_Dims). So it's just a land mine we've helpfully left for ourselves to trip over later.

Member

seberg commented Jan 10, 2013

I don't know the intent for the functions, but in the context of numpy (almost?) all integers are indexing like or values, and values are never converted like this as they could be any type (though personally I would not even mind if these stricter rule slipped into something else).
I agree PyIntAsInt is a bit of a mine, but it seems one in the API as well? As for the uses of PyIntAsInt in descriptor.c, these seem to be strides inside a datatype, and the itemsize is int and not intp? So while it looks dangerous it may actually be ok here, but yes the correct use of PyIntAsInt inside numpy is certainly extremely limited.

Member

seberg commented Jan 17, 2013

Hmmm, there is one usability issue here maybe... Everyone will have to write 10**2 instead of 1e2 when creating new arrays (empty, random ones, etc.). I don't think its a huge deal as such, but I would expect people being annoyed by it, especially if they don't realize that 10**2 is an alternative.

Owner

njsmith commented Jan 17, 2013

Well, and it might break old code in somewhat silly ways. One option would
be to allow floats which are exactly integers as indices. But I don't
know... That then creates the situation where if you do a calculation that
mathematically should produce an exact integer, then in floating point out
may or may not. And that means you can have code that works perfectly on
simple test cases, but errors out sporadically on real data, which is
pretty horrible.

We should put it out on the list, I guess, but I'm inclined to say that the
current patch's behaviour is correct, especially since for now it's just
creating deprecations and there's time to change our minds if it causes
real problems. The patch should definitely add a very explicit note to the
release notes pointing this effect put though.
On 17 Jan 2013 08:43, "seberg" notifications@github.com wrote:

Hmmm, there is one usability issue here maybe... Everyone will have to
write 102 instead of 1e2 when creating new arrays (empty, random ones,
etc.). I don't think its a huge deal as such, but I would expect people
being annoyed by it, especially if they don't realize that 10
2 is an
alternative.


Reply to this email directly or view it on GitHubhttps://github.com/numpy/numpy/pull/2891#issuecomment-12376404.

seberg added some commits Apr 13, 2013

MAINT: adept divisions for truedivide
Following deprecations would cause problems otherwise.
API: Deprecating the use of non-integers for indices arguments
This changes the conversion utils to give deprecations for a all
non-integers (this currently includes python bools). The biggest
change is PyArray_PyIntAsIntp in which the deprecation is done.
Some other conversions are then also pointed to it.

Uses the Index machinery even for numpy types, which is faster
then the current code.
API: Deprecate __index__ for ndim > 0
For example NumPy indexing treats np.ones(()) very differently from
np.ones((1,)). It seems a bad idea to allow __index__ for arrays that
are not 0-d, as they cannot always be safely interpreted as integers.
MAINT: Remove non-integer deprecations which are no in PyIntAsIntp
Also removed old (commented) macro, the use of PyNumber_Index is
now actually implemented in PyIntAsIntp and not necessary here.
Member

seberg commented Apr 13, 2013

Created a new branch for rebasing, so this is now in gh-3243.

@seberg seberg closed this Apr 13, 2013

@seberg seberg deleted the seberg:deprecate-non-integer-arguments branch Apr 13, 2013

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment