-
-
Notifications
You must be signed in to change notification settings - Fork 9.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH: Optimize np.empty
for scalar arguments
#20175
Conversation
codecoverage seems to think this needs some tests for the error cases. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The approach looks good, I made a few comments about smaller refactoring and as matti said, the test coverage must be improved for the error paths.
} | ||
else { | ||
for (i = 0; i < PyArray_MIN(nd,maxvals); i++) { | ||
op = PySequence_GetItem(seq, i); | ||
op = PySequence_Fast_GET_ITEM(seq, i); | ||
if (op == NULL) { | ||
return -1; | ||
} | ||
|
||
vals[i] = PyArray_PyIntAsIntp(op); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since you add the PyArray_IntpFromScalar
helper, this path must also use it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please have a look at 7c8fac3
@@ -1020,6 +1044,26 @@ PyArray_IntpFromIndexSequence(PyObject *seq, npy_intp *vals, npy_intp maxvals) | |||
return nd; | |||
} | |||
|
|||
NPY_NO_EXPORT npy_intp |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Make this static npy_intp
and move it before where it is used. Then you can remove it from the header since it is only used in this file (in fact, you could rename it not follow the PyArray_CameCase
pattern, but that doesn't matter.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please, make this function return the value instead of returning an error indicator. So that it is practically the same as PyArray_PyIntAsIntp
(just with this error replacement just in case).
You could even use it static NPY_INLINE npy_intp
if you want to tell the compiler to inline this, but I doubt it matters it will probably inline it in any case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please have a look at 7acb660
} | ||
seq->len = len; | ||
|
||
nd = PyArray_IntpFromScalar(obj, (npy_intp *)seq->ptr); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why nd
, that sounds like dimensions. Just call it something else, like value
and move the definition down here (The code used to be C89 where definitions could only be at the start of the scope, but this is not needed anymore).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please have a look at daa8b30
if(vals[0] == -1) { | ||
err = PyErr_Occurred(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please use error_converting(value)
instead. That is equivalent to value == -1 && PyErr_Occurred()
. And then just use PyErr_ExceptionMatches()
.
(I am assuming you removed vals
here :).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please have a look at f12f3f7
return NPY_SUCCEED; | ||
} | ||
|
||
PyObject *seq_obj = PySequence_Fast(obj, "PySequence_Fast"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The string gets printed as error message, and this doesn't seem good. Maybe look at what the message reads and decide if you can customize it well. Maybe you do not even need that replacement below.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please have a look at aceb953
PyObject *seq_obj = PySequence_Fast(obj, "PySequence_Fast"); | ||
if (seq_obj == NULL) { | ||
PyErr_SetString(PyExc_ValueError, | ||
"An error occurred during a call to PySequence_Fast"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You might not need to replace the error here. But if you end up doing it, you should check what it was before and make sure it is useful for users. Something that tells the user that an integer or sequence is required, but it was XY.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please have a look at aceb953
4ff7e29
to
aceb953
Compare
I split the work needed to address the review in several commits in order to facilitate the process, I'm going to squash everything after my changes are approved. |
np.empty
for scalar argumentsnp.empty
for scalar arguments
@seberg Is the coverage fine now? If not, where should I look for untested lines of code? |
If you go to the "Files changed" tab, you will see annotations. The one function is only public API, so it is not tested (we could test it by adding something to |
Thank you @seberg, I'm actually having some difficulties understanding where and how new tests should be inserted, how I can run them on my machine (are they translated to Python tests somehow?). Is there a reference on the NumPy contributing guide which I still haven't found? Or maybe some resource online to understand how the testing system works. |
Probably not, when it comes to those things the code is all we have. Matti just added a new system, but lets stick with the old one here. What you would do is copy a function in There is a list of exported functions (which can be imported in Python) at the very end. You could duplicate one that is defined as
(paraphrasing, I have not looked at the code.) Finding a place in Python to add the actual test is probably a bit annoying. For lack of a better spot, I guess lets use (The brief abridged version, if that doesn't add up, please just give me another ping.) |
1926284
to
1d33b1d
Compare
Thank you @seberg, I added two minimal tests and it looks like the coverage is increased. Let me know if there's something missing. I noticed that there are still several untested lines, should I try to cover them? If yes, I think I need some assistance on the following aspects:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! Seems code coverage is great now. I made some comments, I think the helper function would be better to work on a signle value. I do have to look a bit closer at the logic (whether PySequence_Check
is absolutely safe, or we should go back to a try/except style).
If you get stuck/fed up iterating, let me know and I can do some fixups for you.
@@ -125,27 +196,34 @@ PyArray_IntpConverter(PyObject *obj, PyArray_Dims *seq) | |||
if (len < 0) { | |||
PyErr_SetString(PyExc_TypeError, | |||
"expected sequence object with len >= 0 or a single integer"); | |||
Py_DECREF(seq_obj); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PySequence_Fast_GET_SIZE
will never fail. You can remove the whole < 0
and == -1
blocks. Any failure will have occurred in PySequence_Fast
already.
We may have to think briefly whether we have to fall back to the above branch if PySequence_Fast(...)
fails, so that it is a bit like a: try: list(obj); except: int(obj)
path in Python.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reason why this line is not covered, is that is is impossible to cover it :).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I removed the redundant checks in 3a85561.
We may have to think briefly whether we have to fall back to the above branch if PySequence_Fast(...) fails
Which are the conditions under which PySequence_Fast
might fail in this case?
The reason why this line is not covered, is that is is impossible to cover it :).
Got it :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PySequence_Fast
should fail exactly in the same cases as list(obj)
failing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't this check sufficient to ensure that PySequence_Fast
does not fail?
if (PyLong_CheckExact(obj) || !PySequence_Check(obj)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, for one, PySequence_Check
only checks very quickly if it should be a squence. But sequences could be badly implemented and only error when you get the elements.
The one problem that might occur is to try the sequence conversion first. Then we will still try the int(seq)
path for broken sequences. This is probably actually important! Could you add a test for np.array(3)
being accepted?
seq = NULL
if not Long_CheckExact and PySequence_Check:
seq = ...
if seq == NULL:
PyErr_Clear() # ignore the error.
if seq == NULL:
# integer path
else:
# sequence path
A bit annoying dance, but I think it ensures we don't accidentally change more behaviour than anticipated. The PyErr_Clear()
makes it a try/except
clause.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please @seberg have a look at 6da32a8. I implemented the structure you proposed for PyArray_IntpConverter
.
For some reason now the following test fails (everything else is fine):
def test_none(self):
# once the warning expires, this will raise TypeError
with pytest.warns(DeprecationWarning):
> assert self.conv(None) == ()
E assert None == ()
E + where None = <built-in function run_intp_converter>(None)
E + where <built-in function run_intp_converter> = <numpy.core.tests.test_conversion_utils.TestIntpConverter object at 0x7fda6307ecd0>.conv
self = <numpy.core.tests.test_conversion_utils.TestIntpConverter object at 0x7fda6307ecd0>
numpy/core/tests/test_conversion_utils.py:190: AssertionError
From what I understand, this case should be handled in:
numpy/numpy/core/src/multiarray/conversion_utils.c
Lines 120 to 128 in 6da32a8
if (obj == Py_None) { | |
/* Numpy 1.20, 2020-05-31 */ | |
if (DEPRECATE( | |
"Passing None into shape arguments as an alias for () is " | |
"deprecated.") < 0){ | |
return NPY_FAIL; | |
} | |
return NPY_SUCCEED; | |
} |
But it looks like something went wrong.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PyObject *seq_obj = PySequence_Fast(obj, | ||
"An error occurred during a call to `PySequence_Fast`, an iterable " | ||
"object is required. The given object is not a scalar integer nor a " | ||
"sequence."); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This error message (and also below) is not user-friendly. Maybe check what it is, and what it was. If we end up clearing it (due to a try/except), you can leave it empty I think. Otherwise, it should be something like expected an integer or a sequence of integers
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed in 12c9b4e
PyObject *seq_obj = PySequence_Fast(seq, | ||
"An error occurred during a call to `PySequence_Fast`, an iterable " | ||
"object is required. The given object is not a scalar integer nor a " | ||
"sequence."); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here, this is fine, but the error message is not helpful. Maybe the default is even OK (you could compare what it was before and after for an invalid object?).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed in 12c9b4e.
I was not able to recover the default error message since PySequence_Fast()
strongly requires the message, which means that passing NULL
results in a segmentation fault, and passing ""
results in numpy/core/tests/test_multiarray.py:526: TypeError
, which does not contain any error message.
I replaced my message with yours (slightly modified).
} | ||
seq->len = len; | ||
|
||
npy_intp rvalue = intp_from_scalar(obj, (npy_intp *)seq->ptr, 0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you rewrite this so that it reads:
seq->ptr[0] = intp_from_scalar(obj);
if (error_converting(seq->ptr[0]) {
npy_free_cache_dim_obj(*seq);
seq->ptr = NULL; /* I would think this is unnecessary, but keep it if it was there? */
return NPY_FAIL;
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I.e. I am thinking that you should simplify intp_from_scalar
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought for some minutes about the fact that intp_from_scalar
was too verbose (to call, and also in its implementation), but then I decided to stick with the "format" of the other functions.
Fixed in 8dc0f00
"object is required. The given object is not a scalar integer nor a " | ||
"sequence."); | ||
if (seq_obj == NULL) { | ||
return NPY_FAIL; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing the cleanup:
npy_free_cache_dim_obj(*seq);
(and possibly setting ptr
to NULL, but I don't think it should be necessary).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed in 1b16eb5
d549b0f
to
5cbde7f
Compare
Thank you @seberg! This is actually my first time using Python and C together (also, my experience with C is negligible), therefore your reviews are very useful. |
Hi @seberg, just wanted to notify you that I think I pushed all the changes you required yesterday (addition of a test for Let me know what you think and if there is something more I can do :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, this is coming along nicely. A few style nits that apply to some unmarked places also. The test is not quite what I had in mind. Otherwise, my only concern is that I think that public-API function also should get the "try/except" logic.
In any case, I love how much simpler PyArray_IntpFromIndexSequence
is now.
"Expected an integer or an iterable/sequence of integers."); | ||
if (seq_obj == NULL) { | ||
return -1; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we invert this here as well? To make it "try/except style"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wasn't sure since in PyArray_IntpFromSequence
we do not need to reserve memory "manually". I'm going to refactor this as well
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed in 9528716, I also introduced two tests with an overflow error for more extensive checking.
numpy/core/tests/test_multiarray.py
Outdated
@@ -449,6 +449,10 @@ def test_array(self): | |||
def test_array_empty(self): | |||
assert_raises(TypeError, np.array) | |||
|
|||
def test_0d_array(self): | |||
assert np.array(3) == 3 | |||
assert np.array(10).shape == () |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, my "BadSequence" test is probably not necessary, then, I guess. This is not what I was going for ;):
np.ones(np.array(3)).shape == (3,)
is what I was going for. The reason is that list(np.array(3))
fails, I expect PySequence_Check(np.array(3))
is True.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Testing np.array(3)
in test_scalar_intp_converter
and test_scalar_intp_from_sequence
would be perfect (and more than sufficient).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tentative fix in 5d11ffc, what do you think?
My doubt is that testing np.array(3)
is not suitable to be inserted inside test_scalar_intp_converter
or test_scalar_intp_from_sequence
since only one path between the two possible paths (PyArray_IntpFromSequence
or PyArray_IntpConverter
) is followed, the other test is redundant (and wrongly located in some sense). Am I missing something?
numpy/core/tests/test_multiarray.py
Outdated
assert test_scalar_intp_from_sequence(10) == 10 | ||
assert test_scalar_intp_from_sequence(-1) == -1 | ||
|
||
assert_raises(TypeError, test_scalar_intp_from_sequence, 'foo') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Both of these tests look exactly the same, so you could move the import to the stop, and then rewrite them as:
@pytest.mark.parametrize("converter", [test_scalar_intp_converter, test_scalar_intp_from_sequence])
But I guess if you end up changing/extending, it is just as well to keep it as two.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, I'm just waiting to see the final form of test_scalar_intp_converter
and test_scalar_intp_from_sequence
a30885c
to
6180fb6
Compare
Hi @seberg, thank you again for your review, I think I addressed all your suggestions (with several questions) EDIT:
EDIT2: |
a4e1dc6
to
9b25d6d
Compare
Depending on your system, a few things you could try to debug is:
If everything fails, we can run the (non windows) jobs using GDB in CI also. EDIT: The devdocs do have a bit of information about running in gdb: https://numpy.org/devdocs/dev/development_environment.html#debugging |
Thank you @seberg, I used I managed to discover that the problem disappears if I comment |
Sorry, my bad. I keep forgetting that |
8af97a4
to
9b25d6d
Compare
Sure! I cleared away my experiments from the branch |
471b460
to
98dd1cb
Compare
Also refactored the tests a bit and tweaked the error message. Someone else may want to have a look, but I think it should be good to go. The most common error message here is now: np.ones(object()) TypeError: expected a sequence of integers or a single integer, got '<object object at 0x7f90d733c770>' This could also make sense with exeption chaining, but I am not sure it is worth the trouble. The error type should never change, only the message itself. The code here effectively uses |
Thank you @seberg, unfortunately there is a segmentation fault in one of the GH checks (function |
You uncovered an unsupported feature in PyPy: it does not like format widths. Please skip the crashing test on that platform with a See PyPy issue 3593. |
Co-authored-by: Sebastian Berg <sebastian@sipsolutions.net>
… improve error message in PyArray_IntpFromSequence
…ch the PyArray_IntpConverter one. overflow tests
Largely just my personal style preferences. Other small changes: * `PyErr_Clear()` is not necessary before a `PyErr_Set...` * Rename testing functions, testing function starting with `test_` and imported top-level would get executed, just a possible cause of confusion. * Removed/shortened/rephrased a few comments, nothing particular and probably very subjective. * Use `PyLong_FromLongLong` instead of `FromLong`. The former is technically incorrect on windows (and may cause compiler warnings) This should be always correct...
This modifies the error messages to give a more clear error when an invalid object is passed in (neither a sequence of ints nor an integer). The message is now: expected a sequence of integers or a single integer, got '<repr>' Moves tests, expand a little, and use parametrization.
84803b9
to
2d9283f
Compare
Rebased away the pypy related fixups, since I think main is good now. Planning on squash merging this soon, maybe just later today. I think it is a definite improvement of both speed and code. Thanks @fAndreuzzi. |
My bad, the Pypy issue is only fixed in the next release of PyPy (3.7.8), so we probably might as well wait for that. |
Thank you very much @seberg for the kind support and detailed code reviews, I learned a lot from this PR.
Looking forward to see my code in the main branch! |
Hi @seberg, just out of curiosity, how long do you take it's going to take to see the fix to the bug in PyPy published? |
I dunno, before the next release for sure. In any case, you could also skip the tests as we did in gh-20580. |
Not sure if our PyPy dependency got bumped, so close/reopen just to see... |
OK, I marked all the tests that fail on PyPy now. We could of course just bump the CI PyPy as well, but I guess it also just doesn't hurt to have those skips for now (we already have the similar skips anyway). Going to finally merge once CI is done. Thanks again. |
Thank you for your help @seberg |
This PR includes the modifications required to fix #19010, plus several additions discussed in the issue, like the usage of
PySequence_Fast
and the corresponding family of functions.