Add support for np.asarray_chkfinite [WIP] [need help] #6279

rishabhvarshney14 · 2020-09-23T20:05:18Z

Pull request to add support for np.asarray_chkfinite from issue #4074
Following numpy's documentation for asarray_chkfinite asarray_chkfinite takes three args: first one is array and other two are optional which are dtype and order.
When I use order with np.asarray in the impl function:

def impl(a, dtype=None, order='C'):
        a = np.asarray(a, dtype=dt, order=order)
        if not np.all(np.isfinite(a)):
            raise TypingError("array must not contain infs or NaNs")
        return a

it gives the following error

ERROR: test_asarray_chkfinite (numba.tests.test_np_functions.TestNPFunctions)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "numba\numba\tests\test_np_functions.py", line 3757, in test_asarray_chkfinite
    got = cfunc(*pair)
  File "numba\numba\core\dispatcher.py", line 414, in _compile_for_args
    error_rewrite(e, 'typing')
  File "numba\numba\core\dispatcher.py", line 357, in error_rewrite
    raise e.with_traceback(None)
numba.core.errors.TypingError: Failed in nopython mode pipeline (step: nopython frontend)
No implementation of function Function(<function asarray_chkfinite at 0x000001BE3BF74488>) found for signature:

 >>> asarray_chkfinite(array(int32, 1d, C), class(float32), unicode_type)

There are 2 candidate implementations:
  - Of which 2 did not match due to:
  Overload in function 'np_asarray_chkfinite': File: numba\np\arraymath.py: Line 4162.
    With argument(s): '(array(int32, 1d, C), class(float32), unicode_type)':
   Rejected as the implementation raised a specific error:
     TypingError: Failed in nopython mode pipeline (step: nopython frontend)
   No implementation of function Function(<function asarray at 0x000001BE3BE01950>) found for signature:

    >>> asarray(array(int32, 1d, C), dtype=class(float32), order=unicode_type)

   There are 2 candidate implementations:
         - Of which 2 did not match due to:
         Overload in function 'np_asarray': File: numba\np\arraymath.py: Line 4018.
           With argument(s): '(array(int32, 1d, C), dtype=class(float32), order=unicode_type)':
          Rejected as the implementation raised a specific error:
            TypeError: np_asarray() got an unexpected keyword argument 'order'
     raised from numba\numba\core\typing\templates.py:710

   During: resolving callee type: Function(<function asarray at 0x000001BE3BE01950>)
   During: typing of call at numba\numba\np\arraymath.py (4174)


   File "numba\np\arraymath.py", line 4174:
       def impl(a, dtype=None, order='C'):
           a = np.asarray(a, dtype=dt, order=order)
           ^

  raised from numba\numba\core\typeinfer.py:1071

During: resolving callee type: Function(<function asarray_chkfinite at 0x000001BE3BF74488>)
During: typing of call atnumba\numba\tests\test_np_functions.py (324)


File "numba\tests\test_np_functions.py", line 324:
def np_asarray_chkfinite(a, dtype=None, order='C'):
    return np.asarray_chkfinite(a, dtype, order)
    ^

----------------------------------------------------------------------
Ran 1 test in 0.251s

FAILED (errors=1)

The implementation of np.asarray in arraymath.py does not accept order argument.

esc · 2020-09-24T10:09:32Z

@rishabhvarshney14 thanks for submitting this and thank you for your efforts to help to improve Numba! Before we begin to review the code, there are a few flake8 related issues that need to be addressed, as indicated by the failing coverage test on Azure pipelines:

numba/np/arraymath.py:4163:1: E302 expected 2 blank lines, found 1
numba/np/arraymath.py:4165:1: W293 blank line contains whitespace
numba/np/arraymath.py:4167:81: E501 line too long (84 > 80 characters)
numba/np/arraymath.py:4168:1: W293 blank line contains whitespace
numba/np/arraymath.py:4173:1: W293 blank line contains whitespace
numba/tests/test_np_functions.py:323:1: E302 expected 2 blank lines, found 1
numba/tests/test_np_functions.py:3748:1: W293 blank line contains whitespace
numba/tests/test_np_functions.py:3766:81: E501 line too long (98 > 80 characters)
numba/tests/test_np_functions.py:3778:1: E303 too many blank lines (3)

Additionally, it seems like you accidentally committed a file called branch.diff? Please rewrite your git history to remove this file, thank you! Once these two issues have been resolved, the PR will be ready for a review of the code and functionality submitted.

esc · 2020-09-24T11:01:33Z

@rishabhvarshney14 thanks for updating this. It is looking. much better already, but there are still a number of flake8 issues as reported by Azure pipelines.

numba/np/arraymath.py:4169:17: E126 continuation line over-indented for hanging indent
numba/np/arraymath.py:4170:17: E123 closing bracket does not match indentation of opening bracket's line
numba/tests/test_np_functions.py:3769:13: E123 closing bracket does not match indentation of opening bracket's line
numba/tests/test_np_functions.py:3781:1: E303 too many blank lines (3)

esc

Thank you again for your submission! It looks quite good now. I left a few suggestions about algorithmic efficiency and Numpy behavioral coherence that will need to be addressed.

numba/np/arraymath.py

esc · 2020-09-29T13:28:12Z

So, I wrote the following benchmark script:

from numba import njit
import numpy as np

arr = np.random.rand(100, 100, 100)
brr = arr.copy()

mask = np.random.randint(0, 2,size=arr.shape).astype(np.bool)
brr[mask] = np.nan

# now we have three array, b, a_first and a_second

a_first = arr.copy()
a_first[0, 0, 0] = np.nan

a_last = arr.copy()
a_last[-1, -1, -1] = np.nan


@njit
def isfinite_all(a):
    if not np.isfinite(a).all():
        raise ValueError
    return np.asarray(a)


@njit
def nditer_isnan_isinf(a):
    a = np.asarray(a)
    for i in np.nditer(a):
        if np.isnan(i) or np.isinf(i):
            raise ValueError
    return a


@njit
def nditer_isfinite(a):
    a = np.asarray(a)
    for i in np.nditer(a):
        if not np.isfinite(i):
            raise ValueError
    return a


def ex_wrapper(func, arg):
    try:
        func(arg)
    except ValueError:
        pass


# compile them all
ex_wrapper(isfinite_all, arr)
ex_wrapper(nditer_isnan_isinf, arr)
ex_wrapper(nditer_isfinite, arr)

And then I benchmarked them like so:

In [4]: %timeit ex_wrapper(isfinite_all, arr)
1.61 ms ± 96.6 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

In [5]: %timeit ex_wrapper(isfinite_all, brr)
736 µs ± 18.1 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

In [6]: %timeit ex_wrapper(isfinite_all, a_first)
734 µs ± 17.1 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

In [7]: %timeit ex_wrapper(isfinite_all, a_last)
1.52 ms ± 15.2 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

And

In [9]: %timeit ex_wrapper(nditer_isnan_isinf, arr)
996 µs ± 9.24 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

In [10]: %timeit ex_wrapper(nditer_isnan_isinf, brr)
1.18 µs ± 21.1 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

In [11]: %timeit ex_wrapper(nditer_isnan_isinf, a_first)
1.16 µs ± 7.95 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

In [12]: %timeit ex_wrapper(nditer_isnan_isinf, a_last)
978 µs ± 25.6 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

And

In [13]: %timeit ex_wrapper(nditer_isfinite, arr)
508 µs ± 8.98 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

In [14]: %timeit ex_wrapper(nditer_isfinite, brr)
1.17 µs ± 10.7 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

In [15]: %timeit ex_wrapper(nditer_isfinite, a_first)
1.15 µs ± 11.7 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

In [16]: %timeit ex_wrapper(nditer_isfinite, a_last)
560 µs ± 61 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

So, in conclusion:

nditer_isfinite seems to give best performance for the case of no Nans
isfinite_all gives comparatively bad performance for the random brr and a_first cases. (What I think is happening, is that even though the algorithm has encountered a Nan early on, the all means all elements will be looked at.)
nditer_isfinite and nditer_isnan_isinf give similar performance for the brr and a_first cases. (Essentially we can "bail out" as soon as the first Nan is encountered. The way this is checked is of no relative meaning)
nditer_isfinite gives the best performance on the a_last use-case. (I presume this is because doing a np.isfinite() on a single element is faster than np.isnan(i) or np.isinf(i))

From what I can tell, these observations match your findings too, right @rishabhvarshney14 ?

rishabhvarshney14 · 2020-09-30T07:07:47Z

Yes. I have changed the isinf or innan to isfinite.

esc · 2020-09-30T08:23:23Z

@rishabhvarshney14 excellent, thank you very much. And thank you for being persistent and willing to measure what the best implementation is. I think this PR is now in great shape. The only item left on my list that needs to be attended to is the order keyword argument. I believe you should probably add some tests for changing the order (and you may have to hand the order argument into the call to asarray for the tests to pass).

Beyond that, since I was involved in suggesting some code for this, I would like to request that either @sklam, @stuartarchibald or @gmarkall take a look at this too and sign it off, thanks!

gmarkall

Since only the first two arguments are supported for np.asarray (https://numba.readthedocs.io/en/stable/reference/numpysupported.html#other-functions) I think it only makes sense to support the first two arguments of asarray_chkfinite - so instead of adding tests for changing the order, I think it'd instead be better to remove the order='C' kwarg from the implementation and tests.

Other than that, this is looking good!

gmarkall · 2020-09-30T10:32:40Z

docs/source/reference/numpysupported.rst

@@ -338,6 +338,7 @@ The following top-level functions are supported:
 * :func:`numpy.array` (only the 2 first arguments)
 * :func:`numpy.array_equal`
 * :func:`numpy.asarray` (only the 2 first arguments)
+* :func:`numpy.asarray_chkfinite` (only the first first arguments)


Suggested change

* :func:`numpy.asarray_chkfinite` (only the first first arguments)

* :func:`numpy.asarray_chkfinite` (only the 2 first arguments)

Yeah, either that or only the first two arguments, but I guess that would be inconsistent with the rest of the doc?

esc · 2020-10-01T15:50:06Z

@rishabhvarshney14 excellent, thanks very much for fixing this up!

esc · 2020-10-01T15:51:24Z

@gmarkall happy with this? If yes, please do change the label to "Ready To Merge" (RTM), thx!

gmarkall

Looks good to me!

gmarkall · 2020-10-01T15:54:24Z

@gmarkall happy with this? If yes, please do change the label to "Ready To Merge" (RTM), thx!

Yes - I saw this earlier but forgot to approve (must have got distracted). Thanks for the ping!

esc · 2020-10-01T15:57:41Z

@gmarkall excellent, thank you!

@rishabhvarshney14 thanks for working on this so persistently, this PR is now in the RTM stage and will merged during the next merge window (probably later on today or tomorrow). 👌

rishabhvarshney14 · 2020-10-01T15:59:35Z

Thank you for helping me and making my first PR really amazing.

stuartarchibald · 2020-10-01T20:19:20Z

numba/np/arraymath.py

+    else:
+        dt = dtype.dtype


@esc Should this check that the dtype variable as actually a Numba dtype type instance? e.g. if a user passes in a string like 'float32' they are going to get an attribute error.

stuartarchibald · 2020-10-01T20:20:59Z

numba/tests/test_np_functions.py

+        #test for both inf and NaNs
+        with self.assertRaises(ValueError) as e:
+            cfunc(np.array([np.inf, np.nan]))
+        self.assertIn("array must not contain infs or NaNs", str(e.exception))
+
+        #test for NaNs
+        with self.assertRaises(ValueError) as e:
+            cfunc(np.array([np.nan, np.nan, np.nan]))
+        self.assertIn("array must not contain infs or NaNs", str(e.exception))


@esc I'm not sure what these tests gain on the previous two given the alg present?

esc · 2020-10-02T09:31:46Z

@stuartarchibald thanks for reviewing this too, more eyes is always less bugs! I'll move it back to 'waiting on author' as we address the items you mentioned.

@rishabhvarshney14 can you take a look at the comments? Probably, simply adding a check the dtype variable, a test for it and removing the test with the three nan values as it doesn't add anything. Thanks!

rishabhvarshney14 · 2020-10-03T08:50:09Z

@esc @stuartarchibald I have changed dt = dtype.dtype to dt = as_dtype(dtype) which raises unicode_type cannot be represented as a Numpy dtype when 'float32' is passed. Is this what I have to do?

esc · 2020-10-05T08:14:13Z

@rishabhvarshney14 I believe what @stuartarchibald is referring to here are Numpy calls like the following:

In [23]: a
Out[23]: array([0, 1, 2])

In [24]: a.dtype
Out[24]: dtype('int64')

In [25]: np.asarray_chkfinite(a, dtype='float32')
Out[25]: array([0., 1., 2.], dtype=float32)

Support for this type of construct (supplying a dtype as a string) was recently added in: #6262 -- perhaps you could adopt something similar in this case? I.e. to make asarray_chkfinite work also when the dtype specified is a string?

stuartarchibald · 2020-10-05T10:25:47Z

@rishabhvarshney14 I believe what @stuartarchibald is referring to here are Numpy calls like the following:
In [23]: a
Out[23]: array([0, 1, 2])

In [24]: a.dtype
Out[24]: dtype('int64')

In [25]: np.asarray_chkfinite(a, dtype='float32')
Out[25]: array([0., 1., 2.], dtype=float32)
Support for this type of construct (supplying a dtype as a string) was recently added in: #6262 -- perhaps you could adopt something similar in this case? I.e. to make asarray_chkfinite work also when the dtype specified is a string?

In the interests of this PR making 0.52. I recommend just rejecting anything which isn't a dtype and adding a 3 line test for it, dtype-from-string support adds a load more complexity as it requires enforcement of string literals in the typing call. It is however up to you and @esc :)

esc · 2020-10-05T14:58:38Z

@rishabhvarshney14 I would concur with @stuartarchibald to simply reject anything that isn't an instance of a dtype.

rishabhvarshney14 · 2020-10-07T10:00:27Z

I am having some issues with this.
I tried using isinstance(dtype.dtype, (types.Float, ...)) but if any string or any other arg other than np.dtype is passed it gives attribute error as it does not have dtype attribute.

I tried two methods which may not be a good way for this.

Using try-except with as_dtype

try:
    dt = as_dtype(dtype)
except NotImplementedError:
    raise TypingError('dtype must be a valid Numpy dtype')

This will give the required error instead of the attribute error

Using hasattr()

if not hasattr(dtype, 'dtype'):
    raise Error

esc · 2020-10-07T14:04:14Z

@rishabhvarshney14 I think the first option might be fine, actually. @stuartarchibald what do you say? Also, you may have some conflicts with master (as several PRs have been merged now) that must now be resolved. Your best bet is to merge the current numba/numba master and resolved the conflicts in the resulting merge commit.

update

esc · 2020-10-08T18:13:03Z

@rishabhvarshney14 thanks for updates, it looks good now! One last nitpick, the git history now has a strange commit 8398763 with the message Merge pull request #1 from numba/master. It seems very strange and shouldn't be there. Could you perhaps fixup your git history on this PR? Normally, we don't allow forced pushes, but in this case, I think it will be needed. Do let me know if you need further advice on how to use git rebase and friends.

rishabhvarshney14 · 2020-10-09T09:34:22Z

@esc I am not sure how I can use rebase also I follow this article to sync my forked repo with this one which is the reason for the commit.
Any advice on this would be great thank you.

esc · 2020-10-09T15:21:38Z

@rishabhvarshney14 so, after discussing with some of the other Numba devs, we agreed that it will be easiest, if I fixup this PR and re-submit a git-cleaned version. I'll try to preserve as much history as possible.

esc · 2020-10-09T15:29:20Z

New PR available here: #6341

rishabhvarshney14 added 2 commits September 23, 2020 15:04

added implementation of np asarray_chkfinite

01b5c6b

add support for asarray_chkfinite

cb78e0e

rishabhvarshney14 changed the title ~~Add support for np.asarray_chkfinite~~ Add support for np.asarray_chkfinite [WIP] [need help] Sep 24, 2020

esc added the 2 - In Progress label Sep 24, 2020

fixed flake8 error and remove branch.diff

bc5beb4

rishabhvarshney14 added 3 commits September 24, 2020 16:55

fixed flake8 errors

b9491c1

flake8 fixed

a50fa05

flak8 fix

0ef86c9

esc requested changes Sep 24, 2020

View reviewed changes

numba/np/arraymath.py Outdated Show resolved Hide resolved

numba/np/arraymath.py Outdated Show resolved Hide resolved

numba/np/arraymath.py Outdated Show resolved Hide resolved

esc added 4 - Waiting on author Waiting for author to respond to review and removed 2 - In Progress labels Sep 24, 2020

improved implementation

1be9be5

changed to np.isfinite

a578490

gmarkall requested changes Sep 30, 2020

View reviewed changes

removed order

a05d6b1

esc added 4 - Waiting on reviewer Waiting for reviewer to respond to author and removed 4 - Waiting on author Waiting for author to respond to review labels Oct 1, 2020

esc approved these changes Oct 1, 2020

View reviewed changes

gmarkall approved these changes Oct 1, 2020

View reviewed changes

gmarkall added 5 - Ready to merge Review and testing done, is ready to merge and removed 4 - Waiting on reviewer Waiting for reviewer to respond to author labels Oct 1, 2020

gmarkall added this to the Numba 0.52 RC milestone Oct 1, 2020

stuartarchibald reviewed Oct 1, 2020

View reviewed changes

changed dtype.dtype to as_dtype

c24e3aa

stuartarchibald added 4 - Waiting on author Waiting for author to respond to review and removed 5 - Ready to merge Review and testing done, is ready to merge labels Oct 5, 2020

rishabhvarshney14 and others added 3 commits October 8, 2020 18:48

added test for dtype

c9960c4

Merge pull request #1 from numba/master

8398763

update

added test for dtype

4a448ae

esc mentioned this pull request Oct 9, 2020

Re roll 6279 #6341

Merged

esc added abandoned PR is abandoned (no reason required) and removed 4 - Waiting on author Waiting for author to respond to review labels Oct 9, 2020

esc closed this Oct 9, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for np.asarray_chkfinite [WIP] [need help] #6279

Add support for np.asarray_chkfinite [WIP] [need help] #6279

rishabhvarshney14 commented Sep 23, 2020 •

edited

esc commented Sep 24, 2020

esc commented Sep 24, 2020

esc left a comment

esc commented Sep 29, 2020

rishabhvarshney14 commented Sep 30, 2020

esc commented Sep 30, 2020

gmarkall left a comment

gmarkall Sep 30, 2020

esc Oct 1, 2020

esc commented Oct 1, 2020

esc commented Oct 1, 2020

gmarkall left a comment

gmarkall commented Oct 1, 2020

esc commented Oct 1, 2020

rishabhvarshney14 commented Oct 1, 2020

stuartarchibald Oct 1, 2020

stuartarchibald Oct 1, 2020

esc commented Oct 2, 2020

rishabhvarshney14 commented Oct 3, 2020

esc commented Oct 5, 2020

stuartarchibald commented Oct 5, 2020

esc commented Oct 5, 2020

rishabhvarshney14 commented Oct 7, 2020

esc commented Oct 7, 2020

esc commented Oct 8, 2020

rishabhvarshney14 commented Oct 9, 2020

esc commented Oct 9, 2020

esc commented Oct 9, 2020

	* :func:`numpy.asarray_chkfinite` (only the first first arguments)
	* :func:`numpy.asarray_chkfinite` (only the 2 first arguments)

Add support for np.asarray_chkfinite [WIP] [need help] #6279

Add support for np.asarray_chkfinite [WIP] [need help] #6279

Conversation

rishabhvarshney14 commented Sep 23, 2020 • edited

esc commented Sep 24, 2020

esc commented Sep 24, 2020

esc left a comment

Choose a reason for hiding this comment

esc commented Sep 29, 2020

rishabhvarshney14 commented Sep 30, 2020

esc commented Sep 30, 2020

gmarkall left a comment

Choose a reason for hiding this comment

gmarkall Sep 30, 2020

Choose a reason for hiding this comment

esc Oct 1, 2020

Choose a reason for hiding this comment

esc commented Oct 1, 2020

esc commented Oct 1, 2020

gmarkall left a comment

Choose a reason for hiding this comment

gmarkall commented Oct 1, 2020

esc commented Oct 1, 2020

rishabhvarshney14 commented Oct 1, 2020

stuartarchibald Oct 1, 2020

Choose a reason for hiding this comment

stuartarchibald Oct 1, 2020

Choose a reason for hiding this comment

esc commented Oct 2, 2020

rishabhvarshney14 commented Oct 3, 2020

esc commented Oct 5, 2020

stuartarchibald commented Oct 5, 2020

esc commented Oct 5, 2020

rishabhvarshney14 commented Oct 7, 2020

esc commented Oct 7, 2020

esc commented Oct 8, 2020

rishabhvarshney14 commented Oct 9, 2020

esc commented Oct 9, 2020

esc commented Oct 9, 2020

rishabhvarshney14 commented Sep 23, 2020 •

edited