Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for np.asarray_chkfinite [WIP] [need help] #6279

Closed
wants to merge 13 commits into from
Closed

Add support for np.asarray_chkfinite [WIP] [need help] #6279

wants to merge 13 commits into from

Conversation

rishabhvarshney14
Copy link
Contributor

@rishabhvarshney14 rishabhvarshney14 commented Sep 23, 2020

Pull request to add support for np.asarray_chkfinite from issue #4074
Following numpy's documentation for asarray_chkfinite asarray_chkfinite takes three args: first one is array and other two are optional which are dtype and order.
When I use order with np.asarray in the impl function:

def impl(a, dtype=None, order='C'):
        a = np.asarray(a, dtype=dt, order=order)
        if not np.all(np.isfinite(a)):
            raise TypingError("array must not contain infs or NaNs")
        return a

it gives the following error

ERROR: test_asarray_chkfinite (numba.tests.test_np_functions.TestNPFunctions)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "numba\numba\tests\test_np_functions.py", line 3757, in test_asarray_chkfinite
    got = cfunc(*pair)
  File "numba\numba\core\dispatcher.py", line 414, in _compile_for_args
    error_rewrite(e, 'typing')
  File "numba\numba\core\dispatcher.py", line 357, in error_rewrite
    raise e.with_traceback(None)
numba.core.errors.TypingError: Failed in nopython mode pipeline (step: nopython frontend)
No implementation of function Function(<function asarray_chkfinite at 0x000001BE3BF74488>) found for signature:

 >>> asarray_chkfinite(array(int32, 1d, C), class(float32), unicode_type)

There are 2 candidate implementations:
  - Of which 2 did not match due to:
  Overload in function 'np_asarray_chkfinite': File: numba\np\arraymath.py: Line 4162.
    With argument(s): '(array(int32, 1d, C), class(float32), unicode_type)':
   Rejected as the implementation raised a specific error:
     TypingError: Failed in nopython mode pipeline (step: nopython frontend)
   No implementation of function Function(<function asarray at 0x000001BE3BE01950>) found for signature:

    >>> asarray(array(int32, 1d, C), dtype=class(float32), order=unicode_type)

   There are 2 candidate implementations:
         - Of which 2 did not match due to:
         Overload in function 'np_asarray': File: numba\np\arraymath.py: Line 4018.
           With argument(s): '(array(int32, 1d, C), dtype=class(float32), order=unicode_type)':
          Rejected as the implementation raised a specific error:
            TypeError: np_asarray() got an unexpected keyword argument 'order'
     raised from numba\numba\core\typing\templates.py:710

   During: resolving callee type: Function(<function asarray at 0x000001BE3BE01950>)
   During: typing of call at numba\numba\np\arraymath.py (4174)


   File "numba\np\arraymath.py", line 4174:
       def impl(a, dtype=None, order='C'):
           a = np.asarray(a, dtype=dt, order=order)
           ^

  raised from numba\numba\core\typeinfer.py:1071

During: resolving callee type: Function(<function asarray_chkfinite at 0x000001BE3BF74488>)
During: typing of call atnumba\numba\tests\test_np_functions.py (324)


File "numba\tests\test_np_functions.py", line 324:
def np_asarray_chkfinite(a, dtype=None, order='C'):
    return np.asarray_chkfinite(a, dtype, order)
    ^

----------------------------------------------------------------------
Ran 1 test in 0.251s

FAILED (errors=1)

The implementation of np.asarray in arraymath.py does not accept order argument.

@rishabhvarshney14 rishabhvarshney14 changed the title Add support for np.asarray_chkfinite Add support for np.asarray_chkfinite [WIP] [need help] Sep 24, 2020
@esc
Copy link
Member

esc commented Sep 24, 2020

@rishabhvarshney14 thanks for submitting this and thank you for your efforts to help to improve Numba! Before we begin to review the code, there are a few flake8 related issues that need to be addressed, as indicated by the failing coverage test on Azure pipelines:

numba/np/arraymath.py:4163:1: E302 expected 2 blank lines, found 1
numba/np/arraymath.py:4165:1: W293 blank line contains whitespace
numba/np/arraymath.py:4167:81: E501 line too long (84 > 80 characters)
numba/np/arraymath.py:4168:1: W293 blank line contains whitespace
numba/np/arraymath.py:4173:1: W293 blank line contains whitespace
numba/tests/test_np_functions.py:323:1: E302 expected 2 blank lines, found 1
numba/tests/test_np_functions.py:3748:1: W293 blank line contains whitespace
numba/tests/test_np_functions.py:3766:81: E501 line too long (98 > 80 characters)
numba/tests/test_np_functions.py:3778:1: E303 too many blank lines (3)

Additionally, it seems like you accidentally committed a file called branch.diff? Please rewrite your git history to remove this file, thank you! Once these two issues have been resolved, the PR will be ready for a review of the code and functionality submitted.

@esc
Copy link
Member

esc commented Sep 24, 2020

@rishabhvarshney14 thanks for updating this. It is looking. much better already, but there are still a number of flake8 issues as reported by Azure pipelines.

numba/np/arraymath.py:4169:17: E126 continuation line over-indented for hanging indent
numba/np/arraymath.py:4170:17: E123 closing bracket does not match indentation of opening bracket's line
numba/tests/test_np_functions.py:3769:13: E123 closing bracket does not match indentation of opening bracket's line
numba/tests/test_np_functions.py:3781:1: E303 too many blank lines (3)

Copy link
Member

@esc esc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you again for your submission! It looks quite good now. I left a few suggestions about algorithmic efficiency and Numpy behavioral coherence that will need to be addressed.

numba/np/arraymath.py Outdated Show resolved Hide resolved
numba/np/arraymath.py Outdated Show resolved Hide resolved
numba/np/arraymath.py Outdated Show resolved Hide resolved
@esc esc added 4 - Waiting on author Waiting for author to respond to review and removed 2 - In Progress labels Sep 24, 2020
@esc
Copy link
Member

esc commented Sep 29, 2020

So, I wrote the following benchmark script:

from numba import njit
import numpy as np

arr = np.random.rand(100, 100, 100)
brr = arr.copy()

mask = np.random.randint(0, 2,size=arr.shape).astype(np.bool)
brr[mask] = np.nan

# now we have three array, b, a_first and a_second

a_first = arr.copy()
a_first[0, 0, 0] = np.nan

a_last = arr.copy()
a_last[-1, -1, -1] = np.nan


@njit
def isfinite_all(a):
    if not np.isfinite(a).all():
        raise ValueError
    return np.asarray(a)


@njit
def nditer_isnan_isinf(a):
    a = np.asarray(a)
    for i in np.nditer(a):
        if np.isnan(i) or np.isinf(i):
            raise ValueError
    return a


@njit
def nditer_isfinite(a):
    a = np.asarray(a)
    for i in np.nditer(a):
        if not np.isfinite(i):
            raise ValueError
    return a


def ex_wrapper(func, arg):
    try:
        func(arg)
    except ValueError:
        pass


# compile them all
ex_wrapper(isfinite_all, arr)
ex_wrapper(nditer_isnan_isinf, arr)
ex_wrapper(nditer_isfinite, arr)

And then I benchmarked them like so:

In [4]: %timeit ex_wrapper(isfinite_all, arr)
1.61 ms ± 96.6 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

In [5]: %timeit ex_wrapper(isfinite_all, brr)
736 µs ± 18.1 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

In [6]: %timeit ex_wrapper(isfinite_all, a_first)
734 µs ± 17.1 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

In [7]: %timeit ex_wrapper(isfinite_all, a_last)
1.52 ms ± 15.2 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

And

In [9]: %timeit ex_wrapper(nditer_isnan_isinf, arr)
996 µs ± 9.24 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

In [10]: %timeit ex_wrapper(nditer_isnan_isinf, brr)
1.18 µs ± 21.1 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

In [11]: %timeit ex_wrapper(nditer_isnan_isinf, a_first)
1.16 µs ± 7.95 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

In [12]: %timeit ex_wrapper(nditer_isnan_isinf, a_last)
978 µs ± 25.6 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

And

In [13]: %timeit ex_wrapper(nditer_isfinite, arr)
508 µs ± 8.98 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

In [14]: %timeit ex_wrapper(nditer_isfinite, brr)
1.17 µs ± 10.7 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

In [15]: %timeit ex_wrapper(nditer_isfinite, a_first)
1.15 µs ± 11.7 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

In [16]: %timeit ex_wrapper(nditer_isfinite, a_last)
560 µs ± 61 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

So, in conclusion:

  • nditer_isfinite seems to give best performance for the case of no Nans
  • isfinite_all gives comparatively bad performance for the random brr and a_first cases. (What I think is happening, is that even though the algorithm has encountered a Nan early on, the all means all elements will be looked at.)
  • nditer_isfinite and nditer_isnan_isinf give similar performance for the brr and a_first cases. (Essentially we can "bail out" as soon as the first Nan is encountered. The way this is checked is of no relative meaning)
  • nditer_isfinite gives the best performance on the a_last use-case. (I presume this is because doing a np.isfinite() on a single element is faster than np.isnan(i) or np.isinf(i))

From what I can tell, these observations match your findings too, right @rishabhvarshney14 ?

@rishabhvarshney14
Copy link
Contributor Author

Yes. I have changed the isinf or innan to isfinite.

@esc
Copy link
Member

esc commented Sep 30, 2020

@rishabhvarshney14 excellent, thank you very much. And thank you for being persistent and willing to measure what the best implementation is. I think this PR is now in great shape. The only item left on my list that needs to be attended to is the order keyword argument. I believe you should probably add some tests for changing the order (and you may have to hand the order argument into the call to asarray for the tests to pass).

Beyond that, since I was involved in suggesting some code for this, I would like to request that either @sklam, @stuartarchibald or @gmarkall take a look at this too and sign it off, thanks!

Copy link
Member

@gmarkall gmarkall left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since only the first two arguments are supported for np.asarray (https://numba.readthedocs.io/en/stable/reference/numpysupported.html#other-functions) I think it only makes sense to support the first two arguments of asarray_chkfinite - so instead of adding tests for changing the order, I think it'd instead be better to remove the order='C' kwarg from the implementation and tests.

Other than that, this is looking good!

@@ -338,6 +338,7 @@ The following top-level functions are supported:
* :func:`numpy.array` (only the 2 first arguments)
* :func:`numpy.array_equal`
* :func:`numpy.asarray` (only the 2 first arguments)
* :func:`numpy.asarray_chkfinite` (only the first first arguments)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* :func:`numpy.asarray_chkfinite` (only the first first arguments)
* :func:`numpy.asarray_chkfinite` (only the 2 first arguments)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, either that or only the first two arguments, but I guess that would be inconsistent with the rest of the doc?

@esc esc added 4 - Waiting on reviewer Waiting for reviewer to respond to author and removed 4 - Waiting on author Waiting for author to respond to review labels Oct 1, 2020
@esc
Copy link
Member

esc commented Oct 1, 2020

@rishabhvarshney14 excellent, thanks very much for fixing this up!

@esc
Copy link
Member

esc commented Oct 1, 2020

@gmarkall happy with this? If yes, please do change the label to "Ready To Merge" (RTM), thx!

Copy link
Member

@gmarkall gmarkall left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me!

@gmarkall gmarkall added 5 - Ready to merge Review and testing done, is ready to merge and removed 4 - Waiting on reviewer Waiting for reviewer to respond to author labels Oct 1, 2020
@gmarkall
Copy link
Member

gmarkall commented Oct 1, 2020

@gmarkall happy with this? If yes, please do change the label to "Ready To Merge" (RTM), thx!

Yes - I saw this earlier but forgot to approve (must have got distracted). Thanks for the ping!

@gmarkall gmarkall added this to the Numba 0.52 RC milestone Oct 1, 2020
@esc
Copy link
Member

esc commented Oct 1, 2020

@gmarkall excellent, thank you!

@rishabhvarshney14 thanks for working on this so persistently, this PR is now in the RTM stage and will merged during the next merge window (probably later on today or tomorrow). 👌

@rishabhvarshney14
Copy link
Contributor Author

Thank you for helping me and making my first PR really amazing.

Comment on lines 4172 to 4173
else:
dt = dtype.dtype
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@esc Should this check that the dtype variable as actually a Numba dtype type instance? e.g. if a user passes in a string like 'float32' they are going to get an attribute error.

Comment on lines 3778 to 3786
#test for both inf and NaNs
with self.assertRaises(ValueError) as e:
cfunc(np.array([np.inf, np.nan]))
self.assertIn("array must not contain infs or NaNs", str(e.exception))

#test for NaNs
with self.assertRaises(ValueError) as e:
cfunc(np.array([np.nan, np.nan, np.nan]))
self.assertIn("array must not contain infs or NaNs", str(e.exception))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@esc I'm not sure what these tests gain on the previous two given the alg present?

@esc
Copy link
Member

esc commented Oct 2, 2020

@stuartarchibald thanks for reviewing this too, more eyes is always less bugs! I'll move it back to 'waiting on author' as we address the items you mentioned.

@rishabhvarshney14 can you take a look at the comments? Probably, simply adding a check the dtype variable, a test for it and removing the test with the three nan values as it doesn't add anything. Thanks!

@rishabhvarshney14
Copy link
Contributor Author

@esc @stuartarchibald I have changed dt = dtype.dtype to dt = as_dtype(dtype) which raises unicode_type cannot be represented as a Numpy dtype when 'float32' is passed. Is this what I have to do?

@esc
Copy link
Member

esc commented Oct 5, 2020

@rishabhvarshney14 I believe what @stuartarchibald is referring to here are Numpy calls like the following:

In [23]: a
Out[23]: array([0, 1, 2])

In [24]: a.dtype
Out[24]: dtype('int64')

In [25]: np.asarray_chkfinite(a, dtype='float32')
Out[25]: array([0., 1., 2.], dtype=float32)

Support for this type of construct (supplying a dtype as a string) was recently added in: #6262 -- perhaps you could adopt something similar in this case? I.e. to make asarray_chkfinite work also when the dtype specified is a string?

@stuartarchibald
Copy link
Contributor

@rishabhvarshney14 I believe what @stuartarchibald is referring to here are Numpy calls like the following:

In [23]: a
Out[23]: array([0, 1, 2])

In [24]: a.dtype
Out[24]: dtype('int64')

In [25]: np.asarray_chkfinite(a, dtype='float32')
Out[25]: array([0., 1., 2.], dtype=float32)

Support for this type of construct (supplying a dtype as a string) was recently added in: #6262 -- perhaps you could adopt something similar in this case? I.e. to make asarray_chkfinite work also when the dtype specified is a string?

In the interests of this PR making 0.52. I recommend just rejecting anything which isn't a dtype and adding a 3 line test for it, dtype-from-string support adds a load more complexity as it requires enforcement of string literals in the typing call. It is however up to you and @esc :)

@stuartarchibald stuartarchibald added 4 - Waiting on author Waiting for author to respond to review and removed 5 - Ready to merge Review and testing done, is ready to merge labels Oct 5, 2020
@esc
Copy link
Member

esc commented Oct 5, 2020

@rishabhvarshney14 I would concur with @stuartarchibald to simply reject anything that isn't an instance of a dtype.

@rishabhvarshney14
Copy link
Contributor Author

I am having some issues with this.
I tried using isinstance(dtype.dtype, (types.Float, ...)) but if any string or any other arg other than np.dtype is passed it gives attribute error as it does not have dtype attribute.

I tried two methods which may not be a good way for this.

  1. Using try-except with as_dtype
try:
    dt = as_dtype(dtype)
except NotImplementedError:
    raise TypingError('dtype must be a valid Numpy dtype')

This will give the required error instead of the attribute error

  1. Using hasattr()
if not hasattr(dtype, 'dtype'):
    raise Error

@esc
Copy link
Member

esc commented Oct 7, 2020

@rishabhvarshney14 I think the first option might be fine, actually. @stuartarchibald what do you say? Also, you may have some conflicts with master (as several PRs have been merged now) that must now be resolved. Your best bet is to merge the current numba/numba master and resolved the conflicts in the resulting merge commit.

@esc
Copy link
Member

esc commented Oct 8, 2020

@rishabhvarshney14 thanks for updates, it looks good now! One last nitpick, the git history now has a strange commit 8398763 with the message Merge pull request #1 from numba/master. It seems very strange and shouldn't be there. Could you perhaps fixup your git history on this PR? Normally, we don't allow forced pushes, but in this case, I think it will be needed. Do let me know if you need further advice on how to use git rebase and friends.

@rishabhvarshney14
Copy link
Contributor Author

@esc I am not sure how I can use rebase also I follow this article to sync my forked repo with this one which is the reason for the commit.
Any advice on this would be great thank you.

@esc
Copy link
Member

esc commented Oct 9, 2020

@rishabhvarshney14 so, after discussing with some of the other Numba devs, we agreed that it will be easiest, if I fixup this PR and re-submit a git-cleaned version. I'll try to preserve as much history as possible.

@esc esc mentioned this pull request Oct 9, 2020
@esc
Copy link
Member

esc commented Oct 9, 2020

New PR available here: #6341

@esc esc added abandoned PR is abandoned (no reason required) and removed 4 - Waiting on author Waiting for author to respond to review labels Oct 9, 2020
@esc esc closed this Oct 9, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
abandoned PR is abandoned (no reason required)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants