New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support for np.cov #3345
Support for np.cov #3345
Conversation
I've added a bunch of tests for different types / shapes of array-like inputs and some pre-processing logic to get the tests to pass (I expect the current CI to be clean). I also made the tests So we should be getting nearer the mark - let me know what you think. If it's alright with you, I will remove the WIP from the PR title later today. Cheers! |
Sounds good, thanks! Yes, please feel free to remove with WIP when you are happy for review to take place, it sounds like most of the issues noted above are now covered? |
Yep, I think the issues noted are all covered one way or another and there are tests covering a bunch of other use cases. Quite possible there are gaps / potential improvements, but it's probably a valid basis to review - I won't commit any more changes for the time being. |
Great, thanks. Please merge in master to resolve conflicts and I'll review. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the patches. I've done an initial review, it's a good impl. I've added some minor comments to clean up the code in a couple of places and there's a couple of potential bugs relating to typing to address, otherwise good to go. Thanks again for your efforts.
numba/tests/test_np_functions.py
Outdated
_check(params) | ||
|
||
@unittest.skipUnless(np_version >= (1, 10), "cov needs Numpy 1.10+") | ||
def test_cov_egde_cases(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/egde/edge/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
@@ -263,6 +263,7 @@ The following top-level functions are supported: | |||
* :func:`numpy.convolve` (only the 2 first arguments) | |||
* :func:`numpy.copy` (only the first argument) | |||
* :func:`numpy.correlate` (only the 2 first arguments) | |||
* :func:`numpy.cov` (only the 5 first arguments, requires NumPy >= 1.10) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should note requirement of SciPy 0.16+.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
numba/targets/arraymath.py
Outdated
m_arr = np.atleast_2d(_asarray(m)) | ||
|
||
# transpose if asked to and not a (1, n) vector | ||
if not rowvar and m_arr.shape[0] != 1: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does the check for (1, n) need to happen, 1D array.T == 1D array
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, now that the "2D but single row" data shape is not allowed, this check is redundant - have removed it.
numba/targets/arraymath.py
Outdated
"simply pass the row as a 1D array, i.e. m[0].") | ||
raise RuntimeError(msg) | ||
|
||
_handle_m_dim_nop = register_jitable(lambda x:x) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
space after the :
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
numba/targets/arraymath.py
Outdated
if y in (None, types.none): | ||
return np_cov_impl_single_variable | ||
|
||
if isinstance(m, (types.Integer, types.Float, types.Complex, types.Boolean)): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think checking inheritance from types.Number
in the above condition should cover the numeric types (but not Boolean
).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
numba/tests/test_np_functions.py
Outdated
y = np.arange(4) | ||
with self.assertRaises(ValueError) as raises: | ||
cfunc(m, y=y) | ||
self.assertIn('m and y must have the same number of variables', str(raises.exception)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps this should refer to the dimension size mismatch? Is variables
a bit generic/ambiguous?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I changed the wording a bit - I meant 'variables' in the statistics sense, but I agree the wording's poor. Hopefully it's now less poor :)
numba/targets/arraymath.py
Outdated
return np.array(variance) | ||
|
||
# identify up front if output is 0D | ||
if isinstance(m, types.Array) and m.ndim == 1 or isinstance(m, types.Tuple): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm a bit puzzled by this, a Tuple
type could have more than one dimension?
In [24]: print(type(numba.typeof(((0.1, 0.2), (0.11, 0.19), (0.09j, 0.21j)))))
<class 'numba.types.containers.Tuple'>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, Numba has a load of Tuple
related types.
In [30]: [x for x in dir(numba.types) if 'tuple' in x.lower()]
Out[30]:
['BaseAnonymousTuple',
'BaseNamedTuple',
'BaseTuple',
'NamedTuple',
'NamedTupleClass',
'NamedUniTuple',
'Tuple',
'UniTuple',
'UniTupleIter']
(ignore the iter and class entries). I think this probably needs to check if m
subclasses a BaseTuple
and then if the types of the items in the tuple inherit from Number/Boolean
, which would indicate a "1D" scenario?
if isinstance(m, types.BaseTuple):
if all(isinstance(x, (types.Number, types.Boolean)) for x in m.types):
if y in (None, types.none):
return np_cov_impl_single_variable
might help?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, this was a bug on my part. The tests all miraculously passed due to the way the if gates were set up and the various test cases resolving to UniTuple vs Tuple type etc.
Should be sorted now, or at least nearer to being sorted. I added your example explicitly in tests.
def m_variations(): | ||
# array inputs | ||
yield np.array([[0, 2], [1, 1], [2, 0]]).T | ||
yield self.rnd.randn(100).reshape(5, 20) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps add:
yield np.asfortranarray(np.array([[0, 2], [1, 1], [2, 0]]).T)
yield self.rnd.randn(100).reshape(5, 20)[:,::2]
for a fortran order and a slice. Note the fortran order specified a few lines below is also C contig by virtue of it being a single "column" of data.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done and good spot with the Fortan order blunder.
numba/tests/test_np_functions.py
Outdated
yield np.linspace(-3, 3, 33).reshape(33, 1, order='F') | ||
|
||
# non-array inputs | ||
yield ((0.1, 0.2), (0.11, 0.19), (0.09, 0.21)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps add a non-homogeneous tuple in here? Like:
((0.1, 0.2), (0.11, 0.19), (0.09j, 0.21j))
? It should trip the issue described above WRT types.Tuple
.
m_dt = determine_dtype(m) | ||
y_dt = determine_dtype(y) | ||
dtype = np.result_type(m_dt, y_dt, np.float64) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Think ddof
needs type checking to be an integer type, non-integral DOFs don't make sense do they? Seems like NumPy will accept floats so long as they are integral value, perhaps we can just accept types.Integer
for now and catch this at compile time? A unit test checking this would then be good, thanks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added something (possibly a bit crude) which should allow int, bool and float if integral value - with some explicit tests. Let me know what you think.
I can't think of a case where anything other than 0 and 1 make sense, in the absence of weights (which I haven't implemented anyway).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great, thanks.
Thanks for the detailed feedback. The last bunch of commits should address all points raised. I will keep an eye on CI - apologies in advance for any jet-lag related gremlins on my part. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the fixes. There's a few minor things I've spotted in the new changes, but otherwise good. Once these small things are resolved this can be merged. Thanks again.
numba/targets/arraymath.py
Outdated
# 'variables' as the constraint on rows or columns depends on | ||
# whether rowvar is True or False... | ||
msg = ("m and y have incompatible dimensions and thus " | ||
"cannot be concatenated") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can probably drop the part about concatenation, this is an implementation detail?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
numba/targets/arraymath.py
Outdated
@register_jitable | ||
def _handle_ddof(ddof): | ||
if not np.isfinite(ddof): | ||
raise TypeError('Cannot convert non-finite ddof to integer') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be a ValueError
? The function is accepting a floating point type, it's just that it must be finite?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I made this a ValueError initially and can't quite recall why I changed it - so it's back to a ValueError
numba/targets/arraymath.py
Outdated
if not np.isfinite(ddof): | ||
raise TypeError('Cannot convert non-finite ddof to integer') | ||
if ddof - int(ddof) != 0: | ||
raise TypeError('ddof must be integer') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here, ValueError
? Also, perhaps ddof must be an integral value
as floating point is being accepted?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
numba/targets/arraymath.py
Outdated
elif isinstance(ddof, types.Float): | ||
_DDOF_HANDLER = _handle_ddof | ||
else: | ||
raise TypeError('ddof must be integer') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Think this can be a TypingError
as it'll be caught at compile time? Does ddof must be a numerical scalar type
make more sense too as it's a typing problem?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
numba/tests/test_np_functions.py
Outdated
msg = 'Cannot convert non-finite ddof to integer' | ||
_check(m, ddof, msg) | ||
|
||
for ddof in 'bacon', np.arange(4), 1.1, -3.142: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: Perhaps s/bacon/junk/ as it's a more common nonsense input value?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, removed it - although I am always keen to include breakfast nouns in tests :)
m_dt = determine_dtype(m) | ||
y_dt = determine_dtype(y) | ||
dtype = np.result_type(m_dt, y_dt, np.float64) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great, thanks.
return np_cov_impl_single_variable | ||
|
||
if isinstance(m, types.Sequence): | ||
if not isinstance(m.key[0], types.Sequence) and y in (None, types.none): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This may cause issues at some point, but we can defer concern until sequence type detection is altered.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, there may be a tax to pay - but there are quite a few tests so it should at least be apparent when it's affected and needs a rethink
I think I've actioned the above - all seems sensible to me. Let me know any residuals and I will close them out ASAP. Cheers! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for all the work on this and your persistence in the correct handling of all the details. This can be merged once CI passes. (CI passing requires the LLVM 7 upgrade, I'll kick off a new build once that's done but do not foresee any issues). Thanks again for the contribution!
merging since we've rolled back to LLVM 6 |
Initial commit for CI.