Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for np.cov #3345

Merged
merged 69 commits into from Oct 31, 2018
Merged

Support for np.cov #3345

merged 69 commits into from Oct 31, 2018

Conversation

rjenc29
Copy link
Contributor

@rjenc29 rjenc29 commented Sep 27, 2018

Initial commit for CI.

@rjenc29
Copy link
Contributor Author

rjenc29 commented Oct 17, 2018

I've added a bunch of tests for different types / shapes of array-like inputs and some pre-processing logic to get the tests to pass (I expect the current CI to be clean).

I also made the tests @needs_blas and ripped out the naive matrix multiply stuff.

So we should be getting nearer the mark - let me know what you think.

If it's alright with you, I will remove the WIP from the PR title later today.

Cheers!

@stuartarchibald
Copy link
Contributor

Sounds good, thanks! Yes, please feel free to remove with WIP when you are happy for review to take place, it sounds like most of the issues noted above are now covered?

@rjenc29 rjenc29 changed the title [WIP] Support for np.cov Support for np.cov Oct 17, 2018
@rjenc29
Copy link
Contributor Author

rjenc29 commented Oct 17, 2018

Yep, I think the issues noted are all covered one way or another and there are tests covering a bunch of other use cases.

Quite possible there are gaps / potential improvements, but it's probably a valid basis to review - I won't commit any more changes for the time being.

@stuartarchibald
Copy link
Contributor

Great, thanks. Please merge in master to resolve conflicts and I'll review.

Copy link
Contributor

@stuartarchibald stuartarchibald left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the patches. I've done an initial review, it's a good impl. I've added some minor comments to clean up the code in a couple of places and there's a couple of potential bugs relating to typing to address, otherwise good to go. Thanks again for your efforts.

_check(params)

@unittest.skipUnless(np_version >= (1, 10), "cov needs Numpy 1.10+")
def test_cov_egde_cases(self):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/egde/edge/

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@@ -263,6 +263,7 @@ The following top-level functions are supported:
* :func:`numpy.convolve` (only the 2 first arguments)
* :func:`numpy.copy` (only the first argument)
* :func:`numpy.correlate` (only the 2 first arguments)
* :func:`numpy.cov` (only the 5 first arguments, requires NumPy >= 1.10)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should note requirement of SciPy 0.16+.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

m_arr = np.atleast_2d(_asarray(m))

# transpose if asked to and not a (1, n) vector
if not rowvar and m_arr.shape[0] != 1:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does the check for (1, n) need to happen, 1D array.T == 1D array?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, now that the "2D but single row" data shape is not allowed, this check is redundant - have removed it.

"simply pass the row as a 1D array, i.e. m[0].")
raise RuntimeError(msg)

_handle_m_dim_nop = register_jitable(lambda x:x)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

space after the :

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

if y in (None, types.none):
return np_cov_impl_single_variable

if isinstance(m, (types.Integer, types.Float, types.Complex, types.Boolean)):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think checking inheritance from types.Number in the above condition should cover the numeric types (but not Boolean).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

y = np.arange(4)
with self.assertRaises(ValueError) as raises:
cfunc(m, y=y)
self.assertIn('m and y must have the same number of variables', str(raises.exception))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps this should refer to the dimension size mismatch? Is variables a bit generic/ambiguous?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed the wording a bit - I meant 'variables' in the statistics sense, but I agree the wording's poor. Hopefully it's now less poor :)

return np.array(variance)

# identify up front if output is 0D
if isinstance(m, types.Array) and m.ndim == 1 or isinstance(m, types.Tuple):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a bit puzzled by this, a Tuple type could have more than one dimension?

In [24]: print(type(numba.typeof(((0.1, 0.2), (0.11, 0.19), (0.09j, 0.21j)))))
<class 'numba.types.containers.Tuple'>

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, Numba has a load of Tuple related types.

In [30]: [x for x in dir(numba.types) if 'tuple' in x.lower()]
Out[30]: 
['BaseAnonymousTuple',
 'BaseNamedTuple',
 'BaseTuple',
 'NamedTuple',
 'NamedTupleClass',
 'NamedUniTuple',
 'Tuple',
 'UniTuple',
 'UniTupleIter']

(ignore the iter and class entries). I think this probably needs to check if m subclasses a BaseTuple and then if the types of the items in the tuple inherit from Number/Boolean, which would indicate a "1D" scenario?

        if isinstance(m, types.BaseTuple):
            if all(isinstance(x, (types.Number, types.Boolean)) for x in m.types):
                if y in (None, types.none):
                    return np_cov_impl_single_variable

might help?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, this was a bug on my part. The tests all miraculously passed due to the way the if gates were set up and the various test cases resolving to UniTuple vs Tuple type etc.

Should be sorted now, or at least nearer to being sorted. I added your example explicitly in tests.

def m_variations():
# array inputs
yield np.array([[0, 2], [1, 1], [2, 0]]).T
yield self.rnd.randn(100).reshape(5, 20)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps add:

            yield np.asfortranarray(np.array([[0, 2], [1, 1], [2, 0]]).T)
            yield self.rnd.randn(100).reshape(5, 20)[:,::2]

for a fortran order and a slice. Note the fortran order specified a few lines below is also C contig by virtue of it being a single "column" of data.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done and good spot with the Fortan order blunder.

yield np.linspace(-3, 3, 33).reshape(33, 1, order='F')

# non-array inputs
yield ((0.1, 0.2), (0.11, 0.19), (0.09, 0.21))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps add a non-homogeneous tuple in here? Like:

((0.1, 0.2), (0.11, 0.19), (0.09j, 0.21j))

? It should trip the issue described above WRT types.Tuple.

m_dt = determine_dtype(m)
y_dt = determine_dtype(y)
dtype = np.result_type(m_dt, y_dt, np.float64)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Think ddof needs type checking to be an integer type, non-integral DOFs don't make sense do they? Seems like NumPy will accept floats so long as they are integral value, perhaps we can just accept types.Integer for now and catch this at compile time? A unit test checking this would then be good, thanks.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added something (possibly a bit crude) which should allow int, bool and float if integral value - with some explicit tests. Let me know what you think.

I can't think of a case where anything other than 0 and 1 make sense, in the absence of weights (which I haven't implemented anyway).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great, thanks.

@rjenc29
Copy link
Contributor Author

rjenc29 commented Oct 28, 2018

Thanks for the detailed feedback. The last bunch of commits should address all points raised. I will keep an eye on CI - apologies in advance for any jet-lag related gremlins on my part.

Copy link
Contributor

@stuartarchibald stuartarchibald left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the fixes. There's a few minor things I've spotted in the new changes, but otherwise good. Once these small things are resolved this can be merged. Thanks again.

# 'variables' as the constraint on rows or columns depends on
# whether rowvar is True or False...
msg = ("m and y have incompatible dimensions and thus "
"cannot be concatenated")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can probably drop the part about concatenation, this is an implementation detail?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@register_jitable
def _handle_ddof(ddof):
if not np.isfinite(ddof):
raise TypeError('Cannot convert non-finite ddof to integer')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be a ValueError? The function is accepting a floating point type, it's just that it must be finite?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I made this a ValueError initially and can't quite recall why I changed it - so it's back to a ValueError

if not np.isfinite(ddof):
raise TypeError('Cannot convert non-finite ddof to integer')
if ddof - int(ddof) != 0:
raise TypeError('ddof must be integer')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here, ValueError? Also, perhaps ddof must be an integral value as floating point is being accepted?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

elif isinstance(ddof, types.Float):
_DDOF_HANDLER = _handle_ddof
else:
raise TypeError('ddof must be integer')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Think this can be a TypingError as it'll be caught at compile time? Does ddof must be a numerical scalar type make more sense too as it's a typing problem?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

msg = 'Cannot convert non-finite ddof to integer'
_check(m, ddof, msg)

for ddof in 'bacon', np.arange(4), 1.1, -3.142:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Perhaps s/bacon/junk/ as it's a more common nonsense input value?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, removed it - although I am always keen to include breakfast nouns in tests :)

m_dt = determine_dtype(m)
y_dt = determine_dtype(y)
dtype = np.result_type(m_dt, y_dt, np.float64)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great, thanks.

return np_cov_impl_single_variable

if isinstance(m, types.Sequence):
if not isinstance(m.key[0], types.Sequence) and y in (None, types.none):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This may cause issues at some point, but we can defer concern until sequence type detection is altered.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, there may be a tax to pay - but there are quite a few tests so it should at least be apparent when it's affected and needs a rethink

@rjenc29
Copy link
Contributor Author

rjenc29 commented Oct 29, 2018

I think I've actioned the above - all seems sensible to me. Let me know any residuals and I will close them out ASAP. Cheers!

Copy link
Contributor

@stuartarchibald stuartarchibald left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for all the work on this and your persistence in the correct handling of all the details. This can be merged once CI passes. (CI passing requires the LLVM 7 upgrade, I'll kick off a new build once that's done but do not foresee any issues). Thanks again for the contribution!

@seibert
Copy link
Contributor

seibert commented Oct 31, 2018

merging since we've rolled back to LLVM 6

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants