Size and density properties #69
Conversation
Seems fine, however it would be really nice to have docstrings for these, and if possible, tests. |
Added them |
Ignore the Travis CI failure. Numpy 1.14 changed the way it converts arrays to strings, and that's causing a doctests failure. I've addressed it over in #43. |
Yup, I already noticed that |
Well, this looks good. Let me know if you're finished working on it, and I'll review and merge (don't want to be too fast this time). |
|
||
def test_density(): | ||
s = sparse.random((20, 30, 40), density=0.1) | ||
assert 0.09 < s.density < 0.11 |
hameerabbasi
Jan 10, 2018
Collaborator
It might be wise to use np.isclose
here.
It might be wise to use np.isclose
here.
>>> x[0, :] = 1 | ||
>>> s = COO.from_numpy(x) | ||
>>> s.density | ||
0.10000000000000001 |
hameerabbasi
Jan 10, 2018
Collaborator
This might fluctuate in different system floating-point implementations. Perhaps np.isclose(density, 0.1)
?
This might fluctuate in different system floating-point implementations. Perhaps np.isclose(density, 0.1)
?
Or perhaps change 10 to 8 above so that the result is evenly divisible by
two
…On Wed, Jan 10, 2018 at 12:08 PM, Hameer Abbasi ***@***.***> wrote:
***@***.**** commented on this pull request.
------------------------------
In sparse/core.py
<https://github.com/mrocklin/sparse/pull/69#discussion_r160755728>:
> + -------
+ float
+ The ratio of nonzero to all elements.
+
+ See Also
+ --------
+ COO.size : Number of elements.
+ COO.nnz : Number of nonzero elements.
+
+ Examples
+ --------
+ >>> x = np.zeros((10, 10))
+ >>> x[0, :] = 1
+ >>> s = COO.from_numpy(x)
+ >>> s.density
+ 0.10000000000000001
This might fluctuate in different system floating-point implementations.
Perhaps np.isclose(density, 0.1)?
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<https://github.com/mrocklin/sparse/pull/69#pullrequestreview-87928365>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AASszEWhBJgW3Nr0LVCaGKs4YQy3xdscks5tJPyjgaJpZM4RZBRy>
.
|
@mrocklin that's clever! Done. |
This looks ready. If you merge, I can do a final review and merge. |
Did you see my code comment? |
Hmmm, that's strange, it isn't showing any of your comments in that link, or on any commits. |
@@ -1277,7 +1325,7 @@ def astype(self, dtype, out=None): | |||
|
|||
def maybe_densify(self, allowed_nnz=1e3, allowed_fraction=0.25): | |||
""" Convert to a dense numpy array if not too costly. Err othrewise """ | |||
if reduce(operator.mul, self.shape) <= allowed_nnz or self.nnz >= np.prod(self.shape) * allowed_fraction: | |||
if self.size <= allowed_nnz or self.density >= allowed_fraction: |
nils-werner
Jan 12, 2018
Author
Contributor
It is easy to see that
self.nnz >= np.prod(self.shape) * allowed_fraction
can be transformed to
self.density >= allowed_fraction
But does self.density >= allowed_fraction
actually make sense? Shouldn't it be the other way round?
It is easy to see that
self.nnz >= np.prod(self.shape) * allowed_fraction
can be transformed to
self.density >= allowed_fraction
But does self.density >= allowed_fraction
actually make sense? Shouldn't it be the other way round?
hameerabbasi
Jan 12, 2018
Collaborator
It actually does make sense. If the density is low, then the cost of a dense array is higher compared to that of a sparse array, and we should avoid densifying.
If, however, density is high, the overhead of a sparse matrix far outweighs the benefits and we should densify.
It actually does make sense. If the density is low, then the cost of a dense array is higher compared to that of a sparse array, and we should avoid densifying.
If, however, density is high, the overhead of a sparse matrix far outweighs the benefits and we should densify.
hameerabbasi
Jan 12, 2018
Collaborator
To put it another way, a large array with a very low density should NOT be densified but it's okay to densify a high-density or a small array.
To put it another way, a large array with a very low density should NOT be densified but it's okay to densify a high-density or a small array.
nils-werner
Jan 12, 2018
Author
Contributor
True. Maybe I got confused that one allowed
means max
and the other one means min
... What about renaming them to
maybe_densify(self, max_nnz=1e3, min_density=0.25)
True. Maybe I got confused that one allowed
means max
and the other one means min
... What about renaming them to
maybe_densify(self, max_nnz=1e3, min_density=0.25)
hameerabbasi
Jan 12, 2018
Collaborator
I would make sure that no tests break either here or in Dask before doing that. We can change unit tests here... But it's best to send a PR to Dask if any tests break there.
I would make sure that no tests break either here or in Dask before doing that. We can change unit tests here... But it's best to send a PR to Dask if any tests break there.
hameerabbasi
Jan 15, 2018
Collaborator
Let's change it anyway and I'll send a PR to Dask if anything fails for them.
Let's change it anyway and I'll send a PR to Dask if anything fails for them.
nils-werner
Jan 15, 2018
Author
Contributor
Actually, shouldn't max_nnz
be actually named max_size
? We are not comparing it to .nnz
anyways...
Actually, shouldn't max_nnz
be actually named max_size
? We are not comparing it to .nnz
anyways...
hameerabbasi
Jan 15, 2018
Collaborator
Yes, max_size
is a good idea as well.
Yes, max_size
is a good idea as well.
Aha! after commenting on lines you have to also open the code review thingy. |
Is |
I would say A |
Best to switch that to a |
an error. | ||
>>> x = np.zeros((5, 5), dtype=np.uint8) | ||
>>> x[2, 2] = 1 | ||
>>> s = COO.from_numpy(x) | ||
>>> s.maybe_densify(allowed_nnz=5, allowed_fraction=0.25) | ||
>>> s.maybe_densify(max_size=5, min_fraction=0.25) | ||
Traceback (most recent call last): | ||
... | ||
NotImplementedError: Operation would require converting large sparse array to dense |
@@ -2381,17 +2381,17 @@ def astype(self, dtype, out=None): | |||
assert out is None | |||
return self.elemwise(np.ndarray.astype, dtype) | |||
|
|||
def maybe_densify(self, allowed_nnz=1000, allowed_fraction=0.25): | |||
def maybe_densify(self, max_size=1000, min_fraction=0.25): |
hameerabbasi
Jan 15, 2018
Collaborator
Best keep this as min_density
, it's clearer that it represents the density.
Best keep this as min_density
, it's clearer that it represents the density.
an error. | ||
>>> x = np.zeros((5, 5), dtype=np.uint8) | ||
>>> x[2, 2] = 1 | ||
>>> s = COO.from_numpy(x) | ||
>>> s.maybe_densify(allowed_nnz=5, allowed_fraction=0.25) | ||
>>> s.maybe_densify(max_size=5, min_density=0.25) | ||
Traceback (most recent call last): | ||
... | ||
NotImplementedError: Operation would require converting large sparse array to dense |
hameerabbasi
Jan 15, 2018
Collaborator
Line 2425...
NotImplementedError: Operation would require converting large sparse array to dense
Need to change this to ValueError
.
Line 2425...
NotImplementedError: Operation would require converting large sparse array to dense
Need to change this to ValueError
.
hameerabbasi
Jan 15, 2018
Collaborator
It's causing doctests to fail. :)
It's causing doctests to fail. :)
nils-werner
Jan 15, 2018
Author
Contributor
Currently there are a million tests failing on master
, it is hard to spot my own mistakes...
Currently there are a million tests failing on master
, it is hard to spot my own mistakes...
nils-werner
Jan 15, 2018
•
Author
Contributor
I checked, they only fail until Numpy 1.13.1. Latest Numpy works fine...
I checked, they only fail until Numpy 1.13.1. Latest Numpy works fine...
hameerabbasi
Jan 15, 2018
Collaborator
That is really weird, I tested with Numpy 1.13.3 on my local system and it still works fine.
That is really weird, I tested with Numpy 1.13.3 on my local system and it still works fine.
To me |
So, we all agree that it should be |
Are you still making changes here? If not, this gets a +1 from me. |
No, I'm done. |
A rebase would be nice. |
* Size and density properties * Docstrings and tests * Test .density using np.isclose * Fixed typos that made tests fail * Replaced other occurences of np.prod(self.shape) with self.size * Avoid float precision errors in .denstity docstring * maybe_densify raises ValueError instead of NotImplementedError * Updated docstring to match ValueError exception * changed allowed_nnz and allowed_fraction to max_size and min_density * changed min_fraction to min_density * Another docstring that didn't show ValueError * Documentation pages
NumPy has a
.size
property that gives you the number of elements in an array.Additionally for sparse arrays a
.density
property may be handy, which is the ratio of nonzero elements to total elements in the array (a value between0
and1
).This PR implements both properties for
COO
arrays