-
-
Notifications
You must be signed in to change notification settings - Fork 130
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Size and density properties #69
Conversation
Seems fine, however it would be really nice to have docstrings for these, and if possible, tests. |
Added them |
Ignore the Travis CI failure. Numpy 1.14 changed the way it converts arrays to strings, and that's causing a doctests failure. I've addressed it over in #43. |
Yup, I already noticed that |
Well, this looks good. Let me know if you're finished working on it, and I'll review and merge (don't want to be too fast this time). |
sparse/tests/test_core.py
Outdated
|
||
def test_density(): | ||
s = sparse.random((20, 30, 40), density=0.1) | ||
assert 0.09 < s.density < 0.11 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might be wise to use np.isclose
here.
sparse/core.py
Outdated
>>> x[0, :] = 1 | ||
>>> s = COO.from_numpy(x) | ||
>>> s.density | ||
0.10000000000000001 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This might fluctuate in different system floating-point implementations. Perhaps np.isclose(density, 0.1)
?
Or perhaps change 10 to 8 above so that the result is evenly divisible by
two
…On Wed, Jan 10, 2018 at 12:08 PM, Hameer Abbasi ***@***.***> wrote:
***@***.**** commented on this pull request.
------------------------------
In sparse/core.py
<https://github.com/mrocklin/sparse/pull/69#discussion_r160755728>:
> + -------
+ float
+ The ratio of nonzero to all elements.
+
+ See Also
+ --------
+ COO.size : Number of elements.
+ COO.nnz : Number of nonzero elements.
+
+ Examples
+ --------
+ >>> x = np.zeros((10, 10))
+ >>> x[0, :] = 1
+ >>> s = COO.from_numpy(x)
+ >>> s.density
+ 0.10000000000000001
This might fluctuate in different system floating-point implementations.
Perhaps np.isclose(density, 0.1)?
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<https://github.com/mrocklin/sparse/pull/69#pullrequestreview-87928365>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AASszEWhBJgW3Nr0LVCaGKs4YQy3xdscks5tJPyjgaJpZM4RZBRy>
.
|
@mrocklin that's clever! Done. |
This looks ready. If you merge, I can do a final review and merge. |
Did you see my code comment? |
Hmmm, that's strange, it isn't showing any of your comments in that link, or on any commits. |
sparse/core.py
Outdated
@@ -1277,7 +1325,7 @@ def astype(self, dtype, out=None): | |||
|
|||
def maybe_densify(self, allowed_nnz=1e3, allowed_fraction=0.25): | |||
""" Convert to a dense numpy array if not too costly. Err othrewise """ | |||
if reduce(operator.mul, self.shape) <= allowed_nnz or self.nnz >= np.prod(self.shape) * allowed_fraction: | |||
if self.size <= allowed_nnz or self.density >= allowed_fraction: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is easy to see that
self.nnz >= np.prod(self.shape) * allowed_fraction
can be transformed to
self.density >= allowed_fraction
But does self.density >= allowed_fraction
actually make sense? Shouldn't it be the other way round?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It actually does make sense. If the density is low, then the cost of a dense array is higher compared to that of a sparse array, and we should avoid densifying.
If, however, density is high, the overhead of a sparse matrix far outweighs the benefits and we should densify.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To put it another way, a large array with a very low density should NOT be densified but it's okay to densify a high-density or a small array.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
True. Maybe I got confused that one allowed
means max
and the other one means min
... What about renaming them to
maybe_densify(self, max_nnz=1e3, min_density=0.25)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would make sure that no tests break either here or in Dask before doing that. We can change unit tests here... But it's best to send a PR to Dask if any tests break there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's change it anyway and I'll send a PR to Dask if anything fails for them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, shouldn't max_nnz
be actually named max_size
? We are not comparing it to .nnz
anyways...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, max_size
is a good idea as well.
Aha! after commenting on lines you have to also open the code review thingy. |
Is |
I would say A |
Best to switch that to a |
sparse/core.py
Outdated
an error. | ||
|
||
>>> x = np.zeros((5, 5), dtype=np.uint8) | ||
>>> x[2, 2] = 1 | ||
>>> s = COO.from_numpy(x) | ||
>>> s.maybe_densify(allowed_nnz=5, allowed_fraction=0.25) | ||
>>> s.maybe_densify(max_size=5, min_fraction=0.25) | ||
Traceback (most recent call last): | ||
... | ||
NotImplementedError: Operation would require converting large sparse array to dense |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You need to update this line in the docstring. See here.
sparse/core.py
Outdated
@@ -2381,17 +2381,17 @@ def astype(self, dtype, out=None): | |||
assert out is None | |||
return self.elemwise(np.ndarray.astype, dtype) | |||
|
|||
def maybe_densify(self, allowed_nnz=1000, allowed_fraction=0.25): | |||
def maybe_densify(self, max_size=1000, min_fraction=0.25): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Best keep this as min_density
, it's clearer that it represents the density.
sparse/core.py
Outdated
an error. | ||
|
||
>>> x = np.zeros((5, 5), dtype=np.uint8) | ||
>>> x[2, 2] = 1 | ||
>>> s = COO.from_numpy(x) | ||
>>> s.maybe_densify(allowed_nnz=5, allowed_fraction=0.25) | ||
>>> s.maybe_densify(max_size=5, min_density=0.25) | ||
Traceback (most recent call last): | ||
... | ||
NotImplementedError: Operation would require converting large sparse array to dense |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Line 2425...
NotImplementedError: Operation would require converting large sparse array to dense
Need to change this to ValueError
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's causing doctests to fail. :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently there are a million tests failing on master
, it is hard to spot my own mistakes...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's strange, Travis shows everything is working on master.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I checked, they only fail until Numpy 1.13.1. Latest Numpy works fine...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is really weird, I tested with Numpy 1.13.3 on my local system and it still works fine.
To me |
So, we all agree that it should be |
Are you still making changes here? If not, this gets a +1 from me. |
No, I'm done. |
A rebase would be nice. |
* Size and density properties * Docstrings and tests * Test .density using np.isclose * Fixed typos that made tests fail * Replaced other occurences of np.prod(self.shape) with self.size * Avoid float precision errors in .denstity docstring * maybe_densify raises ValueError instead of NotImplementedError * Updated docstring to match ValueError exception * changed allowed_nnz and allowed_fraction to max_size and min_density * changed min_fraction to min_density * Another docstring that didn't show ValueError * Documentation pages
NumPy has a
.size
property that gives you the number of elements in an array.Additionally for sparse arrays a
.density
property may be handy, which is the ratio of nonzero elements to total elements in the array (a value between0
and1
).This PR implements both properties for
COO
arrays