Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Size and density properties #69

Merged
merged 12 commits into from
Jan 16, 2018
Merged

Conversation

nils-werner
Copy link
Contributor

@nils-werner nils-werner commented Jan 10, 2018

NumPy has a .size property that gives you the number of elements in an array.

Additionally for sparse arrays a .density property may be handy, which is the ratio of nonzero elements to total elements in the array (a value between 0 and 1).

This PR implements both properties for COO arrays

@hameerabbasi
Copy link
Collaborator

Seems fine, however it would be really nice to have docstrings for these, and if possible, tests.

@nils-werner
Copy link
Contributor Author

Added them

@hameerabbasi
Copy link
Collaborator

Ignore the Travis CI failure. Numpy 1.14 changed the way it converts arrays to strings, and that's causing a doctests failure. I've addressed it over in #43.

@nils-werner
Copy link
Contributor Author

Yup, I already noticed that

@hameerabbasi
Copy link
Collaborator

Well, this looks good. Let me know if you're finished working on it, and I'll review and merge (don't want to be too fast this time).


def test_density():
s = sparse.random((20, 30, 40), density=0.1)
assert 0.09 < s.density < 0.11
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be wise to use np.isclose here.

sparse/core.py Outdated
>>> x[0, :] = 1
>>> s = COO.from_numpy(x)
>>> s.density
0.10000000000000001
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This might fluctuate in different system floating-point implementations. Perhaps np.isclose(density, 0.1)?

@mrocklin
Copy link
Contributor

mrocklin commented Jan 10, 2018 via email

@nils-werner
Copy link
Contributor Author

@mrocklin that's clever! Done.

@hameerabbasi
Copy link
Collaborator

hameerabbasi commented Jan 11, 2018

This looks ready. If you merge, I can do a final review and merge.

@nils-werner
Copy link
Contributor Author

Did you see my code comment?

@hameerabbasi
Copy link
Collaborator

Hmmm, that's strange, it isn't showing any of your comments in that link, or on any commits.

sparse/core.py Outdated
@@ -1277,7 +1325,7 @@ def astype(self, dtype, out=None):

def maybe_densify(self, allowed_nnz=1e3, allowed_fraction=0.25):
""" Convert to a dense numpy array if not too costly. Err othrewise """
if reduce(operator.mul, self.shape) <= allowed_nnz or self.nnz >= np.prod(self.shape) * allowed_fraction:
if self.size <= allowed_nnz or self.density >= allowed_fraction:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is easy to see that

self.nnz >= np.prod(self.shape) * allowed_fraction

can be transformed to

self.density >= allowed_fraction

But does self.density >= allowed_fraction actually make sense? Shouldn't it be the other way round?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It actually does make sense. If the density is low, then the cost of a dense array is higher compared to that of a sparse array, and we should avoid densifying.

If, however, density is high, the overhead of a sparse matrix far outweighs the benefits and we should densify.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To put it another way, a large array with a very low density should NOT be densified but it's okay to densify a high-density or a small array.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True. Maybe I got confused that one allowed means max and the other one means min... What about renaming them to

maybe_densify(self, max_nnz=1e3, min_density=0.25)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would make sure that no tests break either here or in Dask before doing that. We can change unit tests here... But it's best to send a PR to Dask if any tests break there.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's change it anyway and I'll send a PR to Dask if anything fails for them.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, shouldn't max_nnz be actually named max_size? We are not comparing it to .nnz anyways...

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, max_size is a good idea as well.

@nils-werner
Copy link
Contributor Author

Aha! after commenting on lines you have to also open the code review thingy.

@hameerabbasi
Copy link
Collaborator

Is NotImplementedError the best way to show that densfication is forbidden? What would be the right exception here?

@nils-werner
Copy link
Contributor Author

nils-werner commented Jan 15, 2018

I would say ValueError is the right one, as it's the choice of function arguments that make the operation fail.

A NotImplementedError sounds more like "whatever you pass to this function, in this version it will never succeed".

@hameerabbasi
Copy link
Collaborator

Best to switch that to a ValueError then.

sparse/core.py Outdated
an error.

>>> x = np.zeros((5, 5), dtype=np.uint8)
>>> x[2, 2] = 1
>>> s = COO.from_numpy(x)
>>> s.maybe_densify(allowed_nnz=5, allowed_fraction=0.25)
>>> s.maybe_densify(max_size=5, min_fraction=0.25)
Traceback (most recent call last):
...
NotImplementedError: Operation would require converting large sparse array to dense
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You need to update this line in the docstring. See here.

sparse/core.py Outdated
@@ -2381,17 +2381,17 @@ def astype(self, dtype, out=None):
assert out is None
return self.elemwise(np.ndarray.astype, dtype)

def maybe_densify(self, allowed_nnz=1000, allowed_fraction=0.25):
def maybe_densify(self, max_size=1000, min_fraction=0.25):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Best keep this as min_density, it's clearer that it represents the density.

sparse/core.py Outdated
an error.

>>> x = np.zeros((5, 5), dtype=np.uint8)
>>> x[2, 2] = 1
>>> s = COO.from_numpy(x)
>>> s.maybe_densify(allowed_nnz=5, allowed_fraction=0.25)
>>> s.maybe_densify(max_size=5, min_density=0.25)
Traceback (most recent call last):
...
NotImplementedError: Operation would require converting large sparse array to dense
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Line 2425...

NotImplementedError: Operation would require converting large sparse array to dense

Need to change this to ValueError.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's causing doctests to fail. :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently there are a million tests failing on master, it is hard to spot my own mistakes...

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

@nils-werner nils-werner Jan 15, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I checked, they only fail until Numpy 1.13.1. Latest Numpy works fine...

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is really weird, I tested with Numpy 1.13.3 on my local system and it still works fine.

@mrocklin
Copy link
Contributor

Is NotImplementedError the best way to show that densfication is forbidden? What would be the right exception here?

To me NotImplementedError means "We have not yet implemented this, but it is a reasonable request". If it is a case of a type not being supported like "we don't think that densifying scipy.sparse matrices is a good idea" then I would raise a TypeError. If it is a case of the values being inappropriate like "sometimes we do densify sparse arrays, but this particular array is very very sparse" then I would use a ValueError.

@hameerabbasi
Copy link
Collaborator

So, we all agree that it should be ValueError in maybe_densify then. :-)

@hameerabbasi
Copy link
Collaborator

Are you still making changes here? If not, this gets a +1 from me.

@hameerabbasi hameerabbasi added this to the 0.3 milestone Jan 15, 2018
@nils-werner
Copy link
Contributor Author

No, I'm done.

@hameerabbasi
Copy link
Collaborator

A rebase would be nice.

@hameerabbasi hameerabbasi added the enhancement Indicates new feature requests label Jan 16, 2018
@hameerabbasi hameerabbasi merged commit bbb0869 into pydata:master Jan 16, 2018
hameerabbasi pushed a commit to hameerabbasi/sparse that referenced this pull request Feb 27, 2018
* Size and density properties

* Docstrings and tests

* Test .density using np.isclose

* Fixed typos that made tests fail

* Replaced other occurences of np.prod(self.shape) with self.size

* Avoid float precision errors in .denstity docstring

* maybe_densify raises ValueError instead of NotImplementedError

* Updated docstring to match ValueError exception

* changed allowed_nnz and allowed_fraction to max_size and min_density

* changed min_fraction to min_density

* Another docstring that didn't show ValueError

* Documentation pages
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Indicates new feature requests
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants