New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inheritance hierarchy #92

Merged
merged 9 commits into from Jan 28, 2018

Conversation

Projects
None yet
2 participants
@hameerabbasi
Collaborator

hameerabbasi commented Jan 27, 2018

  • Move from six to future (I think it's cleaner, with a few imports we can write Python 2/3 compatible code instead of modifying everything)
  • Add a SparseArray class with the properties common to both COO and DOK.
  • Make COO and DOK inherit from SparseArray.
  • Update docs.
@shape.setter
def shape(self, value):
self._shape = value

This comment has been minimized.

@mrocklin

mrocklin Jan 27, 2018

Collaborator

Perhaps we should avoid the property altogether and just have this be a normal attribute?

This comment has been minimized.

@hameerabbasi

hameerabbasi Jan 27, 2018

Collaborator

There's no way to inherit it if we do that. And we need shape directly for ndim, size, and indirectly fordensity.

This comment has been minimized.

@mrocklin

mrocklin Jan 27, 2018

Collaborator

Maybe we just avoid listing it as an abstract method then?

This comment has been minimized.

@hameerabbasi

hameerabbasi Jan 27, 2018

Collaborator

Then there's no point to the base class. We would have to remove all of the properties I mentioned.

The whole point of the class is to access the "common" stuff for all types.

This comment has been minimized.

@hameerabbasi

hameerabbasi Jan 27, 2018

Collaborator

Or we can make it have an __init__ method with just the shape.

This comment has been minimized.

@mrocklin

mrocklin Jan 27, 2018

Collaborator

This seems like it might work? The super class doesn't necessarily need to be functional on its own.

class SparseArray(object):
    @property
    def ndim(self):
        return len(self.shape)

    @property
    def size(self):
        return reduce(operator.mul, self.shape, 1)
  
    @property
    def density(self):
        return self.nnz / self.size

class COO(SparseArray):
    def __init__(...):
        self.shape = ...

This comment has been minimized.

@hameerabbasi

hameerabbasi Jan 27, 2018

Collaborator

It doesn't make sense in an structural way to be using things undefined in the parent class. However, I made it into an attribute of the parent class. This keeps it as an attribute and still makes sense in the inheritance hierarchy.

@hameerabbasi hameerabbasi force-pushed the hameerabbasi:inheritance-hierarchy branch from 9b6176a to 43f850d Jan 27, 2018

self[c] = d
else:
raise ValueError('data must be a dict.')
@property
def dtype(self):
return self._dtype

This comment has been minimized.

@mrocklin

mrocklin Jan 27, 2018

Collaborator

Thoughts on using an attribute rather than a property here? That seems simpler.

My guess is that we're doing this because the superclass has a property. In this case I would be inclined to just drop the property on the superclass.

This comment has been minimized.

@hameerabbasi

hameerabbasi Jan 27, 2018

Collaborator

COO uses a property, whereas DOK uses an attribute. We could make COO use an attribute as well, however that seems bad since it might not match data.dtype. I chose the lowest common denominator and went for a property.

This comment has been minimized.

@mrocklin

mrocklin Jan 27, 2018

Collaborator

Is it important that they match?

This comment has been minimized.

@hameerabbasi

hameerabbasi Jan 27, 2018

Collaborator

I want to have a base class that can be used to access common things. If I turn it into an attribute on DOK then I have to take it out of the base class... Which I don't want to do. Anything that is common should be in the base class.

This comment has been minimized.

@mrocklin

mrocklin Jan 27, 2018

Collaborator

Anything that is common should be in the base class

I would agree with this to the extent that it's convenient. It looks like by trying to make these two subclasses adhere to the same interface we're forced to increase the level of technology used by one of the subclasses. I have a (probably irrational) aversion to using more advanced features than are necessary, mostly on the grounds of avoiding complex situations and increasing maintainability.

This comment has been minimized.

@mrocklin

mrocklin Jan 27, 2018

Collaborator

Although that would work for type inference, inheriting from SparseArray won't fail if we forget to implement dtype. Failures are good if we fail to add basic functionality.

I'm not particularly concerned about us forgetting to specify a dtype.

In addition, SparseArray.dtype would be inferred as NoneType in all IDEs instead of a np.dtype object.

What happens now? We don't specify the dtype in the property.

This comment has been minimized.

@hameerabbasi

hameerabbasi Jan 27, 2018

Collaborator

We could, for example, add a docstring and it'd be inferred as a dtype.

This comment has been minimized.

This comment has been minimized.

@hameerabbasi

hameerabbasi Jan 28, 2018

Collaborator

Great idea. However, I had another: How about a class docstring with an Attributes section?

Edit: I tried this out and it worked fine in PyCharm. :-)

This comment has been minimized.

@mrocklin

mrocklin Jan 28, 2018

Collaborator

Glad to hear it!

@@ -377,7 +343,7 @@ def todense(self):
"""
result = np.zeros(self.shape, dtype=self.dtype)
for c, d in six.iteritems(self.data):
for c, d in self.data.items():

This comment has been minimized.

@mrocklin

mrocklin Jan 27, 2018

Collaborator

I'm +1 on this change, even if it causes more memory issues in Python 2.

This comment has been minimized.

@hameerabbasi

hameerabbasi Jan 27, 2018

Collaborator

With from builtins import dict it won't, unless the user purposefully passes an old-style dict. :-)

@@ -1,31 +1,21 @@
import six
from builtins import dict, int

This comment has been minimized.

@mrocklin

mrocklin Jan 27, 2018

Collaborator

What is dict here in Python 2? It's not a normal builtin dict object?

Ah, ok, I'm now reading about the future package. I have to admit that I am a bit hesitant to add it as a new dependency just because I haven't heard of it before. Happy to deal with this in the future though.

This comment has been minimized.

@hameerabbasi

hameerabbasi Jan 27, 2018

Collaborator

future is a dependency for many packages. It's more common than six, I believe. I may be wrong.

It's pip installable.

This comment has been minimized.

@mrocklin

mrocklin Jan 27, 2018

Collaborator

future is a dependency for many packages. It's more common than six, I believe. I may be wrong.

This would make me much more comfortable. I would be somewhat surprised to learn this though.

I hesitate to have a dict type anywhere in the code that is not the standard builtin dict type, both for performance and (more importantly) consistency reasons with the ecosystem.

This comment has been minimized.

@hameerabbasi

hameerabbasi Jan 27, 2018

Collaborator

I just took a peek through the code for future.

    def items(self):
        """
        On Python 2.7+:
            D.items() -> a set-like object providing a view on D's items
        On Python 2.6:
            D.items() -> an iterator over D's items
        """
        if ver == (2, 7):
            return self.viewitems()
        elif ver == (2, 6):
            return self.iteritems()
        elif ver >= (3, 0):
            return self.items()

So performance isn't a concern.

This comment has been minimized.

@mrocklin

mrocklin Jan 27, 2018

Collaborator

What if some other infrastrucural library (like cloudpickle, Cython, PyPy, Numba, ...) has an optimiziation that works well on objects where type(obj) is dict (this is almost certainly the case in all of those projects). Then presumably that optimization won't apply to the DOK class.

This library is aimed to be infrastructural for the ecosystem. I think that it is important that we be as non-fancy as possible. As another point, if you ever want to see this upstreamed into scipy then you'll need to convince them about this point as well and they are, in general, far more conservative than I am.

This comment has been minimized.

@mrocklin

mrocklin Jan 27, 2018

Collaborator

Yeah, it's a nice trick, don't get me wrong. I've just seen many many things go wrong once code like this interacts with 1000 other libraries (which again, I think is our intention). I would definitely use a library like this in application code, but for core ecosystem code I tend to be more conservative. I think that many other maintainers of core ecosystem code also hold this belief.

This comment has been minimized.

@hameerabbasi

hameerabbasi Jan 27, 2018

Collaborator

Hmm. I'd be okay going back to six, but the memory constrains could get bad. What do you think?

This comment has been minimized.

@hameerabbasi

hameerabbasi Jan 27, 2018

Collaborator

I apologize if this causes friction.

Not at all. :-) We're just discussing.

This comment has been minimized.

@mrocklin

mrocklin Jan 28, 2018

Collaborator

I would wait for someone to arrive with a memory issue before optimizing for memory constraints in this way. I don't expect this to be a significant issue in common use. Also, I don't mind Python 2 being less efficient.

This comment has been minimized.

@hameerabbasi

hameerabbasi Jan 28, 2018

Collaborator

I removed the six/future dependency entirely and added our own basic Python 2/3 compat, however, the dict issue stands.

hameerabbasi added some commits Jan 28, 2018

@hameerabbasi

This comment has been minimized.

Collaborator

hameerabbasi commented Jan 28, 2018

If there are no other objections, I think this gets a +1 from me.

Rename py23 to compatibility
This is a more common name for this file in standard projects.
@mrocklin

This comment has been minimized.

Collaborator

mrocklin commented Jan 28, 2018

I've renamed py23.py to compatibility.py and pushed to this branch (I hope that's ok). +1 from me to merge

@hameerabbasi

This comment has been minimized.

Collaborator

hameerabbasi commented Jan 28, 2018

Thanks!

@hameerabbasi hameerabbasi merged commit b9fc91c into pydata:master Jan 28, 2018

1 check passed

continuous-integration/travis-ci/pr The Travis CI build passed
Details

@hameerabbasi hameerabbasi deleted the hameerabbasi:inheritance-hierarchy branch Jan 28, 2018

hameerabbasi added a commit to hameerabbasi/sparse that referenced this pull request Feb 27, 2018

Inheritance hierarchy (pydata#92)
* Implement inheritance hierarchy.

* Implement inheritance hierarchy.

* Make shape an attribute and move it to the base class.

* Delete useless properties.

* Make dtype into a property of the parent class.

* Removed six dependency and added some basic memory
Py 2/3 optimisations.

* Documentation for SparseArray.

* Make dtype into an attribute for DOK.

* Rename py23 to compatibility

This is a more common name for this file in standard projects.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment