Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inheritance hierarchy #92

Merged
merged 9 commits into from Jan 28, 2018

Conversation

hameerabbasi
Copy link
Collaborator

@hameerabbasi hameerabbasi commented Jan 27, 2018

  • Move from six to future (I think it's cleaner, with a few imports we can write Python 2/3 compatible code instead of modifying everything)
  • Add a SparseArray class with the properties common to both COO and DOK.
  • Make COO and DOK inherit from SparseArray.
  • Update docs.

sparse/coo.py Outdated

@shape.setter
def shape(self, value):
self._shape = value
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps we should avoid the property altogether and just have this be a normal attribute?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's no way to inherit it if we do that. And we need shape directly for ndim, size, and indirectly fordensity.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we just avoid listing it as an abstract method then?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then there's no point to the base class. We would have to remove all of the properties I mentioned.

The whole point of the class is to access the "common" stuff for all types.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or we can make it have an __init__ method with just the shape.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like it might work? The super class doesn't necessarily need to be functional on its own.

class SparseArray(object):
    @property
    def ndim(self):
        return len(self.shape)

    @property
    def size(self):
        return reduce(operator.mul, self.shape, 1)
  
    @property
    def density(self):
        return self.nnz / self.size

class COO(SparseArray):
    def __init__(...):
        self.shape = ...

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesn't make sense in an structural way to be using things undefined in the parent class. However, I made it into an attribute of the parent class. This keeps it as an attribute and still makes sense in the inheritance hierarchy.

sparse/dok.py Outdated
self[c] = d
else:
raise ValueError('data must be a dict.')

@property
def dtype(self):
return self._dtype
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thoughts on using an attribute rather than a property here? That seems simpler.

My guess is that we're doing this because the superclass has a property. In this case I would be inclined to just drop the property on the superclass.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

COO uses a property, whereas DOK uses an attribute. We could make COO use an attribute as well, however that seems bad since it might not match data.dtype. I chose the lowest common denominator and went for a property.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it important that they match?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I want to have a base class that can be used to access common things. If I turn it into an attribute on DOK then I have to take it out of the base class... Which I don't want to do. Anything that is common should be in the base class.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Anything that is common should be in the base class

I would agree with this to the extent that it's convenient. It looks like by trying to make these two subclasses adhere to the same interface we're forced to increase the level of technology used by one of the subclasses. I have a (probably irrational) aversion to using more advanced features than are necessary, mostly on the grounds of avoiding complex situations and increasing maintainability.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Although that would work for type inference, inheriting from SparseArray won't fail if we forget to implement dtype. Failures are good if we fail to add basic functionality.

I'm not particularly concerned about us forgetting to specify a dtype.

In addition, SparseArray.dtype would be inferred as NoneType in all IDEs instead of a np.dtype object.

What happens now? We don't specify the dtype in the property.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could, for example, add a docstring and it'd be inferred as a dtype.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great idea. However, I had another: How about a class docstring with an Attributes section?

Edit: I tried this out and it worked fine in PyCharm. :-)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Glad to hear it!

@@ -377,7 +343,7 @@ def todense(self):
"""
result = np.zeros(self.shape, dtype=self.dtype)

for c, d in six.iteritems(self.data):
for c, d in self.data.items():
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm +1 on this change, even if it causes more memory issues in Python 2.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With from builtins import dict it won't, unless the user purposefully passes an old-style dict. :-)

sparse/dok.py Outdated
@@ -1,31 +1,21 @@
import six
from builtins import dict, int
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is dict here in Python 2? It's not a normal builtin dict object?

Ah, ok, I'm now reading about the future package. I have to admit that I am a bit hesitant to add it as a new dependency just because I haven't heard of it before. Happy to deal with this in the future though.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

future is a dependency for many packages. It's more common than six, I believe. I may be wrong.

It's pip installable.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

future is a dependency for many packages. It's more common than six, I believe. I may be wrong.

This would make me much more comfortable. I would be somewhat surprised to learn this though.

I hesitate to have a dict type anywhere in the code that is not the standard builtin dict type, both for performance and (more importantly) consistency reasons with the ecosystem.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just took a peek through the code for future.

    def items(self):
        """
        On Python 2.7+:
            D.items() -> a set-like object providing a view on D's items
        On Python 2.6:
            D.items() -> an iterator over D's items
        """
        if ver == (2, 7):
            return self.viewitems()
        elif ver == (2, 6):
            return self.iteritems()
        elif ver >= (3, 0):
            return self.items()

So performance isn't a concern.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if some other infrastrucural library (like cloudpickle, Cython, PyPy, Numba, ...) has an optimiziation that works well on objects where type(obj) is dict (this is almost certainly the case in all of those projects). Then presumably that optimization won't apply to the DOK class.

This library is aimed to be infrastructural for the ecosystem. I think that it is important that we be as non-fancy as possible. As another point, if you ever want to see this upstreamed into scipy then you'll need to convince them about this point as well and they are, in general, far more conservative than I am.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, it's a nice trick, don't get me wrong. I've just seen many many things go wrong once code like this interacts with 1000 other libraries (which again, I think is our intention). I would definitely use a library like this in application code, but for core ecosystem code I tend to be more conservative. I think that many other maintainers of core ecosystem code also hold this belief.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm. I'd be okay going back to six, but the memory constrains could get bad. What do you think?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I apologize if this causes friction.

Not at all. :-) We're just discussing.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would wait for someone to arrive with a memory issue before optimizing for memory constraints in this way. I don't expect this to be a significant issue in common use. Also, I don't mind Python 2 being less efficient.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed the six/future dependency entirely and added our own basic Python 2/3 compat, however, the dict issue stands.

@hameerabbasi
Copy link
Collaborator Author

If there are no other objections, I think this gets a +1 from me.

This is a more common name for this file in standard projects.
@mrocklin
Copy link
Collaborator

I've renamed py23.py to compatibility.py and pushed to this branch (I hope that's ok). +1 from me to merge

@hameerabbasi
Copy link
Collaborator Author

Thanks!

@hameerabbasi hameerabbasi merged commit b9fc91c into pydata:master Jan 28, 2018
@hameerabbasi hameerabbasi deleted the inheritance-hierarchy branch January 28, 2018 14:24
hameerabbasi added a commit to hameerabbasi/sparse that referenced this pull request Feb 27, 2018
* Implement inheritance hierarchy.

* Implement inheritance hierarchy.

* Make shape an attribute and move it to the base class.

* Delete useless properties.

* Make dtype into a property of the parent class.

* Removed six dependency and added some basic memory
Py 2/3 optimisations.

* Documentation for SparseArray.

* Make dtype into an attribute for DOK.

* Rename py23 to compatibility

This is a more common name for this file in standard projects.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants