Dataset types use nonzero/bool method for truthiness #992

philippjfr · 2016-11-29T17:10:38Z

As discussed in #988 this PR implements the __nonzero__ (Py2) and __bool__ (Py3) methods on Dataset ensuring, which is used instead of __len__. This ensures that the dask interface can implement the __len__ method correctly, without forcing code throughout holoviews to compute the length on a large out-of-core dask dataframe all the time.

jbednar · 2016-11-29T18:05:22Z

holoviews/core/data/dask.py

-        """
-        return 1
+    def nonzero(cls, dataset):
+        return True



This is a definite improvement! Hardcoding nonzero is vastly better than hardcoding length. Even so, is there no way to determine the actual value of nonzero in a way that doesn't load the entire dataset?

I tried various things such as using:

try: next(df.iterrows()) except: nonzero = False else: nonzero = True

But all of these approaches still take a considerable amount of time.

Maybe worth asking the dask maintainers if there is a quick method for testing nonzero, then.

I've asked, hopefully they'll have a good solution.

Matt Rocklin suggested using .head and checking the length of that, not any faster than my solution above though.

Discussed it properly now, and it's fairly clear that there won't be a cheap general solution here. A dask dataframe can be the result of a bunch of chained operations which all have to be evaluated before the length or even a nonzero length can be determined.

jbednar · 2016-11-29T18:06:18Z

holoviews/element/chart.py

-        while i < len(self):
-            yield tuple(self.data[i, ...])
-            i += 1
-



Not sure of the implications of deleting this.

This was old and broken because it assumed the data was always an array. There is an existing issue about adding an iterator method to all the Dataset interfaces somewhere.

jbednar · 2016-11-29T18:06:36Z

Looks great! Happy to see it merged.

jlstevens · 2016-11-29T18:48:40Z

Very happy with this improvement and tests have passed. Merging.

github-actions · 2024-10-26T02:55:15Z

This pull request has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

Dataset types use __nonzero/bool__ method for truthiness

90d9050

philippjfr added the tag: component: data label Nov 29, 2016

jbednar reviewed Nov 29, 2016

View reviewed changes

jlstevens merged commit 8c03983 into master Nov 29, 2016

philippjfr deleted the interface_nonzero branch December 10, 2016 23:42

github-actions bot locked as resolved and limited conversation to collaborators Oct 26, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Dataset types use nonzero/bool method for truthiness #992

Dataset types use nonzero/bool method for truthiness #992

philippjfr commented Nov 29, 2016

jbednar Nov 29, 2016

philippjfr Nov 29, 2016

jbednar Nov 29, 2016

philippjfr Nov 29, 2016

philippjfr Nov 29, 2016

philippjfr Nov 29, 2016

jbednar Nov 29, 2016

philippjfr Nov 29, 2016

jbednar commented Nov 29, 2016

jlstevens commented Nov 29, 2016

github-actions bot commented Oct 26, 2024

Dataset types use __nonzero/bool__ method for truthiness #992

Dataset types use __nonzero/bool__ method for truthiness #992

Conversation

philippjfr commented Nov 29, 2016

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jbednar commented Nov 29, 2016

jlstevens commented Nov 29, 2016

github-actions bot commented Oct 26, 2024

Dataset types use nonzero/bool method for truthiness #992

Dataset types use nonzero/bool method for truthiness #992