Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dataset types use __nonzero/bool__ method for truthiness #992

Merged
merged 1 commit into from Nov 29, 2016
Merged
Changes from all commits
Commits
File filter...
Filter file types
Jump to…
Jump to file or symbol
Failed to load files and symbols.
+11 −13
Diff settings

Always

Just for now

Dataset types use __nonzero/bool__ method for truthiness

  • Loading branch information...
Philipp Rudiger Philipp Rudiger
Philipp Rudiger authored and Philipp Rudiger committed Nov 29, 2016
commit 90d9050c5f924745a5b58c1fbb1608b6881a6599
@@ -426,6 +426,10 @@ def __len__(self):
"""
return self.interface.length(self)

def __nonzero__(self):
return self.interface.nonzero(self)

__bool__ = __nonzero__

@property
def shape(self):
Copy path View file
@@ -229,12 +229,8 @@ def dframe(cls, columns, dimensions):
return columns.data.compute()

@classmethod
def length(cls, dataset):
"""
Length of dask dataframe is unknown, always return 1
for performance, use shape to compute dataframe shape.
"""
return 1
def nonzero(cls, dataset):
return True

This comment has been minimized.

Copy link
@jbednar

jbednar Nov 29, 2016

Contributor

This is a definite improvement! Hardcoding nonzero is vastly better than hardcoding length. Even so, is there no way to determine the actual value of nonzero in a way that doesn't load the entire dataset?

This comment has been minimized.

Copy link
@philippjfr

philippjfr Nov 29, 2016

Author Contributor

I tried various things such as using:

try:
   next(df.iterrows())
except:
   nonzero = False
else:
   nonzero = True

But all of these approaches still take a considerable amount of time.

This comment has been minimized.

Copy link
@jbednar

jbednar Nov 29, 2016

Contributor

Maybe worth asking the dask maintainers if there is a quick method for testing nonzero, then.

This comment has been minimized.

Copy link
@philippjfr

philippjfr Nov 29, 2016

Author Contributor

I've asked, hopefully they'll have a good solution.

This comment has been minimized.

Copy link
@philippjfr

philippjfr Nov 29, 2016

Author Contributor

Matt Rocklin suggested using .head and checking the length of that, not any faster than my solution above though.

This comment has been minimized.

Copy link
@philippjfr

philippjfr Nov 29, 2016

Author Contributor

Discussed it properly now, and it's fairly clear that there won't be a cheap general solution here. A dask dataframe can be the result of a bunch of chained operations which all have to be evaluated before the length or even a nonzero length can be determined.



@@ -207,6 +207,10 @@ def shape(cls, dataset):
def length(cls, dataset):
return len(dataset.data)

@classmethod
def nonzero(cls, dataset):
return bool(cls.length(dataset))

@classmethod
def redim(cls, dataset, dimensions):
return dataset.data
Copy path View file
@@ -629,7 +629,7 @@ def _valid_dimensions(self, dimensions):
@property
def ddims(self):
"The list of deep dimensions"
if self._deep_indexable and len(self):
if self._deep_indexable and self:
return self.values()[0].dimensions()
else:
return []
Copy path View file
@@ -309,12 +309,6 @@ class Points(Chart):

_min_dims = 2 # Minimum number of columns

def __iter__(self):
i = 0
while i < len(self):
yield tuple(self.data[i, ...])
i += 1


This comment has been minimized.

Copy link
@jbednar

jbednar Nov 29, 2016

Contributor

Not sure of the implications of deleting this.

This comment has been minimized.

Copy link
@philippjfr

philippjfr Nov 29, 2016

Author Contributor

This was old and broken because it assumed the data was always an array. There is an existing issue about adding an iterator method to all the Dataset interfaces somewhere.


class VectorField(Points):
ProTip! Use n and p to navigate between commits in a pull request.
You can’t perform that action at this time.