Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Generic reducer operation #69

Closed
jpivarski opened this issue Jan 14, 2020 · 2 comments · Fixed by #115
Closed

Generic reducer operation #69

jpivarski opened this issue Jan 14, 2020 · 2 comments · Fixed by #115
Assignees
Labels
feature New feature or request

Comments

@jpivarski
Copy link
Member

jpivarski commented Jan 14, 2020

This is an operation, so it is similar in scope to Content::flatten. (Time management: it's one of the hardest; maybe you don't want to start with this.)

The generic reducer operation reduces list depth by 1, much like the Content::flatten operation. It takes an array at some depth (axis parameter, just like flatten), a function of two arguments (maybe templated by array type?), and an identity (of that same type).

Once the generic reducer is in place, all concrete reducers can be implemented. They are:

reducer type binary function identity
any boolean logical or false
all boolean logical and true
count any +1 for each argument 0
count_nonzero numerical +1 for each non-zero argument 0
sum numerical + 0
prod numerical * 1
min numerical x < y ? x : y inf
max numerical x > y ? x : y -inf

For integer types, min and max should use the integer type's maximum and minimum value as an identity.

As a later enhancement, there should also be a way to skip None in arrays of OptionType.

@jpivarski
Copy link
Member Author

jpivarski commented Jan 14, 2020

In #51 and here, I said that negative axis should count upward from the first RecordArray; that is, axis=-1 would mean the outermost ListOffsetArray64 in

example-hierarchy

instead of the innermost ListOffsetArray64 (pointed to by contents["y"]). Although that seems like a good choice for flatten, it would be a bad choice for reducers. It's quite common to want to reduce the innermost level of a tree, even if they're inside of a RecordArray. In fact, awkward 0.x always applies to the innermost level; it was written before I realized that other levels were conceptually possible.

So negative axis parameters should count up from the leaves of the tree, passing right through any RecordArrays. The leaves of the tree might be at different levels, so a negative axis defined this way would do things that aren't possible with a positive axis (unlike NumPy).

This has consequences for flatten: the axis should not be interpreted different ways in different functions, so flatten should count up from the leaves of the tree, too!

@jpivarski
Copy link
Member Author

Worth mentioning: the existence of reducers will also enable several other essential functions (though they are not themselves reducers):

def moment(self, n, weight=None):
   "Compute the n-th moment of an array with optional weight."
    with self.numpy.errstate(invalid="ignore"):
        if weight is None:
            return self.numpy.true_divide((self**n).sum(), self.count())
        else:
            return self.numpy.true_divide(((self * weight)**n).sum(), (self * 0 + weight).sum())

def mean(self, weight=None):
   "Compute the mean (average) of an array with optional weight."
    with self.numpy.errstate(invalid="ignore"):
        if weight is None:
            return self.numpy.true_divide(self.sum(), self.count())
        else:
            return self.numpy.true_divide((self * weight).sum(), (self * 0 + weight).sum())

def var(self, weight=None, ddof=0):
   "Compute the variance of an array with optional weight and possibly reduce it with a number of degrees of freedom."
    with self.numpy.errstate(invalid="ignore"):
        if weight is None:
            denom = self.count()
            one = self.numpy.true_divide(self.sum(), denom)
            two = self.numpy.true_divide((self**2).sum(), denom)
        else:
            denom (self * 0 + weight).sum()
            one = self.numpy.true_divide((self * weight).sum(), denom)
            two = self.numpy.true_divide(((self * weight)**2).sum(), denom)
        if ddof != 0:
            return (two - one**2) * denom / (denom - ddof)
        else:
            return two - one**2

def std(self, weight=None, ddof=0):
   "Compute the standard deviation of an array with optional weight and possibly reduce it with a number of degrees of freedom."
    with self.numpy.errstate(invalid="ignore"):
        return self.numpy.sqrt(self.var(weight=weight, ddof=ddof))

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants