Generic reducer operation #69

jpivarski · 2020-01-14T15:51:06Z

This is an operation, so it is similar in scope to Content::flatten. (Time management: it's one of the hardest; maybe you don't want to start with this.)

The generic reducer operation reduces list depth by 1, much like the Content::flatten operation. It takes an array at some depth (axis parameter, just like flatten), a function of two arguments (maybe templated by array type?), and an identity (of that same type).

Once the generic reducer is in place, all concrete reducers can be implemented. They are:

reducer	type	binary function	identity
any	boolean	logical or	`false`
all	boolean	logical and	`true`
count	any	`+1` for each argument	`0`
count_nonzero	numerical	`+1` for each non-zero argument	`0`
sum	numerical	`+`	`0`
prod	numerical	`*`	`1`
min	numerical	`x < y ? x : y`	`inf`
max	numerical	`x > y ? x : y`	`-inf`

For integer types, min and max should use the integer type's maximum and minimum value as an identity.

As a later enhancement, there should also be a way to skip None in arrays of OptionType.

The text was updated successfully, but these errors were encountered:

jpivarski · 2020-01-14T16:02:13Z

In #51 and here, I said that negative axis should count upward from the first RecordArray; that is, axis=-1 would mean the outermost ListOffsetArray64 in

instead of the innermost ListOffsetArray64 (pointed to by contents["y"]). Although that seems like a good choice for flatten, it would be a bad choice for reducers. It's quite common to want to reduce the innermost level of a tree, even if they're inside of a RecordArray. In fact, awkward 0.x always applies to the innermost level; it was written before I realized that other levels were conceptually possible.

So negative axis parameters should count up from the leaves of the tree, passing right through any RecordArrays. The leaves of the tree might be at different levels, so a negative axis defined this way would do things that aren't possible with a positive axis (unlike NumPy).

This has consequences for flatten: the axis should not be interpreted different ways in different functions, so flatten should count up from the leaves of the tree, too!

jpivarski · 2020-01-14T16:06:39Z

Worth mentioning: the existence of reducers will also enable several other essential functions (though they are not themselves reducers):

def moment(self, n, weight=None):
   "Compute the n-th moment of an array with optional weight."
    with self.numpy.errstate(invalid="ignore"):
        if weight is None:
            return self.numpy.true_divide((self**n).sum(), self.count())
        else:
            return self.numpy.true_divide(((self * weight)**n).sum(), (self * 0 + weight).sum())

def mean(self, weight=None):
   "Compute the mean (average) of an array with optional weight."
    with self.numpy.errstate(invalid="ignore"):
        if weight is None:
            return self.numpy.true_divide(self.sum(), self.count())
        else:
            return self.numpy.true_divide((self * weight).sum(), (self * 0 + weight).sum())

def var(self, weight=None, ddof=0):
   "Compute the variance of an array with optional weight and possibly reduce it with a number of degrees of freedom."
    with self.numpy.errstate(invalid="ignore"):
        if weight is None:
            denom = self.count()
            one = self.numpy.true_divide(self.sum(), denom)
            two = self.numpy.true_divide((self**2).sum(), denom)
        else:
            denom (self * 0 + weight).sum()
            one = self.numpy.true_divide((self * weight).sum(), denom)
            two = self.numpy.true_divide(((self * weight)**2).sum(), denom)
        if ddof != 0:
            return (two - one**2) * denom / (denom - ddof)
        else:
            return two - one**2

def std(self, weight=None, ddof=0):
   "Compute the standard deviation of an array with optional weight and possibly reduce it with a number of degrees of freedom."
    with self.numpy.errstate(invalid="ignore"):
        return self.numpy.sqrt(self.var(weight=weight, ddof=ddof))

jpivarski assigned ianna Jan 14, 2020

jpivarski added the feature New feature or request label Jan 14, 2020

jpivarski added this to the Minimum viable product for analysis milestone Jan 14, 2020

jpivarski mentioned this issue Jan 14, 2020

*::flatten and *::count for axis != 0 #51

Closed

This was referenced Jan 14, 2020

argmin and argmax #70

Closed

isna (formerly: dropna) #71

Closed

jpivarski mentioned this issue Feb 10, 2020

Add reducer operations (with an 'axis' parameter). #115

Merged

jpivarski linked a pull request Feb 10, 2020 that will close this issue

Add reducer operations (with an 'axis' parameter). #115

Merged

jpivarski closed this as completed in #115 Feb 16, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Generic reducer operation #69

Generic reducer operation #69

jpivarski commented Jan 14, 2020 •

edited

Loading

jpivarski commented Jan 14, 2020 •

edited

Loading

jpivarski commented Jan 14, 2020

Generic reducer operation #69

Generic reducer operation #69

Comments

jpivarski commented Jan 14, 2020 • edited Loading

jpivarski commented Jan 14, 2020 • edited Loading

jpivarski commented Jan 14, 2020

jpivarski commented Jan 14, 2020 •

edited

Loading

jpivarski commented Jan 14, 2020 •

edited

Loading