Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add reduce macros #143

Open
slott56 opened this issue Aug 1, 2020 · 5 comments
Open

Add reduce macros #143

slott56 opened this issue Aug 1, 2020 · 5 comments

Comments

@slott56
Copy link
Contributor

slott56 commented Aug 1, 2020

[1, 2, 3].reduce(s, x, s+x, 0) for example, would compute a sum. An alternative is L.reduce("_+_", 0) which would use implied variable bindings but severly limit the features available. The 4th parameter, the initial value, defaults to zero to make sum and count slightly simpler.

Specialized reductions would be available to avoid wordy constructs using the foundational reduce macro.

  • L.sum() -- implicitly L.sum(x, x)
  • L.min() -- L.min(x, x)
  • L.max() -- L.max(x, x)
  • L.count() -- L.count(x, x) == size(L.filter(x, x))

This would permit introduction of mean(), stdev(), variance() permitting CEL the be applied to statistically-based decisions. size(L.filter(sample, sample > mean(benchmark)+3.*stdev(benchmark))) > 1. We can then supply an appropriate benchmark value in a binding.

Update:

Additionally, an L.first() would also be helpful.

This is not based on the above reduce(). It's a kind of existence test and can use short-circuit processing to stop processing when the first value has been found.

For example, resource["Tags"].first(x, x["Key"] == "Name" ? x["Value"] : null, "Default"). This lets us examine JSON documents with a list of {"Key": x, "Value": y} values for the first instance of x == "Name" and extract the y value.

This can be slightly more pleasant than resource["Tags"].filter(x, x["Key"] == "Name")[0]["Value"] because the first() macro can return a default value instead of suffering from an index error in the event of a missing {"Key": "Name", ...} entry in the list being examined.

@TristonianJones
Copy link
Collaborator

TristonianJones commented Aug 26, 2020

This is great use case with solid examples.

@JimLarson and I have been wanting to introduce a more robust set of macros into CEL for reducers. I'll add a mini-design doc here and start the official language change process with the CEL governance team.

// Reduction always operates on the reduced (or initial) value and the iteration value from
// the input range.
// 
// Before iterating over the range value, which must be of list type, the <reduce_var> will
// be initialized to the <init_expr>. For each iteration of the range, the current value in the
// range will be assigned to <iter_var>. The <op_expr> may refer to both the <reduce_var>
// and <iter_var> and the result of the evaluation will be assigned to the <reduce_var>.
//
// If the <range_expr> is empty the <reduce_var> value will return the <init_expr> value.
<range_expr>.reduce(<reduce_var>, <iter_var>, <init_expr>, <op_expr>)

The macros of min, max, sum, and count would be written as expansions on reduce as follows:

<range_expr>.min() -> <range_expr>.reduce(r, i, int_max, r < i ? r : i)
<range_expr>.max() -> <range_expr>.reduce(r, i, int_min, r > i ? r : i)
<range_expr>.sum() -> <range_expr>.reduce(r, i, 0, r + i)
<range_expr>.count() -> <range_expr>.size()
<range_expr>.count(<i>, filter) -> <range_expr>.reduce(r, <i>, 0, filter ? r + 1 : r)

@TristonianJones
Copy link
Collaborator

@slott56 My only question is whether you can satisfy the following equation today without the reducers and just using the existing filter macro and size method with some extensions for mean and stddev (not sure over what range of values those functions would apply based on the example though):

<list>.filter(sample, sample > mean(benchmark)+3.*stdev(benchmark))).size() > 1

@slott56
Copy link
Contributor Author

slott56 commented Aug 26, 2020

Yes. As stated above, L.count(x, condition) is effectively size(L.filter(x, condition)). And yes, this is a redundant approach. I think a symmetry between count and sum are helpful. It makes generalization to sum of squares somewhat easier to understand.

I'd prefer <range_expr>. count() -> <range_expr>.reduce(r, i, 0, r + 1) but I don't think there's a justifiable reason for this other than simple symmetry.

@slott56
Copy link
Contributor Author

slott56 commented Sep 17, 2020

The use of int_max and int_min could be a problem when working with float, duration, or timestamp values. Perhaps min() and max() could use <range_expr>.first() to grab the first value in the sequence, irrespective of type. If there is no first value (because the sequence is empty) this will raise an erroneous result.

@nbryant42
Copy link

At the office I've written an almost-compliant implementation of CEL in Elixir (it does deviate from spec in a few minor areas that make it a better fit for Elixir, e.g. all integers in Elixir/Erlang are bignums, so it makes a lot of sense to keep that property)

Our users do need to do a few things like .sum() that could be implemented as a reduce with more generality. My CEL implementation supports user-defined pluggable functions and macros by simply binding anonymous functions into a context map, so I'm strongly tempted to define a reduce. The only thing stopping me is what if upstream CEL implements it later and it ends up using a different parameter order.

I could get away with just defining things like sum() but my vote goes for adding this to the common language definition

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants
@TristonianJones @slott56 @nbryant42 and others