Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FEAT: Added SUM and AVERAGE functions #3498

Merged
merged 3 commits into from
Aug 20, 2018
Merged

Conversation

endo64
Copy link
Contributor

@endo64 endo64 commented Aug 11, 2018

Added SUM and AVERAGE functions.
See also Design notes of SUM function

@greggirwin
Copy link
Contributor

As often happens with me, as soon as something "gets real", my mind starts to think of more details. These are probably good as is, though I'm thinking of how spreadsheets work (don't think we want to do that in a basic func, as they coerce convertable strings), and the exact doc string wording. I'll sleep on it.

@greggirwin
Copy link
Contributor

Forcing float in average is probably a good idea. In the future, I think the plan is for / to do that, with // being a new integer divide op. Easy to change at that time though.

@greggirwin
Copy link
Contributor

Percent! is the odd man out with float! forced.

>> average [2x3 4x9]
== 3x6
>> average [10% 12%]
== 0.11
>> average [1.2.3 4.5.6]
== 2.3.4

@greggirwin
Copy link
Contributor

The easy cases aren't an issue, but doc'ing things beyond "results are undefined if you mix numeric types in the block." gets tricky, and maybe leads to other design questions.

>> average [2x3 4x9 1x1 100]
== 26x28

e.g. do we add refinements for control, or use HOF filter-style funcs in combination?

@9214
Copy link
Collaborator

9214 commented Aug 16, 2018

Note that sum [1 2 3] is the task of inserting + between elements and doing the result, i.e. the inverse of common extract usage. That is do inlay [1 2 3] '+. This could be generalized to any other related patterns, also introducing refinements for head and tail insertion, e.g.:

>> append inlay/head [1 2 3] 'add 0 ; problem is that you need a 'symmetry' in the form of 0
== [add 1 add 2 add 3 0]
>> do inlay collect [repeat i 5 [keep i]] '* ; factorial of 5
>> 120

You can insert an arbitrary block of data that way:

>> inlay/head [1 2 3] [append something 1 +]
== [append something 1 + 1 append something 1 + 2 ...]

Or cycle through it and insert each element periodically:

>> inlay/thru [1 2 3] [+ -]
== [1 + 2 - 3]

average can be generalized in a similar way, with a bit of thinking. It takes one series twice, calculates its properties and combines them together in the third operation:

>> x: [1 2 3]
== 6
>> length? x
== 3
>> sum x
== 6
>> divide sum x length? x ; mean
== 2
>> combine [1 2 3] [sum length?] ; apply each function separately and collect the results together
== [6 3]
>> do inlay combine [1 2 3] [sum length?] '/
== 2
>> apply combine [1 2 3] [sum length?] :divide
== 2
>> block: inlay/head/thru append/dup/only [] [1 2 3] 2 [[divide sum] length?]
== [divide sum [1 2 3] length? [1 2 3]]
>> do block
== 2

In short, I propose to think in terms of HOF and clever combinators here, instead of introducing ad hoc solutions.

Another question is why we should restrict ourselves to block! when vector!, binary! and perhaps even string! with paren! are all equally applicable (whole series! typeset?).

@greggirwin
Copy link
Contributor

The fold/accumulator approach is an option, but I'd like to keep these simple for now. As long as we get the spec right, we can change the body later. The main thing for me is the behavior.

@endo64
Copy link
Contributor Author

endo64 commented Aug 18, 2018

@9214 I agree that sum and average should support vector!, paren! and hash!, but not whole series! typeset (at least in this simple version)

@greggirwin I don't think we should reduce the given block, if that is necessary user can easily do that. Additionally by removing reduce we can easily support paren!s which otherwise reduced and needs special check.

I propose the following version,

sum: func [
    "Returns the sum of all values in a block"
    values [block! vector! paren! hash!]
    /local result value
][
    result: make any [values/1 0] 0
    foreach value values [result: result + value]
    result
]

average: func [
    "Returns the average of all values in a block"
    block [block! vector! paren! hash!]
][
    if empty? block [return none]
    divide sum block to float! length? block
]

which also supports percent! values:

>> sum [30% 20%]
== 50%
>> average [30% 20%]
== 25%
>> sum [1 30% 20%]
== 1.5
>> sum [30% 20% 1]
== 150%

What do you think? If that is good, I'll update my PR.

@endo64
Copy link
Contributor Author

endo64 commented Aug 18, 2018

For mixing numeric types in block; average [2x3 4x9 1x1 100] ;== 26x28 I don't think it is a problem, because its all depend on the + op on the values, 1x1 + 100 already defined as 101x101 (and divide pair! by integer! as well)

@greggirwin
Copy link
Contributor

Looks good to me @endo64. Update the PR, let's get it in there, and solicit feedback.

@endo64
Copy link
Contributor Author

endo64 commented Aug 20, 2018

Updated the PR

@greggirwin greggirwin merged commit 318673b into red:master Aug 20, 2018
@greggirwin
Copy link
Contributor

@endo64, feel free to announce the availability of these funcs on Gitter.

@toomasv
Copy link
Contributor

toomasv commented Aug 21, 2018

sum: func [
   values [block! hash! paren! vector!] 
   /result
][
   result: 0 
   forall values [result: result + values/1]
]

is slightly more efficient

And

sum: func [
   values [block! hash! paren! vector!] 
   /result
][
   result: 0 
   forall values [attempt [result: result + values/1]] 
   result
]

would accept some series with mixed types summing summable things only, but is about twice slower.

@9214
Copy link
Collaborator

9214 commented Aug 21, 2018

👍 to @toomasv and

average: func [
    "Returns the average of all values in a block" ;@@ block..?
    block [block! vector! paren! hash!] ;@@ but it's not ONLY a block
][
    unless empty? block [
        divide sum block .0 + length? block
    ]
]

@greggirwin
Copy link
Contributor

Let's focus on behavior before optimizations.

@9214, I'm not keen on that trick for casting to a float, as it doesn't show the intent.

I also think block is a reasonable name for any-block(ish) args. What else would you suggest? We can always go generic with input, but block gives us a clue.

@9214
Copy link
Collaborator

9214 commented Aug 21, 2018

@greggirwin okay, unless, still has its place.

@greggirwin
Copy link
Contributor

@9214 you will find I'm not a fan of unless. ;^) In this case, I prefer Endo's version. Unless has sort of a "hidden negation" for me, which makes it harder to scan quickly, and the explicit none return works well here IMO.

@greggirwin
Copy link
Contributor

greggirwin commented Aug 21, 2018

Unless fits better in Perl, where it can be an adaptive clause, but that style has issues for me too. It's a better natural language fit, but makes it easier to miss logic branches.

@9214
Copy link
Collaborator

9214 commented Aug 21, 2018

I guess as long as suggestion is functionally equivalent to PR it will be rejected. I have nothing else to say then, except for:

>> any-list!
== make typeset! [block! paren! hash!]

@greggirwin
Copy link
Contributor

Clear improvements are welcome, but these funcs are so simple that it's hard to make more than lateral moves.

@toomasv
Copy link
Contributor

toomasv commented Aug 21, 2018

I agree, that probably the usual case would be summing/averaging some kind of number-blocks, and so the sum tailored for this use is preferable. For blocks like [John 58 Tom 45 Valdis 51 Igor 67] or [translate 100x20 translate -56x48 translate 75x-39] sum can be easily re(de)fined.

@9214
Copy link
Collaborator

9214 commented Aug 21, 2018

@toomasv sum extract/index [...] 2 2

@endo64
Copy link
Contributor Author

endo64 commented Aug 22, 2018

@toomasv, @9214 Thanks for the inputs, as @greggirwin said current sum and average functions are very simple and small tricks/optimizations that doesn't change the behavior is not yet necessary IMO.

My first version was also using forall but it was a bit slower than foreach so I changed it.
I also added /safe to skip non-summable items, there was also controls for overflow and convert result to float in that case (see https://gist.github.com/endo64/e0ea49bc2fb09d548bad1b077720eda5 ) but we decided to keep the initial version very simple.

@endo64 endo64 deleted the SumAndAverage branch August 22, 2018 21:28
@endo64
Copy link
Contributor Author

endo64 commented Aug 25, 2018

One additional note about @toomasv's version (#3498 (comment)), that sum function returns none for empty block, but current version returns 0.

I prefer sum returns zero instead of none but we should consider this. MySQL (and some other database servers) sum returns null for empty table but PHP array_sum return zero.

@greggirwin
Copy link
Contributor

Yes, this is the bigger picture question of how spreadsheets and DB aggregates behave, and taking those into account. e.g. if you're using these funcs to build tools like that, we have to manage expectations if we do something different.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants