Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve min and max logic #3197

Merged
merged 5 commits into from
Feb 22, 2022
Merged

Improve min and max logic #3197

merged 5 commits into from
Feb 22, 2022

Conversation

abrwn
Copy link
Contributor

@abrwn abrwn commented Nov 8, 2021

Currently the min, max, minBy and maxBy use a simple comparison to check which is the greater/smaller value, and returns the value accordingly.

However, in cases where a comparison is not possible (eg comparing undefined or an alphabetic character with a number), it simply returns the second value, since the expression evaluates to false.

This means that the following executed functions return different results:

min(1, 'a')
min('a', 1)

This PR checks whether at least one comparison result is true before returning a value, and throws an error if not.

Additionally, although it could be argued that there is no 'min'/'max' when the function is supplied with two of the same value, since one is not greater/less than the other, practically I believe one would expect a returned value rather than an error in this scenario. Therefore the latter of the two (equal) values is returned.

Note: I notice the trim test failing locally, although this is happening on master too so shouldn't be caused by this PR.

source/maxBy.js Outdated
@@ -26,6 +26,7 @@ import _curry3 from './internal/_curry3.js';
* R.reduce(R.maxBy(square), 0, []); //=> 0
*/
var maxBy = _curry3(function maxBy(f, a, b) {
return f(b) > f(a) ? b : a;
return max(f(a), f(b)) === f(b) ? b : a;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think it would be worth using a variable to store the result of f(b)?

Also I wonder if you should use equals again here. If I follow this correctly then:

max([1,2])([2,1])
//=> [2,1]

But:

maxBy(identity)([1,2])([2,1])
//=> [1,2]

Because [2,1]===[2,1] is false.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the comment.

You're right - I was generally working under the assumption that a primitive would be returned by the max function, but it's true that arrays are an exception, plus it's also occurred to me that max could return any type if the values passed to it are equal.

equals makes sense here, though based on @CrossEye 's reservations about using it in lightweight functions like these, it's probably worth me thinking of a different approach.

source/max.js Outdated
var max = _curry2(function max(a, b) {
if (equals(a, b)) { return b; }
if (a > b || b > a) { return b > a ? b : a; }
throw new TypeError('cannot compare ' + toString(a) + ' with ' + toString(b));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not too sure about this.

I would expect people to use maxBy or minBy for values that can't be compared directly e.g.

maxBy(path(['a','b','c']))({a:{b:{c:42}}})({a:{b:{c:41}}})

I understand your motivations (I think) but unexpected result is what you get when you try to compare apples and pears:

max([10,2])([10,10])
//=> [10,2]

What do you think?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My use case for adding this error handling was based on when we can't predict with certainty what the values will be. For example, we may expect values, but receive undefined - this will always result in the first argument being returned by min/max.

min(undefined, 1) // => undefined
min(1, undefined) // => 1

This may not be immediately obvious when using the functions, and cause some strange outcomes.

That said, perhaps a lighter approach may be better, where we assume the correct type will be passed and only check that both args are defined.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That said, perhaps a lighter approach may be better, where we assume the correct type will be passed and only check that both args are defined.

So if both a and b can't be compared and one is nil returned the other?

Sounds reasonable to me but maybe I'm overlooking something @ramda/core ?

@CrossEye
Copy link
Member

CrossEye commented Nov 8, 2021

Thank you very much for your well-written and well-considered PR.

While I need a little more time to consider this, I do have some serious objections.

First, Ramda is extremely wary of throwing exceptions. Functions that throw exceptions do not compose well, and improving functional composition is one of the main themes of the library. Ramda has mostly subscribed to the Garbage In/Garbage Out philosophy. If you supply what the function is designed to take, then we'll do what's expected. If you don't, anything might happen.

Second, I'm not happy with introducing equals here. These are so far a very efficient function, and they operate on the JS notion of comparison. But with equals, we're adding a fairly heavy-weight function and we are sometimes precluding the JS comparison. Moreover, doing this, is a surprising breaking change, and not only for those supplying incompatible types. For instance:

const max2 = (a, b) => {
  if (equals (a, b)) { return a; }
  if (a > b || b > a) { return b > a ? b : a; }
  throw new TypeError('cannot compare ' + toString(a) + ' with ' + toString(b));
}

const foo = (id, val) => ({  // or a class or constructor function
  id, 
  val, 
  equals: (that) => that.id == id, 
  valueOf: () => val
})

const first = foo ('a', 5)
const second = foo ('a', 7)

console .log (max (first, second))   //=> {id: "a", val: 7}
console .log (max2 (first, second))  //=> {id: "a", val: 5}

(This has to do with the fact that Ramda's equals delegates to an equals method on the objects if they have one.)

So I'm not really happy with this implementation. But I agree that there is probably something to fix if we can't guarantee that, for instance, max (a, b) === max (b, a). So I think something should be done. I haven't yet tried to think about what that is, though.

Copy link
Member

@customcommander customcommander left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry I keep getting GH reviews wrong; nothing much to add apart from my initial comments. I share @CrossEye concerns.

@abrwn
Copy link
Contributor Author

abrwn commented Nov 9, 2021

@CrossEye thanks for the reply - both points make a lot of sense.

I think based on the Garbage In/Garbage Out philosophy, we could probably simplify the scope of the PR, and just check whether both arguments are defined before making the comparison. This was my original use case, where the function being passed undefined was causing some unpredictable results.

I see it less likely that people will try to compare 'a' to 1, vs 1 to undefined, for example.

That would allow us to remove the equals function, and only throw an error where we really need to - or possibly even treat undefined in the same way as JS treats null, which is to treat it like 0 in terms of the comparison operator.
(Not sure about this one, but could be done if we really don't want to throw an error).

@CrossEye
Copy link
Member

CrossEye commented Nov 9, 2021

@abrwn: What if we use the four-part logic resulting from a < b and a > b? If these differ, return the appropriate value. If they're both true or both false, then we make a consistent, but arbitrary choice, perhaps the one with the alphabetically earlier "typeof" result, and if that fails, the one with the alphabetically earlier result from R.toString, and only if that fails, choosing the the first one for min and the last one for max. (This last might involve a minor backward incompatibility, but I think it would help if we wanted to use these in any stable sorting algorithm.)

I thought we might need two isNaN checks in there as well, but for NaN, it simply doesn't matter which value we return.

This would let us avoid ever throwing, and would defer the heavyweight toString call until all other options have been exhausted.

@hitmands
Copy link
Member

hitmands commented Nov 10, 2021

Hi, thanks for contributing, I love seeing this much engagement :)

I don't have strong opinions here,
my 2 cents are that I would not want this (or any other function in this) library to oversee too much how it's used...

There are many functions here that if called the wrong way may lead to unexpected behaviours.

We should simply not call min (etc.) with something that isn't either a number or something that cannot be lexicographically compared (e.g. dates and strings).

e.g.:

R.min(10, 20); // 10

R.min('a', 'b') // 'a'

R.min(new Date(2021, 10, 10), new Date(1986, 6, 16)) // May 16, 1986

R.min(undefined, null); // invalid function call
R.min({}, []); // invalid function call

for instance, R.map(R.toUpper, null) just leads to a js error... and that is fine imho

@CrossEye
Copy link
Member

@hitmands:

I would not want this (or any other function in this) library to oversee too much how it's used...

Yes, that's generally been Ramda's philosophy. I don't really want to change that. But we do avoid throwing errors as much as possible.

And even when we say all bets are off when a function is misused, when we can give guarantees about certain features, it would be nice to do so.

Thus, although we've never really stated them, there are laws that I would expect, say, min to uphold, even with bad data:

for all a, b, c  
    min (a, b) = a  or  min (a, b) = b          (closed)
    min (a, b) = min (b, a)                     (commutative)
    min (a, min (b, c)) = min (min (a, b), c)   (transitive)

This PR demonstrates that we're violating commutativity (transitivity as well.) I think we could reasonably fix these functions to avoid that without too much effort. It cannot be perfect -- it's JS after all -- but it can become at least more consistent.

@abrwn
Copy link
Contributor Author

abrwn commented Nov 12, 2021

@CrossEye @hitmands @customcommander thanks all for input.

So would @CrossEye 's suggested approach, where we choose a max or min based on a set criteria be acceptable, as a balance between making these functions more predictable, and avoiding introducing heavyweight functions and/or throwing errors?

#3197 (comment)

Happy to apply this if so.

@CrossEye by the way, AFAIK a < b and b < a will never both be true, but they can both be false if the expression can't be evaluated. Which may simplify the logic slightly.

@CrossEye
Copy link
Member

@abrwn:

By the way, AFAIK a < b and b < a will never both be true, but they can both be false if the expression can't be evaluated. Which may simplify the logic slightly.

D'oh! 😉

Yes, although I might still implement it with if ((a < b) == (b < a)). But any implementation that captured the three cases would work.

I think this would be a good compromise between what we really want and what JS will allow us.

-- Scott

@CrossEye
Copy link
Member

@abrwn:

Are you interested in updating this PR in the manner discussed? If not, do you mind if we close this and open another PR for that?

@abrwn
Copy link
Contributor Author

abrwn commented Jan 22, 2022

@CrossEye I would be interested- sorry for the silence. I'll make some changes this week and update the PR.

@CrossEye
Copy link
Member

No problem at all. We've never been quick around here. But we are trying to speed up.

@abrwn
Copy link
Contributor Author

abrwn commented Jan 25, 2022

@CrossEye I've pushed a new commit based on the above discussion, using the string-based comparisons mentioned.
Ready for re-review.

@CrossEye CrossEye self-assigned this Jan 27, 2022
@customcommander
Copy link
Member

@abrwn I'm going give it a re-review this weekend with the intention to merge next week at the latest if all is good.

Copy link
Member

@CrossEye CrossEye left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One minor nit. Otherwise this looks great.

source/max.js Outdated
if (maxByType !== undefined) { return maxByType === typeof a ? a : b; }

const maxByStringValue = safeMax(toString(a), toString(b));
if (maxByStringValue !== undefined) { return maxByStringValue === toString(a) ? a : b; }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't like recalculating toString(a) here. Perhaps we can save stringA and stringB

@abrwn
Copy link
Contributor Author

abrwn commented Feb 7, 2022

One minor nit. Otherwise this looks great.

@CrossEye
Thanks - updated.
Also realised I was using const instead of var so updated that.

@abrwn abrwn requested a review from CrossEye February 10, 2022 12:49
Copy link
Member

@CrossEye CrossEye left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for all your work, and for your patience. This looks great!

@CrossEye CrossEye merged commit 37c568a into ramda:master Feb 22, 2022
@adispring adispring mentioned this pull request Apr 7, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants