Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement `iter::Sum` and `iter::Product` for `Option` #58975

Open
wants to merge 3 commits into
base: master
from

Conversation

@jtdowney
Copy link
Contributor

commented Mar 6, 2019

This is similar to the existing implementation for Result. It will take each item into the accumulator unless a None is returned.

I based a lot of this on #38580. From that discussion it didn't seem like this addition would be too controversial or difficult. One thing I still don't understand is picking the values for the stable attribute. This is my first non-documentation PR for rust so I am open to any feedback on improvements.

Implement `iter::Sum` and `iter::Product` for `Option`
This is similar to the existing implementation for `Result`. It will
take each item into the accumulator unless a `None` is returned.
@rust-highfive

This comment has been minimized.

Copy link
Collaborator

commented Mar 6, 2019

r? @kennytm

(rust_highfive has picked a reviewer for you, use r? to override)

@Centril

This comment has been minimized.

Copy link
Contributor

commented Mar 6, 2019

@rust-highfive rust-highfive assigned SimonSapin and unassigned kennytm Mar 6, 2019

@scottmcm

This comment has been minimized.

Copy link
Member

commented Mar 7, 2019

Previous PR about Sum for Option: #50884

@jtdowney

This comment has been minimized.

Copy link
Contributor Author

commented Mar 8, 2019

@scottmcm thanks, I missed this PR in my search. If folks still think it shouldn't be in core then that is fine with me. I was mostly confused about the lack of parity with Result which is somewhat unusual from my experience with rust.

@scottmcm

This comment has been minimized.

Copy link
Member

commented Mar 8, 2019

@jtdowney Yours is consistent in behaviour with Result, right? I think the other wasn't, so it's possible the outcome will be different here.

@jtdowney

This comment has been minimized.

Copy link
Contributor Author

commented Mar 8, 2019

Correct, this is PR is consistent with how iter::Product and iter::Sum work with Result today. I hadn't finished reading the other PR before I commented previously but they seem to be going at different things. My use case is solved today by converting the iterator of Option<T> to Result<T, ()> with something like iter.map(|v| v.ok_or(())).sum() but that seemed unnecessary which is why I sent the PR.

@Dylan-DPC

This comment has been minimized.

Copy link
Member

commented Mar 18, 2019

ping from triage anyone from @rust-lang/libs can review this?

@alexcrichton

This comment has been minimized.

Copy link
Member

commented Mar 19, 2019

This looks reasonable to me to merge! In that case I'll...

@rfcbot fcp merge

@rfcbot

This comment has been minimized.

Copy link

commented Mar 19, 2019

Team member @alexcrichton has proposed to merge this. The next step is review by the rest of the tagged team members:

Concerns:

Once a majority of reviewers approve (and at most 2 approvals are outstanding), this will enter its final comment period. If you spot a major issue that hasn't been raised at any point in this process, please speak up!

See this document for info about what commands tagged team members can give me.

@dtolnay

This comment has been minimized.

Copy link
Member

commented Mar 19, 2019

@rfcbot concern not sold on these impls

The use cases that come to mind would all be more clearly written with a for-loop. Could you point out some places in real code where these impls would be beneficial?

I can see that part of the motivation is to be consistent with Result's Sum and Product impls. But I find that the behavior of summing an iterator of Result<T, E> into a Result<T, E> is more intuitive to me than the behavior of summing Option<T> into an Option<T>. Looking at the example code from the PR:

v.iter().map(|&x: &i32|
    if x < 0 { None }
    else { Some(x) }
).sum()

it would be easy to read this as summing only those items that are Some and ignoring Nones.

@jtdowney

This comment has been minimized.

Copy link
Contributor Author

commented Mar 19, 2019

@dtolnay the specific case I hit when I discovered this was missing was taking a string, calling chars(), map(|c| c.to_digit(10)), and then summing that after some more filtering and mapping. That code you quoted in the PR was modified from the existing PR for Result. Happy to replace it with a better example.

@Centril Centril added relnotes and removed needs-fcp labels Mar 19, 2019

@Centril Centril added this to the 1.35 milestone Mar 19, 2019

Show resolved Hide resolved src/libcore/iter/traits/accum.rs Outdated
Show resolved Hide resolved src/libcore/iter/traits/accum.rs Outdated
Update stable attribute to be since 1.35.0
Co-Authored-By: jtdowney <jdowney@gmail.com>
@jtdowney

This comment has been minimized.

Copy link
Contributor Author

commented Mar 20, 2019

If it helps, here is the code I was trying to write:

fn is_luhn_valid(value: &str) -> bool {
    value
        .chars()
        .rev()
        .map(|c| c.to_digit(10))
        .enumerate()
        .map(|(idx, n)| {
            if idx % 2 != 0 {
                n.map(|v| {
                    let v = v * 2;
                    if v > 9 {
                        v - 9
                    } else {
                        v
                    }
                })
            } else {
                n
            }
        })
        .sum::<Option<u32>>()
        .map(|total| total % 10 == 0)
        .unwrap_or(false)
}

fn main() {
    dbg!(is_luhn_valid("4111111111111111"));
    dbg!(is_luhn_valid("5454545454545454"));
}

This is an implementation of the luhn check algorithm.

@dtolnay

This comment has been minimized.

Copy link
Member

commented Mar 20, 2019

I think a more inspired example in the documentation would be helpful -- something reasonably representative of the way that you might find this impl used in real code where it is the best solution to the problem being solved.

(Not that doc comments inside trait impls end up all that visible. Current rustdoc would bury it pretty thoroughly. But it would serve as long-term documentation of the use cases that we wanted to support by providing the impl.)


The use cases that come to mind would all be more clearly written with a for-loop.

I think unfortunately this applies to your Luhn implementation too. It is neat that you were able to find a way to express this all in a single method chain, and only one semicolon, but I think it comes out quite challenging to read. A reader looking to understand the implementation or code reviewer looking to confirm that it is correct would need to:

  • Infer based on code lower down that to_digit returns an Option.
  • Trace through the Option gymnastics to figure out that |(idx, n)| { ... } returns an Option.
  • Not get lost in 6 levels of indentation.
  • Probably look up what summing Options into an Option means.

Here is how I might write this instead. My implementation is the same number of lines (24) but some of them are now blank. I find this one far easier to follow -- curious what you think. Even the signature of to_digits is more obvious because we are using ? inside a function that returns Option. It is half as indented and the type-inference-in-your-head is far simpler.

Based on this use case I would still side against accepting the new impls.

fn is_luhn_valid(input: &str) -> bool {
    match luhn_sum(input) {
        Some(sum) => sum % 10 == 0,
        None => false,
    }
}

fn luhn_sum(input: &str) -> Option<u32> {
    let mut sum = 0;

    for (idx, ch) in input.chars().rev().enumerate() {
        let digit = ch.to_digit(10)?;

        if idx % 2 == 0 {
            sum += digit;
        } else if digit * 2 <= 9 {
            sum += digit * 2;
        } else {
            sum += digit * 2 - 9;
        }
    }

    Some(sum)
}

Another possibility which is similarly readable:

fn is_luhn_valid(input: &str) -> bool {
    let mut sum = 0;

    for (idx, ch) in input.chars().rev().enumerate() {
        let digit = match ch.to_digit(10) {
            Some(digit) => digit,
            None => return false,
        };

        if idx % 2 == 0 {
            sum += digit;
        } else if digit * 2 <= 9 {
            sum += digit * 2;
        } else {
            sum += digit * 2 - 9;
        }
    }

    sum % 10 == 0
}
@jtdowney

This comment has been minimized.

Copy link
Contributor Author

commented Mar 20, 2019

I am happy to provide a better doc use case but we will have to agree to disagree about the for loop. You seem to be discounting the number of cases we can't think up in this PR and excusing a clear lack of symmetry in the API with regards to Option vs Result.

@Centril

This comment has been minimized.

Copy link
Contributor

commented Mar 21, 2019

I think unfortunately this applies to your Luhn implementation too. Nice job finding a way to express this all in a single method chain, and only one semicolon, but I think it comes out quite challenging to read. A reader looking to understand the implementation or code reviewer looking to confirm that it is correct would need to:

This seems like the usual matter of preference between an imperative vs. a functional style of writing the code. For example, I don't think doing sum += 3 times is great when you can extract it to a value first.

Trace through the Option gymnastics to figure out that |(idx, n)| { ... } returns an Option.

Even the signature of to_digits is more obvious because we are using ? inside a function that returns Option.

The functional style algorithm has n.map(|v| { ... }) which makes that clear.

Not get lost in 6 levels of indentation.

If this is a concern then extract the lambdas to the top level and you will have ~2 levels of indent.

What is nice about @jtdowney's version is that it encourages separation of concerns and extraction into smaller steps.

@dtolnay

This comment has been minimized.

Copy link
Member

commented Mar 21, 2019

Thanks for the feedback -- I will give it some more thought under the understanding that @Centril finds the iterator chain easier to read.

For now, registering the comment about example code in the documentation, which I would still like to see:

@rfcbot concern more fleshed out example use case in documentation

@Centril

This comment has been minimized.

Copy link
Contributor

commented Mar 21, 2019

Aside: This is another good case where stable type ascription on expressions and patterns (cc rust-lang/rfcs#2522) would have been useful so that you could have written .sum(): Option<u32> and .map(|(idx, n: Option<_>)| { if desired.

@dtolnay

This comment has been minimized.

Copy link
Member

commented Mar 21, 2019

Thanks for the improved example!

@rfcbot resolve more fleshed out example use case in documentation

@dtolnay

This comment has been minimized.

Copy link
Member

commented Mar 21, 2019

I suspect that I will come around to this PR, but before that I would like to better understand the perspective that prefers to use this new Sum impl for something like is_luhn_valid. I am trying to pin down how code using these impls is expected to be understood by readers, so that we may avoid adding impls (even where justified by consistency) that would mostly be used in ways that make Rust code hard to understand.

In the case of is_luhn_valid, do any of the following characterize the perspective of @jtdowney or @Centril or anyone else? Some of these I recognize are uncharitable but included to cover as much of the spectrum as I can identify.

  1. Nobody will need to read this code.

  2. People will read the code but won't need to understand it.

  3. People will need to understand the code, but not in such detail that they would need to identify consciously that the operation of summing an iterator of Option<integer> into an Option<integer> is involved. It is sufficient to see that to_digit, sum, and % 10 are involved to see that the code is correct.

  4. In order to understand how the code works and why it is correct, a reader would need to consciously identify that summing an iterator of Option<integer> into an Option<integer> is involved.

    • A) Most readers would not be able to identify this, but still get a better understanding of the code than they would from an imperative style due to the functional separation of concerns and extraction into smaller steps.

    • B) Readers will generally tell that we are summing an iterator of Option<integer> into an Option<integer> but would not need to be conscious of the behavior of that Sum impl in order to tell that the code is correct.

    • C) Readers will identify what Sum impl is involved and intuitively understand its behavior, as they likely would with the Sum impl on Result.

    • D) Readers will identify what Sum impl is involved and will know its behavior from memory from having seen it used and figured it out before.

    • E) Readers will identify what Sum impl is involved and most would need to look up what that Sum impl does in order to tell that the code is correct.

    • F) Readers will identify what Sum impl is involved and deduce its behavior from first principles i.e. there is only one reasonable way that the impl could be implemented.

    • G) Readers will identify what Sum impl is involved and deduce its behavior by analogy with the Sum impl on Result, which they would more likely be familiar with.

    • H) Readers will identify what Sum impl is involved and deduce its behavior by analogy with how ? operator on Option works.

    • I) Readers will identify what Sum impl is involved and deduce its behavior by working backwards from the assumption that is_luhn_valid is correctly implemented.

@jtdowney

This comment has been minimized.

Copy link
Contributor Author

commented Mar 21, 2019

My perspective is the person reading the code would understand it if they knew that rust already had the idiom where an Option or Result can be moved to the outer context when accumulating an iterator, because this is what happens already for collect() and partially for sum(). This is not a new idiom in functional programming and is not a new idiom in rust either.

@Centril

This comment has been minimized.

Copy link
Contributor

commented Mar 21, 2019

My perspective:

  • 1-2: Doesn't characterize my perspective.

  • 3: Sometimes -- looking at the structure at a glance is sometimes useful. For example, if you as a reviewer are relatively familiar with the algorithm I think skimming is fine.

  • 4:

    • A) Not my perspective but the functional style is helpful for code maintainability overall as a matter of encouraging refactoring. I.e. it's a consideration where a bigger context is accounted for rather than just the individual function.

    • B) I think not needing to think about the semantics of .sum works well if you have a rough understanding of the algorithm.

    • C) I think that n.map(|v| {, .map(|total| total % 10 == 0), ::<Option<u32>>(), and .unwrap_or(false) are all good indicators that Option<_> is being worked on. From that understanding of the resulting type should flow what summing means. To me it's a persuasive argument that if Result<_, _> can be understood then so should Option<_>. This also suggests that it is "expected".

    • D) This is probably true for some people. For some it's likely a mental short-cut.

    • E) Hmm... Hard to say... On one hand I think a transformation of form Iterator<Item = T> -> Option<T> which is called sum clearly indicates its semantics. On the other, some people may not do such type based reasoning. I think if you haven't seen this function before and you want to double-check because the code-path is unusually important then looking the impl up may be necessary. However, I think most people won't need to look it up at least more than once.

    • F) This applies to a reasonable portion of people. I count myself among them.

    • G) I think that if you are familiar with the impl for Result then inferring semantics from that is likely. Not sure how many folks are familiar.

    • H) This seems a bit far-fetched as there's no clear immediate connection.

    • I) I think this applies to a degree if you are aware of the algorithm.

    Many of these seem to apply and I think collectively they conspire to cover different subsets of users (but the subsets do not form a partition) such that together, they cover a good portion of all readers.

@dtolnay

This comment has been minimized.

Copy link
Member

commented Mar 21, 2019

Thanks @jtdowney and @Centril -- I think the case has been made that this is a matter of preference. I am on board with accepting the impl and leaving it up to preference when to rely on it.

@rfcbot resolve not sold on these impls

@moshg

This comment has been minimized.

Copy link

commented Mar 29, 2019

Isn't .map(|x| x.ok_or(())) sufficient?
Is it ok to add a function counterintuitive for not a few people?

@Centril

This comment has been minimized.

Copy link
Contributor

commented Mar 29, 2019

Isn't .map(|x| x.ok_or(())) sufficient?

That doesn't seem more intuitive, discoverable, or readable.

@moshg

This comment has been minimized.

Copy link

commented Mar 29, 2019

That doesn't seem more intuitive, discoverable, or readable.

At least, this doesn't seem less discoverable. And this doesn't have to be more intuitive, discoverable or readable for those who are happy with Sum<Option<U>> for Option<T> because this avoid confusion of those who feel it canonical to implement Sum<Option<U>> for T. It is sufficient to be about as intuitive and readable as this PR's impl.
And I discovered NoneError. How about map(|x| x.ok_or(NoneError))?

@scottmcm

This comment has been minimized.

Copy link
Member

commented Mar 29, 2019

I think we've seen from collecting into a Result that these forwarding impls for Result (and Option) aren't very discoverable. This PR, and the other one I linked earlier, also demonstrate that it's non-obvious exactly what the semantics should be, even if its existence is known.

So I wonder if try_sum (and try_product and try_collect) or similar might be more discoverable and more easily understood.

@Dylan-DPC

This comment has been minimized.

Copy link
Member

commented Apr 8, 2019

ping from triage @SimonSapin any updates?

@Centril Centril modified the milestones: 1.35, 1.36 Apr 12, 2019

@Dylan-DPC

This comment has been minimized.

Copy link
Member

commented Apr 22, 2019

ping from triage can anyone from @rust-lang/libs review this?

@alexcrichton

This comment has been minimized.

Copy link
Member

commented Apr 23, 2019

@SimonSapin, @sfackler, or @withoutboats would y'all be up for weighing in on the FCP above?

@rfcbot

This comment has been minimized.

Copy link

commented Apr 24, 2019

🔔 This is now entering its final comment period, as per the review above. 🔔

@cristicbz

This comment has been minimized.

Copy link
Contributor

commented Apr 25, 2019

FWIW, when seeing this in TWiR my immediate assumption was that, rather than shortcircuiting after the first None, this would sum all the things ignoring None-s. A bit like nansum/nanmean in e.g. numpy (granted those work with nan-s).

I expect Result to short-circuit, since often giving up on the first error makes sense, but I'd expect Option to filter. Anything else suggests the use of Option where Result would have been more appropriate.

In the is_luhn_valid case, I'd argue Result-like semantics are closer to the truth---a non-digit character in the sequence suggests malformed input; so something like char.to_digit().ok_or(Error::NonDigitChar) would be what I'd expect.

Maybe this ship has sailed since there already is a shortcircuiting FromIterator<Option<A>> for Option<V>, which I find suprising. At least that implementation doesn't preclude the, to me, more natural FromIterator<Option<A>> for V where V: FromIterator<A>.

@moshg

This comment has been minimized.

Copy link

commented Apr 25, 2019

It might be appropriate to make the RFC of the API guideline about the equation of Option<T> and Result<T, ZST> or about the distinction between those and which we should use depending on the situation before stabilize this.
It might also be good the RFC about when we may implement traits.
e.g. whenever we have the natural semantics or when we can immediately understand the semantics of the functions using the trait with/without specifying the type parameters.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.