New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tracking issue for RFC 2169: Euclidean Modulo #49048

Open
alexcrichton opened this Issue Mar 15, 2018 · 31 comments

Comments

Projects
None yet
@alexcrichton
Copy link
Member

alexcrichton commented Mar 15, 2018

Tracking issue for rust-lang/rfcs#2169, euclidean modulo methods on numbers

Steps:

  • Implementation (#49389)
  • Stabilization PR
@varkor

This comment has been minimized.

Copy link
Member

varkor commented Mar 16, 2018

@fanzier had expressed interest in implementing this RFC as a good starter issue (most of the details are included in the RFC), so I'd be happy to mentor this issue if they want to tackle it!

@KodrAus

This comment has been minimized.

Copy link

KodrAus commented Apr 13, 2018

A question came up in the initial implementation PR about shifting the euclidean modulo methods for f32/f64 into libcore rather than libstd.

bors pushed a commit that referenced this issue Apr 13, 2018

bors added a commit that referenced this issue Apr 13, 2018

Auto merge of #49389 - fanzier:euclidean-division, r=KodrAus
Implement RFC #2169 (Euclidean modulo).

Tracking issue: #49048
@sdroege

This comment has been minimized.

Copy link
Contributor

sdroege commented Apr 18, 2018

euc as the function name seems a bit short. I didn't see that one before elsewhere, is it used in any other language? A longer, more descriptive name would seem useful.

@scottmcm

This comment has been minimized.

Copy link
Member

scottmcm commented Apr 18, 2018

@sdroege See conversations in the RFC, such as rust-lang/rfcs#2169 (comment)

@lfairy

This comment has been minimized.

Copy link
Contributor

lfairy commented Apr 18, 2018

@huonw made this argument for a longer name, which I find convincing:

Using just euc as a suffix seems a bit like Rust's abbreviation gone too far. For a common word like Vec, it makes sense in a huffman-encoding sense (length should be inversely proportional to frequency) and "vector" is common enough that it's relatively self-explanatory. But, performing Euclidean division is much less common, and "Euclidean" isn't that common as a word to begin with. At least for mod, it could be mod_positive, but something like div_euclidean/mod_euclidean does pair nicer.

@varkor

This comment has been minimized.

Copy link
Member

varkor commented Apr 18, 2018

@sdroege, @lfairy: I think the plan was to take another look at the method names during the FCP for these methods (which won't be for a while at least) — because there was some disagreement about what to prioritise for the naming here. With some actual usage statistics, it might be easier to form arguments one way or the other.

@xen0n

This comment has been minimized.

Copy link
Contributor

xen0n commented Apr 26, 2018

Just my 2c: the suffix euc is woefully confusing for Asian users where "EUC" usually refers instead to a family of legacy charsets like EUC-JP or EUC-CN. Considering the expected frequency of usage a longer name might be more appropriate -- we'll have to see.

@pietroalbini

This comment has been minimized.

Copy link
Member

pietroalbini commented Apr 26, 2018

There is an issue about the future compatibility lint of this: #50232

@varkor

This comment has been minimized.

Copy link
Member

varkor commented Apr 26, 2018

@pietroalbini: I think that was an issue about the lint in general, rather than specifically about mod_euc; I've made a comment on the issue to clarify. Thanks!

@Boscop

This comment has been minimized.

Copy link

Boscop commented Apr 30, 2018

I really like the short name (I've been using this function a lot in my code, also had named it mod_euc). Even knowing that "EUC" also refers to charsets, in this case it's disambiguated by the mod_ part of the function name :)
And anyone who stumbles across this function in 3rdparty code will see that it is called with numbers as args, so they can easily infer that it is "probably some kind of modulo" even if they don't know about euclidean modulo, but then they will look it up and find out!
So IMO there's not really a high chance for confusion with the "EUC" charsets..

@clarcharr

This comment has been minimized.

Copy link
Contributor

clarcharr commented May 4, 2018

I didn't follow the original RFC, but I think that the methods should be div_euc and rem_euc for consistency with the Rem trait.

@clarcharr

This comment has been minimized.

Copy link
Contributor

clarcharr commented May 4, 2018

Another thing to note: while it has been mentioned before that having a div_rem method wouldn't make too much sense considering how LLVM automatically optimises nearby x / y and x % y calls to one operation, but I think it would be worthwhile to have a div_rem_euc method which is more efficient.

Although I don't know how far down the rabbit hole we'd want to go with that, considering how adding every combination of checked, wraping, and overflowing for div_rem_euc and div_rem would be around 8 new methods.

@demurgos

This comment has been minimized.

Copy link

demurgos commented May 4, 2018

Regarding terminology, I believe that mod (for modulo) was specifically chosen to refer the mathematical modulo and avoid confusion with the rem (remainder) operation available currently. The choice to avoid rem_euc was deliberate.

@clarcharr

This comment has been minimized.

Copy link
Contributor

clarcharr commented May 4, 2018

@demurgos Haskell uses the distinction quot/rem for the Euclidean versions and div/mod for the non-euclidean versions. I personally don't like this distinction and find it hard to remember, which is why I accept Rust's decision to simply append _euc to indicate Euclidean versions.

But if we're going to change rem to mod and not div to quot, that seems too odd to me. And besides, the ship has sailed and Rust is stuck with rem instead of mod. So, I'd say that rem_euc is a better name.

@clarcharr

This comment has been minimized.

Copy link
Contributor

clarcharr commented May 4, 2018

Also, I messed around with the code and div_rem_euc would be:

let (q, r) = (self / rhs, self % rhs);
if r < 0 {
    if rhs > 0 {
        (q - 1, r - rhs)
    } else {
        (q + 1, r + rhs)
    }
} else {
    (q, r)
}

versus the existing:

let q = self / rhs;
let q = if self % rhs < 0 {
    if rhs > 0 { q - 1 } else { q + 1 }
} else {
    q
};
let r = self % rhs;
let r = if r < 0 {
    if rhs < 0 {
        r - rhs
    } else {
        r + rhs
    }
} else {
    r
};
(q, r)

And I haven't run any benchmarks, but it seems pretty clear that even if LLVM can combine the existing division and remainder into one operation, it probably won't be smart enough to combine the branches. But, if we can write the latter in a way that optimises to the former, I think it'd be reasonable to say that a divrem method isn't necessary.

@tspiteri

This comment has been minimized.

Copy link
Contributor

tspiteri commented May 4, 2018

I don't think it should be hard for LLVM to find the common subexpressions. In fact, I get the same assembly for div_mod_euc and separate_div_mod_euc below; the only difference I found was in the labels.

pub fn div_mod_euc(lhs: i32, rhs: i32) -> (i32, i32) {
    let (q, r) = (lhs / rhs, lhs % rhs);
    if r < 0 {
        if rhs > 0 {
            (q - 1, r - rhs)
        } else {
            (q + 1, r + rhs)
        }
    } else {
        (q, r)
    }
}
pub fn separate_div_mod_euc(lhs: i32, rhs: i32) -> (i32, i32) {
    (div_euc(lhs, rhs), mod_euc(lhs, rhs))
}
pub fn div_euc(lhs: i32, rhs: i32) -> i32 {
    let (q, r) = (lhs / rhs, lhs % rhs);
    if r < 0 {
        if rhs > 0 {
            q - 1
        } else {
            q + 1
        }
    } else {
        q
    }
}
pub fn mod_euc(lhs: i32, rhs: i32) -> i32 {
    let r = lhs % rhs;
    if r < 0 {
        if rhs > 0 {
            r - rhs
        } else {
            r + rhs
        }
    } else {
        r
    }
}
@varkor

This comment has been minimized.

Copy link
Member

varkor commented May 5, 2018

Haskell uses the distinction quot/rem for the Euclidean versions and div/mod for the non-euclidean versions. I personally don't like this distinction and find it hard to remember

Even though it can be annoying to have to remember the distinction, it's more helpful to follow convention and at least be unambiguous than mix terminology confusingly. Rust originally used Quot and Rem too, but Quot was later renamed to Div because it was more familiar. Unfortunately, this leaves us in an awkward state where one of the operators is named inconsistently. It's too late to change that now, though, so I think the best we can do is try to be correct going forward — hopefully the suffix makes it clear something different is going on.

Another thing to note: while it has been mentioned before that having a div_rem method wouldn't make too much sense considering how LLVM automatically optimises nearby x / y and x % y calls to one operation, but I think it would be worthwhile to have a div_rem_euc method which is more efficient.

I was under the impression LLVM doesn't optimise this as much as it could, but I haven't checked in a while, so that may be outdated. I think it could be worth having a dedicated method for this, and it's worth proposing (it wasn't part of the original RFC just to avoid too much bloat).

Although I don't know how far down the rabbit hole we'd want to go with that, considering how adding every combination of checked, wraping, and overflowing for div_rem_euc and div_rem would be around 8 new methods.

Yeah, this method propagation is unfortunate, and it's unclear how to completely avoid it. Maybe just by adding the simplest methods first and then adding the others if/when people request them?

@anirudhb

This comment has been minimized.

Copy link
Contributor

anirudhb commented Jul 19, 2018

I would like to work on stabilizing this. One question: would it be stable since 1.27.1 or the next version?

@pietroalbini

This comment has been minimized.

Copy link
Member

pietroalbini commented Jul 19, 2018

You should use the current nightly version when stabilizing (at the moment 1.29.0).

@anirudhb

This comment has been minimized.

Copy link
Contributor

anirudhb commented Jul 19, 2018

OK, thanks! I'll open a PR soon.

@anirudhb

This comment has been minimized.

Copy link
Contributor

anirudhb commented Jul 20, 2018

I've closed my PR because there is still discussion about naming the functions.

@daboross

This comment has been minimized.

Copy link
Contributor

daboross commented Nov 9, 2018

@clarcharr do you think you could elaborate on your concerns in #52547 (comment)?

It seems to me that the consensus in the general programming community is for -5 remainder x being negative and -5 modulo y being positive. As other's have said in this thread, the distinction was purposeful.


There's was already quite a bit of discussion in rust-lang/rfcs#2169 that came up with the current names, mod_euc and div_euc. If the arguments there don't provide enough rationale for the current names, we could discuss more here. But if they're convincing enough to justify the current names, might I move to stabilize this again?

@ubsan

This comment has been minimized.

Copy link
Contributor

ubsan commented Nov 9, 2018

@demurgos This is not a "mod operation", mathematically, this is a remainder operation. "mod operation" doesn't make sense - modular arithmetic is about equivalences, not operators.

a ~ b mod m means that ∃k ∈ ℤ s.t. k (a - b) = m. There is no mathematical x mod m operation. There's a reason that the division algorithm calls it r - ∀a, b ∈ ℤ, ∃! q, r ∈ ℤ s.t. (0 ≤ r < b ∨ b < r ≤ 0) ∧ aq + r = b. It's the remainder.

Notably, also - mod has no qualms about how negative b is, and it's utterly fine to say, for example, that 128 ~ -128 (mod 256), which is how 2s complement arithmetic works.

@ubsan

This comment has been minimized.

Copy link
Contributor

ubsan commented Nov 9, 2018

Also, the names are too short - div_euclidean and rem_euclidean are much better names.

@varkor

This comment has been minimized.

Copy link
Member

varkor commented Nov 9, 2018

This is not a "mod operation", mathematically, this is a remainder operation. "mod operation" doesn't make sense - modular arithmetic is about equivalences, not operators.

There is such a thing as a mod(ulo) operation. It's analogous to modular arithmetic, but it's a distinct concept.

Also, the names are too short

This is exactly the point under discussion. Personally, I find the full _euclidean suffix to be too verbose, considering it's the most common choice for negative operands in many contexts (the RFC thread contains many examples).

@ubsan

This comment has been minimized.

Copy link
Contributor

ubsan commented Nov 9, 2018

@varkor note the first paragraph - "In computing, the modulo operation". Note also the ambiguity around sign - calling it "modulo" doesn't make it positive. Remainder is what the rest of the ecosystem calls it, we should continue calling it remainder.

@daboross Afaict, only languages like Haskell make this distinction - C still calls it the modulo operator, but a % b is between 0 and b, not between 0 and |b|.

@varkor

This comment has been minimized.

Copy link
Member

varkor commented Nov 9, 2018

I'm not sure why the mathematics versus programming distinction should be made, but even then, the definition for programming languages seems like the most appropriate anyway.

As you say, the precise definition is language-specific anyway, but:

Remainder is what the rest of the ecosystem calls it, we should continue calling it remainder.

Rust chose the name "remainder" specifically because it produced negative results for negative dividends. This is either described in the commit history or the mailing list, but I don't have time to dig it up right now. (If I recall correctly, it used to be called "modulo" until it was changed for this reason.)

@ubsan

This comment has been minimized.

Copy link
Contributor

ubsan commented Nov 9, 2018

@varkor that doesn't change the fact that even in CS, "mod" does not imply positive. It's only used that way, afaict, in Haskell - other languages use mod to just mean remainder.

You'll note, for example, that C++'s std::fmod does something completely different - fmod(-8, 3) = -2, having the same sign as the dividend, not the divisor, and not positive.

@clarcharr

This comment has been minimized.

Copy link
Contributor

clarcharr commented Nov 9, 2018

I personally think that _euclid is a good suffix; it's not as verbose as _euclidean while still feeling natural and unambiguous. Additionally, even as a mathematically inclined person, I don't immediately read "euc" as "Euclidean," whereas I feel that even non-mathematically-inclined people would read "euclid" as "Euclidean."

As far as the distinction between "modulo" and "remainder," I feel like the most important distinction is that Rust currently uses "rem" to refer to the remainder of truncated division, rather than "mod." Because div_euc does not use any weird terminology compared to the regular div, it feels inconsistent to use mod instead of rem for mod_euc.

So to clarify, I think we should name the methods to div_euclid and rem_euclid.

That way, we have truncated division + remainder and Euclidean divison + remainder, rather than truncated division + remainder and Euclidean division + modulo.

@clarcharr

This comment has been minimized.

Copy link
Contributor

clarcharr commented Nov 9, 2018

Side note-- I really think that Rust should have somewhere in the documentation for Div that division is truncated, rather than floored. I don't know all of the places where this would be relevant, but I feel like having that officially clarified is important, considering how C used to have floored division and then changed.

@ubsan

This comment has been minimized.

Copy link
Contributor

ubsan commented Dec 17, 2018

Can I fix the names of these functions? Will that be accepted?

ubsan added a commit to ubsan/rust that referenced this issue Dec 18, 2018

rename div_euc -> div_euclid, and mod_euc -> rem_euclid
logic is written up in rust-lang#49048

Also, update the documentation slightly

Centril added a commit to Centril/rust that referenced this issue Dec 22, 2018

Rollup merge of rust-lang#56936 - ubsan:euclidean_div_rem, r=dtolnay
rename div_euc -> div_euclid, and mod_euc -> rem_euclid

logic is written up in rust-lang#49048

Also, update the documentation slightly.

cc @alexcrichton @clarcharr @varkor

kennytm added a commit to kennytm/rust that referenced this issue Dec 22, 2018

Rollup merge of rust-lang#56936 - ubsan:euclidean_div_rem, r=dtolnay
rename div_euc -> div_euclid, and mod_euc -> rem_euclid

logic is written up in rust-lang#49048

Also, update the documentation slightly.

cc @alexcrichton @clarcharr @varkor

kennytm added a commit to kennytm/rust that referenced this issue Dec 22, 2018

Rollup merge of rust-lang#56936 - ubsan:euclidean_div_rem, r=dtolnay
rename div_euc -> div_euclid, and mod_euc -> rem_euclid

logic is written up in rust-lang#49048

Also, update the documentation slightly.

cc @alexcrichton @clarcharr @varkor

kennytm added a commit to kennytm/rust that referenced this issue Dec 22, 2018

Rollup merge of rust-lang#56936 - ubsan:euclidean_div_rem, r=dtolnay
rename div_euc -> div_euclid, and mod_euc -> rem_euclid

logic is written up in rust-lang#49048

Also, update the documentation slightly.

cc @alexcrichton @clarcharr @varkor
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment