New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

widening_mul #2417

Open
wants to merge 5 commits into
base: master
from

Conversation

Projects
None yet
@strake
Copy link
Contributor

strake commented Apr 24, 2018

View

strake added some commits Apr 24, 2018

@le-jzr

This comment has been minimized.

Copy link

le-jzr commented Apr 24, 2018

I think we could come up with a better name for the method, but otherwise this is something I'd very much like to see added. Given that on some architectures (x86 to name one), multiplication always produces result twice the width of the operands, it seems rather inefficient to not provide a way to use the higher half.

@strake

This comment has been minimized.

Copy link
Contributor Author

strake commented Apr 24, 2018

@le-jzr We can also choose an efficient-as-possible implementation on architectures where it doesn't and save users the bother.

Some name ideas:

  • carrying_mul
  • double_mul
  • double_wide_mul
@strake

This comment has been minimized.

Copy link
Contributor Author

strake commented Apr 24, 2018

For reference, here is the generic definition from my u2size crate:

#[inline(always)]
fn carrying_mul(x: usize, y: usize) -> (usize, usize) {
    let n = 0usize.count_zeros() >> 1;
    let halves = |x| (x >> n, x & (!0 >> n));
    let ((x1, x0), (y1, y0)) = (halves(x), halves(y));
    let (z2, z0) = (x1 * y1, x0 * y0);
    let z1 = (x1 + x0) * (y1 + y0) - z2 - z0;
    let (w, c) = usize::overflowing_add(z0, z1 << n);
    (z2 + z1 >> n + c as u32, w)
}

The emitted machine code for this is quite heinous.

@Ixrec

This comment has been minimized.

Copy link
Contributor

Ixrec commented Apr 24, 2018

Having never heard of this operation before, but immediately understanding why it would be useful, I think the name needs to have some form of the word "wide" in it.

carrying_mul sounds as if regular multiplication doesn't "carry the 1", which would just be inaccurate. double_mul sounds like multiplying twice, or multiplying by two, which are totally different operations. double_wide_mul I'd be fine with, but is probably overkill unless there's a quadruple_wide_mul we need to contrast it with. Finally, std already has wrapping_mul and overflowing_mul methods on these types, so for naming consistency I would call it widening_mul.

@le-jzr

This comment has been minimized.

Copy link

le-jzr commented Apr 24, 2018

After a bit of though, I see the logic behind widening_mul. It could work.

@tspiteri

This comment has been minimized.

Copy link

tspiteri commented Apr 24, 2018

@strake I know it's not really the point, but couldn't you do something like this?

#[inline(always)]
fn carrying_mul(x: usize, y: usize) -> (usize, usize) {
    let usize_bits = 0usize.count_zeros();
    if usize_bits <= 32 {
        let z = (x as u64) * (y as u64);
        ((z >> usize_bits) as usize, z as usize)
    } else if usize_bits <= 64 {
        let z = (x as u128) * (y as u128);
        ((z >> usize_bits) as usize, z as usize)
    } else {
        // fallback
        let n = usize_bits >> 1;
        let halves = |x| (x >> n, x & (!0 >> n));
        let ((x1, x0), (y1, y0)) = (halves(x), halves(y));
        let (z2, z0) = (x1 * y1, x0 * y0);
        let z1 = (x1 + x0) * (y1 + y0) - z2 - z0;
        let (w, c) = usize::overflowing_add(z0, z1 << n);
        (z2 + z1 >> n + c as u32, w)
    }
}
@devonhollowood

This comment has been minimized.

Copy link

devonhollowood commented Apr 25, 2018

Maybe instead of returning a tuple, it would make sense to return a struct? So e.g. u32::carrying_mul() would return a carried_product<u32>. This struct would have high and low fields. This makes it hard to confuse the fields, which would otherwise be an error-prone and hard to lint against. (Feel free to bikeshed on all the names here.)

@eddyb

This comment has been minimized.

Copy link
Member

eddyb commented Apr 25, 2018

I have only seen this called "widening multiply", but mostly in low-level (compiler/ISA) contexts.

@joshtriplett

This comment has been minimized.

Copy link
Member

joshtriplett commented Apr 25, 2018

I do wonder why this is a Mul impl on u2size rather than a mul of two usize values that returns a u2size (or equivalent such as a 2-tuple of usize, which I'd prefer over a new type here).

@scottmcm

This comment has been minimized.

Copy link
Member

scottmcm commented Apr 25, 2018

The broader rust project already has this in compiler-builtins (https://github.com/rust-lang-nursery/compiler-builtins/blob/master/src/int/mul.rs#L7), so I'd personally be happy to have it in core instead.

A quick playground experiment shows the ASM is simpler for returning (low, high) with u64. (For smaller types it didn't matter.) Conveniently that's consistent with overflowing_mul which also returns the low part of the result as the .0 part of the tuple.

But I wonder if this really wants to be ternary instead, like

fn mul_with_carry(self, multiplier: Self, carry: Self) -> (Self, Self)
@nagisa

This comment has been minimized.

Copy link
Contributor

nagisa commented Apr 25, 2018

@tspiteri

This comment has been minimized.

Copy link

tspiteri commented Apr 25, 2018

Regarding output tuple order, I think it could be made more obvious using the method name. For example fn mul_low_high(self, multiplier: Self) -> (Self, Self) makes it more obvious that ret.0 is the low word and ret.1 is the high word, and let (high, low) = a.mul_low_high(b); sets off alarm bells.

@rkruppe

This comment has been minimized.

Copy link
Member

rkruppe commented Apr 25, 2018

To me this operation falls in the same category as leading/trailing zeros, rotates, bit reverse, saturation, etc. in that it

  • is relatively niche, but important for some crucial routines and sometimes provided by hardware
  • is not provided directly by C and friends, and thus generally implemented with intrinsics or in terms of other operators (e.g., rotate ~= bit shifts and bitwise or) that then have to be pattern matched against to recognize the operation
  • (on some targets, at least) requires special casing by instruction selection to achieve good codegen, so the aforementioned pattern matching is important

For these operations I'm in favor of providing them in the Rust standard library, both to offer a canonical way to write out these important primitives and to make it easier for the compiler to generate good code reliably (i.e., no need to pattern match a particular way to implement the operation).

@joshtriplett joshtriplett added the T-libs label Apr 25, 2018

@strake

This comment has been minimized.

Copy link
Contributor Author

strake commented Apr 26, 2018

@joshtriplett the Mul impl is merely an example of use; the actual method is fn(usize, usize) -> (usize, usize).

@nagisa yes, it does for me too, but i as a crate author worry whether it will remain so on other targets — will it generate such code on a 32-bit machine? a 128-bit machine? is Rust willing to preclude ever supporting 128-bit targets (e.g. RV128I)? I would prefer to have a method which means efficient double-wide multiplication rather than rely on the compiler to figure out what i mean and myself to not mess it up every time.

@rkruppe thanks for verbalizing these points — yes, the notion is to have a canonical form so neither the compiler nor a future reader needs to guess what is going on.

widening_mul LGTM, will modify RFC when we reach a consensus.

@clarfon

This comment has been minimized.

Copy link
Contributor

clarfon commented Apr 26, 2018

Personally I think that the return type should be the next size up of an integer type. I wouldn't mind creating u256 and i256 types that only allow bit operations and casting, but we can also just not define this for 128-bit integers.

@joshtriplett

This comment has been minimized.

Copy link
Member

joshtriplett commented Apr 26, 2018

@clarcharr I'd love to have u256 (and maybe even u512) available, but not with just a limited set of operations. But that aside, I think we can universally define these operations as a function from T, T to (T, T), even for the largest type we have. Having versions that turn into the next larger type, for every type other than the largest, seems useful as well, but I'd hope that the LLVM optimizer makes it unnecessary to have two different versions.

@JonSeverinsson

This comment has been minimized.

Copy link

JonSeverinsson commented Apr 26, 2018

As the order of the high and low half of the result tuple is not obvious suggests to me that a tuple is probably not the best return type. The obvious solution would be a struct with two fields named "high" and "low", but I'm wondering if simply returning a wider integer type is not simpler (though that means i128/u128 will unfortunately lack the method).

I.e. add the following functions (and the equivalent for unsigned integers) to the standard library.

impl i8 {
    pub fn widening_mul(self, rhs: Self) -> i16 { ... }
}
impl i16 {
    pub fn widening_mul(self, rhs: Self) -> i32 { ... }
}
impl i32 {
    pub fn widening_mul(self, rhs: Self) -> i64 { ... }
}
impl i64 {
    pub fn widening_mul(self, rhs: Self) -> i128 { ... }
}
impl isize {
    #[cfg(target_pointer_width = "16")]
    pub fn widening_mul(self, rhs: Self) -> i32 { ... }
    #[cfg(target_pointer_width = "32")]
    pub fn widening_mul(self, rhs: Self) -> i64 { ... }
    #[cfg(target_pointer_width = "64")]
    pub fn widening_mul(self, rhs: Self) -> i128 { ... }
}
@eddyb

This comment has been minimized.

Copy link
Member

eddyb commented Apr 26, 2018

@JonSeverinsson Not sure there is much of a point in doing that because you can simply cast to the wider type and perform the multiplication.

@le-jzr

This comment has been minimized.

Copy link

le-jzr commented Apr 26, 2018

Returning twice as wide type instead of a tuple has the issue that it wouldn't work on usize, which is the canonical use case for this kind of operation (at least until someone adds ireg/ureg).

@strake

This comment has been minimized.

Copy link
Contributor Author

strake commented Apr 26, 2018

I'd be fine with returning either a tuple or a struct. As @le-jzr noted, returning twice-as-wide type is impossible for usize (unless we define u2size in core, which i consider far out of scope of this RFC).

@kennytm

This comment has been minimized.

Copy link
Member

kennytm commented Apr 26, 2018

Does it make sense to define widening_mul on signed integers?

fn widening_mul(self, other: isize) -> (isize, usize);
@clarfon

This comment has been minimized.

Copy link
Contributor

clarfon commented Apr 30, 2018

@kennytm absolutely imho.

@scottmcm

This comment has been minimized.

Copy link
Member

scottmcm commented May 4, 2018

Double-wide multiplication is a prerequisite of arbitrary-precision multiplication

I took a stab at the simplest version of that using the proposed method, and I think it ends up being something like this:

fn scalar_mul_assign(big: &mut Vec<u8>, scalar: u8) {
    let mut carry = 0;
    for i in big.iter_mut() {
        let (low, high) = i.wide_mul(scalar);
        let (low, overflowed) = low.overflowing_add(carry);
        *i = low;
        carry = high + if overflowed { 1 } else { 0 };
    }
    if carry > 0 {
        big.push(carry);
    }
}

That's not great, and I doubt LLVM will do a great job with it.

So I think it'd be good to have the carry built-in, so it can be something like

pub fn scalar_mul_assign2(big: &mut Vec<u8>, scalar: u8) {
    let mut carry = 0;
    for i in big.iter_mut() {
        let (x, c) = i.mul_with_carry(scalar, carry);
        *i = x;
        carry = c;
    }
    if carry > 0 {
        big.push(carry);
    }
}

Since one can always pass 0 as the carry, which LLVM will fold away easily.

@Pratyush

This comment has been minimized.

Copy link

Pratyush commented May 6, 2018

+1, would make writing certain cryptographic code very simple

@clarfon

This comment has been minimized.

Copy link
Contributor

clarfon commented May 10, 2018

One interesting use case for this might be a more efficient version of the multiply_and_divide function that num-integer uses internally:

/// Calculate r * a / b, avoiding overflows and fractions.
///
/// Assumes that b divides r * a evenly.
fn multiply_and_divide<T: Integer + Clone>(r: T, a: T, b: T) -> T {
    // See http://blog.plover.com/math/choose-2.html for the idea.
    let g = gcd(r.clone(), b.clone());
    r/g.clone() * (a / (b/g))
}

I mean, num-integer specifically almost certainly won't use this as it's not backwards compatible, but it is a nice use case.

@strake

This comment has been minimized.

Copy link
Contributor Author

strake commented Jul 3, 2018

ping

have we reached consensus on the name widening_mul?

@sm-Fifteen

This comment has been minimized.

Copy link

sm-Fifteen commented Nov 30, 2018

I'd like to point out that the apfloat component of the official librustc has had a widening_mul function for well over a year, so there's some precedent for this in official rust projects, just not in the language itself.

@TheIronBorn

This comment has been minimized.

Copy link

TheIronBorn commented Nov 30, 2018

The rand library uses wmul

@eddyb

This comment has been minimized.

Copy link
Member

eddyb commented Dec 1, 2018

@sm-Fifteen It is no coincidence I wrote that function as a non-allocating bigint multiplication helper when I ported APFloat from C++ and #2417 (comment) - so don't count my "vote" for widening_mul twice, heh.

@strake strake changed the title carrying_mul widening_mul Dec 5, 2018

@strake strake force-pushed the strake:carrying_mul branch from c420699 to 329e58b Dec 5, 2018

@strake

This comment has been minimized.

Copy link
Contributor Author

strake commented Dec 5, 2018

Consensus for name seems to be widening_mul; modified RFC.

@fstirlitz

This comment has been minimized.

Copy link

fstirlitz commented Jan 19, 2019

What's the rationale for having u2size at all? Given that usize is supposed to be used for pointers, offsets and memory sizes, if multiplication involving it overflows, it means the result represents an amount of memory you cannot possibly allocate or refer to. You shouldn't even care what the value is: all that really matters is that it's too large to handle.

@clarfon

This comment has been minimized.

Copy link
Contributor

clarfon commented Jan 19, 2019

@fstirlitz One use case would be intermediate results that may overflow, e.g. rationals. While the end result of x*y/z might fit in a usize, the product x*y may not.

@strake

This comment has been minimized.

Copy link
Contributor Author

strake commented Jan 21, 2019

@fstirlitz I had 2 motivating use cases for u2size:

  • Intermediate values in arbitrary-precision multiplication
  • The which-trees-present value in smoothsort, which is word_size/log₂((1+√5)/2) bits
@lu-zero

This comment has been minimized.

Copy link

lu-zero commented Jan 29, 2019

Would be good to have the variants that return the wider type directly, since it has direct uses for a number of common dsp operations.

@strake

This comment has been minimized.

Copy link
Contributor Author

strake commented Jan 29, 2019

@lu-zero u2size is not in core/std; i merely cite it as an example use of this operation. For fixed-width types, you can merely cast to the wider type before multiplying, and the emitted code will use the machine instruction for widening multiplication.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment