Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add unsigned operations to Int32, Int64, Nativeint #1458

Merged
merged 3 commits into from Nov 11, 2018

Conversation

@nojb
Copy link
Contributor

commented Oct 30, 2017

I missed having unsigned operations in the standard integer modules lately, so following discussion of #1201, I propose here the addition of compare_unsigned, div_unsigned, and rem_unsigned to each of Int32, Int64, and Nativeint.

I did not focus on performance for a first version, so the implementation is in OCaml in terms of the existing signed operations, following Hacker's Delight, 2nd ed, section 9.3.

If this PR is considered for inclusion we should decide if we want to go one (or two) steps further and implement the operations with C primitives (easy) or even with special processor instructions (unlikely to be worth the trouble).

One more operation that may be useful is of_int32_unsigned (extension by zero).

@bluddy

This comment has been minimized.

Copy link

commented Oct 30, 2017

Right shift is also different for unsigned -- it doesn't check the sign bit but rather always places a 0 at the leftmost bit.

@alainfrisch

This comment has been minimized.

Copy link
Contributor

commented Oct 30, 2017

Right shift is also different for unsigned

shift_right_logical already exists.

@nojb nojb force-pushed the nojb:unsigned_intops branch from 7ee375f to e5f34de Oct 30, 2017

stdlib/int32.ml Show resolved Hide resolved
@murmour

This comment has been minimized.

Copy link
Contributor

commented Oct 31, 2017

I have carefully reviewed this PR and believe that it's a correct reference implementation.

@dbuenzli

This comment has been minimized.

Copy link
Contributor

commented Oct 31, 2017

One more operation that may be useful is of_int32_unsigned (extension by zero).

These kind of things would also be useful:

Int32.unsigned_to_int : int32 -> int option
Int64.unsigned_to_int : int64 -> int option
Nativeint.unsigned_to_int : nativeint -> int option

E.g. here's one I have in a project for the first function:

let int31_of_uint32 =
  let max_int = Int32.of_int max_int in
  fun u ->
    if Int32.compare 0l u < 0 && Int32.compare u max_int <= 0
    then Some (Int32.to_int u)
    else None

let int63_of_uint32 =
  let move = Int64.to_int 0x1_0000_0000L in
  fun u ->
    let i = Int32.to_int u in
    Some (if i < 0 then i + move else i)

let int_of_uint32 = match Sys.word_size with
| 32 -> int31_of_uint32
| 64 -> int63_of_uint32
| _ -> assert false
@murmour

This comment has been minimized.

Copy link
Contributor

commented Oct 31, 2017

@dbuenzli

E.g. here's one I have in a project for the first function:

let int31_of_uint32 =
  let max_int = Int32.of_int max_int in
  fun u ->
    if Int32.compare 0l u < 0 && Int32.compare u max_int <= 0
    then Some (Int32.to_int u)
    else None

It returns None for zero. The first condition should probably be Int32.compare u 0l >= 0.

@dbuenzli

This comment has been minimized.

Copy link
Contributor

commented Oct 31, 2017

It returns None for zero. The first condition should probably be Int32.compare u 0l >= 0.

Thank for debugging ! One more reason that indicates it's best to have this things in a proper library.

@dbuenzli

This comment has been minimized.

Copy link
Contributor

commented Oct 31, 2017

Btw. to motivate the functions: binary data formats often have fields with unsigned indices/offsets that index into binary blobs. Since the latter are read/mmaped as bigarrays or strings you need ints to index those, hence the conversion from sized unsigned ints to int.

@damiendoligez damiendoligez added the stdlib label Nov 2, 2017

@damiendoligez damiendoligez added this to the 4.07-or-later milestone Nov 2, 2017

@@ -125,6 +125,12 @@ external to_int : int32 -> int = "%int32_to_int"
during the conversion. On 64-bit platforms, the conversion
is exact. *)

val to_int_unsigned : int32 -> int

This comment has been minimized.

Copy link
@dbuenzli

dbuenzli Nov 2, 2017

Contributor

I would prefer the Int32.unsigned_to_int naming. While the result in int cannot be negative you are not treating the int as an unsigned int.

This comment has been minimized.

Copy link
@dbuenzli

dbuenzli Nov 2, 2017

Contributor

Ah sorry in fact you are but I don't think you should. As a said these operations are mainly useful for indexing arrays, this why they should return options if they exceed the positive range of ints.

This comment has been minimized.

Copy link
@nojb

nojb Nov 2, 2017

Author Contributor

I don't think I am, did I make a mistake? In any case renamed to_int_unsigned -> unsigned_to_int as suggested.

@nojb nojb force-pushed the nojb:unsigned_intops branch from 593b8b4 to 6f7941b Nov 2, 2017

@nojb

This comment has been minimized.

Copy link
Contributor Author

commented Nov 28, 2017

Ping! Any opinions on the eventual integration of this kind of PR?

Incidentally, it was mentioned to me that this code would be useful for some upcoming work on Flambda (of course this by itself is hardly an argument for addition into the stdlib, but well...)

@hcarty

This comment has been minimized.

Copy link
Contributor

commented Nov 28, 2017

Could/should the PR also include Int32.unsigned_to_int64 and Nativeint.unsigned_to_int64 to cover 32 bit platforms?

stdlib/int64.mli Outdated Show resolved Hide resolved
@hannesm

This comment has been minimized.

Copy link
Member

commented Jun 24, 2018

As a regular user of unsigned fixed-width integers (mostly in protocol code), I'd welcome this PR to be merged. I reviewed and tested the implementation. This PR lacks tests for rem_unsigned (while div_unsigned, compare_unsigned, and unsigned_to_int are tested).

@damiendoligez damiendoligez added this to the 4.08 milestone Jun 25, 2018

@xclerc

This comment has been minimized.

Copy link
Contributor

commented Jul 4, 2018

I would say that the benefit of native support is significant.

On amd64, a trivial program that generates couples of
integers and performs unsigned divisions is 7 to 10% faster
with native support, while spending most of its time generating
the values.

A draft patch implementing the support of the various
back-ends is available here. The tests added by @nojb in
this PR pass for all architectures (using qemu). Inria's CI
precheck seems to fail for some Windows flavors; I need
to investigate these issues.

@nojb

This comment has been minimized.

Copy link
Contributor Author

commented Sep 2, 2018

We need to converge on this PR. All opinions voiced so far seem to be for inclusion of these functions. So, if you are against, please speak up!

On the level of implementation, we need to decide if we want to include native backend support as per @xclerc's patch or stick to an OCaml implementation. Personally I feel we could merge the pure OCaml implementation first and include the more performant native backend support in a later PR.

@dbuenzli

This comment has been minimized.

Copy link
Contributor

commented Sep 2, 2018

That's kind of a naming nit but since we have XXX.unsigned_to_int wouldn't it make more sense to settle on XXX.unsigned_OP for the other operations rather than XXX.OP_unsigned ?

@nojb

This comment has been minimized.

Copy link
Contributor Author

commented Sep 2, 2018

Seems fine to me.

stdlib/int32.mli Outdated Show resolved Hide resolved
stdlib/int32.mli Outdated Show resolved Hide resolved

@nojb nojb force-pushed the nojb:unsigned_intops branch 4 times, most recently from 2fda16f to afe524a Sep 2, 2018

@alainfrisch

This comment has been minimized.

Copy link
Contributor

commented Oct 29, 2018

Is anyone opposed in principle to this PR? In particular, it would be good to hear to the opinion of those in favor of #1201. Of course adding the "unsigned" functions now does not preclude in theory adding dedicated types in the future, but it would make the argument more difficult.

Considering that #1201 got a strong negative feedback from @xavierleroy and is thus unlikely to merged (soon), I think that the present PR is a good middle-ground. Personally I think that dedicated types would be a good use of the type system to avoid mistakes, but the use cases for unsigned ints are narrow enough that loosing this safety is probably ok, considering the relative weights of the two approches. So basically I would be fine with either #1201 or #1458, but we should do at least one of them. With this PR, it becomes more straightforward to create external unsigned int libraries (and they will benefit from compiler support if/when added later).

Another question: should we also add similar functions to Int, for the sake of uniformity (even if they will probably be less useful)?

@mshinwell

This comment has been minimized.

Copy link
Contributor

commented Oct 29, 2018

Hmm, it seems strange to me that we would leave the existing functions lying around if we have proper types. That aside though, the point about the module name is a good one; maybe it would be better to have separate compilation units if we had proper types. I'm not really sure about that, but maybe we shouldn't commit either way yet, and just have the existing names.

@alainfrisch

This comment has been minimized.

Copy link
Contributor

commented Oct 29, 2018

Hmm, it seems strange to me that we would leave the existing functions lying around if we have proper types.

As said, we could mark them as deprecated. But I don't see a smooth migration plan if we want to reuse the names but change their type from version N to N+1. Look at the difficulty to move from exception-raising functions to option-returning ones!

@bluddy

This comment has been minimized.

Copy link

commented Oct 29, 2018

I think it's a little strange that we all agree that having unsigned types makes sense except @xavierleroy. The benefits seem obvious, and it's really odd that we're hoping things will change later instead of making a strong case for why they're useful now. The whole point of the type system is to distinguish values that obey different rules, and unsigned values seems like a classic case for type distinction to me. The only argument I've heard for not having unsigned types is that it's less 'elegant', but in the real world, we have to use and interface with less elegant things. I think a strong caution against using these types in pure OCaml code that doesn't interface with the outside world is enough of a discouragement.

@pmetzger

This comment has been minimized.

Copy link
Member

commented Oct 29, 2018

The only argument I've heard for not having unsigned types is that it's less 'elegant', but in the real world, we have to use and interface with less elegant things.

I have an argument for the "elegance" of unsigned ints that involves not calling them unsigned ints, but rather calling them "machine words". The operations for machine words are intended to be bit manipulation, while we think of integers as, well, integers, where the intended operations are supposed to be things like addition or multiplication.

If you're trying to count things or store a cardinal or what have you, you use an int. If you're trying to do cryptographic operations for which you want rotations or popcount or what have you, or you're trying to do bit manipulation on a GPIO port on an embedded system, and you're not really doing "arithmetic", you use a machine word.

Indeed, I'd argue for a system where doing bit manipulations on machine words doesn't signal errors on things like overflow, but where overflowing the integer representation for an integer does signal an error — the former is probably not a bug in your program (or if it is, you asked for machine words!), while the latter probably is a bug...

@XVilka

This comment has been minimized.

Copy link
Contributor

commented Oct 30, 2018

@murmour

This comment has been minimized.

Copy link
Contributor

commented Oct 30, 2018

How about naming the new functions compare_as_unsigned, div_as_unsigned, and rem_as_unsigned, to emphasize that they merely interpret their arguments as unsigned, without performing any type-level conversions?

If we ever get dedicated unsigned types, these "as"-functions will still stand on their own, providing a short-path to perform unsigned operations without explicit conversions.

@pmetzger

This comment has been minimized.

Copy link
Member

commented Oct 30, 2018

@XVilka

There is more conventional name for this then - bit arrays or bitvectors.

Almost. When I think of bit vectors, I think of actual arrays of bits where the operations you're interested in are indexing, checking if the thing you are indexing is set or unset, and changing the bit at an index. This is very useful for densely packing booleans.

When I think of "machine word" operations, I'm really thinking of the combination of what you get from working in $\mathbb{Z}_{2^64}$ plus some non-traditional operations like rotate. This isn't really integers in the normal sense, but it isn't densely packed booleans either.

@dbuenzli
Copy link
Contributor

left a comment

Regarding #1201 vs this PR, I rather have this than nothing. Besides we at least know this one has @xavierleroy's blessing (unless he changed his mind).

Regarding submodule or not I personally don't see a strong need to namespace them in an Unsigned submodule and would rather be in favor of including the names as currently implemented in the PR.

Changes Outdated Show resolved Hide resolved
utils/targetint.mli Outdated Show resolved Hide resolved
@alainfrisch
Copy link
Contributor

left a comment

LGTM

testsuite/tests/basic/boxedints.ml Show resolved Hide resolved

@nojb nojb force-pushed the nojb:unsigned_intops branch 2 times, most recently from 53b79ea to c41e634 Nov 8, 2018

@nojb

This comment has been minimized.

Copy link
Contributor Author

commented Nov 9, 2018

Will merge this by end of day unless someone objects.

@nojb nojb force-pushed the nojb:unsigned_intops branch from c41e634 to feead05 Nov 10, 2018

@nojb nojb merged commit 4539cc9 into ocaml:trunk Nov 11, 2018

2 checks passed

continuous-integration/appveyor/pr AppVeyor build succeeded
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details

@nojb nojb deleted the nojb:unsigned_intops branch Nov 11, 2018

@gasche

This comment has been minimized.

Copy link
Member

commented Nov 11, 2018

INRIA's CI reports a few build failures after this PR was merged (example log), and they look like failures caused by out-of-date .depend files:


../boot/ocamlrun ../ocamlopt -strict-sequence -absname -w +a-4-9-41-42-44-45-48 -g -warn-error A -bin-annot -nostdlib -safe-string -strict-formats -O3   \
           -p -c -o stdlib__unit.p.cmx unit.ml
File "/home/barsac/ci/builds/workspace/parallel-build/flambda/true/label/ocaml-linux-64/stdlib/_none_", line 1:
Error (warning 58): no cmx file was found in path for module Stdlib__sys, and its interface was not compiled with -opaque

I will do a make depend run, commit, and see whether it fixes it.

@gasche

This comment has been minimized.

Copy link
Member

commented Nov 11, 2018

Doing a make alldepend did fix the issue, nothing to worry about here.

@nojb

This comment has been minimized.

Copy link
Contributor Author

commented Nov 11, 2018

Thanks!

@yallop

This comment has been minimized.

Copy link
Member

commented Dec 4, 2018

@alainfrisch:

With this PR, it becomes more straightforward to create external unsigned int libraries

I don't see how that's the case, unless you ignore polymorphic comparison operations. A library like integers where unsigned integer types have proper comparison behaviour can't make use of this PR.

@alainfrisch

This comment has been minimized.

Copy link
Contributor

commented Dec 4, 2018

unless you ignore polymorphic comparison operations

Indeed, I did ignore them. This needs to be documented of course, but I don't think this is necessarily a showstopper (personally I try to avoid polymorphic comparison -- not counting equality -- on anything except int).

@alainfrisch

This comment has been minimized.

Copy link
Contributor

commented Dec 4, 2018

Note that the Num library also exposes numeric types for which the polymorphic comparison does not correspond to the natural ordering (namely Num, Ratio, Big_int). This was part of the standard distribution for a long time without people overly complaining about that specific mis-feature.

@yallop

This comment has been minimized.

Copy link
Member

commented Dec 4, 2018

I'm not sure about that. People did complain and @xavierleroy recommended using Zarith in place of Num, in part so that polymorphic comparison would behave properly:

Mantis 5375:

[edmcman] Since compare does not work with big ints, we must manually write our own comparison functions, which is very unsatisfying (and tedious).

[xleroy] I can only recommend that you replace big_int by Zarith in your code [...] it also supports polymorphic comparisons perfectly.

Mantis 7341:

[goswin] The types defined by the num library can not be compared with Pervasives because the nat modules does not provide a compare function in the custom ops.

[xleroy] As mentioned by doligez, the Zarith library provides big integers with correct polymorphic comparisons [...] So, just use Zarith and don't expect Num to be fixed.

The behaviour of polymorphic comparison for Big_int is rightly considered a bug, albeit one that's too inconvenient to fix. Users will (rightly) consider broken polymorphic comparison a bug in unsigned integer implementations, too.

So, unfortunately, this PR only makes it easier to write buggy implementations of unsigned modules, and doesn't help with writing correct implementations.

@alainfrisch

This comment has been minimized.

Copy link
Contributor

commented Dec 4, 2018

The behaviour of polymorphic comparison for Big_int is rightly considered a bug,

I agree it's a problem, but it's not a bug. If OCaml was a good language to implement an arbitrary precision library, that library would likely have the same problem. And Zarith's Q module obviously has it!

As long as the language itself does not allow to use a custom comparison function for OCaml datatype themselves, one should rather manage users' expectation about the behavior of the polymorphic comparison function.

@yallop

This comment has been minimized.

Copy link
Member

commented Dec 4, 2018

I agree it's a problem, but it's not a bug.

Well, okay. At the moment the integers library doesn't have this problem, and this PR doesn't make the implementation of integers any easier. However, the PR does make it more straightforward to create libraries with problems, which doesn't seem like a strong recommendation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
You can’t perform that action at this time.