New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: rename `int` and `uint` to `intptr`/`uintptr` #9940

Closed
thestinger opened this Issue Oct 19, 2013 · 71 comments

Comments

Projects
None yet
@thestinger
Contributor

thestinger commented Oct 19, 2013

An arbitrarily sized integer type would be provided in std under the name Int. I think encouraging use of an arbitrarily sized integer when bounds are unknown is a much better solution than adding failure throwing overflow checks to fixed-size integers.

@cmr

This comment has been minimized.

Show comment
Hide comment
@cmr

cmr Oct 19, 2013

Member

I think intptr and uintptr are awful names, but the best alternatives I can come up with is word and sword, which are worse.

Fixed integers of pseudo-arbitrary width are rarely useful.

Member

cmr commented Oct 19, 2013

I think intptr and uintptr are awful names, but the best alternatives I can come up with is word and sword, which are worse.

Fixed integers of pseudo-arbitrary width are rarely useful.

@Thiez

This comment has been minimized.

Show comment
Hide comment
@Thiez

Thiez Oct 19, 2013

Contributor

Seems to me int and uint are not pointers, so a 'ptr' suffix doesn't make a whole lot of sense. What would the type be of the ~[T].len() and [T, ..n].len()? Surely not uintptr. Perhaps introduce size_t?

I rather like the int and uint types. Why can't they coexist with Int? If they're going to get renamed to something ugly perhaps it would be best to stick with intptr_t and uintptr_t, existing Rust users are going to have to get used to the new stuff anyway, and it'll be easier to remember for those coming from C/C++.

I think the machine-word-sized int and uint are really nice to use as they are at this time. Int could be BigInt or Integer for people who really want arbitrary sized integers, but I'm thinking the vast majority of the time you don't want/need that functionality anyway.

Contributor

Thiez commented Oct 19, 2013

Seems to me int and uint are not pointers, so a 'ptr' suffix doesn't make a whole lot of sense. What would the type be of the ~[T].len() and [T, ..n].len()? Surely not uintptr. Perhaps introduce size_t?

I rather like the int and uint types. Why can't they coexist with Int? If they're going to get renamed to something ugly perhaps it would be best to stick with intptr_t and uintptr_t, existing Rust users are going to have to get used to the new stuff anyway, and it'll be easier to remember for those coming from C/C++.

I think the machine-word-sized int and uint are really nice to use as they are at this time. Int could be BigInt or Integer for people who really want arbitrary sized integers, but I'm thinking the vast majority of the time you don't want/need that functionality anyway.

@huonw

This comment has been minimized.

Show comment
Hide comment
@huonw

huonw Oct 19, 2013

Member

intptr_t and uintptr_t

Why introduce a completely new & (so far) unused naming convention to the language?

Member

huonw commented Oct 19, 2013

intptr_t and uintptr_t

Why introduce a completely new & (so far) unused naming convention to the language?

@thestinger

This comment has been minimized.

Show comment
Hide comment
@thestinger

thestinger Oct 19, 2013

Contributor

@Thiez: They aren't machine word size, they're pointer-size. On the x32 ABI they will be 32-bit, despite having 16 64-bit integer registers. If you want to use fixed-size integers correctly, you need upper bounds on the size. Fixed-size types named int/uint encourage writing buggy code because it implies they are a sane default rather than just a way to deal with sizes smaller than the address space.

Contributor

thestinger commented Oct 19, 2013

@Thiez: They aren't machine word size, they're pointer-size. On the x32 ABI they will be 32-bit, despite having 16 64-bit integer registers. If you want to use fixed-size integers correctly, you need upper bounds on the size. Fixed-size types named int/uint encourage writing buggy code because it implies they are a sane default rather than just a way to deal with sizes smaller than the address space.

@Thiez

This comment has been minimized.

Show comment
Hide comment
@Thiez

Thiez Oct 19, 2013

Contributor

@thestinger fair point. Perhaps that should change as well? Since we're not really supposed to be messing around with pointers outside of unsafe blocks, perhaps a pointer-size type is deserving of an ugly name. That opens up the option of having int and uint be machine word sized...

Contributor

Thiez commented Oct 19, 2013

@thestinger fair point. Perhaps that should change as well? Since we're not really supposed to be messing around with pointers outside of unsafe blocks, perhaps a pointer-size type is deserving of an ugly name. That opens up the option of having int and uint be machine word sized...

@thestinger

This comment has been minimized.

Show comment
Hide comment
@thestinger

thestinger Oct 19, 2013

Contributor

@cmr: I agree they're awful names. We should discourage using fixed-size types only when bounds are unknown. I think you only want these types in low-level code or for in-memory container sizes.

@Thiez: I don't really think word-sized is a useful property. If the upper bound is 32-bit, 32-bit integers will likely be fastest for the use case due to wasting less cache space.

Contributor

thestinger commented Oct 19, 2013

@cmr: I agree they're awful names. We should discourage using fixed-size types only when bounds are unknown. I think you only want these types in low-level code or for in-memory container sizes.

@Thiez: I don't really think word-sized is a useful property. If the upper bound is 32-bit, 32-bit integers will likely be fastest for the use case due to wasting less cache space.

@Thiez

This comment has been minimized.

Show comment
Hide comment
@Thiez

Thiez Oct 19, 2013

Contributor

I realize my suggestion is silly anyway as one would still need a pointer-size variable for array and vector lengths, which is a nice case for int/uint (but not when they're word-sized). Ignore it :)

Contributor

Thiez commented Oct 19, 2013

I realize my suggestion is silly anyway as one would still need a pointer-size variable for array and vector lengths, which is a nice case for int/uint (but not when they're word-sized). Ignore it :)

@1fish2

This comment has been minimized.

Show comment
Hide comment
@1fish2

1fish2 Oct 19, 2013

I completely agree with @thestinger

A machine-word sized integer means bugs and security holes e.g. because you ran the tests on one platform then deployed on others.

If one of the platforms has 16-bit int like PalmOS, that's too short to use without thinking carefully about it, so the prudent coding style forbids un-sized int and uint. (Actually the PalmOS 68000 ABI is emulated on a 32-bit ARM so it's not clear what's a machine word.)

Hence the strategy of using a pointer-size integer type only in low-level code that requires it, with an ugly name.

1fish2 commented Oct 19, 2013

I completely agree with @thestinger

A machine-word sized integer means bugs and security holes e.g. because you ran the tests on one platform then deployed on others.

If one of the platforms has 16-bit int like PalmOS, that's too short to use without thinking carefully about it, so the prudent coding style forbids un-sized int and uint. (Actually the PalmOS 68000 ABI is emulated on a 32-bit ARM so it's not clear what's a machine word.)

Hence the strategy of using a pointer-size integer type only in low-level code that requires it, with an ugly name.

@UtherII

This comment has been minimized.

Show comment
Hide comment
@UtherII

UtherII Oct 19, 2013

I agree that using int and uint should be discouraged and renaming them to a less straightforward name is better.
I don't know how type inference works but I think it should avoid using them by default too.

UtherII commented Oct 19, 2013

I agree that using int and uint should be discouraged and renaming them to a less straightforward name is better.
I don't know how type inference works but I think it should avoid using them by default too.

@michaelwoerister

This comment has been minimized.

Show comment
Hide comment
@michaelwoerister

michaelwoerister Oct 19, 2013

Contributor

I think that's a good idea. You can't really rely on very much when using int/uint.

I'm not so fond of the names intptr/uintptr. Given that the use cases for these types would be rare, I think they could also be defined in the standard library with more verbose names like PointerSizedInt / PointerSizedUInt. Not much ambiguity there. One could also define other integer types in the same module in the vain of C's uint_fast8_t and uint_least8_t in stdint.h to tackle the "machine word" problem.

Contributor

michaelwoerister commented Oct 19, 2013

I think that's a good idea. You can't really rely on very much when using int/uint.

I'm not so fond of the names intptr/uintptr. Given that the use cases for these types would be rare, I think they could also be defined in the standard library with more verbose names like PointerSizedInt / PointerSizedUInt. Not much ambiguity there. One could also define other integer types in the same module in the vain of C's uint_fast8_t and uint_least8_t in stdint.h to tackle the "machine word" problem.

@glaebhoerl

This comment has been minimized.

Show comment
Hide comment
@glaebhoerl

glaebhoerl Oct 19, 2013

Contributor

IMHO, the interesting questions are: what type should be used to index into arrays, and what should it be named? Indexing into arrays is pretty common. A pointer-sized type is needed to be able to represent any index. It should presumably be unsigned. I'm not sure if there's much reason to also have a signed version. Expanding to a BigInt on overflow doesn't make much sense here. But wrapping around on over/underflow also doesn't make very much sense, I think. If you want to catch over/underflow and fail!() or similar, you lose a lot (or all) of the performance advantage you might have had over the expand-to-BigInt version. So there's decent arguments in favor of expanding, wrapping, as well as trapping.

I think the strongest argument might be for expanding: negative or larger-than-the-address-space values don't make sense for array indexes, but the array bounds check will already catch that. Meanwhile it's versatile and generally useful for most other situations as well, not just array indexing. The downside is a performance cost relative to a type that wraps on over/underflow. (In the event of a fixed pointer-sized type, the relevant association when naming it should be that it holds any array index, not that it's pointer-sized.)

Whatever this type ends up being and named, it's the one that should be in the prelude.

If someone explicitly needs pointer-sized machine integers for unsafe hackery, those could indeed be named intptr and uintptr and buried in a submodule somewhere.

Contributor

glaebhoerl commented Oct 19, 2013

IMHO, the interesting questions are: what type should be used to index into arrays, and what should it be named? Indexing into arrays is pretty common. A pointer-sized type is needed to be able to represent any index. It should presumably be unsigned. I'm not sure if there's much reason to also have a signed version. Expanding to a BigInt on overflow doesn't make much sense here. But wrapping around on over/underflow also doesn't make very much sense, I think. If you want to catch over/underflow and fail!() or similar, you lose a lot (or all) of the performance advantage you might have had over the expand-to-BigInt version. So there's decent arguments in favor of expanding, wrapping, as well as trapping.

I think the strongest argument might be for expanding: negative or larger-than-the-address-space values don't make sense for array indexes, but the array bounds check will already catch that. Meanwhile it's versatile and generally useful for most other situations as well, not just array indexing. The downside is a performance cost relative to a type that wraps on over/underflow. (In the event of a fixed pointer-sized type, the relevant association when naming it should be that it holds any array index, not that it's pointer-sized.)

Whatever this type ends up being and named, it's the one that should be in the prelude.

If someone explicitly needs pointer-sized machine integers for unsafe hackery, those could indeed be named intptr and uintptr and buried in a submodule somewhere.

@bstrie

This comment has been minimized.

Show comment
Hide comment
@bstrie

bstrie Oct 19, 2013

Contributor

Dumb question here, but what's the use of having a signed pointer-sized int at all? Could we get away with having only uintptr (or whatever it ends up being called)?

As for the general idea of this bug, I'm warming to it after seeing how well the removal of float has worked. Having to actually think about the size of my types has been quite illuminating.

Contributor

bstrie commented Oct 19, 2013

Dumb question here, but what's the use of having a signed pointer-sized int at all? Could we get away with having only uintptr (or whatever it ends up being called)?

As for the general idea of this bug, I'm warming to it after seeing how well the removal of float has worked. Having to actually think about the size of my types has been quite illuminating.

@thestinger

This comment has been minimized.

Show comment
Hide comment
@thestinger

thestinger Oct 19, 2013

Contributor

@bstrie: a signed one is needed for offsets/differences (POSIX has ssize_t mostly because they like returning -1 as an error code though! ISO C has ptrdiff_t though)

Contributor

thestinger commented Oct 19, 2013

@bstrie: a signed one is needed for offsets/differences (POSIX has ssize_t mostly because they like returning -1 as an error code though! ISO C has ptrdiff_t though)

@1fish2

This comment has been minimized.

Show comment
Hide comment
@1fish2

1fish2 Oct 19, 2013

@thestinger good point. Subtracting array indexes should yield a signed value.

So to reverse the question, what's the need for an unsigned array index type? Is it feasible to allocate a byte array that takes more than half the address space?

1fish2 commented Oct 19, 2013

@thestinger good point. Subtracting array indexes should yield a signed value.

So to reverse the question, what's the need for an unsigned array index type? Is it feasible to allocate a byte array that takes more than half the address space?

@thestinger

This comment has been minimized.

Show comment
Hide comment
@thestinger

thestinger Oct 19, 2013

Contributor

AFAIK the rationale for unsigned types here is to avoid the need for a dynamic check for a negative integer in every function. A bounds check only has to compare against the length, and a reserve/with_capacity function only has to check for overflow, not underflow. It just bubbles up the responsibility for handling underflow as far as possible into the caller (if it needs to check at all - it may not every subtract from an index).

Contributor

thestinger commented Oct 19, 2013

AFAIK the rationale for unsigned types here is to avoid the need for a dynamic check for a negative integer in every function. A bounds check only has to compare against the length, and a reserve/with_capacity function only has to check for overflow, not underflow. It just bubbles up the responsibility for handling underflow as far as possible into the caller (if it needs to check at all - it may not every subtract from an index).

@nikomatsakis

This comment has been minimized.

Show comment
Hide comment
@nikomatsakis

nikomatsakis Oct 21, 2013

Contributor

cc me

I have contemplating whether int/uint carry their weight or not.
Array indexing is a good example of where they can be useful,
particularly around overloading -- we could make the built-in indexing
operator accept arbitrary types (and in fact I think maybe they do?)
but that's not so easy with an overloaded one.

Contributor

nikomatsakis commented Oct 21, 2013

cc me

I have contemplating whether int/uint carry their weight or not.
Array indexing is a good example of where they can be useful,
particularly around overloading -- we could make the built-in indexing
operator accept arbitrary types (and in fact I think maybe they do?)
but that's not so easy with an overloaded one.

@pnkfelix

This comment has been minimized.

Show comment
Hide comment
@pnkfelix

pnkfelix Oct 21, 2013

Member

@glehel the issues you raise about how to handle overflow/underflow on array indices are important, but there is already a ticket that I think is a more appropriate spot for that discussion: #9469.

Member

pnkfelix commented Oct 21, 2013

@glehel the issues you raise about how to handle overflow/underflow on array indices are important, but there is already a ticket that I think is a more appropriate spot for that discussion: #9469.

@glaebhoerl

This comment has been minimized.

Show comment
Hide comment
@glaebhoerl

glaebhoerl Oct 21, 2013

Contributor

@pnkfelix I think the two are very closely related. (basically: if we want to use the existing int/uint as the preferred type for indexing arrays, then they should not be renamed to intptr/uintptr, but if we want to prefer a different type for that (e.g. one which checks for over/underflow), then they should be renamed.)

Contributor

glaebhoerl commented Oct 21, 2013

@pnkfelix I think the two are very closely related. (basically: if we want to use the existing int/uint as the preferred type for indexing arrays, then they should not be renamed to intptr/uintptr, but if we want to prefer a different type for that (e.g. one which checks for over/underflow), then they should be renamed.)

@brendanzab

This comment has been minimized.

Show comment
Hide comment
@brendanzab

brendanzab Dec 5, 2013

Member

To those commenting that intptr and uintptr are horrible names, that's entirely the point. They should be ergonomically discouraged.

Having int and uint so succinct and pretty makes it easy for beginners to think they should use them as default. In fact I already did int for everything in glfw-rs - I should probably change them to i32s.

+1 for this change from me.

Member

brendanzab commented Dec 5, 2013

To those commenting that intptr and uintptr are horrible names, that's entirely the point. They should be ergonomically discouraged.

Having int and uint so succinct and pretty makes it easy for beginners to think they should use them as default. In fact I already did int for everything in glfw-rs - I should probably change them to i32s.

+1 for this change from me.

@brson

This comment has been minimized.

Show comment
Hide comment
@brson

brson Dec 5, 2013

Contributor

If there's consensus that it's bad practice to use int by default (and I don't know that there is) then I agree we should change the names, and probably make default integer types that have the correct size.

Contributor

brson commented Dec 5, 2013

If there's consensus that it's bad practice to use int by default (and I don't know that there is) then I agree we should change the names, and probably make default integer types that have the correct size.

@brendanzab

This comment has been minimized.

Show comment
Hide comment
@brendanzab

brendanzab Dec 5, 2013

Member

@brson We already make folks choose between f32 and f64. It seems a little asymmetrical from a design point of view having uint and int as the default that folks should reach for without also having float.

Member

brendanzab commented Dec 5, 2013

@brson We already make folks choose between f32 and f64. It seems a little asymmetrical from a design point of view having uint and int as the default that folks should reach for without also having float.

@nikomatsakis

This comment has been minimized.

Show comment
Hide comment
@nikomatsakis

nikomatsakis Dec 6, 2013

Contributor

I find this thread confusing.

  • Having fixed size types does not make overflow irrelevant. It's well-defined, but it's equally well-defined on uint if you know your target architecture. The danger occurs because people write innocent-looking code that can easily overflow if supplied with large inputs. Consider (a + b) / 2, which is incorrect if a or b is large, even though the end result ought to be between a and b. (A safer way to write that expression is something like a + (b - a) / 2.)
  • I disagree that uint is "almost always" a bad choice or something like that. Most of the times that I write an integer, it is an index into an array or in some way tied to the size of a data structure. In those times, having it be the same size as a pointer is a very logical choice. I know there are good arguments for LP64, and in practice I've probably never worked with an array of more than 4bil elements, but somehow hard-coding 32-bit doesn't seem particularly forward looking. In any case, the RFC wasn't that we adopt LP64 rather than ILP64 (which was considered for a time and eventually rejected).
  • I am not quite sure what the proposal was regarding adding Int to the standard library, but I don't see how it's an improvement on the current situation, other than a slight simplification to the compiler:
    • If Int is an alias for i32 or i64 as appropriate, it creates a portability nightmare. We tried it for a while. Code written by someone using a 64-bit compiler almost never compiled on 32-bit machines and vice versa. With bors this would be a bit better but most people don't have bors.
    • If Int is a newtype, that's better, but it'll still be quite annoying to use.
  • The asymmetry between floats and ints doesn't bother me. The argument for removing float was that people don't write "generic" floating point code, but rather they target a specific precision for specific performance needs. Since I don't write a lot of code that uses floating point, I don't know personally, but I buy it as plausible. I don't think this argument holds for integers, which are most commonly used as loop counters, array indices, etc. where hard-coding to a specific size doesn't feel especially portable to me. I'd rather add back float than remove int and uint.
Contributor

nikomatsakis commented Dec 6, 2013

I find this thread confusing.

  • Having fixed size types does not make overflow irrelevant. It's well-defined, but it's equally well-defined on uint if you know your target architecture. The danger occurs because people write innocent-looking code that can easily overflow if supplied with large inputs. Consider (a + b) / 2, which is incorrect if a or b is large, even though the end result ought to be between a and b. (A safer way to write that expression is something like a + (b - a) / 2.)
  • I disagree that uint is "almost always" a bad choice or something like that. Most of the times that I write an integer, it is an index into an array or in some way tied to the size of a data structure. In those times, having it be the same size as a pointer is a very logical choice. I know there are good arguments for LP64, and in practice I've probably never worked with an array of more than 4bil elements, but somehow hard-coding 32-bit doesn't seem particularly forward looking. In any case, the RFC wasn't that we adopt LP64 rather than ILP64 (which was considered for a time and eventually rejected).
  • I am not quite sure what the proposal was regarding adding Int to the standard library, but I don't see how it's an improvement on the current situation, other than a slight simplification to the compiler:
    • If Int is an alias for i32 or i64 as appropriate, it creates a portability nightmare. We tried it for a while. Code written by someone using a 64-bit compiler almost never compiled on 32-bit machines and vice versa. With bors this would be a bit better but most people don't have bors.
    • If Int is a newtype, that's better, but it'll still be quite annoying to use.
  • The asymmetry between floats and ints doesn't bother me. The argument for removing float was that people don't write "generic" floating point code, but rather they target a specific precision for specific performance needs. Since I don't write a lot of code that uses floating point, I don't know personally, but I buy it as plausible. I don't think this argument holds for integers, which are most commonly used as loop counters, array indices, etc. where hard-coding to a specific size doesn't feel especially portable to me. I'd rather add back float than remove int and uint.
@glaebhoerl

This comment has been minimized.

Show comment
Hide comment
@glaebhoerl

glaebhoerl Dec 6, 2013

Contributor

The other possibility was to use a type that doesn't wrap on over/underflow, but eithers traps or extends into a bigint. Which is likely to be slow, but I don't know whether it's been tested.

Contributor

glaebhoerl commented Dec 6, 2013

The other possibility was to use a type that doesn't wrap on over/underflow, but eithers traps or extends into a bigint. Which is likely to be slow, but I don't know whether it's been tested.

@1fish2

This comment has been minimized.

Show comment
Hide comment
@1fish2

1fish2 Dec 6, 2013

(What are bors?)

An integer type with platform-specific overflow makes programs
non-portable, that is, produce different results on different platforms.
What's it good for besides maybe C interop? (Not for performance. E.g. Palm
OS runs on a 32-bit ARM emulating a 16-bit 68000, so int is 16 bits. It's
too short for an everyday loop index and probably slower than 32 bits.)

Intertwined issues: whether to have non-portable integer types, what to
name them, and whether array indexing (and some type inferences?) uses
fixed size integers with or without overflow traps or big-ints?

1fish2 commented Dec 6, 2013

(What are bors?)

An integer type with platform-specific overflow makes programs
non-portable, that is, produce different results on different platforms.
What's it good for besides maybe C interop? (Not for performance. E.g. Palm
OS runs on a 32-bit ARM emulating a 16-bit 68000, so int is 16 bits. It's
too short for an everyday loop index and probably slower than 32 bits.)

Intertwined issues: whether to have non-portable integer types, what to
name them, and whether array indexing (and some type inferences?) uses
fixed size integers with or without overflow traps or big-ints?

@huonw

This comment has been minimized.

Show comment
Hide comment
@huonw

huonw Dec 6, 2013

Member

(@bors is the Rust integration bot; (almost) all PRs have the full test suite run on a variety of platforms, and a limited version on run on others, and only merge if everything passes; the r+'s that you may see on PRs are directives to @bors to attempt a merge & test run.)

Member

huonw commented Dec 6, 2013

(@bors is the Rust integration bot; (almost) all PRs have the full test suite run on a variety of platforms, and a limited version on run on others, and only merge if everything passes; the r+'s that you may see on PRs are directives to @bors to attempt a merge & test run.)

@ghost

This comment has been minimized.

Show comment
Hide comment
@ghost

ghost Dec 6, 2013

There's also the x32 ABI where pointers are smaller than ints.

I'd remove variable-width int altogether (except ffi of course). Those who expect their code to run on 32-bit should already be thinking about overflows and use int64/bigint where appropriate, and those who know they'll only ever run on 64-bit should have no problem either way.

Are there credible use cases of pointer-sized Rust ints outside ffi?

ghost commented Dec 6, 2013

There's also the x32 ABI where pointers are smaller than ints.

I'd remove variable-width int altogether (except ffi of course). Those who expect their code to run on 32-bit should already be thinking about overflows and use int64/bigint where appropriate, and those who know they'll only ever run on 64-bit should have no problem either way.

Are there credible use cases of pointer-sized Rust ints outside ffi?

@cmr

This comment has been minimized.

Show comment
Hide comment
@cmr

cmr Dec 6, 2013

Member

Pointer-sized ints are required for representing pointers and indices into
vectors (otherwise the vector will be artificially limited in size).

On Fri, Dec 6, 2013 at 6:54 AM, György Andrasek notifications@github.comwrote:

There's also the x32 ABI http://en.wikipedia.org/wiki/X32_ABI where
pointers are smaller than ints.

I'd remove variable-width int altogether (except ffi of course). Those who
expect their code to run on 32-bit should already be thinking about
overflows and use int64/bigint where appropriate, and those who know
they'll only ever run on 64-bit should have no problem either way.

Are there credible use cases of pointer-sized Rust ints outside ffi?


Reply to this email directly or view it on GitHubhttps://github.com//issues/9940#issuecomment-29983276
.

Member

cmr commented Dec 6, 2013

Pointer-sized ints are required for representing pointers and indices into
vectors (otherwise the vector will be artificially limited in size).

On Fri, Dec 6, 2013 at 6:54 AM, György Andrasek notifications@github.comwrote:

There's also the x32 ABI http://en.wikipedia.org/wiki/X32_ABI where
pointers are smaller than ints.

I'd remove variable-width int altogether (except ffi of course). Those who
expect their code to run on 32-bit should already be thinking about
overflows and use int64/bigint where appropriate, and those who know
they'll only ever run on 64-bit should have no problem either way.

Are there credible use cases of pointer-sized Rust ints outside ffi?


Reply to this email directly or view it on GitHubhttps://github.com//issues/9940#issuecomment-29983276
.

@cmr

This comment has been minimized.

Show comment
Hide comment
@cmr

cmr Dec 6, 2013

Member

@nikomatsakis The idea is that int and uint aren't very useful numeric types, and that a fast bigint would be more appropriate most of the time one is actually dealing with numbers. And when one does not want a real numeric type, they probably want one of the fixed-size types anyway.

Member

cmr commented Dec 6, 2013

@nikomatsakis The idea is that int and uint aren't very useful numeric types, and that a fast bigint would be more appropriate most of the time one is actually dealing with numbers. And when one does not want a real numeric type, they probably want one of the fixed-size types anyway.

@nikomatsakis

This comment has been minimized.

Show comment
Hide comment
@nikomatsakis

nikomatsakis Dec 6, 2013

Contributor

@cmr I do not agree with the "probably want one of the fixed-size types" part of that sentence. That is not clear to me -- I think it is very common to have integers that are ultimately indices into some sort of array or tied to the size of a data structure, and for that use case it is natural to want an integer that represents "the address space of the machine".

Of course I think having a nice, performant bigint library would be great, particularly for implementing "business logic" or other use cases where a "true integer" is required. But I am not sure how common that really is.

Contributor

nikomatsakis commented Dec 6, 2013

@cmr I do not agree with the "probably want one of the fixed-size types" part of that sentence. That is not clear to me -- I think it is very common to have integers that are ultimately indices into some sort of array or tied to the size of a data structure, and for that use case it is natural to want an integer that represents "the address space of the machine".

Of course I think having a nice, performant bigint library would be great, particularly for implementing "business logic" or other use cases where a "true integer" is required. But I am not sure how common that really is.

@nmsmith

This comment has been minimized.

Show comment
Hide comment
@nmsmith

nmsmith Jan 12, 2014

The moment someone tries to use intptr as a pointer and gets a compiler error they'll realise it's not a pointer and nothing will come of it. I don't see how it's a problem.

nmsmith commented Jan 12, 2014

The moment someone tries to use intptr as a pointer and gets a compiler error they'll realise it's not a pointer and nothing will come of it. I don't see how it's a problem.

@CloudiDust

This comment has been minimized.

Show comment
Hide comment
@CloudiDust

CloudiDust Jan 12, 2014

@ecl3ctic Yes the compiler can and will help here, but I think the principle of least surprise applies. intptr is not a int ptr, so it should not be named as such. If the distinction can be made clear without the help of the compiler, it's better.

On the other hand, intps is not a name/type that is normally encountered in a programming language's core (AFAIK), so it is harder to mistake it for something else.

CloudiDust commented Jan 12, 2014

@ecl3ctic Yes the compiler can and will help here, but I think the principle of least surprise applies. intptr is not a int ptr, so it should not be named as such. If the distinction can be made clear without the help of the compiler, it's better.

On the other hand, intps is not a name/type that is normally encountered in a programming language's core (AFAIK), so it is harder to mistake it for something else.

@nikomatsakis

This comment has been minimized.

Show comment
Hide comment
@nikomatsakis

nikomatsakis Jan 12, 2014

Contributor

On Sat, Jan 11, 2014 at 11:37:02AM -0800, Daniel Micay wrote:

@ecl3ctic: If you're using int as a "default", then you're not
using it correctly. It will be 16-bit on an architecture with a
16-bit address space, or 32-bit/64-bit. If you're doing your testing
on a 64-bit architecture, you're going to miss plenty of bugs.

As I wrote earlier, I don't find this argument especially persuasive,
for the reasons I have previously expressed (and can reiterate if
desired). That said, while I find the current design quite logical, an
unbounded sized integer type would be a nice thing to have. The
bottom line though is that I think this bug cannot progress without
more data. All of us (including myself) are making rather unsupported
claims. I'd like to see any or all of the following:

  1. A survey of uses of the int/uint types, showing how many of them
    are appropriate / inappropriate / borderline.

  2. An actual implementation of an efficient bigint that we can use to
    make default performance comparisons; to my knowledge, the only
    bigint types that have been developed are basically wrappers around
    GNU's bigint, which I don't think is suitably licensed nor
    particularly performant. Correct me if I am wrong.

    Regardless, If it's going to be the default integer type, it's
    going to get used a lot, and I'd like to know how fast it's going
    to be. The prior work that I'm aware of is all in the area of
    dynamic languages (JavaScript, Python, Smalltalk, etc) and all of
    them expend a significant amount of energy optimizing away the
    bigint overhead and customizing for the expected smallint case
    (s/digint/double, in the case of JS). As a purely compiled
    language, we don't have that luxury.

  3. Measurements concerning the overhead of using the current scheme
    augmented with bounds checking for all types. It seems to me that
    bounds checking offers a reasonable compromise, in that people
    may still use int inappropriately, but at least their integers
    won't silently wrap around but instead fail in a noisy way.

Contributor

nikomatsakis commented Jan 12, 2014

On Sat, Jan 11, 2014 at 11:37:02AM -0800, Daniel Micay wrote:

@ecl3ctic: If you're using int as a "default", then you're not
using it correctly. It will be 16-bit on an architecture with a
16-bit address space, or 32-bit/64-bit. If you're doing your testing
on a 64-bit architecture, you're going to miss plenty of bugs.

As I wrote earlier, I don't find this argument especially persuasive,
for the reasons I have previously expressed (and can reiterate if
desired). That said, while I find the current design quite logical, an
unbounded sized integer type would be a nice thing to have. The
bottom line though is that I think this bug cannot progress without
more data. All of us (including myself) are making rather unsupported
claims. I'd like to see any or all of the following:

  1. A survey of uses of the int/uint types, showing how many of them
    are appropriate / inappropriate / borderline.

  2. An actual implementation of an efficient bigint that we can use to
    make default performance comparisons; to my knowledge, the only
    bigint types that have been developed are basically wrappers around
    GNU's bigint, which I don't think is suitably licensed nor
    particularly performant. Correct me if I am wrong.

    Regardless, If it's going to be the default integer type, it's
    going to get used a lot, and I'd like to know how fast it's going
    to be. The prior work that I'm aware of is all in the area of
    dynamic languages (JavaScript, Python, Smalltalk, etc) and all of
    them expend a significant amount of energy optimizing away the
    bigint overhead and customizing for the expected smallint case
    (s/digint/double, in the case of JS). As a purely compiled
    language, we don't have that luxury.

  3. Measurements concerning the overhead of using the current scheme
    augmented with bounds checking for all types. It seems to me that
    bounds checking offers a reasonable compromise, in that people
    may still use int inappropriately, but at least their integers
    won't silently wrap around but instead fail in a noisy way.

@thestinger

This comment has been minimized.

Show comment
Hide comment
@thestinger

thestinger Jan 12, 2014

Contributor

@nikomatsakis: As far as I know, GMP is leagues ahead of any other big integer implementation in performance. There's no doubt that it's the best open-source implementation. It has many different algorithms implemented for each operation because with very large integers it has progressively better asymptomatic performance than other libraries. It also has highly optimized hand-written assembly for different revisions of many platforms too, because it's many times faster than the same code in C without specialized instructions. Intel adds relevant instructions with almost every iteration of their CPU architecture too... Haswell has MULX, Broadwell brings ADOX and ADCX, and there are many relevant SSE/AVX instructions.

It's licensed under LGPL, which gives you 3 choices:

  1. use dynamic linking
  2. use static linking, and make your source available (under any license allowing the user to recompile it)
  3. use static linking, and provide linkable object files (works for proprietary code)

There are various clones of the library with inferior performance and a less exhaustive API but more permissive licenses. I think Rust should default to using one of these libraries and allow GMP as a drop-in alternative.

Measurements concerning the overhead of using the current scheme augmented with bounds checking for all types.

This is well-explored territory with -ftrapv in C. I expect it would push Rust significantly behind Java in out-of-the-box performance. It's easy enough to directly use the LLVM intrinsics I added for this and branch to abort to compare. I can make benchmarks and present Rust-specific numbers if that's desired. I can also include benchmarks of a number type overflowing to a big integer.

Contributor

thestinger commented Jan 12, 2014

@nikomatsakis: As far as I know, GMP is leagues ahead of any other big integer implementation in performance. There's no doubt that it's the best open-source implementation. It has many different algorithms implemented for each operation because with very large integers it has progressively better asymptomatic performance than other libraries. It also has highly optimized hand-written assembly for different revisions of many platforms too, because it's many times faster than the same code in C without specialized instructions. Intel adds relevant instructions with almost every iteration of their CPU architecture too... Haswell has MULX, Broadwell brings ADOX and ADCX, and there are many relevant SSE/AVX instructions.

It's licensed under LGPL, which gives you 3 choices:

  1. use dynamic linking
  2. use static linking, and make your source available (under any license allowing the user to recompile it)
  3. use static linking, and provide linkable object files (works for proprietary code)

There are various clones of the library with inferior performance and a less exhaustive API but more permissive licenses. I think Rust should default to using one of these libraries and allow GMP as a drop-in alternative.

Measurements concerning the overhead of using the current scheme augmented with bounds checking for all types.

This is well-explored territory with -ftrapv in C. I expect it would push Rust significantly behind Java in out-of-the-box performance. It's easy enough to directly use the LLVM intrinsics I added for this and branch to abort to compare. I can make benchmarks and present Rust-specific numbers if that's desired. I can also include benchmarks of a number type overflowing to a big integer.

@thestinger

This comment has been minimized.

Show comment
Hide comment
@thestinger

thestinger Jan 12, 2014

Contributor

@CloudiDust: The names intptr and uintptr come from C/C++.

http://en.cppreference.com/w/cpp/types/integer

The _t suffix is used there because types with that suffix are reserved names, and can be added in revisions of the standard.

Contributor

thestinger commented Jan 12, 2014

@CloudiDust: The names intptr and uintptr come from C/C++.

http://en.cppreference.com/w/cpp/types/integer

The _t suffix is used there because types with that suffix are reserved names, and can be added in revisions of the standard.

@cmr

This comment has been minimized.

Show comment
Hide comment
@cmr

cmr Jan 12, 2014

Member

@huonw pointed out https://github.com/wbhart/bsdnt on IRC, which seems like
a solid choice for us.

My thoughts for auto-overflow is make the type have an align of at least 2,
and use the least significant bit to indicate whether the value is a
pointer to the big integer or the actual value of the number. It's going to
incur a branch on every single operation, though.

On Sun, Jan 12, 2014 at 3:55 PM, Daniel Micay notifications@github.comwrote:

@nikomatsakis https://github.com/nikomatsakis: As far as I know, GMP is
leagues ahead of any other big integer implementation in performance.
There's no doubt that it's the best open-source implementation. It has many
different algorithms implemented for each operation because with very large
integers it has progressively better asymptomatic performance than other
libraries. It also has highly optimized hand-written assembly for different
revisions of many platforms too, because it's many times faster than the
same code in C without specialized instructions. Intel adds relevant
instructions with almost every iteration of their CPU architecture too...
Haswell has MULX, Broadwell brings ADOX and ADCX, and there are many
relevant SSE/AVX instructions.

It's licensed under LGPL, which gives you 3 choices:

  1. use dynamic linking
  2. use static linking, and make your source available (under any
    license allowing the user to recompile it)
  3. use static linking, and provide linkable object files (works for
    proprietary code)

There are various clones of the library with inferior performance and a
less exhaustive API but more permissive licenses. I think Rust should
default to using one of these libraries and allow GMP as a drop-in
alternative.

Measurements concerning the overhead of using the current scheme augmented
with bounds checking for all types.

This is well-explored territory with -ftrapv in C. I expect it would push
Rust significantly behind Java in out-of-the-box performance. It's easy
enough to directly use the LLVM intrinsics I added for this and branch to
abort to compare. I can make benchmarks and present Rust-specific numbers
if that's desired. I can also include benchmarks of a number type
overflowing to a big integer.


Reply to this email directly or view it on GitHubhttps://github.com//issues/9940#issuecomment-32133688
.

Member

cmr commented Jan 12, 2014

@huonw pointed out https://github.com/wbhart/bsdnt on IRC, which seems like
a solid choice for us.

My thoughts for auto-overflow is make the type have an align of at least 2,
and use the least significant bit to indicate whether the value is a
pointer to the big integer or the actual value of the number. It's going to
incur a branch on every single operation, though.

On Sun, Jan 12, 2014 at 3:55 PM, Daniel Micay notifications@github.comwrote:

@nikomatsakis https://github.com/nikomatsakis: As far as I know, GMP is
leagues ahead of any other big integer implementation in performance.
There's no doubt that it's the best open-source implementation. It has many
different algorithms implemented for each operation because with very large
integers it has progressively better asymptomatic performance than other
libraries. It also has highly optimized hand-written assembly for different
revisions of many platforms too, because it's many times faster than the
same code in C without specialized instructions. Intel adds relevant
instructions with almost every iteration of their CPU architecture too...
Haswell has MULX, Broadwell brings ADOX and ADCX, and there are many
relevant SSE/AVX instructions.

It's licensed under LGPL, which gives you 3 choices:

  1. use dynamic linking
  2. use static linking, and make your source available (under any
    license allowing the user to recompile it)
  3. use static linking, and provide linkable object files (works for
    proprietary code)

There are various clones of the library with inferior performance and a
less exhaustive API but more permissive licenses. I think Rust should
default to using one of these libraries and allow GMP as a drop-in
alternative.

Measurements concerning the overhead of using the current scheme augmented
with bounds checking for all types.

This is well-explored territory with -ftrapv in C. I expect it would push
Rust significantly behind Java in out-of-the-box performance. It's easy
enough to directly use the LLVM intrinsics I added for this and branch to
abort to compare. I can make benchmarks and present Rust-specific numbers
if that's desired. I can also include benchmarks of a number type
overflowing to a big integer.


Reply to this email directly or view it on GitHubhttps://github.com//issues/9940#issuecomment-32133688
.

@thestinger

This comment has been minimized.

Show comment
Hide comment
@thestinger

thestinger Jan 12, 2014

Contributor

@cmr: It will incur two branches, since you need to check if you have a big integer and then check for overflow. Checking the overflow flag serializes the CPU pipeline quite a bit too.

Contributor

thestinger commented Jan 12, 2014

@cmr: It will incur two branches, since you need to check if you have a big integer and then check for overflow. Checking the overflow flag serializes the CPU pipeline quite a bit too.

@thestinger

This comment has been minimized.

Show comment
Hide comment
@thestinger

thestinger Jan 12, 2014

Contributor

If you're limited to 31-bit then it seems that you'll need to use a comparison instruction rather than using the carry/overflow flag. This could be really bad for multiplication.

Contributor

thestinger commented Jan 12, 2014

If you're limited to 31-bit then it seems that you'll need to use a comparison instruction rather than using the carry/overflow flag. This could be really bad for multiplication.

@thestinger

This comment has been minimized.

Show comment
Hide comment
@thestinger

thestinger Jan 12, 2014

Contributor

Simple example:

extern mod extra;
use std::unstable::intrinsics::{abort, u32_mul_with_overflow};
use extra::test::BenchHarness;

#[inline(never)]
fn control(xs: &mut [u32]) {
    for x in xs.mut_iter() {
        *x *= 5;
    }
}

#[inline(never)]
fn check(xs: &mut [u32]) {
    for x in xs.mut_iter() {
        unsafe {
            let (y, o) = u32_mul_with_overflow(*x, 5);
            if o {
                abort()
            }
            *x = y;
        }
    }
}

#[inline(never)]
fn check_libstd(xs: &mut [u32]) {
    for x in xs.mut_iter() {
        *x = x.checked_mul(&5).unwrap();
    }
}

#[bench]
fn bench_control(b: &mut BenchHarness) {
    b.iter(|| {
        let mut xs = [0, ..1000];
        control(xs)
    });
}

#[bench]
fn bench_check(b: &mut BenchHarness) {
    b.iter(|| {
        let mut xs = [0, ..1000];
        check(xs)
    });
}

#[bench]
fn bench_check_libstd(b: &mut BenchHarness) {
    b.iter(|| {
        let mut xs = [0, ..1000];
        check_libstd(xs)
    });
}

--opt-level=2

test bench_check        ... bench:      1085 ns/iter (+/- 28)
test bench_check_libstd ... bench:      1082 ns/iter (+/- 38)
test bench_control      ... bench:       349 ns/iter (+/- 12)

--opt-level=3

test bench_check        ... bench:      1080 ns/iter (+/- 14)
test bench_check_libstd ... bench:      1177 ns/iter (+/- 16)
test bench_control      ... bench:       350 ns/iter (+/- 11)

Ouch. It becomes a larger slowdown multiplier when you add more operations to the loop too. Since it's increasing the code size a lot, it will bloat the instruction cache too.

Contributor

thestinger commented Jan 12, 2014

Simple example:

extern mod extra;
use std::unstable::intrinsics::{abort, u32_mul_with_overflow};
use extra::test::BenchHarness;

#[inline(never)]
fn control(xs: &mut [u32]) {
    for x in xs.mut_iter() {
        *x *= 5;
    }
}

#[inline(never)]
fn check(xs: &mut [u32]) {
    for x in xs.mut_iter() {
        unsafe {
            let (y, o) = u32_mul_with_overflow(*x, 5);
            if o {
                abort()
            }
            *x = y;
        }
    }
}

#[inline(never)]
fn check_libstd(xs: &mut [u32]) {
    for x in xs.mut_iter() {
        *x = x.checked_mul(&5).unwrap();
    }
}

#[bench]
fn bench_control(b: &mut BenchHarness) {
    b.iter(|| {
        let mut xs = [0, ..1000];
        control(xs)
    });
}

#[bench]
fn bench_check(b: &mut BenchHarness) {
    b.iter(|| {
        let mut xs = [0, ..1000];
        check(xs)
    });
}

#[bench]
fn bench_check_libstd(b: &mut BenchHarness) {
    b.iter(|| {
        let mut xs = [0, ..1000];
        check_libstd(xs)
    });
}

--opt-level=2

test bench_check        ... bench:      1085 ns/iter (+/- 28)
test bench_check_libstd ... bench:      1082 ns/iter (+/- 38)
test bench_control      ... bench:       349 ns/iter (+/- 12)

--opt-level=3

test bench_check        ... bench:      1080 ns/iter (+/- 14)
test bench_check_libstd ... bench:      1177 ns/iter (+/- 16)
test bench_control      ... bench:       350 ns/iter (+/- 11)

Ouch. It becomes a larger slowdown multiplier when you add more operations to the loop too. Since it's increasing the code size a lot, it will bloat the instruction cache too.

@CloudiDust

This comment has been minimized.

Show comment
Hide comment
@CloudiDust

CloudiDust Jan 13, 2014

@thestinger Thanks for the link. I am aware that the names come from the C/C++ standards, but still find them confusing (to rust newcomers from outside the C/C++ world).

Now come to think of it, this is a convention that can be learnt quickly, and C# actually uses IntPtr in a similar manner, so in this regard, it is quite okay.

But there may be another problem: the names intptr/uintptr (or intps/uintps for that matter) don't seem to reflect their main use case in safe-mode rust. They should be more akin to ssize_t/size_t than intptr_t/uintptr_t, and there are good reasons why C/C++ differentiates the two pairs. So I think we should differntiate them too.

This is to say, we may have dedicated names for container-indexing integer types, while the fact that they are pointer sized is an implementation detail on certain architectures, just like in C/C++.

Here are three pairs of possible candidates: intsize/uintsize, isize/usize and size/ssize. They all have:

Common pros:

  1. clearer intention than intptr/uintptr.
    (For a C/C++ programmer that is. He/she will accept quickly that a "size" type has something to do with "indexes".)
  2. won't break where size_t and uintptr_t are different.
    (Does rust target such architectures? If not, is it possible that they will be targeted later?)

Common cons:

  1. not suited for other possible use cases of pointer sized integers.
    (Eh, what are those in safe-mode rust? And in unsafe mode, can we get away with std::libc types?)

Pros and cons specific to each candidate:

intsize/uintsize, like intptr/uintptr, are long and out of place, so as to discourage their usage. But I think such "ugliness" is unnecessary, because:

isize/usize are more in alignment with the rust naming convention, and shorter. They are still somewhat ugly and foreign at first glance, while being specific enough so people won't toss them everywhere.

On the other hand, ssize/size align with C/C++ ssize_t/size_t. Some may consider this an advantage, but for the most part, safe-mode rust doesn't directly use C/C++ type names anyway. And size does occupy the common word "size".

So I lean towards isize/usize.

Regarding an arbitrarily sized Int, in a systems language, people may expect Int to be fix-sized, so I believe BigInt is a better name, signifying the higher cost of such a type.

I am not sure about my stance on the "default integer type" issue, but people must make informed choices consciously. Some "rusty guidelines to integer type selection" in the docs would be great.

CloudiDust commented Jan 13, 2014

@thestinger Thanks for the link. I am aware that the names come from the C/C++ standards, but still find them confusing (to rust newcomers from outside the C/C++ world).

Now come to think of it, this is a convention that can be learnt quickly, and C# actually uses IntPtr in a similar manner, so in this regard, it is quite okay.

But there may be another problem: the names intptr/uintptr (or intps/uintps for that matter) don't seem to reflect their main use case in safe-mode rust. They should be more akin to ssize_t/size_t than intptr_t/uintptr_t, and there are good reasons why C/C++ differentiates the two pairs. So I think we should differntiate them too.

This is to say, we may have dedicated names for container-indexing integer types, while the fact that they are pointer sized is an implementation detail on certain architectures, just like in C/C++.

Here are three pairs of possible candidates: intsize/uintsize, isize/usize and size/ssize. They all have:

Common pros:

  1. clearer intention than intptr/uintptr.
    (For a C/C++ programmer that is. He/she will accept quickly that a "size" type has something to do with "indexes".)
  2. won't break where size_t and uintptr_t are different.
    (Does rust target such architectures? If not, is it possible that they will be targeted later?)

Common cons:

  1. not suited for other possible use cases of pointer sized integers.
    (Eh, what are those in safe-mode rust? And in unsafe mode, can we get away with std::libc types?)

Pros and cons specific to each candidate:

intsize/uintsize, like intptr/uintptr, are long and out of place, so as to discourage their usage. But I think such "ugliness" is unnecessary, because:

isize/usize are more in alignment with the rust naming convention, and shorter. They are still somewhat ugly and foreign at first glance, while being specific enough so people won't toss them everywhere.

On the other hand, ssize/size align with C/C++ ssize_t/size_t. Some may consider this an advantage, but for the most part, safe-mode rust doesn't directly use C/C++ type names anyway. And size does occupy the common word "size".

So I lean towards isize/usize.

Regarding an arbitrarily sized Int, in a systems language, people may expect Int to be fix-sized, so I believe BigInt is a better name, signifying the higher cost of such a type.

I am not sure about my stance on the "default integer type" issue, but people must make informed choices consciously. Some "rusty guidelines to integer type selection" in the docs would be great.

@nikomatsakis

This comment has been minimized.

Show comment
Hide comment
@nikomatsakis

nikomatsakis Jan 13, 2014

Contributor

On Sun, Jan 12, 2014 at 12:55:36PM -0800, Daniel Micay wrote:

This is well-explored territory with -ftrapv in C. I expect it
would push Rust significantly behind Java in out-of-the-box
performance. It's easy enough to directly use the LLVM intrinsics I
added for this and branch to abort to compare. I can make
benchmarks and present Rust-specific numbers if that's desired. I
can also include benchmarks of a number type overflowing to a big
integer.

I do not understand how checking for overflow and failing can possibly
be slower than checking for overflow and promoting to a big
integer. Perhaps you can elaborate. And yes, benchmarks are precisely
what I was asking for.

Contributor

nikomatsakis commented Jan 13, 2014

On Sun, Jan 12, 2014 at 12:55:36PM -0800, Daniel Micay wrote:

This is well-explored territory with -ftrapv in C. I expect it
would push Rust significantly behind Java in out-of-the-box
performance. It's easy enough to directly use the LLVM intrinsics I
added for this and branch to abort to compare. I can make
benchmarks and present Rust-specific numbers if that's desired. I
can also include benchmarks of a number type overflowing to a big
integer.

I do not understand how checking for overflow and failing can possibly
be slower than checking for overflow and promoting to a big
integer. Perhaps you can elaborate. And yes, benchmarks are precisely
what I was asking for.

@thestinger

This comment has been minimized.

Show comment
Hide comment
@thestinger

thestinger Jan 13, 2014

Contributor

I do not understand how checking for overflow and failing can possibly be slower than checking for overflow and promoting to a big integer. Perhaps you can elaborate. And yes, benchmarks are precisely what I was asking for.

I'm not saying performing a branch on the contained value and then a check for overflow is faster than the check for overflow. I'm just suggesting that it's worth making benchmarks to measure the cost of both.

Contributor

thestinger commented Jan 13, 2014

I do not understand how checking for overflow and failing can possibly be slower than checking for overflow and promoting to a big integer. Perhaps you can elaborate. And yes, benchmarks are precisely what I was asking for.

I'm not saying performing a branch on the contained value and then a check for overflow is faster than the check for overflow. I'm just suggesting that it's worth making benchmarks to measure the cost of both.

@bstrie

This comment has been minimized.

Show comment
Hide comment
@bstrie

bstrie Feb 14, 2014

Contributor

There's a lot of interrelated concerns here:

  1. Rust needs to have a type that's as large as a pointer. What should it be named?
  2. What should the "default" integer type be, both for integer literal inference, and for accepting and receiving from stdlib APIs (in order to avoid having to sprinkle casts everywhere)?
  3. Should we prefer to encourage signed ints, or unsigned ints?
  4. To what degree should bigints be a first-class type?
  5. On what types do we want to accept the cost of bounds checking?

I personally think that int being the default is a mistake, and I've seen a lot of people recoil when they realize that our "default" integer type has both a maximum size after which it wraps and that size varies by architecture. This is the worst of both worlds.

The easiest way to make immediate progress (not the best, mind you) might be the following:

  1. int -> intptr and uint -> uintptr (let's bikeshed later, we can trivially rename before 1.0, all we need for now is a name that doesn't encourage use).
  2. Convert integer literal inference to make i32 the default integer type (yeah it's super gross, but I really doubt that vast majority of code is going to want to pay for a bigint, and at least this is honest about its limitations).

This lets us punt on the topics of bigints, bounds checking, and signed vs unsigned for a later date.

Contributor

bstrie commented Feb 14, 2014

There's a lot of interrelated concerns here:

  1. Rust needs to have a type that's as large as a pointer. What should it be named?
  2. What should the "default" integer type be, both for integer literal inference, and for accepting and receiving from stdlib APIs (in order to avoid having to sprinkle casts everywhere)?
  3. Should we prefer to encourage signed ints, or unsigned ints?
  4. To what degree should bigints be a first-class type?
  5. On what types do we want to accept the cost of bounds checking?

I personally think that int being the default is a mistake, and I've seen a lot of people recoil when they realize that our "default" integer type has both a maximum size after which it wraps and that size varies by architecture. This is the worst of both worlds.

The easiest way to make immediate progress (not the best, mind you) might be the following:

  1. int -> intptr and uint -> uintptr (let's bikeshed later, we can trivially rename before 1.0, all we need for now is a name that doesn't encourage use).
  2. Convert integer literal inference to make i32 the default integer type (yeah it's super gross, but I really doubt that vast majority of code is going to want to pay for a bigint, and at least this is honest about its limitations).

This lets us punt on the topics of bigints, bounds checking, and signed vs unsigned for a later date.

@thestinger

This comment has been minimized.

Show comment
Hide comment
@thestinger

thestinger Feb 14, 2014

Contributor

What should the "default" integer type be, both for integer literal inference, and for accepting and receiving from stdlib APIs (in order to avoid having to sprinkle casts everywhere)?

I don't think there should be a default fallback, It means you can't trust the compiler to infer the type or give an error, and you have to watch out for bugs from this.

Contributor

thestinger commented Feb 14, 2014

What should the "default" integer type be, both for integer literal inference, and for accepting and receiving from stdlib APIs (in order to avoid having to sprinkle casts everywhere)?

I don't think there should be a default fallback, It means you can't trust the compiler to infer the type or give an error, and you have to watch out for bugs from this.

@cartazio

This comment has been minimized.

Show comment
Hide comment
@cartazio

cartazio Feb 14, 2014

Contributor

agree with @thestinger and @bstrie. having defaulting for literals when theres no type constraints is a mixed bag, one hand its great sometimes (but mostly when using a repl). Othertimes its really unclear/confusing what it can mean.

What about a model where literals are treated as being "polymorphic" if theres no constraints? (this may not make sense in Rust granted), but in haskell / ghc, literals have a generic type until used.

Prelude> :t 1
1 :: Num a => a
Contributor

cartazio commented Feb 14, 2014

agree with @thestinger and @bstrie. having defaulting for literals when theres no type constraints is a mixed bag, one hand its great sometimes (but mostly when using a repl). Othertimes its really unclear/confusing what it can mean.

What about a model where literals are treated as being "polymorphic" if theres no constraints? (this may not make sense in Rust granted), but in haskell / ghc, literals have a generic type until used.

Prelude> :t 1
1 :: Num a => a
@thestinger

This comment has been minimized.

Show comment
Hide comment
@thestinger

thestinger Feb 14, 2014

Contributor

Using a fixed-size integer requires carefully considering whether the application enforces bounds on it. Otherwise, you need a big integer instead. A default fallback type removes this thought process in favour of lazy, incorrect code everywhere. Haskell makes the fallback configurable, but the default is a big integer type.

Contributor

thestinger commented Feb 14, 2014

Using a fixed-size integer requires carefully considering whether the application enforces bounds on it. Otherwise, you need a big integer instead. A default fallback type removes this thought process in favour of lazy, incorrect code everywhere. Haskell makes the fallback configurable, but the default is a big integer type.

@cartazio

This comment has been minimized.

Show comment
Hide comment
@cartazio

cartazio Feb 14, 2014

Contributor

a wider problem in actual haskell code is users choosing to use Int, and then assuming int is 32 or 64bits always :), but yes, defaulting to integer would be wrong for rust

Contributor

cartazio commented Feb 14, 2014

a wider problem in actual haskell code is users choosing to use Int, and then assuming int is 32 or 64bits always :), but yes, defaulting to integer would be wrong for rust

@kballard

This comment has been minimized.

Show comment
Hide comment
@kballard

kballard Feb 14, 2014

Contributor

i32 does not strike me as a good default. The benefit of using a pointer-sized integer as a default is this same default can be used for indexing (as indexes will want to be pointer-sized). C can get away with using a 4-byte int everywhere because it will silently cast as appropriate, but we can't.

Contributor

kballard commented Feb 14, 2014

i32 does not strike me as a good default. The benefit of using a pointer-sized integer as a default is this same default can be used for indexing (as indexes will want to be pointer-sized). C can get away with using a 4-byte int everywhere because it will silently cast as appropriate, but we can't.

@cmr

This comment has been minimized.

Show comment
Hide comment
@cmr

cmr Feb 14, 2014

Member

i suffix on literals isn't that bad, either... feels a bit strange to go
back on that though.

On Fri, Feb 14, 2014 at 3:45 PM, Daniel Micay notifications@github.comwrote:

Using a fixed-size integer requires carefully considering whether the
application enforces bounds on it. Otherwise, you need a big integer
instead. A default fallback type removes this thought process in favour
of lazy, incorrect code everywhere.


Reply to this email directly or view it on GitHubhttps://github.com//issues/9940#issuecomment-35122969
.

Member

cmr commented Feb 14, 2014

i suffix on literals isn't that bad, either... feels a bit strange to go
back on that though.

On Fri, Feb 14, 2014 at 3:45 PM, Daniel Micay notifications@github.comwrote:

Using a fixed-size integer requires carefully considering whether the
application enforces bounds on it. Otherwise, you need a big integer
instead. A default fallback type removes this thought process in favour
of lazy, incorrect code everywhere.


Reply to this email directly or view it on GitHubhttps://github.com//issues/9940#issuecomment-35122969
.

@nmsmith

This comment has been minimized.

Show comment
Hide comment
@nmsmith

nmsmith Feb 14, 2014

I agree that the compiler should not automatically choose an arbitrary, potentially dangerous integer type if it can't infer the type from the context.

nmsmith commented Feb 14, 2014

I agree that the compiler should not automatically choose an arbitrary, potentially dangerous integer type if it can't infer the type from the context.

@l0kod

This comment has been minimized.

Show comment
Hide comment
@l0kod
Contributor

l0kod commented Aug 15, 2014

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment