Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hard-wired index operator for vectors implicitly converts integral types #10453

Closed
brendanzab opened this issue Nov 13, 2013 · 23 comments · Fixed by #13257
Closed

Hard-wired index operator for vectors implicitly converts integral types #10453

brendanzab opened this issue Nov 13, 2013 · 23 comments · Fixed by #13257
Milestone

Comments

@brendanzab
Copy link
Member

This is legal at the moment:

assert_eq!([1, 2, 3][1i8], 2);
assert_eq!([1, 2, 3][1u8], 2);

Shouldn't we only allow uints here?

@thestinger
Copy link
Contributor

This would be really easy to fix, and I agree. However, it was intentionally written this way based on comments in the code and the fact that it took a few dozen lines of code to add support for it.

@brendanzab
Copy link
Member Author

Oh ok. I'm just thinking that if we get the index trait up to scratch, indexing vectors should be implemented with that rather than hard-coded in the compiler. If that is the case then all those implicit conversions will have to be re-written.

@thestinger
Copy link
Contributor

Well, it would still need to be implemented in the compiler due to the lack of CTFE. I get what you're saying though :).

@thestinger
Copy link
Contributor

(a big +1 for removing any magic from built-in types that we don't plan to expose to non-built-in types)

@brendanzab
Copy link
Member Author

it would still need to be implemented in the compiler due to the lack of CTFE

Oh blast, forgot about that. I guess that shoots down my wish for the numeric operators and casts to be lib-ified too. :(

@nikomatsakis
Copy link
Contributor

It was intentional but I think we should get rid of it. It dates from earlier days before we had integer type inference.

@nikomatsakis
Copy link
Contributor

I agree it interacts poorly with overloaded index.

@alexcrichton
Copy link
Member

Nominating.

@pnkfelix
Copy link
Member

Assigning 1.0, P-backcompat-lang.

@pnkfelix
Copy link
Member

(this topic was discussed over the work week, though we may not have realized that there was already this issue opened for it.)

@pnkfelix pnkfelix added this to the 1.0 milestone Mar 13, 2014
bors added a commit that referenced this issue Apr 2, 2014
The details are outlined in the first commit. 

Closes #10453
@gnzlbg
Copy link
Contributor

gnzlbg commented Apr 6, 2014

So this means I won't be able to use my custom index/integral type that doesn't implicitly convert to unsigned to index std containers?

This seems like a step back. An index is a concept/typeclass, forcing a particular index (like int) seems wrong. Unsigned is even worse. It might seem obvious that if sizes are always >= 0, one should use unsigned types for them. However, this is actually not the case. There is a Q&A in GoingNative 2013 where both Bjarne Stroustrup and Herb Sutter comment on this issue being a big mistake in the design of the C++ STL (the choice of unsigned size/index over signed ones, e.g. for std::size_t and .size() member functions).

The problem is that integer arithmetic on unsigned is just "not natural" and thus more dangerous, and thus the recommended thing to do is use unsigned when you need bit fiddling, and signed integers otherwise. Imposing unsigned integer types in the std container interfaces forces your users to either:

  • use dangerous unsigned integers (possibly everywhere),
  • perform dangerous conversions from safe signed to unsafe unsigned.

I fail to see what the problem is with requiring indices to be an integral. Some times you want to prohibit implicit conversions between ints to strongly type an interface. Since both integers model integral you could (before) still use them naturally to interface with std containers. Now you can't, which gets in the way of being able to write your own Integral types that works with the language and library as if it were a native one.

@nikomatsakis
Copy link
Contributor

On Sun, Apr 06, 2014 at 09:56:46AM -0700, gnzlbg wrote:

So this means I won't be able to use my custom index type to index
std containers?

No, this doesn't mean that at all. This bug concerned only what types
you were permitted to use when indexing the built-in array types (it
used to be you could use any integral type; now you must use uint).

@gnzlbg
Copy link
Contributor

gnzlbg commented Apr 9, 2014

This bug concerned only what types you were permitted to use when indexing the built-in array types

But the same arguments I mention hold for built-in array types:

  • it will be hard/impossible to make user-defined integer types to "feel" like built-in integers since in particular situations they won't be usable,
  • unsigned is still more dangerous than signed (forcing either usage of a dangerous type on users or potentially dangerous signed to unsigned conversions).

@huonw
Copy link
Member

huonw commented Apr 9, 2014

it will be hard/impossible to make user-defined integer types to "feel" like built-in integers since in particular situations they won't be usable,

Indexing a vector is inherently linked to the address space of the machine (since the size of a vector is limited by available address space), which is exactly what uint is for.

User-defined indexable types (which don't necessarily have the same strong link to memory) will be able to overload some traits with whatever index types they like (including generic ones): #6515

@gnzlbg
Copy link
Contributor

gnzlbg commented Apr 9, 2014

Indexing a vector is inherently linked to the address space of the machine (since the size of a vector is limited by available address space), which is exactly what uint is for.

Can I use an int to index a vector? That is:

  • Will either an implicit conversion from int to uint happen?
  • Or will I get a compilation error and:
    • have to perform an unsafe cast from int to uint, or
    • use uints throughout my code?

@thestinger
Copy link
Contributor

If we used signed integers to index slices, the bounds check would be significantly more expensive...

@gnzlbg
Copy link
Contributor

gnzlbg commented Apr 9, 2014

If we used signed integers to index slices, the bounds check would be significantly more expensive...

The only thing that is easier to check is that unsigned integers can't be smaller than zero.

OTOH, checking that they do not overflow the array is hard, since they can be very large.

If people use unsigned for positive things and do math with them (think subtractions) they will wrap around (this is a very common bug), and you will overflow.

If people who know better use signed integers, they will need to cast them back to unsigned for accessing the array. Again, if the signed is negative, the cast will succeed, and accessing the array will overflow too.

If you know both the index and the array bounds at compile-time, doing the check is equally expensive for signed and unsigned types. If you do not, you should only do the check at run-time in debug mode or something (since it will be very expensive), and then again, the difference between signed and unsigned in cost is very small. So the performance argument is weak.

The strong argument is that unsigned integer arithmetic is dangerous, and you want people to prefer signed integers over unsigned ones. Making the interface of a fundamental datatype use unsigned will lead to either people preferring unsigned since they have to interface with your data type (the worst possible thing that could happen), or less worse but still bad: unsafe cast from signed to unsigned.

@ghost
Copy link

ghost commented Jul 11, 2014

If you do not, you should only do the check at run-time in debug mode or something (since it will be very expensive)

Well that would be a major leap backwards. I don't get how you can say "unsigned ints are dangerous" and on the other hand propose that indexing is not validated at runtime ;), which is one of the major curses of previous native languages and have cost the industry literally trillions of dollars.

The strong argument is that unsigned integer arithmetic is dangerous

I don't really get what you mean with "unsafe uint"... How are uints unsafe? Where is the difference between uints overflowing and ints overflowing. In no case, without boundary checks, this will make things "safe" for any of both. Overflowing integers is the problem. They should raise a task fault, unless specified otherwise by the programmer. And indexing should check for boundaries, unless specified otherwise by programmers (perhaps both only in "unsafe" marked methods). For the latter I would go as far as saying "History has proven, programmers should not be able to disable checking boundaries"

In the future, most of these checks might be eliminated by runtime instrumentation or static analysis. It's not a good idea to base a language made for the future (it will take years for rust to go mainstream and once it does it may live as long as C++) on compiler flaws we have to deal with today. Ratehr make it a bit slower and safe, and leave the optimization to the next generations of compilers & VMs.

@gnzlbg
Copy link
Contributor

gnzlbg commented Jul 11, 2014

I don't really get what you mean with "unsafe uint"... How are uints unsafe?

The difference is in the arithmetic. With signed ints subtraction does what most people expect. With unsigned ints this is not the case. This is further complicated if distances between values can be negative, and thus users mix signed ints for distances to offset unsigned indices. IMO just using signed ints for everything is better than mixing both.

Anyhow what I am proposing is to leave the Index type as a concept (or typeclass), such that its type is deduced by the compiler. This will allow the user to chose wether to use ints, uints, or some special int with special rules (think about conversions, big ints, ...).

@ghost
Copy link

ghost commented Jul 11, 2014

You mean if I do this on unsigned: "abs(low - high)" then I have a problem ;). But generally signed ints are just unsigned int with value range shifted a bit. They both have the same problem of not being "natural" because of overflows. I don't think this discussion will get you far. What SHOULD be done instead is trap overflows by default, so that they can't yield valid executions (unless requested explicitly, like for hash algorithms). Even better would be to allow us to specify the machine datatype along with a number range, like "a: u8{1..10}, b: u8{1..30}"... Now if I do this "let c = abs(a - b)" the compiler would know for sure that this code is seriously broken ;). Would be much easier to actually warn for dangerous constructs that are likely to overflow under certain input conditions. While a task fault on overflow is a good last resort, it is still not really pleasing because this might lead to random failures in a production system that could have been caught with a richer type system at compile time.

I agree with you on the deduction though. Quite a bit annoying in RUST atm, not only for this one, but also in general. There are a lot of unnecessary casts that can't possible go wrong but still trigger an error (u8 -> u64, &mut u64 -> u64, etc. for instance), well that's not deduction anymore but from a user perspective it is pretty close.

@gnzlbg
Copy link
Contributor

gnzlbg commented Jul 11, 2014

You mean if I do this on unsigned: "abs(low - high)" then I have a problem ;).

No, I mean that if you work with unsigned ints you will:

  • have people using abs difference and omitting the sign (like you suggest),
  • people using custom difference functions that don't return an unsigned integer but do stuff like a - b is (b > a) ? int(-(b - a)) : int(b - a) to preserve the sign but narrowing the range.
  • people writing generic code who might not know what type do their integers have (or might never think that someone will pass their code an unsigned int).
  • 1% of users who know that some library functions return unsigned integers while others return signed integers and taking extra care when mixing results from both.
  • 99% of the users who just don't know/care. Over-/Underflowing with signed integers requires huge numbers. Underflowing with unsigned integers is as easy as 0 - 1.

Why put your users through this trouble? The only win I see is that for 64bit ints going from signed to unsigned allows you to have twice as large sizes (from 2^63 to 2^64). I don't think this is a big win at all. And it introduces a lot of trouble. People needing this extended range will probably be using Big Integers anyways and thus would also benefit from a parametric index type that allows Big Integers.

Unsigned ints are useful for bit twiddling, but people doing bit twiddling know what they are doing. I really believe that unsigned integers are not useful for array length or indices since the trouble of having to deal with them and their arithmetic (offseting indices) is just not worth it for almost no win in size.

In C++ using unsigned for array/vector/string/container lengths was a mistake that every C++ user has to deal with. A lot of committee members agree that the small win is not worth the burden on the users.

@ghost
Copy link

ghost commented Jul 11, 2014

Well I think we are talking about different things here ;). I totally agree with you that the trouble is not worth is. But what I am really suggesting is the trap on overflow (at least) and maybe even a type system that helps with 99% of the use cases, where I really don't need much range (I seldomly have encountered situations in which I really needed the full signed int range), and in such cases is able to detect at compile time if there can be overflows with the current range restrictions imposed on inputs. Anyway, I am just complaining a bit about the discussion signed/unsigned, while in my POV the real issue is that overflows are allowed at all. I generally think that code which can fail randomly at runtime smells xD. Using signed vs. unsigned just makes those time bombs less likely.

@gnzlbg
Copy link
Contributor

gnzlbg commented Jul 11, 2014

I think that is a different discussion and a quality of implementation issue. In C++ overflow/underflow of signed integers is undefined behavior which means anything could happen. This allows the compiler implementors to optimize as if this couldn't happen in "release" mode, and e.g. terminate the program or throw an exception in debug mode if overflow happens.

flip1995 pushed a commit to flip1995/rust that referenced this issue Mar 24, 2023
Add utility macros to help with writing tests.

Adds two utility macros to help with testing:
* `external` expands to it's argument tokens, but makes them appear to come from an external macro. Helps make tests for `in_external_macro` much more readable.
* `inline_macros` is an attribute macro which allows the use of a pseudo `inline!` macro which expands to it's argument tokens, but makes them appear to be from a crate-local macro expansion. This removes the need to write `macro_rules` boilerplate when testing how lints interact with macros.

---

`external`'s usage is simple. `external!(struct Foo { x: u32});` will make the struct appear as though it came from an external macro. Individual tokens can be escaped if needed. `external!($x + 0 / 10)` will make everything except `x` appear as though it came from an external macro. Can also use `$literal` and `$(tokens...)` as well.

---

`inline_macros` is more complicated due to compiler constraints. Given:
```rust
#[inline_macros]
fn foo() {
    inline!(5 + 5 / 10);
}
```
`inline!(5 + 5 / 10)` will be replace with a call to a generated macro which expands to the contained tokens.

Tokens can be escaped by prefixing them with `$`:
```rust
#[inline_macros]
fn foo() {
    let x = 5;
    inline!($x + 5 / $10);
}
```
This will pass `x` as an `ident` argument and `10` as a `literal` argument.

Token sequences can also be passed with `$(...)`:
```rust
#[inline_macros]
fn foo() {
    let mut x = 5;
    inline!(if $(x >= 5) {
        $x = 5;
    });
}
```
This will pass `x >= 5` as `tt` arguments, and `x` as an `ident` argument.

---

Not 100% sure `inline_macros` is actually worth having. It does make the tests a little easier to read once you're used to it and it becomes more useful once there are multiple macro tests. The verbosity of declaring single use macros starts to hurt at that point.

changelog: None
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants