Signed integer indexing and slicing #2249

clarfonthey · 2017-12-14T21:53:23Z

I haven't found an RFC or Rust issue for this, so, I figured I'd mention it here. The only ones I've seen suggested panicking on negative values, which would not be desirable.

Is there a reason why there aren't impls for signed indexing? For example, being able to make slices like s[..-2] might be useful. This would still technically be opt-in, as indexing on usize would not have the extra branch, but indexing on isize would.

And just to clarify, negative indexes like -1 and -2 get added to the length of the slice to become real indices, like len - 1 and len - 2. If the value is less than the absolute value of the length, then it would be marked as OOB like any other index.

This seems pretty ergonomic as languages like Python and Bash have been using negative indices for a long time. Technically, this would be a bigger project than simply adding Index and other impls, as the const evaluator would also need to be updated.

If there is desire for this, I'd be happy to write an RFC for it.

The text was updated successfully, but these errors were encountered:

clarfonthey · 2017-12-14T21:53:55Z

Also should clarify this applies to indexing as well as slicing, i.e. s[-1] would be the last value in the slice.

Centril · 2017-12-14T22:02:31Z

Sounds like this could be really neat when dealing with things like "index to the player on the left" where you subtract when going to the left or the dining philosophers problem-like cases. Please write that RFC =)

clarfonthey · 2017-12-14T22:08:37Z

@Centril honestly the main reason why I asked is I'm legitimately confused why this hasn't been brought up yet. I feel like there's a reason why this wasn't done that I wasn't able to find, so, I figured I'd ask what people think before writing something up.

ExpHP · 2017-12-15T22:34:07Z

One thing that's bothersome about Python's solution is that -0 is indistinguishable from 0, when ideally you would want it to represent the index that's one past the end. For me this has caused a number of delightful bugs:

def last_n_items(xs, n):
    return xs[-n:]

last_n_items(range(10), 3) # [7, 8, 9]
last_n_items(range(10), 2) # [8, 9]
last_n_items(range(10), 1) # [9]
last_n_items(range(10), 0) # [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

For this reason I could never see this as being incorporated into rust as indexing by actual negative integers. Maybe indexing by some new enum type with a FromTheEnd variant,, or some new type around usize that simulates one's complement.

est31 · 2017-12-16T08:43:21Z

Negative indices in C mean something completely different and are super scary (but are defined behaviour!!). Imitating the python behaviour will be surprising to anyone with a C/C++ background and would be a bad idea IMO.

Alternative suggestions I would prefer really much:

put it into a wrapping_index or wrapping_slice function that also deals with out of bounds and similar
what @ExpHP suggested, having a struct FromEnd(usize) tuple struct (one doesn't need an enum as the index trait can be impl'd for multiple types)

Centril · 2017-12-16T11:15:45Z

I think we should here first and foremost focus on semantics and not whether or not we should have a wrapper type around usize - that discussion is also nice to have, but first let's figure out if we want "indexing starting from the end" or not.

That said, seq[FromEnd(0)] seems like a nice and clear compromise - perhaps expose it as seq[end(0)] or seq[from_end(0)] so that it reads better.

est31 · 2017-12-16T11:42:05Z

but first let's figure out if we want "indexing starting from the end" or not.

A question that every proposal to change the language has to answer: why should it be in the standard library instead of being inside a third party crate? I believe it deserves to be in the standard library, especially given that python has support for this too.

I am only against the actual [-1] proposal of exposing this on the surface. About FromEnd vs from_end: structs in Rust are usually given in PascalCase. If from_end is a function that returns FromEnd, its fine by me :).

Centril · 2017-12-16T12:09:07Z

@est31 In an effort to cheat Wadler's law I will agree with you on the concrete syntax seq[-1]. Let's forego that.

To be clear, what I was proposing was (playground):

// Inside core::ops:

pub trait Index<Idx>
where
    Idx: ?Sized,
{
    type Output: ?Sized;
    fn index(&self, index: Idx) -> &Self::Output;
}

#[derive(Copy, Clone, Debug, Hash, PartialEq, Eq, PartialOrd, Ord)] // <- discuss
pub struct FromEnd<T> {
    pub index: T,
}

pub fn end<T>(index: T) -> FromEnd<T> { // <- bikeshed name
    FromEnd { index }
}

impl<T> From<T> for FromEnd<T> {
    fn from(index: T) -> Self { end(index) }
}

impl<T> Index<FromEnd<usize>> for Vec<T> {
    type Output = T;

    fn index(&self, index: FromEnd<usize>) -> &Self::Output {
        // Panicing is taken care of for us... so no buffer underflow.
        &self[self.len() - 1 - index.index]
    }
}

fn main() {
    let vec = (0..10).collect::<Vec<_>>();
    println!("{:?}", vec.index(end(0)));
    //                    ^- obviously vec[end(0)] but this isn't inside libcore.
}

Centril · 2017-12-16T12:11:33Z

Hmm.. on second thought, I think vec[from_end(0)] reads better than vec[end(0)]. The former is more verbose, but unambiguous.

Centril · 2017-12-16T12:22:19Z

@est31

A question that every proposal to change the language has to answer: why should it be in the standard library instead of being inside a third party crate?

To add to your argument for the proposition, if you replace the trait definition above with:

use std::ops::Index;

you run into coherence troubles:

error[E0210]: type parameter `T` must be used as the type parameter for some local type (e.g. `MyStruct<T>`); only traits defined in the current crate can be implemented for a type parameter

Therefore it has to be in libcore, or no dice.

clarfonthey · 2017-12-16T17:41:17Z

I think that having a from_end struct is a nice compromise. For the wrapping_index example, it makes sense that wrapping_index(i) is equivalent to [i % len] which could be possible but imho encourages bad behaviour.

I may go ahead with the from_end proposal.

Centril · 2017-12-16T17:44:15Z

@clarcharr If you want to co-author I'm interested =) You can find me at #rust-lang @ irc.mozilla.org for quicker discussions.

clarfonthey · 2017-12-16T17:53:53Z

@Centril I will get in touch! :)

I need to rethink this for a bit though. Right now, thinking again, it still makes sense to support signed indices for the sake of ranges. For example, 2..from_end(1) simply won't work. And any method that involves somehow injecting the length of the container will be less useful than just manually subtracting length. In this case, negative indices make sense.

I also question the usefulness of negative indices from a C background. Doing such an index is inherently unsafe, and Rust allowing it leads to suspicion. I feel like that problem could easily be avoided by having a clippy lint that points out negative indexing, potentially enabled alongside a bunch of other "catch-ups for C Devs" lints.

clarfonthey · 2017-12-16T17:56:41Z

@petrochenkov and @koutheir, mind adding any extra thoughts considering how you both downvoted this?

est31 · 2017-12-16T17:59:08Z

Therefore it has to be in libcore, or no dice.

@Centril good point!

For the wrapping_index example, it makes sense that wrapping_index(i) is equivalent to [i % len] which could be possible but imho encourages bad behaviour.

I have personally could have used wrapping_index (which one can maybe even extend to negative indices, but then one should use eucledian modulo) in some places where I am managing a vec and need to do operations on the vector that are wrapping it. But I guess it isn't as useful as from_end so I believe in the few places its needed, manually doing modulo is more obviously showing what happens.

Doing such an index is inherently unsafe

As I've said in my comment, negative indexing is defined behaviour in C. I've linked to a discussion on stack overflow where the C standard was quoted.

char *c = malloc(40);
c += 20;
c[-1] = 10;
printf("%d\n", c[-1]);

All non-ub "safe" C here, except that you should maybe verify that malloc doesnt return NULL.

est31 · 2017-12-16T18:02:22Z

Right now, thinking again, it still makes sense to support signed indices for the sake of ranges. For example, 2..from_end(1) simply won't work.

Right now, Range is a struct like

pub struct Range<Idx> {
    pub start: Idx,
    pub end: Idx,
}

It could be extended to be a struct like

pub struct Range<IdxStart, IdxEnd=IdxStart> {
    pub start: IdxStart,
    pub end: IdxEnd,
}

This would be 100% backwards compatible, wouldnt it?

est31 · 2017-12-16T18:18:47Z

Also, if you went with the signed indices proposal, you'd have to change how the compiler is doing inference when it sees an indexing operation in order to make stuff work.

Right now, impl'ing index for isize will totally break inference:
https://play.rust-lang.org/?gist=c26f1c71e24d878fa811836c22a1503d&version=stable

Although inference breakages are not guaranteed by the language, I think in this case the breakage would be too much to not be fixed by a compiler change :).

I'm not even 100% sure whether inference happens before or after operators are desugared, if it happens after, the required changes would be even more wide-ranging.

ExpHP · 2017-12-16T19:08:55Z

wrapping_index sounds to me like it has promise, and what's more, it can be experimented with outside of the stdlib.

My thoughts on wrapping_slice (or anything similar like wrapping_index with slice arguments) are:

I do not think that i.. should be interpreted as equivalent to i..self.len(), because then you can run into my earlier problem. It should either be (a) not supported, or (b) interpreted as i..infinity.
following similar thinking, the only possible error cases should be self.len() == 0 && !range.is_empty() or range.end < range.start
There is the question of return type; I guess it needs to return either an iterator over contiguous slices, or an iterator over borrowed elements. I'm leaning towards the latter, because it seems an iterator over slices would once make all those ugly edge cases become front and center once again.

Edit: Actually, I can also think of an alternative model, but I'm not sure if I like it:

Only support single indices in -len..len and range indices in -len...len
Omitting either bound is equivalent to putting 0 at that bound. This avoids my issues from python...
...but it also means ranges like 3.. are an error. Kinda weird, and it's even wierder for .. (which is an empty range now?!)...

Edit 2: Argh! I just noticed that neither of these models let you do something like [..-1], which is one of my favorite things to do in python. Basically, in either of these models, if you want to take a slice that is O(len) in length, you'll have to write vec.len() somewhere, which is the very thing I want to be able to avoid.

ExpHP · 2017-12-16T19:38:48Z

I guess there are really multiple distinct things that python's negative indices are useful for, none of which it is really ideal for, and which may each be better served by distinct apis in rust.

My current thoughts are at least two features:

Something for wrapped indexing and slicing with signed indices. The slice is considered to extend off into infinity. (this is nicer than python because you don't have to worry about staying confined to the region -len..len)
Something for splitting off the last k elements of a slice. (This is nicer than python because it does not exhibit my footgun from earlier). Something like this:

impl<T> [T] {
    pub fn rsplit_at(&self, k: usize) -> (&[T], &[T]) {
        self.split_at(self.len() - k)
    }
}

(note: A disadvantage of rsplit_at compared to wrapping the index type is that it is less amenable to code reuse; you can easily write a function that is generic over range types)

le-jzr · 2017-12-17T11:56:20Z

WRT original proposal: IMO this would reduce safety by making accidental index underflows harder to detect. Typing -i instead of x.len() - i is not enough of a convenience to offset that.

scottmcm · 2017-12-17T21:41:47Z

I'm glad to see this leading away from indexing by mixed ranges. The thing I'm saddest about there are situations like "is v[2..-2] empty?" or "wait, what does v[-2..2] return?"

A bunch of the scenarios I've seen in the thread remind me of iterator adapters, just directly on slices. For example, v[-1] is v.rev()[0], v[..-1] is v.rev()[1..].unrev(), v.wrapping_index(i+1) is v.cycle()[i+1]. Would just need a few things like rev: &[T] -> &RevSlice<T>...

leonardo-m · 2017-12-18T20:16:49Z

The very nice solution of D language:

my_arr[$ - 2]

That is equivalent to Python:

my_arr[-2]

In D you can even overload the $ (defining opDollar) if you want to define multi-dimensional tensor-like data structures (you overload each $ for each dimension of the tensor):

https://dlang.org/spec/operatoroverloading.html#dollar

clarfonthey · 2017-12-18T21:23:19Z

@scottmcm that's actually a nice idea! All of these adapters should be constant-time for slice iterators, and perhaps there may be a way to add convenience functions if they're desired.

ZNackasha · 2018-05-23T02:38:11Z

Random person from the internet here
I just wanted to expand on @scottmcm idea of thinking about ranges as iterators.

Basic

range	iterator
v[-1]	v.rev().nth(0)
v[-2]	v.rev().nth(1)

Ranges

range	iterator	range equivalent
v[1...]	v.skip(1)	v[1...]
v[ ...-2]	v.take(v.count() - 2)	v[...n-2)]
v[ -2...]	v.skip(v.count() - 2)	v[n-2...)]
v[1...-2]	v.skip(1).take(v.count() - (1 + 1))	v[1...n-2]

Reverse ranges

range	iterator
v[2...0]	v.rev().skip(v.count() - 3) .take(2)
v[-1...0]	v.rev().take(v.count() - 1)
v[-2...2]	v.rev().skip(1).take(v.count() - (1 + 2))
v[-1...-3]	v.rev().take(2)
v[-2...-3]	v.rev().skip(1).take(1)

Negative zero for inclusive reverse ranges

range	iterator
v[-1...-0]	v.rev()
v[2...-0]	v.rev().skip(v.count() - 3)

I understand that some of these may seem weird but I just wanted to show what could be the equivalent using iterators.

clarfonthey · 2018-05-24T19:53:18Z

I personally really like the idea from D of having a value like $ represent "end of array," then allowing operations like + and - on it.

For example, instead of x[from_end(5)] you'd do x[PastEnd - 5].

roblabla · 2018-10-08T19:11:52Z

C# 8 implemented it with ^. x[^0] is the last element, x[^5] is the fifth element from the end. x[^5..^0] to get a slice. I guess that's similar from D. I find this syntax super elegant.

scottmcm · 2018-10-09T01:36:28Z

I made a crate to try out the "slice adapter" idea above (#2249 (comment))

let r = [1, 2, 4, 9, 16, 25].rev();
assert_eq!(r[0], 25);
assert_eq!(r[1..3].rev(), &[9, 16]);
assert_eq!(r.split_first().unwrap().0, &25);

let mut it = r.iter().cloned().skip(2);
assert_eq!(it.next(), Some(9));
assert_eq!(it.next(), Some(4));
assert_eq!(it.next(), Some(2));

https://docs.rs/rev_slice/*/rev_slice/

mathieucaroff · 2019-01-02T14:34:59Z

Um, I'm from Python where we have iterators (about any container) and sequences (indexable, lengthed containers). I see iterators have a .count(), method, which is a bit confusing, but I'm too lazy to RTFM. So would someone kind link the documentation or explain what is what.

Otherwise, I like the solution used in D. It also exists in GNU Octave, where they reused the keyword end, though I think $ from D would be more appropriated for Rust.

mark-i-m · 2019-01-02T15:40:18Z

In general, in Rust you can’t efficiently know the length of an iterator. You just have consume all the elements and count them. This is what the count() method does.

ExpHP · 2019-01-03T16:41:33Z

Right, it's very different from python's count which in Rust you can simply use a RangeFrom for (using the built-in syntax 0..).

Iterator documentation

scottmcm · 2019-02-24T07:25:10Z

The C# example -- which works by having a newtype that means "from the end" reminded me that we have std::cmp::Reverse, so it would be possible to allow my_slice[Reverse(4)] and similar.

(I can't decide if this is a good idea or an abuse of the type, though.)

leonardo-m · 2019-02-24T08:49:50Z

To me using something like the D syntax (with # instead of $) seems way better than my_slice[Reverse(4)] and similar.

samirm · 2022-12-21T22:32:41Z

did this ever go anywhere?

SOF3 · 2022-12-22T06:34:32Z

You can already implement this yourself by defining a type Reverse and implementing ops::Index<Reverse>/ops::Index<ops::Range<Reverse>>. This doesn't support mixing positive and negative bounds, but this probably doesn't make sense anyway.

danielfaust · 2023-01-11T20:55:59Z

How about using curly braces like [1..1} (=Python [1:-1]), {5..] (=Python [-5:]), {5..1} (=Python [-5:-1]) denote negative numbers? This would avoid the negative index issue from C and the error prone -0 index from Python ([..0} would equal [..]).

Or normal braces [1..1) instead of curly ones.

I really do use negative indices a lot in Python and always hate it when I have to use .length - x in JavaScript.

This doesn't support mixing positive and negative bounds, but this probably doesn't make sense anyway.

Let's assume you have a list of files in the format:

file_1.jpg
file_2.jpg
...
file_10.jpg
file_11.jpg

Then you can quickly extract the number via [5:-4]

This could even be applied to single elements like {5} = [-5] and {0} would be the last element in the array.

Edit: Wait a minute, I just realized that using 0 would be an out of bounds error, also in the slice examples from the beginning. Using anything with 0} would be an error.

afetisov · 2023-01-12T12:17:17Z

@danielfaust This would break literally every bracket matching and cold folding algorithm out there, and looks just as confusing to humans.

willtennien · 2023-05-24T03:42:09Z

Even if this doesn't support everything we want, can we get something out?

The discussion seems generally happy with vec[from_end(1)] and vec[2..from_end(1)].

Quelfth · 2024-05-11T09:51:47Z

I think that one thing that's ever so very slightly confusing about from_end() is that it seems possibly unclear whether from_end(0) or from_end(1) is the last element. Thus, I propose we implement a unit struct Len and then implement Sub<T> for Len such that Len-i gives FromEnd(i) where FromEnd can be used to index from the end. So now really_long_sequence_name[really_long_sequence_name.len()-1] can be written as just really_long_sequence_name[Len-1] which I think is better than any other proposed solution in that it's just really obvious what it means. This is especially true when compared to solutions like [$-1] and [#-1] where you just have to know the arbitrary meaning of this special character. Of course it is already possible to implement this using a crate, but not if you want to allow mixed bound ranges such as [1..Len-2] which feel valuable. The current range notation won't allow the indices on the two ends to be different types, and I suspect that this has to do with type inference, so if we can't generalize Range and RangeInclusive to allow two different types, then perhaps we could just add several new types to cover all cases of a range where one bound is T and the other is FromEnd<T>.

NumesSanguis · 2024-07-11T14:04:32Z

The discussion seems generally happy with vec[from_end(1)] and vec[2..from_end(1)].

I would suggest against a Rust-only keyword for a quite standard operation like from_end().

As a daily Python programmer, experienced in other languages and a beginner Rust learning going through Rustlings, I was surprised that negative indices are not a thing.

While from_end() sounds descriptive, as someone learning multiple programming languages, I rather not have to remember additional specific keywords only used in Rust. Therefore, in terms of mental load, I would say a negative number is more intuitive if technically possible.

If I have programmed for the past days in a different language and would try to do a negative offset in a slice in Rust, I would start with -x out of muscle memory. When that throws an error, I would think like, hmm was it end()? from_end? last()? from_last()?, tail()?, rear()? etc, as they all make semantic sense. This would be even harder to remember for non-native English speakers.

tail is used by e.g. Haskell (tail :: [a] -> [a]) or R (tail(vec, 1))
end is used by e.g. PHP end($arr)
last is used by e.g. Swift (myArray.last)

Also, end is used in e.g. Ruby like:

def my_method
  puts "Hello, World!"
end

If I ask ChatGPT if from_end is used in any other language:

The function from_end() is not a standard feature in any of the mainstream programming languages. In most commonly used languages, operations that involve accessing elements from the end of a sequence typically use indexing or other specific methods rather than a from_end() function.

Personally, even if from_end would be implemented, I would be slightly more inclined to use .len()-1, because at least .len() is common in other programming languages.
Even as a non-C# programmer, x[^0] seems elegant to me and is human language agnostic.

While it is just the opinion of a single person, I hope it helps the discussion.

clarfonthey changed the title ~~Signed integer indexing~~ Signed integer indexing and slicing Dec 14, 2017

Centril added the T-libs-api Relevant to the library API team, which will review and decide on the RFC. label Dec 14, 2017

tinoism mentioned this issue Jul 3, 2019

Game of life live_neighbor_count wrong neighbour index rustwasm/book#177

Closed

Anders429 mentioned this issue Dec 25, 2020

Negative indexing Anders429/substring#6

Closed

adlerjohn mentioned this issue Feb 4, 2021

Initial primitive types FuelLabs/sway#4

Closed

mikex-oss mentioned this issue Jun 18, 2024

Allow negative indexing of arrays for negative offset google/xls#1491

Open

Signed integer indexing and slicing #2249

Signed integer indexing and slicing #2249

Comments

clarfonthey commented Dec 14, 2017

clarfonthey commented Dec 14, 2017

Centril commented Dec 14, 2017 • edited Loading

clarfonthey commented Dec 14, 2017

ExpHP commented Dec 15, 2017 • edited Loading

est31 commented Dec 16, 2017

Centril commented Dec 16, 2017 • edited Loading

est31 commented Dec 16, 2017

Centril commented Dec 16, 2017 • edited Loading

Centril commented Dec 16, 2017

Centril commented Dec 16, 2017

clarfonthey commented Dec 16, 2017

Centril commented Dec 16, 2017

clarfonthey commented Dec 16, 2017

clarfonthey commented Dec 16, 2017

est31 commented Dec 16, 2017 • edited Loading

est31 commented Dec 16, 2017

est31 commented Dec 16, 2017 • edited Loading

ExpHP commented Dec 16, 2017 • edited Loading

ExpHP commented Dec 16, 2017 • edited Loading

le-jzr commented Dec 17, 2017 • edited Loading

scottmcm commented Dec 17, 2017

leonardo-m commented Dec 18, 2017 • edited Loading

clarfonthey commented Dec 18, 2017

ZNackasha commented May 23, 2018 • edited Loading

Basic

Ranges

Reverse ranges

Negative zero for inclusive reverse ranges

clarfonthey commented May 24, 2018

roblabla commented Oct 8, 2018

scottmcm commented Oct 9, 2018

mathieucaroff commented Jan 2, 2019

mark-i-m commented Jan 2, 2019

ExpHP commented Jan 3, 2019

scottmcm commented Feb 24, 2019

leonardo-m commented Feb 24, 2019

samirm commented Dec 21, 2022

SOF3 commented Dec 22, 2022 • edited Loading

danielfaust commented Jan 11, 2023 • edited Loading

afetisov commented Jan 12, 2023

willtennien commented May 24, 2023

Quelfth commented May 11, 2024

NumesSanguis commented Jul 11, 2024 • edited Loading

Centril commented Dec 14, 2017 •

edited

Loading

ExpHP commented Dec 15, 2017 •

edited

Loading

Centril commented Dec 16, 2017 •

edited

Loading

Centril commented Dec 16, 2017 •

edited

Loading

est31 commented Dec 16, 2017 •

edited

Loading

est31 commented Dec 16, 2017 •

edited

Loading

ExpHP commented Dec 16, 2017 •

edited

Loading

ExpHP commented Dec 16, 2017 •

edited

Loading

le-jzr commented Dec 17, 2017 •

edited

Loading

leonardo-m commented Dec 18, 2017 •

edited

Loading

ZNackasha commented May 23, 2018 •

edited

Loading

SOF3 commented Dec 22, 2022 •

edited

Loading

danielfaust commented Jan 11, 2023 •

edited

Loading

NumesSanguis commented Jul 11, 2024 •

edited

Loading