Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Signed integer indexing and slicing #2249

Open
clarfonthey opened this issue Dec 14, 2017 · 39 comments
Open

Signed integer indexing and slicing #2249

clarfonthey opened this issue Dec 14, 2017 · 39 comments
Labels
T-libs-api Relevant to the library API team, which will review and decide on the RFC.

Comments

@clarfonthey
Copy link
Contributor

I haven't found an RFC or Rust issue for this, so, I figured I'd mention it here. The only ones I've seen suggested panicking on negative values, which would not be desirable.

Is there a reason why there aren't impls for signed indexing? For example, being able to make slices like s[..-2] might be useful. This would still technically be opt-in, as indexing on usize would not have the extra branch, but indexing on isize would.

And just to clarify, negative indexes like -1 and -2 get added to the length of the slice to become real indices, like len - 1 and len - 2. If the value is less than the absolute value of the length, then it would be marked as OOB like any other index.

This seems pretty ergonomic as languages like Python and Bash have been using negative indices for a long time. Technically, this would be a bigger project than simply adding Index and other impls, as the const evaluator would also need to be updated.

If there is desire for this, I'd be happy to write an RFC for it.

@clarfonthey clarfonthey changed the title Signed integer indexing Signed integer indexing and slicing Dec 14, 2017
@clarfonthey
Copy link
Contributor Author

Also should clarify this applies to indexing as well as slicing, i.e. s[-1] would be the last value in the slice.

@Centril Centril added the T-libs-api Relevant to the library API team, which will review and decide on the RFC. label Dec 14, 2017
@Centril
Copy link
Contributor

Centril commented Dec 14, 2017

Sounds like this could be really neat when dealing with things like "index to the player on the left" where you subtract when going to the left or the dining philosophers problem-like cases. Please write that RFC =)

@clarfonthey
Copy link
Contributor Author

@Centril honestly the main reason why I asked is I'm legitimately confused why this hasn't been brought up yet. I feel like there's a reason why this wasn't done that I wasn't able to find, so, I figured I'd ask what people think before writing something up.

@ExpHP
Copy link

ExpHP commented Dec 15, 2017

One thing that's bothersome about Python's solution is that -0 is indistinguishable from 0, when ideally you would want it to represent the index that's one past the end. For me this has caused a number of delightful bugs:

def last_n_items(xs, n):
    return xs[-n:]

last_n_items(range(10), 3) # [7, 8, 9]
last_n_items(range(10), 2) # [8, 9]
last_n_items(range(10), 1) # [9]
last_n_items(range(10), 0) # [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

For this reason I could never see this as being incorporated into rust as indexing by actual negative integers. Maybe indexing by some new enum type with a FromTheEnd variant,, or some new type around usize that simulates one's complement.

@est31
Copy link
Member

est31 commented Dec 16, 2017

Negative indices in C mean something completely different and are super scary (but are defined behaviour!!). Imitating the python behaviour will be surprising to anyone with a C/C++ background and would be a bad idea IMO.

Alternative suggestions I would prefer really much:

  • put it into a wrapping_index or wrapping_slice function that also deals with out of bounds and similar
  • what @ExpHP suggested, having a struct FromEnd(usize) tuple struct (one doesn't need an enum as the index trait can be impl'd for multiple types)

@Centril
Copy link
Contributor

Centril commented Dec 16, 2017

I think we should here first and foremost focus on semantics and not whether or not we should have a wrapper type around usize - that discussion is also nice to have, but first let's figure out if we want "indexing starting from the end" or not.

That said, seq[FromEnd(0)] seems like a nice and clear compromise - perhaps expose it as seq[end(0)] or seq[from_end(0)] so that it reads better.

@est31
Copy link
Member

est31 commented Dec 16, 2017

but first let's figure out if we want "indexing starting from the end" or not.

A question that every proposal to change the language has to answer: why should it be in the standard library instead of being inside a third party crate? I believe it deserves to be in the standard library, especially given that python has support for this too.

I am only against the actual [-1] proposal of exposing this on the surface. About FromEnd vs from_end: structs in Rust are usually given in PascalCase. If from_end is a function that returns FromEnd, its fine by me :).

@Centril
Copy link
Contributor

Centril commented Dec 16, 2017

@est31 In an effort to cheat Wadler's law I will agree with you on the concrete syntax seq[-1]. Let's forego that.

To be clear, what I was proposing was (playground):

// Inside core::ops:

pub trait Index<Idx>
where
    Idx: ?Sized,
{
    type Output: ?Sized;
    fn index(&self, index: Idx) -> &Self::Output;
}

#[derive(Copy, Clone, Debug, Hash, PartialEq, Eq, PartialOrd, Ord)] // <- discuss
pub struct FromEnd<T> {
    pub index: T,
}

pub fn end<T>(index: T) -> FromEnd<T> { // <- bikeshed name
    FromEnd { index }
}

impl<T> From<T> for FromEnd<T> {
    fn from(index: T) -> Self { end(index) }
}

impl<T> Index<FromEnd<usize>> for Vec<T> {
    type Output = T;

    fn index(&self, index: FromEnd<usize>) -> &Self::Output {
        // Panicing is taken care of for us... so no buffer underflow.
        &self[self.len() - 1 - index.index]
    }
}

fn main() {
    let vec = (0..10).collect::<Vec<_>>();
    println!("{:?}", vec.index(end(0)));
    //                    ^- obviously vec[end(0)] but this isn't inside libcore.
}

@Centril
Copy link
Contributor

Centril commented Dec 16, 2017

Hmm.. on second thought, I think vec[from_end(0)] reads better than vec[end(0)]. The former is more verbose, but unambiguous.

@Centril
Copy link
Contributor

Centril commented Dec 16, 2017

@est31

A question that every proposal to change the language has to answer: why should it be in the standard library instead of being inside a third party crate?

To add to your argument for the proposition, if you replace the trait definition above with:

use std::ops::Index;

you run into coherence troubles:

error[E0210]: type parameter `T` must be used as the type parameter for some local type (e.g. `MyStruct<T>`); only traits defined in the current crate can be implemented for a type parameter

Therefore it has to be in libcore, or no dice.

@clarfonthey
Copy link
Contributor Author

I think that having a from_end struct is a nice compromise. For the wrapping_index example, it makes sense that wrapping_index(i) is equivalent to [i % len] which could be possible but imho encourages bad behaviour.

I may go ahead with the from_end proposal.

@Centril
Copy link
Contributor

Centril commented Dec 16, 2017

@clarcharr If you want to co-author I'm interested =) You can find me at #rust-lang @ irc.mozilla.org for quicker discussions.

@clarfonthey
Copy link
Contributor Author

@Centril I will get in touch! :)

I need to rethink this for a bit though. Right now, thinking again, it still makes sense to support signed indices for the sake of ranges. For example, 2..from_end(1) simply won't work. And any method that involves somehow injecting the length of the container will be less useful than just manually subtracting length. In this case, negative indices make sense.

I also question the usefulness of negative indices from a C background. Doing such an index is inherently unsafe, and Rust allowing it leads to suspicion. I feel like that problem could easily be avoided by having a clippy lint that points out negative indexing, potentially enabled alongside a bunch of other "catch-ups for C Devs" lints.

@clarfonthey
Copy link
Contributor Author

@petrochenkov and @koutheir, mind adding any extra thoughts considering how you both downvoted this?

@est31
Copy link
Member

est31 commented Dec 16, 2017

Therefore it has to be in libcore, or no dice.

@Centril good point!

For the wrapping_index example, it makes sense that wrapping_index(i) is equivalent to [i % len] which could be possible but imho encourages bad behaviour.

I have personally could have used wrapping_index (which one can maybe even extend to negative indices, but then one should use eucledian modulo) in some places where I am managing a vec and need to do operations on the vector that are wrapping it. But I guess it isn't as useful as from_end so I believe in the few places its needed, manually doing modulo is more obviously showing what happens.

Doing such an index is inherently unsafe

As I've said in my comment, negative indexing is defined behaviour in C. I've linked to a discussion on stack overflow where the C standard was quoted.

char *c = malloc(40);
c += 20;
c[-1] = 10;
printf("%d\n", c[-1]);

All non-ub "safe" C here, except that you should maybe verify that malloc doesnt return NULL.

@est31
Copy link
Member

est31 commented Dec 16, 2017

Right now, thinking again, it still makes sense to support signed indices for the sake of ranges. For example, 2..from_end(1) simply won't work.

Right now, Range is a struct like

pub struct Range<Idx> {
    pub start: Idx,
    pub end: Idx,
}

It could be extended to be a struct like

pub struct Range<IdxStart, IdxEnd=IdxStart> {
    pub start: IdxStart,
    pub end: IdxEnd,
}

This would be 100% backwards compatible, wouldnt it?

@est31
Copy link
Member

est31 commented Dec 16, 2017

Also, if you went with the signed indices proposal, you'd have to change how the compiler is doing inference when it sees an indexing operation in order to make stuff work.

Right now, impl'ing index for isize will totally break inference:
https://play.rust-lang.org/?gist=c26f1c71e24d878fa811836c22a1503d&version=stable

Although inference breakages are not guaranteed by the language, I think in this case the breakage would be too much to not be fixed by a compiler change :).

I'm not even 100% sure whether inference happens before or after operators are desugared, if it happens after, the required changes would be even more wide-ranging.

@ExpHP
Copy link

ExpHP commented Dec 16, 2017

wrapping_index sounds to me like it has promise, and what's more, it can be experimented with outside of the stdlib.

My thoughts on wrapping_slice (or anything similar like wrapping_index with slice arguments) are:

  • I do not think that i.. should be interpreted as equivalent to i..self.len(), because then you can run into my earlier problem. It should either be (a) not supported, or (b) interpreted as i..infinity.

  • following similar thinking, the only possible error cases should be self.len() == 0 && !range.is_empty() or range.end < range.start

  • There is the question of return type; I guess it needs to return either an iterator over contiguous slices, or an iterator over borrowed elements. I'm leaning towards the latter, because it seems an iterator over slices would once make all those ugly edge cases become front and center once again.


Edit: Actually, I can also think of an alternative model, but I'm not sure if I like it:

  • Only support single indices in -len..len and range indices in -len...len
  • Omitting either bound is equivalent to putting 0 at that bound. This avoids my issues from python...
  • ...but it also means ranges like 3.. are an error. Kinda weird, and it's even wierder for .. (which is an empty range now?!)...

Edit 2: Argh! I just noticed that neither of these models let you do something like [..-1], which is one of my favorite things to do in python. Basically, in either of these models, if you want to take a slice that is O(len) in length, you'll have to write vec.len() somewhere, which is the very thing I want to be able to avoid.

@ExpHP
Copy link

ExpHP commented Dec 16, 2017

I guess there are really multiple distinct things that python's negative indices are useful for, none of which it is really ideal for, and which may each be better served by distinct apis in rust.

My current thoughts are at least two features:

  • Something for wrapped indexing and slicing with signed indices. The slice is considered to extend off into infinity. (this is nicer than python because you don't have to worry about staying confined to the region -len..len)
  • Something for splitting off the last k elements of a slice. (This is nicer than python because it does not exhibit my footgun from earlier). Something like this:
impl<T> [T] {
    pub fn rsplit_at(&self, k: usize) -> (&[T], &[T]) {
        self.split_at(self.len() - k)
    }
}

(note: A disadvantage of rsplit_at compared to wrapping the index type is that it is less amenable to code reuse; you can easily write a function that is generic over range types)

@le-jzr
Copy link

le-jzr commented Dec 17, 2017

WRT original proposal: IMO this would reduce safety by making accidental index underflows harder to detect. Typing -i instead of x.len() - i is not enough of a convenience to offset that.

@scottmcm
Copy link
Member

I'm glad to see this leading away from indexing by mixed ranges. The thing I'm saddest about there are situations like "is v[2..-2] empty?" or "wait, what does v[-2..2] return?"

A bunch of the scenarios I've seen in the thread remind me of iterator adapters, just directly on slices. For example, v[-1] is v.rev()[0], v[..-1] is v.rev()[1..].unrev(), v.wrapping_index(i+1) is v.cycle()[i+1]. Would just need a few things like rev: &[T] -> &RevSlice<T>...

@leonardo-m
Copy link

leonardo-m commented Dec 18, 2017

The very nice solution of D language:

my_arr[$ - 2]

That is equivalent to Python:

my_arr[-2]

In D you can even overload the $ (defining opDollar) if you want to define multi-dimensional tensor-like data structures (you overload each $ for each dimension of the tensor):

https://dlang.org/spec/operatoroverloading.html#dollar

@clarfonthey
Copy link
Contributor Author

@scottmcm that's actually a nice idea! All of these adapters should be constant-time for slice iterators, and perhaps there may be a way to add convenience functions if they're desired.

@ZNackasha
Copy link

ZNackasha commented May 23, 2018

Random person from the internet here
I just wanted to expand on @scottmcm idea of thinking about ranges as iterators.

Basic

range iterator
v[-1] v.rev().nth(0)
v[-2] v.rev().nth(1)

Ranges

range iterator range equivalent
v[1...] v.skip(1) v[1...]
v[ ...-2] v.take(v.count() - 2) v[...n-2)]
v[ -2...] v.skip(v.count() - 2) v[n-2...)]
v[1...-2] v.skip(1).take(v.count() - (1 + 1)) v[1...n-2]

Reverse ranges

range iterator
v[2...0] v.rev().skip(v.count() - 3) .take(2)
v[-1...0] v.rev().take(v.count() - 1)
v[-2...2] v.rev().skip(1).take(v.count() - (1 + 2))
v[-1...-3] v.rev().take(2)
v[-2...-3] v.rev().skip(1).take(1)

Negative zero for inclusive reverse ranges

range iterator
v[-1...-0] v.rev()
v[2...-0] v.rev().skip(v.count() - 3)

I understand that some of these may seem weird but I just wanted to show what could be the equivalent using iterators.

@clarfonthey
Copy link
Contributor Author

I personally really like the idea from D of having a value like $ represent "end of array," then allowing operations like + and - on it.

For example, instead of x[from_end(5)] you'd do x[PastEnd - 5].

@roblabla
Copy link

roblabla commented Oct 8, 2018

C# 8 implemented it with ^. x[^0] is the last element, x[^5] is the fifth element from the end. x[^5..^0] to get a slice. I guess that's similar from D. I find this syntax super elegant.

@scottmcm
Copy link
Member

scottmcm commented Oct 9, 2018

I made a crate to try out the "slice adapter" idea above (#2249 (comment))

let r = [1, 2, 4, 9, 16, 25].rev();
assert_eq!(r[0], 25);
assert_eq!(r[1..3].rev(), &[9, 16]);
assert_eq!(r.split_first().unwrap().0, &25);

let mut it = r.iter().cloned().skip(2);
assert_eq!(it.next(), Some(9));
assert_eq!(it.next(), Some(4));
assert_eq!(it.next(), Some(2));

https://docs.rs/rev_slice/*/rev_slice/

@mathieucaroff
Copy link

Um, I'm from Python where we have iterators (about any container) and sequences (indexable, lengthed containers). I see iterators have a .count(), method, which is a bit confusing, but I'm too lazy to RTFM. So would someone kind link the documentation or explain what is what.

Otherwise, I like the solution used in D. It also exists in GNU Octave, where they reused the keyword end, though I think $ from D would be more appropriated for Rust.

@mark-i-m
Copy link
Member

mark-i-m commented Jan 2, 2019

In general, in Rust you can’t efficiently know the length of an iterator. You just have consume all the elements and count them. This is what the count() method does.

@ExpHP
Copy link

ExpHP commented Jan 3, 2019

Right, it's very different from python's count which in Rust you can simply use a RangeFrom for (using the built-in syntax 0..).

Iterator documentation

@scottmcm
Copy link
Member

The C# example -- which works by having a newtype that means "from the end" reminded me that we have std::cmp::Reverse, so it would be possible to allow my_slice[Reverse(4)] and similar.

(I can't decide if this is a good idea or an abuse of the type, though.)

@leonardo-m
Copy link

To me using something like the D syntax (with # instead of $) seems way better than my_slice[Reverse(4)] and similar.

@samirm
Copy link

samirm commented Dec 21, 2022

did this ever go anywhere?

@SOF3
Copy link

SOF3 commented Dec 22, 2022

You can already implement this yourself by defining a type Reverse and implementing ops::Index<Reverse>/ops::Index<ops::Range<Reverse>>. This doesn't support mixing positive and negative bounds, but this probably doesn't make sense anyway.

@danielfaust
Copy link

danielfaust commented Jan 11, 2023

How about using curly braces like [1..1} (=Python [1:-1]), {5..] (=Python [-5:]), {5..1} (=Python [-5:-1]) denote negative numbers? This would avoid the negative index issue from C and the error prone -0 index from Python ([..0} would equal [..]).

Or normal braces [1..1) instead of curly ones.

I really do use negative indices a lot in Python and always hate it when I have to use .length - x in JavaScript.

This doesn't support mixing positive and negative bounds, but this probably doesn't make sense anyway.

Let's assume you have a list of files in the format:

file_1.jpg
file_2.jpg
...
file_10.jpg
file_11.jpg

Then you can quickly extract the number via [5:-4]

This could even be applied to single elements like {5} = [-5] and {0} would be the last element in the array.


Edit: Wait a minute, I just realized that using 0 would be an out of bounds error, also in the slice examples from the beginning. Using anything with 0} would be an error.

@afetisov
Copy link

@danielfaust This would break literally every bracket matching and cold folding algorithm out there, and looks just as confusing to humans.

@willtennien
Copy link

Even if this doesn't support everything we want, can we get something out?

The discussion seems generally happy with vec[from_end(1)] and vec[2..from_end(1)].

@Quelfth
Copy link

Quelfth commented May 11, 2024

I think that one thing that's ever so very slightly confusing about from_end() is that it seems possibly unclear whether from_end(0) or from_end(1) is the last element. Thus, I propose we implement a unit struct Len and then implement Sub<T> for Len such that Len-i gives FromEnd(i) where FromEnd can be used to index from the end. So now really_long_sequence_name[really_long_sequence_name.len()-1] can be written as just really_long_sequence_name[Len-1] which I think is better than any other proposed solution in that it's just really obvious what it means. This is especially true when compared to solutions like [$-1] and [#-1] where you just have to know the arbitrary meaning of this special character. Of course it is already possible to implement this using a crate, but not if you want to allow mixed bound ranges such as [1..Len-2] which feel valuable. The current range notation won't allow the indices on the two ends to be different types, and I suspect that this has to do with type inference, so if we can't generalize Range and RangeInclusive to allow two different types, then perhaps we could just add several new types to cover all cases of a range where one bound is T and the other is FromEnd<T>.

@NumesSanguis
Copy link

NumesSanguis commented Jul 11, 2024

The discussion seems generally happy with vec[from_end(1)] and vec[2..from_end(1)].

I would suggest against a Rust-only keyword for a quite standard operation like from_end().

As a daily Python programmer, experienced in other languages and a beginner Rust learning going through Rustlings, I was surprised that negative indices are not a thing.

While from_end() sounds descriptive, as someone learning multiple programming languages, I rather not have to remember additional specific keywords only used in Rust. Therefore, in terms of mental load, I would say a negative number is more intuitive if technically possible.

If I have programmed for the past days in a different language and would try to do a negative offset in a slice in Rust, I would start with -x out of muscle memory. When that throws an error, I would think like, hmm was it end()? from_end? last()? from_last()?, tail()?, rear()? etc, as they all make semantic sense. This would be even harder to remember for non-native English speakers.

  • tail is used by e.g. Haskell (tail :: [a] -> [a]) or R (tail(vec, 1))
  • end is used by e.g. PHP end($arr)
  • last is used by e.g. Swift (myArray.last)

Also, end is used in e.g. Ruby like:

def my_method
  puts "Hello, World!"
end

If I ask ChatGPT if from_end is used in any other language:

The function from_end() is not a standard feature in any of the mainstream programming languages. In most commonly used languages, operations that involve accessing elements from the end of a sequence typically use indexing or other specific methods rather than a from_end() function.

Personally, even if from_end would be implemented, I would be slightly more inclined to use .len()-1, because at least .len() is common in other programming languages.
Even as a non-C# programmer, x[^0] seems elegant to me and is human language agnostic.

While it is just the opinion of a single person, I hope it helps the discussion.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
T-libs-api Relevant to the library API team, which will review and decide on the RFC.
Projects
None yet
Development

No branches or pull requests