Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow negative indices in indexing and slicing like python #358

Closed
baronfel opened this issue Oct 20, 2016 · 29 comments
Closed

Allow negative indices in indexing and slicing like python #358

baronfel opened this issue Oct 20, 2016 · 29 comments

Comments

@baronfel
Copy link
Contributor

baronfel commented Oct 20, 2016

Submitted by Gustavo Guerra on 11/6/2014 12:00:00 AM
39 votes on UserVoice prior to migration

Example: permit usage of a.[..-2] instead of a.[..a.Length-1] The compiler could just do that conversion behind the scenes, so it would work with existing types that have custom indexing and slicing

Original UserVoice Submission
Archived Uservoice Comments

@dsyme dsyme removed the open label Oct 29, 2016
@nosami
Copy link
Member

nosami commented Oct 30, 2016

I wanted this a couple of times in the last few months. Was surprised when I discovered that this wasn't supported.

@dsyme
Copy link
Collaborator

dsyme commented Mar 1, 2017

See also the good discussion on #518

@rmunn
Copy link

rmunn commented Mar 4, 2017

It's not always true that the compiler "could just do that conversion behind the scenes", if the value that could be negative is in a variable. Code like a.[..-2] can be converted behind the scenes, but code like let x = -2 ; a.[..x] could not.

The question is whether let x = -2 ; a.[..x] is useful or not. (Of course, this trivial example isn't really useful, but in real usage x would come from a parameter, or be calculated elsewhere.) And the second question is, if it is useful, whether it's useful enough to be worth the extra implementation cost (which would be high)?

My gut feeling is that a.[..x] (where x might be negative, but the compiler can't know for sure) does not turn out to be nearly as useful as a.[..-2]. When I have used Python's negative index feature, it has almost always been as integer literals, e.g., to remove the first and last characters from a string (s = "[brackets]" ; s2 = s[1:-1]). In F#, of course, that would correspond to s.[1..-2] since F# uses inclusive indices instead of half-open indices like Python does.

@charlesroddie
Copy link

charlesroddie commented Aug 21, 2017

This syntax is terrible.

x..y slicing means: take the integers i with 0≤i≤y and construct the subsequence corresponding to these indices. This form of slicing is implemented, makes logical sense, and cannot be changed without breaking code. This includes 0..y.

..y slicing currently has strange behaviour that leads to failing on ..y when y<0.

This suggestion is to take this strange behavior, and see an opportunity to hack it to make negative numbers do something completely different from weakly positive numbers and 0..y slicing completely different from ..y slicing.

If l = [0;1;2] then l.[..2] = [0;1;2], l[..1] = [0;1], l.[..0]=[0], and according to this suggestion l.[-1]=[0;1]. That doesn't make any sense. Also l.[.. -2] would "mean" l.[.. 1-2]=l.[.. -1], which would "mean" l.[..1]. Modulo arithmetic doesn't match up well with inequalities.

Having a shorthand to avoid a.Length could be useful, and similarly having simpler reverse slicing could be useful, but the syntax should not overlap inconsistently with existing slicing.

@Ciantic
Copy link

Ciantic commented Jul 12, 2018

My mind tried this syntax to get last three characters:

str = "abcdefg"
str.[-3..] // expected to get "efg"

and this would get third of last, and second of last:

str.[-3..-1] // expected to get "ef"

This would match the substr e.g. in PHP:

<?php

echo substr("abcdefg", -3); // efg
echo "\n";
echo substr("abcdefg", -3, -1); // ef
echo "\n";

@jwosty
Copy link
Contributor

jwosty commented Jul 12, 2018

@rmunn Perhaps the compiler could could add a branch in there, and compile a.[..x] (where x is not a literal) as:

let i = if x > 0 then x else a.Length - x
a.[0..i]

@charlesroddie
Copy link

charlesroddie commented Jul 12, 2018

Slicing was recently made into something logical with the implementation of consistent slicing rfc. Please see the current definition of the slicing operations there.

We should not descend to the level of PHP.

By all means we can define define SubStringOfLength(i,length), where i is defined modulo the string length, and length is at least 0, and SubStringBetweenPositions(i,j), where i and j are modulo the string length. But these clearly should be two functions, not one, and secondly they do not substitute for slicing. For slicing "abc".[0..-1] is valid under any reasonable definition, equal to "", and not equal to "abc".[0..2] which is "abc".

@charlesroddie
Copy link

@cartermp
Copy link
Member

The concept of negative indices; i.e., "from the end", is something we should support. There are a few things to consider:

Syntax

The Range type for C#/.NET will use ^x to mean foo.Length-x as the upper bound. In Python, -x is used to represent the same thing.

I couldn't find other examples in other languages.

Using -x has the advantage of being immediately familiar to anyone who has used Python since long ago.

Using ^x has the advantage of being consistent with the C#/.NET implementation in terms of syntax. This may be troublesome for people who define their own operator .^:

> let ( .^ ) x y = x + y;;                                                                                                                                                             
val ( .^ ) : x:int -> y:int -> int  

Given this and the ubiquity of Python I would support -x as the syntax.

Allow in the start

Python also supports negative indices as the first value. Its meaning is no different; it simply means "from the start":

>>> lst = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
>>> lst[-5:-4]
[6]

The example given is a bit silly, but it is consistent. Once the end index becomes more negative than the start index, the result is an empty list.

I think it's perfectly fine to allow a negative index in the start.

Edge cases

Neither Python nor C#/.NET will support indices that are out of bounds of the collection. It should be a fairly common sense rule that giving an index of -50 to a list of size 3 results in an exception. This should be the same in F#.

In Python, if the end index is more negative than the start index (when both are negative), the result is an empty list. It could also make sense for that to be an error.

Both Python and C#/.NET have the upper bound as exclusive. F# upper bounds are inclusive, which means that code in Python or C#/.NET is not directly translatable.

@charlesroddie
Copy link

charlesroddie commented Jan 21, 2019

I vote for ^.

  • Consistency with .Net is more important than with Python.
  • Using -x as the syntax would restrict functionality, as it would only allow (end) .. (beginning) or (end) .. (end) syntax, since 0 .. -1 has to be interpreted as (beginning) .. (beginning). Using ^ would allow (beginning) .. (end) ranges.

@cartermp
Copy link
Member

Using -x as the syntax would restrict functionality, as it would only allow (end) .. (beginning) or (end) .. (end) syntax, since 0 .. -1 has to be interpreted as (beginning) .. (beginning). Using ^ would allow (beginning) .. (end) ranges.

0 .. -1 need not be interpreted as (beginning) .. (beginning). It would mean, "from the 0th place from the start to the place that is 1 from the end". This is how it works in Python:

>>> lst                                                                                                                                                                                
[1, 2, 3, 4, 5, 6, 7, 8, 9, 10]                                                                                                                                                        
>>> lst[0:-1]                                                                                                                                                                          
[1, 2, 3, 4, 5, 6, 7, 8, 9]  

The question of syntax isn't necessarily driven by consistency. IIRC C# had to pick ^x because the other good options ended up being ambiguous. Nobody is particularly pleased with it, but it's what was possible. If we don't end up having the same restriction, there's no particular reason to be consistent with a new syntax for a language that is just now adopting the concept of slicing.

@charlesroddie
Copy link

charlesroddie commented Jan 21, 2019

l.[0 .. -1] must equal [] for logical reasons. This is not the case in Python which has an exclusive upper bound as you mentioned. This is the most important argument.

It's also the case that redefining l.[i .. j] in cases where it current doesn't fail would be a breaking change.

I like C#'s approach better then Python's because in Python you specify which definition you want to use inside the data, and in C# you choose which definition you want to use explicitly.

@cartermp
Copy link
Member

I don't understand what "logical reasons" means. Regardless of syntax, -1/^1 means "one from the end", where "end" is determined by existing F# semantics.

Another tricky thing to consider is that if we were to adopt the same syntax as C# with ^x, we'd still have the ability to write code that uses -x, but instead of doing the same thing it would always just produce [] as the result. That's quite the wart in the language.

It's also the case that redefining l.[i .. j] in cases where it current doesn't fail would be a breaking change.

While true, we do have a precedent for introducing breaking changes like this (hell, even source breaking changes) so long as they are communicated up front, released in advance for people to trial, and allow for some expanded functionality. An example is the byref<'T> changes in F# 4.5.

Given that the next F# language version is likely to be F# 5.0, we do have an opportunity to do a similar thing.

@charlesroddie
Copy link

charlesroddie commented Jan 22, 2019

I have posted a lot about this issue because while [0 .. -1] has always been logicaly implemented, [.. -1] hasn't; hence the issue linked to and the RFC and the in-progress work. Let me copy and paste some of this argument without reference to [..i].

-1 currently means 1 before the beginning. This meaning is completely valid as an end case. This is equivalent to 0 in a language with an exclusive end case.

let l=[1;2]
l.[0..-1] // gives [], the natural initial condition
l.[0..0] // gives [1]
l.[0..1] // gives [1;2]

If l.[0..-1] were changed to give some other set, or to fail, completely logical F# code would break and would need to be hacked out of shape.

As with all failures to use the best initial case (humans before the discovery of 0, difficulty some have with 0!=1, the product of an empty set, etc.) it creates gaps in logic that require workarounds.

For example if n is a position in the string s, you would expect s.[0..(n-1)] to be the substring prior to n. This requires s.[0..-1]="".

@cartermp
Copy link
Member

I'm afraid I still don't understand what "logical" means in this context. What logic is at play, how is that expected by users, and why is it important that it remains so aside from compatibility?

Put differently, I find the entire notion of 0..-1 to be nonsensical in the context of data types such as lists and arrays. After all, -1 is not a valid index in an array, so why would 0..-1 be a valid subarray? I don't see a terrible amount of logic in this entire feature, but it's still incredibly useful and has a precedent set by one of the most popular languages in the world.

@charlesroddie
Copy link

charlesroddie commented Jan 22, 2019

What logic is at play

Logical here: allowing a clear definition that can be reasononed about, which extends to its natural initial condition of an empty sequence.

how is that expected by users

I gave an example just above: if n is a position in the string s, you would expect s.[0..(n-1)] to be the substring prior to n. This requires s.[0..-1]="": the part of a string before the first character.

a precedent set by one of the most popular languages in the world.

Since F#'s range end is inclusive and Python's is exclusive, 0..n in F# is equivalent to 0..n+1 in Python. So 0..-1 in F# is equivalent to 0..0 in Python and both return []. So Python is not illogical. If F# adopted an exclusive end at the same time as redefining the usage of negative numbers it would be merely breaking and hacky but not illogical.

Just think about when you want slices to return empty sequences, and what numbers in F# would give this. In Python it's n..n and in F# it's n..n-1.

@cartermp
Copy link
Member

@charlesroddie Just so I've got it right, you don't have any opposition to the concept of allowing "from the end" slicing, yes? Just want to make sure syntax v. semantics is cleanly separated here.

I gave an example just above: if n is a position in the string s, you would expect s.[0..(n-1)] to be the substring prior to n. This requires s.[0..-1]="", the part of a string before the first character.

There are a three problems with this:

  • There is no definition of the empty string that defines it as a part of a string before the first character, at least not that I'm aware of. This extends to lists/arrays/etc
  • This position assumes that -1 has a set arthmetic meaning, but the proposal is to define that meaning as "1 from the end", not "the -1th index". This difference leads to apples-to-oranges comparisons later in your argument
  • Allowing for negative indices to be "range"-able is breaching into territory that is not consistent with how indexing works today

I don't really see this as a compelling argument against -x syntax and what it would mean. I think the main question comes down to compatibility with existing behavior, which we already know to be inconsistent between lists and strings/arrays/2d arrays.

@cartermp
Copy link
Member

Also to consider- negative index at the end of a range could be done, but a negative index by itself could not. This implies the need for something else like ^x, otherwise it would have an inconsistent feel and probably suboptimal experience.

@charlesroddie
Copy link

charlesroddie commented Jan 22, 2019

you don't have any opposition to the concept of allowing "from the end" slicing

No objection.

There is no definition of the empty string that defines it as a part of a string before the first character, at least not that I'm aware of. This extends to lists/arrays/etc

Do you seriously want to say that there is no position in a string such that the substring consisting of characters before this position is empty? (/List/Array...)

This will give a languge that is less logical than Python and C#, a language that is supposed to be mathsy and recursive but doesn't understand base cases.

we already know to be inconsistent between lists and strings/arrays/2d arrays.

This is only for ..x which is currently being corrected.

@cartermp

This comment has been minimized.

@charlesroddie

This comment has been minimized.

@charlesroddie

This comment has been minimized.

@cartermp
Copy link
Member

@charlesroddie This is now far off-topic. I'll ask that if you wish to continue this discussion, it be done in a separate issue, perhaps in the fslang-design repo here: fsharp/fslang-design#217

@cartermp
Copy link
Member

cartermp commented Oct 4, 2019

@dsyme any thoughts on this with ^ syntax? @nelson-wu has written a draft proposal and investigated what it would take to implement this; it's quite a bit of work but certainly tractable and something that could be done by F# 5 preview 1/.NET 5 preview 1

@charlesroddie
Copy link

^ is good and specification in the RFC looks right.

@dsyme
Copy link
Collaborator

dsyme commented Oct 9, 2019

@dsyme any thoughts on this with ^ syntax? @nelson-wu has written a draft proposal and investigated what it would take to implement this; it's quite a bit of work but certainly tractable and something that could be done by F# 5 preview 1/.NET 5 preview 1

This is acceptable with the ^ syntax.

I believe the biggest problem will be about lexing the ..^ token. It lexes as a single token today.

(TBH I would even be happy with the new feature breaking the 0.00000001% of code that might define that operator)

@TIHan
Copy link

TIHan commented Nov 13, 2019

Comments on the current design in the RFC:


Do you think GetIndexFromEnd would be a better name? I think it's clearer on its intent. GetReverseIndex is ok; though, it means getting the index from the end with the specified rank and index.


GetReverseIndexs current pseudo-signature is effectively, rank: int * index: ? -> ?.
I think this is fine and works for everything. But, there is something to consider. Take this example, assuming xs is a custom type that has implemented the slicing and from-end methods:

xs.[^2..^1]

This will call our GetReverseIndex method twice which is normal. The downside is that getting the rank/dimension data on each call could be expensive on our custom type; it will have to get it twice when we technically only need it once.

A similar thing can also be said when getting the slice when calculating the from-end index. As an example after de-sugaring:

xs.GetSlice(Some(xs.GetReverseIndex(0, 2)), Some(xs.GetReverseIndex(0, 1))

For a F# list type, we could be doing more work than what is necessary to get the end result.

We need to strongly consider a design that only makes a single call to an extension method in order to avoid issues of potentially doing a lot of work on a slice.

@cartermp
Copy link
Member

Closing this as it is now implemented and will be in the next F# vNext preview.

@stefan-schweiger
Copy link

stefan-schweiger commented Jun 7, 2022

@cartermp when will this feature be out of preview? You still can't use it with Lang Version 6.0. I would hope after 2.5 years this would maybe be ready 😄

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

10 participants