Lazy slicing should not stop on the first hole, or something (say @a[lazy 3..*]) #1268

AlexDaniel · 2017-11-24T19:56:14Z

After looking at Raku/doc#1681 I noticed this:

my @letters = <a b c d e f>;
@letters[4]:delete;
say @letters[3..*]
say @letters[lazy 3..*]

Result (2015.09⌁2017.09,456358e3^):

(d)
(d)

Result (456358e, 2017.10,2017.11,HEAD(3166400)):

(d (Any) f)
(d)

I don't understand why it should stop at the first hole. The behavior changed after 456358e, and that seems to be unintentional. I'm really confused as to what should be the right behavior, but my gut feeling is that (d (Any) f) is the right answer.

The text was updated successfully, but these errors were encountered:

AlexDaniel · 2017-11-24T19:56:42Z

ping @lizmat @zoffixznet

zoffixznet · 2017-11-24T20:21:12Z

The behavior changed after 456358e, and that seems to be unintentional

It was intentional. The change was anticipated and I used it as the argument against the optimization, but you said the stopping-at-hole was a bug (there's probably an RT for that).

I agree, (d (Any) f) is what I'd expect to be shown, but don't know of a good way to implement that because IIRC holes are currently just nqp::nulls; just as items past end are.

(IMO the whole holes-in-arrays business should not exist at all and :delete should not work on Arrays; but I guess that ship's sailed)

AlexDaniel · 2017-11-24T20:24:35Z

Ah, I should've read more as I'm starting to forget things. RT #127573 seems to be identical to this ticket (RT ticket now closed in favor of this issue).

jnthn · 2017-11-24T21:07:50Z

The tricky thing, I think, is that the various slicing operations are implemented in terms of methods on the target to be indexed, such that implementing the various methods is enough to get all of the slicing behavior. We can't just call .elems, because the target might well be lazy/infinite. Stopping at a position that doesn't exist, checked by EXISTS-POS, is usually fine - it's just :delete, which is rare, that can cause a surprise.

I don't know of any alternative options available with the current API. So unless I've missed one, then I think our options are:

Just document and roast the behavior as being "stops at the first element that doesn't exist", and live with it.
Extend the API we expect containers to implement, in a way that we can provide a default for backwards compatibility, and use that instead of EXISTS-POS.

tisonkun · 2017-11-25T16:25:27Z

@letters[(lazy 3..*).eager] will block. Either figure out it is not recommended or let it act as @letters[3..*]

AlexDaniel · 2017-11-25T16:28:13Z

@W4anD0eR96 no, not really. (lazy 3..*).eager by itself blocks, because you explicitly asked it to reify an infinite list. There's nothing to fix there really. Well, only if you say that .eager on an infinite Range object should throw instead of blocking forever (in which case we probably need an LTA ticket).

tisonkun · 2017-11-25T16:33:37Z

@AlexDaniel Yes.

@letters[(lazy 3..5)]; # OUTPUT: (d)
@letters[(lazy 3..5).eager] # OUTPUT: (d (Any) f)

I used to think * refer to elems of the Array in this case.

jstuder-gh · 2017-11-26T07:19:39Z

From what I gather (and correct me if I am wrong on this), the behavior exhibited in the infinite Range optimization is correct as per the specification and it is the usage of infinite Lists and Seqs as indices that are incorrect. The existing EXISTS-POS implementation doesn't distinguish between deleted values (causing gap in list) and the undefined values past the end of the collection, which is causing this confusion.

As a stop-gap measure, during the reification process, couldn't we continue eagerization if the reified index is less than the number of elems in the collection even if an element evaluates to undefined? I made a commit in my fork (8c0a665) that does what I'm talking about.

It feels a little bit hacky, but it appears to work, at least in the small amount of testing I've done with it.

zoffixznet · 2017-11-26T17:09:20Z

The existing EXISTS-POS implementation doesn't distinguish

Yes, that's correct.

It feels a little bit hacky

Does to me too. It's going to stop at the end for non-lazy lists, yet stop at the hole on anything lazy. IMO that just moves the goalpost instead of fixing the issue completely. We could make it die like in this case:

m: my @a = lazy 1..10; @a[3]:delete; say @a[*-1]
<camelia> rakudo-moar 78caeb6bc: OUTPUT: «Cannot .elems a lazy list␤  in block <unit> at <tmp> line 1␤␤»

But is that better than stopping at the hole?

Just document and roast the behavior as being "stops at the first element that doesn't exist", and live with it.

👍 on that. This way we can effectively postpone the API extension until some later language version (and the future where we're more performant). Would need to also revert 456358e

I used to think * refer to elems of the Array in this case.

No, here it's a Range object and it's the laziness takes care of reifying only until the elems of the Array. The other form is the WhateverCode, (i.e. *-1) and here the elems of the Array gets passed as an argument.

jstuder-gh · 2017-11-26T19:42:45Z

m: my @a = lazy 1..10; @a[3]:delete; say @a[*-1] <camelia> rakudo-moar 78caeb6bc: OUTPUT: «Cannot .elems a lazy list␤ in block <unit> at <tmp> line 1␤␤»

This is not an issue with that was introduced with this commit. I do think that the error message could probably be a bit more explicit, in stating that relative indexing on a lazy Positional will not work, however.

It's going to stop at the end for non-lazy lists, yet stop at the hole on anything lazy. IMO that just moves the goalpost instead of fixing the issue completely.

That is very true. I agree that the most consistent thing to do would be to revert 456358e and document as stopping at first undefined. I just wonder if that isn't potentially introducing a problem in the future if the intention is for the behavior to change. By implementing some sort of stop-gap solution, then when in the future the API extension is implemented, there are fewer potential surprises for people that have come to rely on the functionality where iteration is halted at the first undefined gap.

Has gaps:

Infinite Ranges on non-lazy collections.
Lazy Iterables on non-lazy collections.

Stops at first undefined:

Lazy Iterables on lazy collections.

Given the above list, users would only potentially be surprised when the behavior of num 3 changes.

In that case, I think even an imperfect implementation that gets us mostly correct functionality is preferable until it can be solved properly and completely via an EXISTS-POS (or similar) update.

I suppose it's possible that we could have it stop at the first undefined by default, but introduce an adverb in the future that would return all elements until end of collection. Just a thought.

say @a[lazy 3..*]:allow-gaps;

jnthn · 2017-11-26T20:11:19Z

Just a quick note on:

document as stopping at first undefined

This should be "stops at the first element that doesn't exist, because it was deleted". An element that exists can contain a value for which .defined returns False.

jstuder-gh · 2017-11-29T04:53:03Z

Yeah, you're right. I was conflating existence and definedness a bit.

I don't want to stray too far off the topic, but I just want to be sure that my definitions are correct...

Existence is the presence of a value bound at a position among a List's internal reified elements.
When an element is deleted, nqp::null is bound to the position among the reified elements (accurate to say it is dereified?).

This is, of course different from definedness. An undefined value can be bound at a position (eg. Scalar containing type object Any), thus undefined but existing.

It just so happens that undefined values are returned when querying at a position that lacks existence. But this operation differs greatly from querying at a position where an undefined value was assigned (thus exists), despite appearing the same at a surface-level.

Correct?

lizmat · 2017-11-29T22:51:03Z

On 29 Nov 2017, at 05:53, Jeremy Studer ***@***.***> wrote: Yeah, you're right. I was conflating existence and definedness a bit. I don't want to stray too far off the topic, but I just want to be sure that my definitions are correct... Existence is the presence of a value bound at a position among a List's internal reified elements. When an element is deleted, nqp::null is bound to the position among the reified elements (accurate to say it is dereified?).

Never seen it like that, but it feels like a nice description.

This is, of course different from definedness. An undefined value can be bound at a position (eg. Scalar containing type object Any), thus undefined but existing. It just so happens that undefined values are returned when querying at a position that lacks existence. But this operation differs greatly from querying at a position where an undefined value was assigned (thus exists), despite appearing the same at a surface-level.

I also depends on whether it is a List or an Array. If it is an Array, an unreified element should return a container that will be bound to the array as soon as it is assigned to: $ 6 'my @A; @A[3] = 42; my $b := @A[2]; say @A[2]:exists; $b = 42; say @A[2]:exists' False True If it is a List, there are no containers, and you get: $ 6 'my @A; @A[3] = 42; my $l = List.new(@A); my $b := $l[2]; say $l[2]:exists; $b = 42' False Cannot assign to an immutable value in block <unit> at -e line 1

Correct?

Yup, pretty much I think. Liz

This commit fixes the issue in which using lazy Iterables as Positional indices will evaluate until the first deleted (undefined) value but not continue further. Within the default eagerize routine, it checks the number of reified elements upon encountering an undefined value to determine whether to continue. Addresses [Issue rakudo#1268](rakudo#1268)

jstuder-gh · 2017-12-08T03:20:00Z

I've submitted a pull request with a commit that I feel addresses this issue in a pretty reasonable manner. It's pretty similar to the commit I had posted earlier in this chain, except that:

It checks the number of reified elements directly
It hides this logic within the default "eagerize" routine defined in POSITIONS
It works when indexing partially-reified lazy Positionals in addition to fully reified

jstuder-gh · 2017-12-19T06:25:17Z

Hey Rakudo devs, I'd like some advice on whether this seems like a valid approach to addressing this problem. After a previous attempt to solve this issue while working around the problem (see here), I figured I would experiment with how I would solve this issue with a more core differentiation of deleted elements from those that simply have not been reified yet. Note that I'm not submitting this; it's just a test on my forks.

I added a deleted representation to Moar that is essentially the same as MVMNull but, not. Deleted can be bound to to a position on call to DELETE-POS and then subsequent tests of existence would return true.

MoarVM commit:
jstuder-gh/MoarVM@f567907

NQP commit:
jstuder-gh/nqp@0ef9716

Rakudo commit:
jstuder-gh@ca6a8b0

This requires that participating DELETE-POS methods bind nqp::deleted and AT-POS and similar methods are aware of deleted so that they return their default Scalar when it is encountered.

Given these changes, the existing "eagerize" callable in array_slice no longer stops at the first hole when undergoing "eagerization", as EXISTS-POS explicitly checks for null and not deleted.

Of course, the big problem with this is that the redefinition of deletion and existence causes spec tests that delete an element and call :exists on it to fail, since deleted elements no longer lack existence as expected.

I was thinking that instead, we could keep null and deleted ops (as this experiment does), modify the existspos op to have both null and deleted lack existence, and have a new operation that distinguishs between the two to be used in the process of "eagerization". Deleted items still lack existence, but can be distinguished from nulls when necessary.

# In src/core/array_slice.pm

# method name pending
SELF.IN-BOUNDS-POS($idx)

I suppose a potential problem with this is that the IN-BOUNDS-POS method (whatever we could call it) is not documented as important to the Positional interface. Current documentation states you should implement the AT-POS and EXISTS-POS methods for Positionals, with other methods being optional. Would it make sense to add IN-BOUNDS-POS as one of the methods to implement?

Another possibility is that we use the new method in conjunction with the existing EXISTS-POS and use it as an optional fallback.

SELF.EXISTS-POS($idx) || nqp::if( nqp::can(SELF, 'IN-BOUNDS-POS'), SELF.IN-BOUNDS-POS($idx), 0 )

This way, IN-BOUNDS-POS could be implemented on a few Positionals that wish to differentiate between null and deleted (such as Array and List) but not required in order for a Positional to be functional.

Does this seem like a valid approach to the issue?

zoffixznet · 2017-12-19T11:21:11Z

MoarVM stuff is above my head, so I can't comment on that, but having a "two kinds of nulls" thing is a nice way to solve this problem IMO. 👍

lizmat · 2017-12-19T11:43:36Z

On 19 Dec 2017, at 12:21, Zoffix Znet ***@***.***> wrote: MoarVM stuff is above my head, so I can't comment on that, but having a "two kinds of nulls" thing is a nice way to solve this problem IMO. 👍

I’m afraid I have to disagree. If we have two kinds of nulls, you have 2 things to check for. If I understand the PR correctly, this would make a difference of the value in @A[0] between: my @A; @A[1] = 42; and: my @A = 666,42; @A[0]:delete; If so, this will only make things more complicated. FWIW, I think all slicing should be overhauled completely and always generate Seq’s rather than Lists. In such an overhaul, the “stopping at null for lazy iterators” would be fixed by just generating Nils ad infinitum.

jstuder-gh · 2017-12-19T13:52:09Z

You're right lizmat, those two statements would be different. Hmm, this lazy slicing is a tricky beast.

However this process has taught me a lot about Rakudo and a bit the NQP and Moar, so it's been valuable. I believe in Perl6 and Rakudo and am looking to help however I can, so feel free to nudge me in the right direction if you feel my energies are best spent elsewhere.

Having slicing return Seqs is an interesting idea. I don't know exactly what it would entail to make a complete implementation, but I'm willing take a closer look into it if everyone thinks it's worth pursuing.

zoffixznet · 2017-12-19T15:15:25Z

always generate Seq’s rather than Lists. In such an overhaul, the “stopping at null for lazy iterators” would be fixed by just generating Nils ad infinitum.

But where would the stopping be done? A Seq that produces Nils ad infinitum doesn't result in this behaviour when I imagine it in my head:

    say <a b c>[1..*]; # OUTPUT: «(b c)»

what it would entail to make a complete implementation

Keep in mind that on top of implementing anything, we also need to maintain compatibility with the language specification. We have some wiggle room with changes affecting new language versions only, but something big might be tough to fit in.

lizmat · 2017-12-19T21:19:02Z

On 19 Dec 2017, at 16:15, Zoffix Znet ***@***.***> wrote: always generate Seq’s rather than Lists. In such an overhaul, the “stopping at null for lazy iterators” would be fixed by just generating Nils ad infinitum. But where would the stopping be done? A Seq that produces Nils ad infinitum doesn't result in this behaviour when I imagine it in my head: say <a b c>[1..*]; # OUTPUT: «(b c)» what it would entail to make a complete implementation Keep in mind that on top of implementing anything, we also need to maintain compatibility with the language specification. We have some wiggle room with changes affecting new language versions only, but something big might be tough to fit in.

Well, that would be a special case anyway, as that’s a Range, and we know through and though how they work.

zoffixznet · 2017-12-19T23:53:16Z

Well, that would be a special case anyway, as that’s a Range, and we know through and though how they work.

Ok. I still don't get it, but I'll trust your judgement.

jstuder-gh · 2017-12-23T21:49:56Z

I've modified the experiment above to address some of the issues surrounding it, and as a result I think it could be a workable solution. I've changed the name of the new "null" value to "inactive" to separate it deletion a bit but the name could still be better. Unoccupied, Unassigned, Unpopulated, maybe?

Moar commit:
jstuder-gh/MoarVM@e230941

NQP commit:
jstuder-gh/nqp@b507a80

Rakudo commit:
jstuder-gh@0f6f2c7

If I understand the PR correctly, this would make a difference of the value in @A[0] between:
my @a; @a[1] = 42;
and:
my @a = 666,42; @a[0]:delete;
If so, this will only make things more complicated.

I modified Moar so that when zeroing the memory allocated for slots in the array, it is assigning pointers to the VMInactive instance as opposed to null pointers. Since Moar's array data structure also includes the number of elems (from the HLL perspective), even if all the newly allocated slots are set to VMInactive, if the index sent to atpos is greater than the number of elems VMNull is returned. This way the the two examples above are internally consistent.

my @a;
@a[1] = 42;

use nqp;
say nqp::isinactive( nqp::atpos( nqp::getattr(@a, List, '$!reified'), 0 ) );
# OUTPUT: 1

If we have two kinds of nulls, you have 2 things to check for.

I've modified the MVM_is_null function (which is used in all null-checking ops in Moar) so that VMInactive is also considered null. This way, all the ops (from Moar all the way on up to Rakudo) do not need to know about VMInactive and do not need to change.

Only in the context of slicing, where the distinction is important, do we need to modify anything to distinguish between null and inactive (via ACTIVE-POS method). EXISTS-POS and AT-POS work as they did before with no change to their Rakudo implementation.

with [^10] { .[0, 3, 9]:delete; with .Seq { .[lazy 0..100] } }
# OUTPUT: ((Any) 1 2 (Any) 4 5 6 7 8)

One caveat with this development is that it does not work reliably with Spesh right now (presumably due to the setting of newly allocated memory to VMInactive) and running the spectest fails. However, when run with MVM_SPESH_DISABLE env var set, it works well. If we could get this working with Spesh, I think it would be a good solution, personally.

Let me know what you think.

lizmat · 2017-12-24T10:59:22Z

This is really something jnthn should look at. I like the idea of it being abstracted away as far as nqp ops are concerned, but otoh this provides an extra burden on the backends such as JVM and Javascript.

…

On 23 Dec 2017, at 22:50, Jeremy Studer ***@***.***> wrote: I've modified the experiment above to address some of the issues surrounding it, and as a result I think it could be a workable solution. I've changed the name of the new "null" value to "inactive" to separate it deletion a bit but the name could still be better. Unoccupied, Unassigned, Unpopulated, maybe? Moar commit: ***@***.*** NQP commit: ***@***.*** Rakudo commit: ***@***.*** If I understand the PR correctly, this would make a difference of the value in @A[0] between: my @A; @A[1] = 42; and: my @A = 666,42; @A[0]:delete; If so, this will only make things more complicated. I modified Moar so that when zeroing the memory allocated for slots in the array, I am assigning pointers to the VMInactive instance as opposed to null pointers. Since Moar's array data structure also includes the number of elems (from the HLL perspective), even if all the newly allocated slots are set to VMInactive, if the index sent to atpos is greater than the number of elems VMNull is returned. This way the the two examples above are internally consistent. my @A; @A[1] = 42; use nqp; say nqp::isinactive( nqp::atpos( ***@***.***, List, '$!reified'), 0 ) ); # OUTPUT: 1 If we have two kinds of nulls, you have 2 things to check for. I've modified the MVM_is_null function (which is used in all null-checking ops in Moar) so that VMInactive is also considered null. This way, all the ops (from Moar all the way on up to Rakudo) do not need to know about VMInactive and do not need to change. Only in the context of slicing, where the distinction is important, do we need to modify anything to distinguish between null and inactive (via ACTIVE-POS method). EXISTS-POS and AT-POS work as they did before with no change to their Rakudo implementation. One caveat with this development is that it does not work reliably with Spesh right now (presumably due to the setting of newly allocated memory to VMInactive) and running the spectest fails. However, when run with MVM_SPESH_DISABLE env var set, it works well. If could get this working with spesh I think it would be a good solution, personally. Let me know what you think. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.

JJ · 2020-05-06T08:28:50Z

Does this maybe need a consensus?

jstuder-gh mentioned this issue Dec 8, 2017

Issue 1268: Fix lazy slicing stopping at the first deleted #1299

Closed

lucasbuchala added the data types Arrays, lists, hashes, pair objects, etc. label Mar 15, 2019

p6rt mentioned this issue Jan 5, 2020

Lazy range as Array index stops at first deleted index unexpectedly Raku/old-issue-tracker#5144

Closed

cuonglm mentioned this issue May 6, 2020

Lazy range as Array index Raku/roast#334

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Lazy slicing should not stop on the first hole, or something (say @a[lazy 3..*]) #1268

Lazy slicing should not stop on the first hole, or something (say @a[lazy 3..*]) #1268

AlexDaniel commented Nov 24, 2017 •

edited

Loading

AlexDaniel commented Nov 24, 2017

zoffixznet commented Nov 24, 2017

AlexDaniel commented Nov 24, 2017 •

edited

Loading

jnthn commented Nov 24, 2017 •

edited

Loading

tisonkun commented Nov 25, 2017

AlexDaniel commented Nov 25, 2017

tisonkun commented Nov 25, 2017

jstuder-gh commented Nov 26, 2017

zoffixznet commented Nov 26, 2017

jstuder-gh commented Nov 26, 2017

jnthn commented Nov 26, 2017

jstuder-gh commented Nov 29, 2017

lizmat commented Nov 29, 2017 via email

jstuder-gh commented Dec 8, 2017

jstuder-gh commented Dec 19, 2017

zoffixznet commented Dec 19, 2017

lizmat commented Dec 19, 2017 via email

jstuder-gh commented Dec 19, 2017

zoffixznet commented Dec 19, 2017

lizmat commented Dec 19, 2017 via email

zoffixznet commented Dec 19, 2017

jstuder-gh commented Dec 23, 2017 •

edited

Loading

lizmat commented Dec 24, 2017 via email

JJ commented May 6, 2020

Lazy slicing should not stop on the first hole, or something (say @a[lazy 3..*]) #1268

Lazy slicing should not stop on the first hole, or something (say @a[lazy 3..*]) #1268

Comments

AlexDaniel commented Nov 24, 2017 • edited Loading

AlexDaniel commented Nov 24, 2017

zoffixznet commented Nov 24, 2017

AlexDaniel commented Nov 24, 2017 • edited Loading

jnthn commented Nov 24, 2017 • edited Loading

tisonkun commented Nov 25, 2017

AlexDaniel commented Nov 25, 2017

tisonkun commented Nov 25, 2017

jstuder-gh commented Nov 26, 2017

zoffixznet commented Nov 26, 2017

jstuder-gh commented Nov 26, 2017

jnthn commented Nov 26, 2017

jstuder-gh commented Nov 29, 2017

lizmat commented Nov 29, 2017 via email

jstuder-gh commented Dec 8, 2017

jstuder-gh commented Dec 19, 2017

zoffixznet commented Dec 19, 2017

lizmat commented Dec 19, 2017 via email

jstuder-gh commented Dec 19, 2017

zoffixznet commented Dec 19, 2017

lizmat commented Dec 19, 2017 via email

zoffixznet commented Dec 19, 2017

jstuder-gh commented Dec 23, 2017 • edited Loading

lizmat commented Dec 24, 2017 via email

JJ commented May 6, 2020

AlexDaniel commented Nov 24, 2017 •

edited

Loading

AlexDaniel commented Nov 24, 2017 •

edited

Loading

jnthn commented Nov 24, 2017 •

edited

Loading

jstuder-gh commented Dec 23, 2017 •

edited

Loading