Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Lazy slicing should not stop on the first hole, or something (say @a[lazy 3..*]) #1268

Open
AlexDaniel opened this issue Nov 24, 2017 · 24 comments
Labels
data types Arrays, lists, hashes, pair objects, etc.

Comments

@AlexDaniel
Copy link
Contributor

AlexDaniel commented Nov 24, 2017

After looking at Raku/doc#1681 I noticed this:

my @letters = <a b c d e f>;
@letters[4]:delete;
say @letters[3..*]
say @letters[lazy 3..*]

Result (2015.09⌁2017.09,456358e3^):

(d)
(d)

Result (456358e, 2017.10,2017.11,HEAD(3166400)):

(d (Any) f)
(d)

I don't understand why it should stop at the first hole. The behavior changed after 456358e, and that seems to be unintentional. I'm really confused as to what should be the right behavior, but my gut feeling is that (d (Any) f) is the right answer.

@AlexDaniel
Copy link
Contributor Author

ping @lizmat @zoffixznet

@zoffixznet
Copy link
Contributor

The behavior changed after 456358e, and that seems to be unintentional

It was intentional. The change was anticipated and I used it as the argument against the optimization, but you said the stopping-at-hole was a bug (there's probably an RT for that).

I agree, (d (Any) f) is what I'd expect to be shown, but don't know of a good way to implement that because IIRC holes are currently just nqp::nulls; just as items past end are.

(IMO the whole holes-in-arrays business should not exist at all and :delete should not work on Arrays; but I guess that ship's sailed)

@AlexDaniel
Copy link
Contributor Author

AlexDaniel commented Nov 24, 2017

Ah, I should've read more as I'm starting to forget things. RT #127573 seems to be identical to this ticket (RT ticket now closed in favor of this issue).

@jnthn
Copy link
Member

jnthn commented Nov 24, 2017

The tricky thing, I think, is that the various slicing operations are implemented in terms of methods on the target to be indexed, such that implementing the various methods is enough to get all of the slicing behavior. We can't just call .elems, because the target might well be lazy/infinite. Stopping at a position that doesn't exist, checked by EXISTS-POS, is usually fine - it's just :delete, which is rare, that can cause a surprise.

I don't know of any alternative options available with the current API. So unless I've missed one, then I think our options are:

  • Just document and roast the behavior as being "stops at the first element that doesn't exist", and live with it.
  • Extend the API we expect containers to implement, in a way that we can provide a default for backwards compatibility, and use that instead of EXISTS-POS.

@tisonkun
Copy link
Contributor

@letters[(lazy 3..*).eager] will block. Either figure out it is not recommended or let it act as @letters[3..*]

@AlexDaniel
Copy link
Contributor Author

@W4anD0eR96 no, not really. (lazy 3..*).eager by itself blocks, because you explicitly asked it to reify an infinite list. There's nothing to fix there really. Well, only if you say that .eager on an infinite Range object should throw instead of blocking forever (in which case we probably need an LTA ticket).

@tisonkun
Copy link
Contributor

@AlexDaniel Yes.

@letters[(lazy 3..5)]; # OUTPUT: (d)
@letters[(lazy 3..5).eager] # OUTPUT: (d (Any) f)

I used to think * refer to elems of the Array in this case.

@jstuder-gh
Copy link
Contributor

From what I gather (and correct me if I am wrong on this), the behavior exhibited in the infinite Range optimization is correct as per the specification and it is the usage of infinite Lists and Seqs as indices that are incorrect. The existing EXISTS-POS implementation doesn't distinguish between deleted values (causing gap in list) and the undefined values past the end of the collection, which is causing this confusion.

As a stop-gap measure, during the reification process, couldn't we continue eagerization if the reified index is less than the number of elems in the collection even if an element evaluates to undefined? I made a commit in my fork (8c0a665) that does what I'm talking about.

It feels a little bit hacky, but it appears to work, at least in the small amount of testing I've done with it.

@zoffixznet
Copy link
Contributor

The existing EXISTS-POS implementation doesn't distinguish

Yes, that's correct.

It feels a little bit hacky

Does to me too. It's going to stop at the end for non-lazy lists, yet stop at the hole on anything lazy. IMO that just moves the goalpost instead of fixing the issue completely. We could make it die like in this case:

m: my @a = lazy 1..10; @a[3]:delete; say @a[*-1]
<camelia> rakudo-moar 78caeb6bc: OUTPUT: «Cannot .elems a lazy list␤  in block <unit> at <tmp> line 1␤␤»

But is that better than stopping at the hole?

Just document and roast the behavior as being "stops at the first element that doesn't exist", and live with it.

👍 on that. This way we can effectively postpone the API extension until some later language version (and the future where we're more performant). Would need to also revert 456358e

I used to think * refer to elems of the Array in this case.

No, here it's a Range object and it's the laziness takes care of reifying only until the elems of the Array. The other form is the WhateverCode, (i.e. *-1) and here the elems of the Array gets passed as an argument.

@jstuder-gh
Copy link
Contributor

m: my @a = lazy 1..10; @a[3]:delete; say @a[*-1] <camelia> rakudo-moar 78caeb6bc: OUTPUT: «Cannot .elems a lazy list␤ in block <unit> at <tmp> line 1␤␤»

This is not an issue with that was introduced with this commit. I do think that the error message could probably be a bit more explicit, in stating that relative indexing on a lazy Positional will not work, however.

It's going to stop at the end for non-lazy lists, yet stop at the hole on anything lazy. IMO that just moves the goalpost instead of fixing the issue completely.

That is very true. I agree that the most consistent thing to do would be to revert 456358e and document as stopping at first undefined. I just wonder if that isn't potentially introducing a problem in the future if the intention is for the behavior to change. By implementing some sort of stop-gap solution, then when in the future the API extension is implemented, there are fewer potential surprises for people that have come to rely on the functionality where iteration is halted at the first undefined gap.

Has gaps:

  1. Infinite Ranges on non-lazy collections.
  2. Lazy Iterables on non-lazy collections.

Stops at first undefined:

  1. Lazy Iterables on lazy collections.

Given the above list, users would only potentially be surprised when the behavior of num 3 changes.

In that case, I think even an imperfect implementation that gets us mostly correct functionality is preferable until it can be solved properly and completely via an EXISTS-POS (or similar) update.

I suppose it's possible that we could have it stop at the first undefined by default, but introduce an adverb in the future that would return all elements until end of collection. Just a thought.

say @a[lazy 3..*]:allow-gaps;

@jnthn
Copy link
Member

jnthn commented Nov 26, 2017

Just a quick note on:

document as stopping at first undefined

This should be "stops at the first element that doesn't exist, because it was deleted". An element that exists can contain a value for which .defined returns False.

@jstuder-gh
Copy link
Contributor

Yeah, you're right. I was conflating existence and definedness a bit.

I don't want to stray too far off the topic, but I just want to be sure that my definitions are correct...

Existence is the presence of a value bound at a position among a List's internal reified elements.
When an element is deleted, nqp::null is bound to the position among the reified elements (accurate to say it is dereified?).

This is, of course different from definedness. An undefined value can be bound at a position (eg. Scalar containing type object Any), thus undefined but existing.

It just so happens that undefined values are returned when querying at a position that lacks existence. But this operation differs greatly from querying at a position where an undefined value was assigned (thus exists), despite appearing the same at a surface-level.

Correct?

@lizmat
Copy link
Contributor

lizmat commented Nov 29, 2017 via email

jstuder-gh added a commit to jstuder-gh/rakudo that referenced this issue Dec 8, 2017
This commit fixes the issue in which using lazy Iterables as Positional
indices will evaluate until the first deleted (undefined) value but not
continue further.

Within the default eagerize routine, it checks the number of reified
elements upon encountering an undefined value to determine whether to
continue.

Addresses [Issue rakudo#1268](rakudo#1268)
@jstuder-gh
Copy link
Contributor

I've submitted a pull request with a commit that I feel addresses this issue in a pretty reasonable manner. It's pretty similar to the commit I had posted earlier in this chain, except that:

  • It checks the number of reified elements directly
  • It hides this logic within the default "eagerize" routine defined in POSITIONS
  • It works when indexing partially-reified lazy Positionals in addition to fully reified

@jstuder-gh
Copy link
Contributor

Hey Rakudo devs, I'd like some advice on whether this seems like a valid approach to addressing this problem. After a previous attempt to solve this issue while working around the problem (see here), I figured I would experiment with how I would solve this issue with a more core differentiation of deleted elements from those that simply have not been reified yet. Note that I'm not submitting this; it's just a test on my forks.

I added a deleted representation to Moar that is essentially the same as MVMNull but, not. Deleted can be bound to to a position on call to DELETE-POS and then subsequent tests of existence would return true.

MoarVM commit:
jstuder-gh/MoarVM@f567907

NQP commit:
jstuder-gh/nqp@0ef9716

Rakudo commit:
jstuder-gh@ca6a8b0

This requires that participating DELETE-POS methods bind nqp::deleted and AT-POS and similar methods are aware of deleted so that they return their default Scalar when it is encountered.

Given these changes, the existing "eagerize" callable in array_slice no longer stops at the first hole when undergoing "eagerization", as EXISTS-POS explicitly checks for null and not deleted.

Of course, the big problem with this is that the redefinition of deletion and existence causes spec tests that delete an element and call :exists on it to fail, since deleted elements no longer lack existence as expected.

I was thinking that instead, we could keep null and deleted ops (as this experiment does), modify the existspos op to have both null and deleted lack existence, and have a new operation that distinguishs between the two to be used in the process of "eagerization". Deleted items still lack existence, but can be distinguished from nulls when necessary.

# In src/core/array_slice.pm

# method name pending
SELF.IN-BOUNDS-POS($idx)

I suppose a potential problem with this is that the IN-BOUNDS-POS method (whatever we could call it) is not documented as important to the Positional interface. Current documentation states you should implement the AT-POS and EXISTS-POS methods for Positionals, with other methods being optional. Would it make sense to add IN-BOUNDS-POS as one of the methods to implement?

Another possibility is that we use the new method in conjunction with the existing EXISTS-POS and use it as an optional fallback.

SELF.EXISTS-POS($idx) || nqp::if( nqp::can(SELF, 'IN-BOUNDS-POS'), SELF.IN-BOUNDS-POS($idx), 0 )

This way, IN-BOUNDS-POS could be implemented on a few Positionals that wish to differentiate between null and deleted (such as Array and List) but not required in order for a Positional to be functional.

Does this seem like a valid approach to the issue?

@zoffixznet
Copy link
Contributor

MoarVM stuff is above my head, so I can't comment on that, but having a "two kinds of nulls" thing is a nice way to solve this problem IMO. 👍

@lizmat
Copy link
Contributor

lizmat commented Dec 19, 2017 via email

@jstuder-gh
Copy link
Contributor

You're right lizmat, those two statements would be different. Hmm, this lazy slicing is a tricky beast.

However this process has taught me a lot about Rakudo and a bit the NQP and Moar, so it's been valuable. I believe in Perl6 and Rakudo and am looking to help however I can, so feel free to nudge me in the right direction if you feel my energies are best spent elsewhere.

Having slicing return Seqs is an interesting idea. I don't know exactly what it would entail to make a complete implementation, but I'm willing take a closer look into it if everyone thinks it's worth pursuing.

@zoffixznet
Copy link
Contributor

always generate Seq’s rather than Lists. In such an overhaul, the “stopping at null for lazy iterators” would be fixed by just generating Nils ad infinitum.

But where would the stopping be done? A Seq that produces Nils ad infinitum doesn't result in this behaviour when I imagine it in my head:

    say <a b c>[1..*]; # OUTPUT: «(b c)»

what it would entail to make a complete implementation

Keep in mind that on top of implementing anything, we also need to maintain compatibility with the language specification. We have some wiggle room with changes affecting new language versions only, but something big might be tough to fit in.

@lizmat
Copy link
Contributor

lizmat commented Dec 19, 2017 via email

@zoffixznet
Copy link
Contributor

Well, that would be a special case anyway, as that’s a Range, and we know through and though how they work.

Ok. I still don't get it, but I'll trust your judgement.

@jstuder-gh
Copy link
Contributor

jstuder-gh commented Dec 23, 2017

I've modified the experiment above to address some of the issues surrounding it, and as a result I think it could be a workable solution. I've changed the name of the new "null" value to "inactive" to separate it deletion a bit but the name could still be better. Unoccupied, Unassigned, Unpopulated, maybe?

Moar commit:
jstuder-gh/MoarVM@e230941

NQP commit:
jstuder-gh/nqp@b507a80

Rakudo commit:
jstuder-gh@0f6f2c7

If I understand the PR correctly, this would make a difference of the value in @A[0] between:

my @a; @a[1] = 42;

and:

my @a = 666,42; @a[0]:delete;

If so, this will only make things more complicated.

I modified Moar so that when zeroing the memory allocated for slots in the array, it is assigning pointers to the VMInactive instance as opposed to null pointers. Since Moar's array data structure also includes the number of elems (from the HLL perspective), even if all the newly allocated slots are set to VMInactive, if the index sent to atpos is greater than the number of elems VMNull is returned. This way the the two examples above are internally consistent.

my @a;
@a[1] = 42;

use nqp;
say nqp::isinactive( nqp::atpos( nqp::getattr(@a, List, '$!reified'), 0 ) );
# OUTPUT: 1

If we have two kinds of nulls, you have 2 things to check for.

I've modified the MVM_is_null function (which is used in all null-checking ops in Moar) so that VMInactive is also considered null. This way, all the ops (from Moar all the way on up to Rakudo) do not need to know about VMInactive and do not need to change.

Only in the context of slicing, where the distinction is important, do we need to modify anything to distinguish between null and inactive (via ACTIVE-POS method). EXISTS-POS and AT-POS work as they did before with no change to their Rakudo implementation.

with [^10] { .[0, 3, 9]:delete; with .Seq { .[lazy 0..100] } }
# OUTPUT: ((Any) 1 2 (Any) 4 5 6 7 8)

One caveat with this development is that it does not work reliably with Spesh right now (presumably due to the setting of newly allocated memory to VMInactive) and running the spectest fails. However, when run with MVM_SPESH_DISABLE env var set, it works well. If we could get this working with Spesh, I think it would be a good solution, personally.

Let me know what you think.

@lizmat
Copy link
Contributor

lizmat commented Dec 24, 2017 via email

@JJ
Copy link
Collaborator

JJ commented May 6, 2020

Does this maybe need a consensus?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
data types Arrays, lists, hashes, pair objects, etc.
Projects
None yet
Development

No branches or pull requests

8 participants