-
-
Notifications
You must be signed in to change notification settings - Fork 373
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Lazy slicing should not stop on the first hole, or something (say @a[lazy 3..*]) #1268
Comments
ping @lizmat @zoffixznet |
It was intentional. The change was anticipated and I used it as the argument against the optimization, but you said the stopping-at-hole was a bug (there's probably an RT for that). I agree, (IMO the whole holes-in-arrays business should not exist at all and |
Ah, I should've read more as I'm starting to forget things. RT #127573 seems to be identical to this ticket (RT ticket now closed in favor of this issue). |
The tricky thing, I think, is that the various slicing operations are implemented in terms of methods on the target to be indexed, such that implementing the various methods is enough to get all of the slicing behavior. We can't just call I don't know of any alternative options available with the current API. So unless I've missed one, then I think our options are:
|
|
@W4anD0eR96 no, not really. |
@AlexDaniel Yes.
I used to think |
From what I gather (and correct me if I am wrong on this), the behavior exhibited in the infinite Range optimization is correct as per the specification and it is the usage of infinite Lists and Seqs as indices that are incorrect. The existing EXISTS-POS implementation doesn't distinguish between deleted values (causing gap in list) and the undefined values past the end of the collection, which is causing this confusion. As a stop-gap measure, during the reification process, couldn't we continue eagerization if the reified index is less than the number of elems in the collection even if an element evaluates to undefined? I made a commit in my fork (8c0a665) that does what I'm talking about. It feels a little bit hacky, but it appears to work, at least in the small amount of testing I've done with it. |
Yes, that's correct.
Does to me too. It's going to stop at the end for non-lazy lists, yet stop at the hole on anything lazy. IMO that just moves the goalpost instead of fixing the issue completely. We could make it die like in this case:
But is that better than stopping at the hole?
👍 on that. This way we can effectively postpone the API extension until some later language version (and the future where we're more performant). Would need to also revert 456358e
No, here it's a Range object and it's the laziness takes care of reifying only until the |
This is not an issue with that was introduced with this commit. I do think that the error message could probably be a bit more explicit, in stating that relative indexing on a lazy Positional will not work, however.
That is very true. I agree that the most consistent thing to do would be to revert 456358e and document as stopping at first undefined. I just wonder if that isn't potentially introducing a problem in the future if the intention is for the behavior to change. By implementing some sort of stop-gap solution, then when in the future the API extension is implemented, there are fewer potential surprises for people that have come to rely on the functionality where iteration is halted at the first undefined gap. Has gaps:
Stops at first undefined:
Given the above list, users would only potentially be surprised when the behavior of num 3 changes. In that case, I think even an imperfect implementation that gets us mostly correct functionality is preferable until it can be solved properly and completely via an EXISTS-POS (or similar) update. I suppose it's possible that we could have it stop at the first undefined by default, but introduce an adverb in the future that would return all elements until end of collection. Just a thought.
|
Just a quick note on:
This should be "stops at the first element that doesn't exist, because it was deleted". An element that exists can contain a value for which |
Yeah, you're right. I was conflating existence and definedness a bit. I don't want to stray too far off the topic, but I just want to be sure that my definitions are correct... Existence is the presence of a value bound at a position among a List's internal reified elements. This is, of course different from definedness. An undefined value can be bound at a position (eg. Scalar containing type object Any), thus undefined but existing. It just so happens that undefined values are returned when querying at a position that lacks existence. But this operation differs greatly from querying at a position where an undefined value was assigned (thus exists), despite appearing the same at a surface-level. Correct? |
On 29 Nov 2017, at 05:53, Jeremy Studer ***@***.***> wrote:
Yeah, you're right. I was conflating existence and definedness a bit.
I don't want to stray too far off the topic, but I just want to be sure that my definitions are correct...
Existence is the presence of a value bound at a position among a List's internal reified elements.
When an element is deleted, nqp::null is bound to the position among the reified elements (accurate to say it is dereified?).
Never seen it like that, but it feels like a nice description.
This is, of course different from definedness. An undefined value can be bound at a position (eg. Scalar containing type object Any), thus undefined but existing.
It just so happens that undefined values are returned when querying at a position that lacks existence. But this operation differs greatly from querying at a position where an undefined value was assigned (thus exists), despite appearing the same at a surface-level.
I also depends on whether it is a List or an Array. If it is an Array, an unreified element should return a container that will be bound to the array as soon as it is assigned to:
$ 6 'my @A; @A[3] = 42; my $b := @A[2]; say @A[2]:exists; $b = 42; say @A[2]:exists'
False
True
If it is a List, there are no containers, and you get:
$ 6 'my @A; @A[3] = 42; my $l = List.new(@A); my $b := $l[2]; say $l[2]:exists; $b = 42'
False
Cannot assign to an immutable value
in block <unit> at -e line 1
Correct?
Yup, pretty much I think.
Liz
|
This commit fixes the issue in which using lazy Iterables as Positional indices will evaluate until the first deleted (undefined) value but not continue further. Within the default eagerize routine, it checks the number of reified elements upon encountering an undefined value to determine whether to continue. Addresses [Issue rakudo#1268](rakudo#1268)
I've submitted a pull request with a commit that I feel addresses this issue in a pretty reasonable manner. It's pretty similar to the commit I had posted earlier in this chain, except that:
|
Hey Rakudo devs, I'd like some advice on whether this seems like a valid approach to addressing this problem. After a previous attempt to solve this issue while working around the problem (see here), I figured I would experiment with how I would solve this issue with a more core differentiation of deleted elements from those that simply have not been reified yet. Note that I'm not submitting this; it's just a test on my forks. I added a deleted representation to Moar that is essentially the same as MVMNull but, not. Deleted can be bound to to a position on call to DELETE-POS and then subsequent tests of existence would return true. MoarVM commit: NQP commit: Rakudo commit: This requires that participating DELETE-POS methods bind nqp::deleted and AT-POS and similar methods are aware of deleted so that they return their default Scalar when it is encountered. Given these changes, the existing "eagerize" callable in array_slice no longer stops at the first hole when undergoing "eagerization", as EXISTS-POS explicitly checks for null and not deleted. Of course, the big problem with this is that the redefinition of deletion and existence causes spec tests that delete an element and call :exists on it to fail, since deleted elements no longer lack existence as expected. I was thinking that instead, we could keep null and deleted ops (as this experiment does), modify the existspos op to have both null and deleted lack existence, and have a new operation that distinguishs between the two to be used in the process of "eagerization". Deleted items still lack existence, but can be distinguished from nulls when necessary.
I suppose a potential problem with this is that the IN-BOUNDS-POS method (whatever we could call it) is not documented as important to the Positional interface. Current documentation states you should implement the AT-POS and EXISTS-POS methods for Positionals, with other methods being optional. Would it make sense to add IN-BOUNDS-POS as one of the methods to implement? Another possibility is that we use the new method in conjunction with the existing EXISTS-POS and use it as an optional fallback.
This way, IN-BOUNDS-POS could be implemented on a few Positionals that wish to differentiate between null and deleted (such as Array and List) but not required in order for a Positional to be functional. Does this seem like a valid approach to the issue? |
MoarVM stuff is above my head, so I can't comment on that, but having a "two kinds of nulls" thing is a nice way to solve this problem IMO. 👍 |
On 19 Dec 2017, at 12:21, Zoffix Znet ***@***.***> wrote:
MoarVM stuff is above my head, so I can't comment on that, but having a "two kinds of nulls" thing is a nice way to solve this problem IMO. 👍
I’m afraid I have to disagree.
If we have two kinds of nulls, you have 2 things to check for.
If I understand the PR correctly, this would make a difference of the value in @A[0] between:
my @A; @A[1] = 42;
and:
my @A = 666,42; @A[0]:delete;
If so, this will only make things more complicated.
FWIW, I think all slicing should be overhauled completely and always generate Seq’s rather than Lists. In such an overhaul, the “stopping at null for lazy iterators” would be fixed by just generating Nils ad infinitum.
|
You're right lizmat, those two statements would be different. Hmm, this lazy slicing is a tricky beast. However this process has taught me a lot about Rakudo and a bit the NQP and Moar, so it's been valuable. I believe in Perl6 and Rakudo and am looking to help however I can, so feel free to nudge me in the right direction if you feel my energies are best spent elsewhere. Having slicing return Seqs is an interesting idea. I don't know exactly what it would entail to make a complete implementation, but I'm willing take a closer look into it if everyone thinks it's worth pursuing. |
But where would the stopping be done? A Seq that produces Nils ad infinitum doesn't result in this behaviour when I imagine it in my head: say <a b c>[1..*]; # OUTPUT: «(b c)»
Keep in mind that on top of implementing anything, we also need to maintain compatibility with the language specification. We have some wiggle room with changes affecting new language versions only, but something big might be tough to fit in. |
On 19 Dec 2017, at 16:15, Zoffix Znet ***@***.***> wrote:
always generate Seq’s rather than Lists. In such an overhaul, the “stopping at null for lazy iterators” would be fixed by just generating Nils ad infinitum.
But where would the stopping be done? A Seq that produces Nils ad infinitum doesn't result in this behaviour when I imagine it in my head:
say <a b c>[1..*]; # OUTPUT: «(b c)»
what it would entail to make a complete implementation
Keep in mind that on top of implementing anything, we also need to maintain compatibility with the language specification. We have some wiggle room with changes affecting new language versions only, but something big might be tough to fit in.
Well, that would be a special case anyway, as that’s a Range, and we know through and though how they work.
|
Ok. I still don't get it, but I'll trust your judgement. |
I've modified the experiment above to address some of the issues surrounding it, and as a result I think it could be a workable solution. I've changed the name of the new "null" value to "inactive" to separate it deletion a bit but the name could still be better. Unoccupied, Unassigned, Unpopulated, maybe? Moar commit: NQP commit: Rakudo commit:
I modified Moar so that when zeroing the memory allocated for slots in the array, it is assigning pointers to the VMInactive instance as opposed to null pointers. Since Moar's array data structure also includes the number of elems (from the HLL perspective), even if all the newly allocated slots are set to VMInactive, if the index sent to atpos is greater than the number of elems VMNull is returned. This way the the two examples above are internally consistent.
I've modified the MVM_is_null function (which is used in all null-checking ops in Moar) so that VMInactive is also considered null. This way, all the ops (from Moar all the way on up to Rakudo) do not need to know about VMInactive and do not need to change. Only in the context of slicing, where the distinction is important, do we need to modify anything to distinguish between null and inactive (via ACTIVE-POS method). EXISTS-POS and AT-POS work as they did before with no change to their Rakudo implementation.
One caveat with this development is that it does not work reliably with Spesh right now (presumably due to the setting of newly allocated memory to VMInactive) and running the spectest fails. However, when run with MVM_SPESH_DISABLE env var set, it works well. If we could get this working with Spesh, I think it would be a good solution, personally. Let me know what you think. |
This is really something jnthn should look at. I like the idea of it being abstracted away as far as nqp ops are concerned, but otoh this provides an extra burden on the backends such as JVM and Javascript.
… On 23 Dec 2017, at 22:50, Jeremy Studer ***@***.***> wrote:
I've modified the experiment above to address some of the issues surrounding it, and as a result I think it could be a workable solution. I've changed the name of the new "null" value to "inactive" to separate it deletion a bit but the name could still be better. Unoccupied, Unassigned, Unpopulated, maybe?
Moar commit:
***@***.***
NQP commit:
***@***.***
Rakudo commit:
***@***.***
If I understand the PR correctly, this would make a difference of the value in @A[0] between:
my @A; @A[1] = 42;
and:
my @A = 666,42; @A[0]:delete;
If so, this will only make things more complicated.
I modified Moar so that when zeroing the memory allocated for slots in the array, I am assigning pointers to the VMInactive instance as opposed to null pointers. Since Moar's array data structure also includes the number of elems (from the HLL perspective), even if all the newly allocated slots are set to VMInactive, if the index sent to atpos is greater than the number of elems VMNull is returned. This way the the two examples above are internally consistent.
my @A;
@A[1] = 42;
use nqp;
say nqp::isinactive( nqp::atpos( ***@***.***, List, '$!reified'), 0 ) );
# OUTPUT: 1
If we have two kinds of nulls, you have 2 things to check for.
I've modified the MVM_is_null function (which is used in all null-checking ops in Moar) so that VMInactive is also considered null. This way, all the ops (from Moar all the way on up to Rakudo) do not need to know about VMInactive and do not need to change.
Only in the context of slicing, where the distinction is important, do we need to modify anything to distinguish between null and inactive (via ACTIVE-POS method). EXISTS-POS and AT-POS work as they did before with no change to their Rakudo implementation.
One caveat with this development is that it does not work reliably with Spesh right now (presumably due to the setting of newly allocated memory to VMInactive) and running the spectest fails. However, when run with MVM_SPESH_DISABLE env var set, it works well. If could get this working with spesh I think it would be a good solution, personally.
Let me know what you think.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or mute the thread.
|
Does this maybe need a consensus? |
After looking at Raku/doc#1681 I noticed this:
Result (2015.09⌁2017.09,
456358e3^
):Result (456358e, 2017.10,2017.11,HEAD(3166400)):
I don't understand why it should stop at the first hole. The behavior changed after 456358e, and that seems to be unintentional. I'm really confused as to what should be the right behavior, but my gut feeling is that
(d (Any) f)
is the right answer.The text was updated successfully, but these errors were encountered: