New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: Linked list cursors #2570

Open
wants to merge 2 commits into
base: master
from

Conversation

Projects
None yet
@4e554c4c

4e554c4c commented Oct 21, 2018

πŸ–Ό Rendered

This is my first RFC, so feel free to critique :)

πŸ“ Summary

Many of the benefits of linked lists rely on the fact that most operations (insert, remove, split, splice etc.) can be performed in constant time once one reaches the desired element. To take advantage of this, a Cursor interface can be created to efficiently edit linked lists. Furthermore, unstable extensions like the IterMut changes will be removed.

The reference implementation is here, feel free to make implementation suggestions. :)

❌ TODO

  • Resolve where split, split_before split lists (this seems to be ambiguous in the specification)
@Diggsey

This comment has been minimized.

Contributor

Diggsey commented Oct 21, 2018

This RFC seems quite vague on the specifics. For example:

  • Do cursors have a lifetime bound to the list - this is necessary for iterators, but isn't mentioned in the RFC, and no lifetimes are present in the struct definitions.
  • Do you expect to be able to have multiple cursors into the same data structure? This is something prevented by the use of mutable cursors and lifetimes, but also one of the ways in which cursors could be most useful.

This seems like the kind of thing that might be better implemented as a crate first - given the amount of new API surface being added, I don't see how it could ever be accepted without at least a sample implementation.

@Centril Centril added the T-libs label Oct 21, 2018

@4e554c4c

This comment has been minimized.

4e554c4c commented Oct 21, 2018

@Diggsey good points
Yeah, these would have to have lifetimes bound to the list. It should be possible to have multiple immutable cursors, but having multiple mutable cursors seems much more difficult (this is why I have it so that a mutable cursor can be used to create an immutable one).

I'll work on a reference implementation so this can be accepted

@4e554c4c

This comment has been minimized.

4e554c4c commented Oct 22, 2018

Currently working on the implementation. All is done except for splitting lists. See here: https://github.com/4e554c4c/list_cursors

@mark-i-m

This comment has been minimized.

Contributor

mark-i-m commented Oct 22, 2018

Perhaps this could be an eRFC to allow greater experimentation? I would really like to play around with your implementation on nightly before deciding if we want to keep it as is, modify it, or go a different route.

@sfackler

This comment has been minimized.

Member

sfackler commented Oct 22, 2018

@mark-i-m the API would land unstable regardless of the e-ness of the RFC.

@mark-i-m

This comment has been minimized.

Contributor

mark-i-m commented Oct 22, 2018

@sfackler Yes, but the e-ness allows us to proceed without as clear of a vision on what API we will eventually adopt. In contrast, IIUC, if we accept a normal RFC for this, we are saying that we are reasonably confident in this approach. I think the approach has merit and is worth trying out, but I can't personally support strongly accepting the RFC because I simply don't have any experience with such an API.

@4e554c4c

This comment has been minimized.

4e554c4c commented Oct 22, 2018

@Centril

This comment has been minimized.

Contributor

Centril commented Oct 22, 2018

@4e554c4c put an "e" before "RFC" ^.^ There's not much difference other than a change in intent.

@4e554c4c 4e554c4c changed the title from RFC: Linked list cursors to eRFC: Linked list cursors Oct 22, 2018

@4e554c4c

This comment has been minimized.

4e554c4c commented Oct 22, 2018

I have edited the title to show that this is an eRFC. The API is not finalized and possibly incomplete and I am taking suggestions.

@Amanieu

This comment has been minimized.

Contributor

Amanieu commented Oct 24, 2018

The cursor API that you propose seems to be almost identical to the one that I implemented a while ago for intrusive-collections: Cursor, CursotMut and general docs. I have used this API in some of my personal projects and it has felt very ergonomic. I am not sure why this RFC has chosen to rename some of the methods.

One thing to note in particular as to how cursors work in my crate:

A cursor views an intrusive collection as a circular list, with a special null object between the last and first elements of the collection. A cursor will either point to a valid object in the collection or to this special null object.

Another difference is that I have front/front_mut/back/back_mut methods in LinkedList which return a cursor to the first/last element of the list (or the null element if the list is empty), while cursor and cursor_mut always return a cursor pointing to the null element.

Regarding IterMut vs CursorMut, I will repeat what I said in the other thread: these two types work in fundamentally different ways (a DoubleEndedIterator acts as a deque which can be popped from both ends) and serve completely different purposes (an iterator is for ... iterating, a cursor is for modifying the list).

@4e554c4c

This comment has been minimized.

4e554c4c commented Oct 25, 2018

@Amanieu thank you for commenting!
Renaming methods to make things more consistent ("before"->"prev") would probably be a good idea. Also I like your implementation's remove, this may benefit from going that route instead of pop (although doing both works too).
front/front_mut/etc. I decided against because these methods already exist in LinkedList to refer to elements. I haven't decided against adding them, I'm just not sure what the best way to go about it would be.

@xaberus

This comment has been minimized.

xaberus commented Oct 25, 2018

If I am not mistaken, the proposed API is not safe, i.e. current() can be used to create dangling references.

let mut c = list.cursor_mut();
c.move_next();
// here we create a mutable  reference to an element
let u = c.current();
c.move_prev();
// drop the element
drop(c.pop());
// here we drop this element
let mut dangle = u.unwrap();
// dereferencing a dangling reference to the now no longer existing element
**dangle = &777;
// use after free confirmed  by valgrind...

(I am still new to rust, so take the following with a grain of salt...)

The reason is that in the code above current() creates a reference that has a lifetime of the cursor ('a) but it should have a much narrower scope/lifetime. The point is, returning a reference to an element must either somehow freeze the cursor or taking a raw reference must be disallowed (like RefCell/RefMut).

Now we have basically created two mutable references to the same data: u and c.current() which is undefined behavior.

I think one way to prevent this is to separate the operation of editing the list and editing its elements, i.e. another Cursor* type... but this makes the interface quite useless, as you than cannot inspect the elements that you are editing.

If I remember correctly this is the reason, we have no similar interface in the std...

N.B. This is a nice reference highlighting some interesting problems.

@4e554c4c

This comment has been minimized.

4e554c4c commented Oct 25, 2018

Interesting! I thought that the reference would borrow the cursor (like how IndexMut borrows slices).
I believe this is fixable, we would just have to make it so that the output of current borrows the cursor.

@xaberus

This comment has been minimized.

xaberus commented Oct 25, 2018

@4e554c4c having given it a little more thought I think the prototype for current() should be

pub fn current(&mut self) -> Option<&mut T> {...}
// that is elided to
pub fn current<'b>(&'b mut self) -> Option<&'b mut T> {...}

not

pub fn current(&mut self) -> Option<&'a mut T> {...}

The first gives the reference a lifetime that is smaller than 'a (like a slice gives to a an element reference, 'a: 'b), while the second gives it the lifetime of the list, which is wrong, as the elements should not outlive the list. Hopefully I made no mistake and this makes sense. (Would be nice if somebody with more experience in lifetimes than me could comment on this...)

@4e554c4c

This comment has been minimized.

4e554c4c commented Oct 25, 2018

yep! That seems to fix the problem. The lifetime annotation I added seems to have made this worse. I'm going to look at the lifetimes more and update the RFC
Now, the question is: is this behavior valid for cursor?
I think not since this can occur

fn main() {
    let mut list = LinkedList::new();
    let mut c = list.cursor_mut();
    c.insert(3);
    c.insert(2);
    c.insert(1);
    
    let u = {
        let mut c = c.as_cursor();
        c.move_next();
        c.current();
    };
    drop(c.pop());
    // use after free!
    println!("element: {:?}", u);
}
@4e554c4c

This comment has been minimized.

4e554c4c commented Oct 25, 2018

Ok, lifetimes should be added to the RFC and fixed in the reference implementation. Tell me what y'all think.

@xaberus

This comment has been minimized.

xaberus commented Oct 25, 2018

@4e554c4c: I think the example is actually OK. (I think you meant current() not current();, otherwise u would be ())

with the modified signatures

impl<'a, T> Cursor<'a, T> {
    ...
    pub fn current(&self) -> Option<&T> {...}
    ...
}
...
impl<'a, T> CursorMut<'a, T> {
    ...
    pub fn current(&mut self) -> Option<&mut T> {...}
    ...
}

the borrow checker (on nightly) complains about c not living long enough in the let u = expression (and it makes sense to me: Cursor "contains" the element T, by calling current() we get a reference &'b T, a': b' [i.e. as a type 'a is 'b and more, 'b cannot outlive 'a]. When we drop the Cursor no references with lifetime 'b can remain).

@xaberus

This comment has been minimized.

xaberus commented Oct 25, 2018

Ok, lifetimes should be added to the RFC and fixed in the reference implementation. Tell me what y'all think.

If my reading of the nomicon is correct (get_mut example), the explicit lifetime is not necessary in

pub fn as_cursor<'cm>(&'cm self) -> Cursor<'cm, T>;

By elision rules this should be the same as

pub fn as_cursor(&self) -> Cursor<T>;

(clippy nags about it...)

@Amanieu

This comment has been minimized.

Contributor

Amanieu commented Oct 25, 2018

To be precise, the methods on Cursor can return &'list T, but the methods on CursorMut must return &'self mut T. This means that if you want to inspect the previous/next value before any modification then you need to call as_cursor first.

@Amanieu

This comment has been minimized.

Contributor

Amanieu commented Oct 25, 2018

Actually, this would probably work:

impl<'a, T> Cursor<'a, T> {
    ...
    pub fn current(&self) -> Option<&'a T> {...}
    ...
}
...
impl<'a, T> CursorMut<'a, T> {
    ...
    pub fn current_mut(&mut self) -> Option<&mut T> {...}
    pub fn current(&self) -> Option<&T> {...}
    // Same with peek/peek_mut, etc
    ...
}

The advantage of having Cursor return &'a T instead of &'self T is that the reference remain valid after moving the cursor. This is fine since Cursor can't delete or modify any elements in the list.

@4e554c4c

This comment has been minimized.

4e554c4c commented Oct 25, 2018

Good point, I'll leave that one how it was.
Should I introduce methods like current/current_mut for CursorMut? I feel like this is somewhat unneeded since one could always call as_cursor to get immutable access to the element and it would have the same borrow semantics.

@4e554c4c

This comment has been minimized.

4e554c4c commented Oct 28, 2018

The reference implementation should be pretty much complete thanks to @xaberus
I've also started a list of what currently needs to get resolved in the RFC.

@xaberus

This comment has been minimized.

xaberus commented Oct 30, 2018

Having tinkered with the reference implementation for a while, here are my impressions and comments so far.

  • I much more prefer the API where the cursors start with the first element, because I constantly forget the initial move_next()/move_prev(). As proposed by @Amanieu, I think head()/head_mut(), tail()/tail_mut() are a better choice.
  • While the wrapping of cursors gives an elegant implementation, the API can unfortunately give unexpected and/or ambiguous results. I started a non-wrapping branch of the reference implementation to test my assumptions. When/if I am happy with the results I will try to write down a pull request for the RFC as time permits.
  • So far, after rewriting parts of reference implementation a couple of times (see the branch above) I ended up with the following simple invariants that I would like to see in the API:
    • inserting an item after the cursor position implies c.peek_after() == Some(e)
    • inserting an item before the cursor position implies c.peek_before() == Some(e)
    • inserting a list before the cursor position implies c.peek_before() == Some(inserted_tail)
    • inserting a list after the cursor position implies c.peek_after() == Some(inserted_head)
    • iterating past the head/tail with move_prev()/move_next() keeps the Cursor one position "before"/"after" the last element, allowing to reverse the iteration direction at any time. Effectively, there are now two empty elements. (This turned out a lot more complicated to implement than I imagined, but is so much harder to abuse.)

I hope to continue this list when I have time to work on it. What are your comments so far?

@4e554c4c

This comment has been minimized.

4e554c4c commented Oct 30, 2018

I think this can work. The main reason that Cursor is wrapping is for efficient movement between the front and back of the list, but head()/tail() could do this just as well.

If the cursor doesn't start at the empty element, could we just do away with the empty element altogether? We could have the invariant that the Cursor is always on an element, and remove the Option from the current method. Then head()/tail() could return an Option<Cursor>

The only problem I see with this is that we would need to add some extra method to determine whether the cursor is at the beginning/end of a list, e.g. has_next() or similar

@blankname

This comment has been minimized.

@xaberus

This comment has been minimized.

xaberus commented Oct 31, 2018

@4e554c4c, I am getting quite fond of my non-wrapping branch (read: less objective), but the ability to run past the end seem useful to me. Also, because the pop* methods never consume the cursor the zero element is necessary to drain the list completely.

The branch should be now in a shape, that it can be looked upon without falling apart immediately, so please take a look if and when you have time. I changed the function names a bit to be more consistent, but I will not insist on the naming. From the API side of things, the cursor()/cursor_mut() functions now take a StartPosition parameter (either BeforeHead or AfterTail), that defines which empty element to use. This way head()/tail() are just convenience methods around this interface. Internally, I merged the current pointer, current_len and the zero elements to a Position enum, that implements the propagation methods for all cases - this way code can be shared between Cursor and CursorMut.

I also tried to refactor most common code-paths to reduce the testing burden.

Finally, I added a couple of new tests, that enforce the invariants in the list above.

Btw. I did not intend to hijack your reference implementation, I am sorry for that. Thinking about this RFC I remembered I had an old list implementation (from C times) and thought, that parts of it might be useful for the new age... and I had a lot more fun rewriting it in rust than I should have had.

@sfackler

This comment has been minimized.

Member

sfackler commented Nov 2, 2018

The libs team discussed this RFC and decided to FCP merge as a normal RFC. The API surface area proposed here does not need the procedural overhead of an eRFC. There are some open design questions around the specific semantics (e.g. wrapping vs non-wrapping) but those can be worked out in the tracking issue for the feature.

@rfcbot fcp merge

@rfcbot

This comment has been minimized.

rfcbot commented Nov 2, 2018

Team member @sfackler has proposed to merge this. The next step is review by the rest of the tagged teams:

No concerns currently listed.

Once a majority of reviewers approve (and none object), this will enter its final comment period. If you spot a major issue that hasn't been raised at any point in this process, please speak up!

See this document for info about what commands tagged team members can give me.

@Centril Centril changed the title from eRFC: Linked list cursors to RFC: Linked list cursors Nov 2, 2018

@xaberus

This comment has been minimized.

xaberus commented Nov 3, 2018

As this is my first collaboration on an RFC, what is the proper way to propose changes to the RFC? Right now a RFC is a pull-request so I cannot have a pull-request to a pull request, can I?

In my non-wrapping branch/playground I ended up with a slightly modified API, that is more symmetric and explicit as to what the individual methods to. In particular:

  • cursor(StartPosition)/head()/tail() for immutable access
  • cursor_mut(StartPosition)/head_mut()/tail_mut() for mutable access
  • peek_after()/peek_before()
  • insert_after(T)/insert_before(T)
  • insert_list_after(LinkedList<T>)/insert_list_before(LinkedList<T>)
  • split_after()/split_before()

with the semantics as written in the referenced gist and invariants as listed in my post above.

@sfackler

This comment has been minimized.

Member

sfackler commented Nov 3, 2018

You could open a PR against @4e554c4c's branch if you wanted, or ask the author to update it themselves.

@4e554c4c

This comment has been minimized.

4e554c4c commented Nov 4, 2018

Thanks for the comments and sorry for not being able to respond earlier.

I'm a big fan of the convenience methods to get a cursor at the beginning or end of a list, but I think the cursor method taking an enum is a bit clunky. IMO We should only do one of them, not both.

Also this may not be a problem, but to me head/tail seem quite synonymous to front/ back for two very different functions. I just wouldn't want some Rust newbie to type list.head() expecting to get a ref to the first element and get some weird error about enabling nightly features.

@xaberus

This comment has been minimized.

xaberus commented Nov 4, 2018

@4e554c4c: Regarding cursor*(), as I have written in my PR the ability to start from an empty element is more or less necessary. The only alternative I can think of is a builder pattern, like:

list.cursor().before_head();
list.cursor().head();
list.cursor().tail();
list.cursor().after_tail();
list.cursor_mut().before_head();
list.cursor_mut().head();
list.cursor_mut().tail();
list.cursor_mut().after_tail();

(I.e. all the cases the API must cover...) To me this feels a lot more verbose than just using an enum. Any other ideas?

@Amanieu

This comment has been minimized.

Contributor

Amanieu commented Nov 4, 2018

A builder pattern won't work for mutable vs immutable cursors since they require &mut self and &self respectively. Two separate methods are needed at the very least.

An enum was suggested:

enum Pos {
    BeforeFirst,
    First, // Same as AfterLast if list is empty
    Last, // Same as BeforeFirst if list is empty
    AfterLast,
}
list.cursor(Pos::First);
list.cursor_mut(Pos::AfterLast);

Or a builder:

list.cursor().first()
list.cursor_mut().after_last();
@xaberus

This comment has been minimized.

xaberus commented Nov 4, 2018

@Amanieu:

First, // Same as AfterLast if list is empty
Last, // Same as BeforeFirst if list is empty

This is what I wrote in one of my first attempts, but it is kind of surprising in the case of an empty list:

// cursor_mut(Last)
[] <>
// insert_before(1)
[1] <>
// pop_before() -> Some(1)
[] <> 

v.s.

// cursor_mut(Last)
<> []
// insert_before(1)
[1] <>
// pop_before() -> Some(1)
[] <>

If I request a cursor at the Last position and insert and pop an element before it I expect the cursor to be still at the same position. This does not break the invariant, i.e. insert_before(1) -> pop_before() == Some(1), but the cursor value changes from BeforeFirst to AfterLast, i.e. a roundtrip is not a no-op.

@Amanieu

This comment has been minimized.

Contributor

Amanieu commented Nov 4, 2018

@xaberus Note that a wrapping cursor (where BeforeFirst is the same as AfterLast) avoids this problem. This is why I went with that approach in intrusive-collections.

@xaberus

This comment has been minimized.

xaberus commented Nov 4, 2018

@Amanieu I see. In way, this not a real problem: The invariants (sans the last one) enforce a wrapping behavior for inserts which effectively collapses the cursor states for the empty list to None. I found this issue in a test case only because I was inspecting the cursor state directly and not though the API. On the outside there is no visible difference.
I guess I may have found a complicated way to express the same behavior πŸ˜„, but for some reason wrapping cursors instill a feeling of unease in me. For example, if I want to find the nth item in the list in a wrapping implementation, I have to inspect each cursor position for None. If I forget about that I will wrap around and end up somewhere in the middle of the list.

// non-wrapping
let c = list.cursor(BeforeFirst);
for _ in 0..n {
    c.move_next();
}
c.current()

v.s.

// wrapping
let c = list.cursor(BeforeFirst);
c.move_next();
for _ in 0..n {
    if c.current().is_none() {
        return None;
    } else {
        c.move_next();
    }
}
c.current()
@rfcbot

This comment has been minimized.

rfcbot commented Nov 6, 2018

πŸ”” This is now entering its final comment period, as per the review above. πŸ””

@gnzlbg

This comment has been minimized.

Contributor

gnzlbg commented Nov 7, 2018

The insert_list and insert_list_before methods have O(1) time complexity right ?

@Amanieu

This comment has been minimized.

Contributor

Amanieu commented Nov 7, 2018

I would prefer if some changes were made to the method names. In particular, I think that every method name should explicitly specify the direction in which the operation is performed (forwards or backwards).

  • peek/peek_before => peek_next/peek_prev
  • insert/insert_before => insert_after/insert_before
  • pop/pop_before => remove
    • pop is a bad name since it only applies at the head or tail or a list. Also the most common operation that you would want is to remove the current element, not the next/previous one. The cursor would then be advanced to the next element after the removal.
  • insert_list/insert_list_before => splice_before/splice_after
    • Splicing is a standard term in linked list operations.
  • split/split_before => split_after/split_before
@nugend

This comment has been minimized.

nugend commented Nov 7, 2018

I'm just driving by, but @Amanieu has got it right. This is not a clunky interface at all as long as the meaning of those API methods are really clear and unambiguous about which elements are being affected.

This RFC actually strongly reminds me of the more general Zipper concept that Haskell has explored at length (and linked lists are just degenerate trees). I suspect that's not really pertinent to the matter at hand, but I think it's interesting.

@glaebhoerl

This comment has been minimized.

Contributor

glaebhoerl commented Nov 7, 2018

@Amanieu Aiming for symmetry with explicitly specified directions sounds good; however, doesn't this bit present a smidgen of discord in the plan?

Also the most common operation that you would want is to remove the current element, not the next/previous one. The cursor would then be advanced to the next element after the removal.

Why to the next element rather than the previous? Why aren't there then two versions of it, with explicitly specified directions, like the others? Of course as a matter of pragmatics, you more often want to move forwards rather than backwards, but if we're choosing not to privilege the forwards direction in the rest of the API...

@xaberus

This comment has been minimized.

xaberus commented Nov 7, 2018

@glaebhoerl: I second that. Removing an item usually can be decided by a predicate f(&T)->bool. For this

let mut c = list.cursor_mut(BeforeHead);
while c.peek_next().is_some() {
    if let Some(true) = c.peek_next().map(f) {
        c.pop_next();
    } else {
        c.move_next();
    }
}

sounds like a reasonable interface that is symmetric by replacing next by prev. Moreover, this makes the action of removing an item explicit, no need to look up in the reference that the cursor will move in some arbitrary direction.

@eaglgenes101

This comment has been minimized.

eaglgenes101 commented Nov 9, 2018

I think that it might be possible to allow for multiple mutable cursors, safely, with a method that splits out two bounded cursors. These bounded cursors will then recognize the position at which the cursor was split, and if the linked list element pointed to is that node, following that pointer through the bounded cursor API instead goes to that bounded cursor's sentinel, and looping back around from the far end will go to the node adjacent to the split-at node on the bounded cursor's side instead of to the other end.

Splitting at the sentinel doesn't make sense, so if a split is attempted at the sentinel, then Option::None is returned instead, hence the Option in the return value.

pub fn split_at_mut<'cm>(&'cm mut self) -> Option<(BoundedCursorMut<'cm, T>, &'cm mut T, BoundedCursorMut<'cm, T>)>;

(The node in the middle where the list is split is a no man's land for both bounded cursors, so that if the resulting bounded cursors are used to manipulate the nodes bordering the split, they don't have to change the pointers on the other side's nodes to retain linked list consistency, instead only having to change the middle node's pointer pointing towards their side.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment