Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

std/lists: support destructive add/prepend for SinglyLinkedList, DoublyLinkedList + other lists improvements #303

Open
timotheecour opened this issue Dec 17, 2020 · 31 comments

Comments

@timotheecour
Copy link
Member

timotheecour commented Dec 17, 2020

current implementation in nim-lang/Nim#16362 is problematic because:

I've just realized that this operation is not really meaningful for doubly linked lists, as it will always leave one in an inconsistent state.

But actually it turns out there's a similar problem with SinglyLinkedList: b is left in inconsistent state after being added twice, and other problems below:

when defined case1:
  import lists, sequtils
  var a = [0,1].toSinglyLinkedList
  a.add a
  echo a # hangs, since you have a cycle

when defined case2:
  import lists, sequtils
  var a = [0,1].toSinglyLinkedList
  var b = [2,3].toSinglyLinkedList
  a.add b
  a.add b
  echo b # would hang: b is left in inconsistent state after being added twice
  # echo a # would hang

proposal

This proposal involves adding a flag to indicate that a list is cycle-free assuming user sticks add, prepend + other APIS in lists.nim; overhead is negligible.

The proposal is to support destructive add and prepend for both SinglyLinkedList and DoublyLinkedList as follows:

  • add a field owned: bool in SinglyLinkedList
  • implement add as follows:
proc add*(a, b: var SinglyLinkedList) =
  ## b degrades to be "owned"
  addImpl(a, b) # implementation from https://github.com/nim-lang/Nim/pull/16362
  if b.owned: a.owned = true else: b.owned = true
  • implement prepend in analogous way
    note that prepend is still needed even if we have add because prepend(a,b) will set b.owned but add(a,b) will set b.owned), so you can't simply implement prepend(a,b) in terms of add as that would "own" a

  • implement $ and items as follows:

proc items*(a: SinglyLinkedList): auto =
  if a.owned: # use a HashSet to avoid looping in potential cycles
  else: oldItems() # no cycles here at least not as created by correct usage of lists.nim apis

then this will make $, and toSeq correct instead of infinite loop on cycles

Note: the owned flag is added to SinglyLinkedList, not SinglyLinkedNodeObj, so it doesn't asymptotically increase memory for normal use (few lists, lots of nodes)

  • and everything above done similarly with DoublyLinkedList
@Araq
Copy link
Member

Araq commented Dec 17, 2020

While I can imagine your solution to work, I think it's too complex and I am not aware of other list implementations that use your solution.

See also: https://doc.rust-lang.org/std/collections/struct.LinkedList.html#method.append

This reuses all the nodes from other and moves them into self. After this operation, other becomes empty.
This operation should compute in O(1) time and O(1) memory.

@timotheecour
Copy link
Member Author

timotheecour commented Dec 17, 2020

that's almost the same approach, in fact that was my original idea but then i figured keeping b (but flagging it as "owned" (or "consumed")) is a bit more flexible. In either case you have a destructive add/prepend, but with "owned" you can still peek at the data that was appended.

Either way, either using an "owned" flag or destroying b would be a net improvement over nim-lang/Nim#16362

(thanks for the link btw)

@salvipeter
Copy link

This won't solve the problem of DoublyLinkedLists, because if you a.add(b) and then c.add(b), the first node of b cannot refer to the last node of a and c at the same time.

In my opinion, if someone uses list concatenation reasonably, these cases do not occur. If you create a cycle, you get an infinite loop, I think of that as a "fact of life" :) Adding a dirty bit, like the one proposed here, will result in many extra computations, and won't save the user if he does something like a.tail.next = a.head.
But this is a good reason to add an outplace operator, as well.

Also, e.g. Common Lisp has a special variable that determines whether cycles are detected or not when printing. We could at least give a predicate to decide if a given list is circular.

@salvipeter
Copy link

I like the Rust solution, because it's so simple, although takes a bit of freedom away (sometimes it is good to have shared structures).

@timotheecour
Copy link
Member Author

timotheecour commented Dec 17, 2020

This won't solve the problem of DoublyLinkedLists, because if you a.add(b) and then c.add(b), the first node of b cannot refer to the last node of a and c at the same time.

after a.add b, b is marked as owned, then when you call c.add(b) c will also be marked as owned according to what i wrote above: if b.owned: a.owned = true else: b.owned = true, to indicate c is in an invalid state (ie that may contain cycles); this may or may not be what you want, but at least it's self consistent.

If destroying b instead of using an owned flag (ie the rust approach), the semantics are instead making c.add(b) a noop since b was set to empty after a.add(b); it's different, but also self consistent.

note: special case of a.add a or similar

in rust:

fn main() {
  use std::collections::LinkedList;
  let mut list1 = LinkedList::new();
  list1.push_back('a');
  list1.push_back('b');
  println!("{:?}", list1);
  list1.append(&mut list1); // error
}

you (correctly) get an error:

error[E0499]: cannot borrow `list1` as mutable more than once at a time
 --> /Users/timothee/git_clone//nim//timn//tests/nim/all/t11524b.rs:8:3
  |
8 |   list1.append(&mut list1);
  |   ^^^^^^------^----------^
  |   |     |      |
  |   |     |      first mutable borrow occurs here
  |   |     first borrow later used by call
  |   second mutable borrow occurs here

The question is, how do we do the same for nim (in a robust way, not just this special case), with same O(1) complexity? nim-lang/Nim#14976 would work but is there something simpler?

var a = [0,1].toSinglyLinkedList
var b = [2,3].toSinglyLinkedList
a.add b
assert a.toSeq == [0,1,2,3]
assert b.toSeq == [] # with the rust behavior, ie no owned flag, just destroy b
a.add a  # this should give an error

less obvious example:

when true:
  import lists, sequtils
  var a = initDoublyLinkedList[int]()
  var n = newDoublyLinkedNode[int](0)
  a.append n
  a.append 1
  a.append 2
  var b = a
  b.remove(n)
  echo a
  echo b
  a.add b # would create a cycle

(note that with the "owned" flag, this would work (after a.add a, a would automatically be marked as owned))

remove can also create cycles:

this example uses a DoublyLinkedRing instead

  var a = initDoublyLinkedRing[int]()
  var n = newDoublyLinkedNode[int](0)
  a.append n
  a.append 1
  a.append 2
  var b = a
  b.remove(n)
  echo b
  echo a # hangs

@salvipeter
Copy link

I am not convinced about the owned solution for doubly linked lists. In the above example, the owned flag will be consistent, but the whole list will not be, because the invariant b.head.prev == nil is not satisfied.

As for handling a.add(a), I think that is also a case when the user should know better...

If someone would like to uses lists in a FP-like manner, he would need something like first and rest:

func first*[T](L: SinglyLinkedList[T]): T = L.head.value
func rest*[T](L: SinglyLinkedList[T]): SinglyLinkedList[T] =
  result = initSinglyLinkedList[T]()
  result.head = L.head.next
  result.tail = L.head.next

But then a.rest would also share structure with a, leading to similar cases. I think the solution is to don't treat these as problems.

@Araq
Copy link
Member

Araq commented Dec 17, 2020

We should follow Rust and move the elements.

@timotheecour
Copy link
Member Author

We should follow Rust and move the elements.

indeed.

We also need a copy proc for a shallow copy of the list, for non-destructive append/prepend, eg:

var a = [0,1].toSinglyLinkedList
var b = [2,3].toSinglyLinkedList
# a.add b # ok but destructive for b
a.add b.copy # not destructive for b
assert a.toSeq == [0,1,2,3]
assert b.toSeq == [2,3]
a.add a.copy

note: copy is intentionally not deepCopy here. It's O(n) where n = num elements of the list

example:

type Foo = ref object
  x: int
var f = Foo(x: 1)
var a = [f, f].toSinglyLinkedList
var b = [f, f].toSinglyLinkedList
a.add b.copy
assert b.head.value == f

@salvipeter
Copy link

Implemented the move semantics, and added a shallow copy as well (both for both singly- and doubly linked lists).
One thing that bothers me is that with the current implementation I can't write a.add(b.copy), because b.copy is immutable, I need to add another variable explicitly:

var
  a = [0, 1].toSinglyLinkedList
  b = [2, 3].toSinglyLinkedList
  c = b.copy
a.add(c)

Is there a workaround for this?

@timotheecour
Copy link
Member Author

timotheecour commented Dec 17, 2020

how about:

proc addMove[T: SinglyLinkedList | DoublyLinkedList](a: var T, b: var T) = ...
proc add[T: SinglyLinkedList | DoublyLinkedList](a: var T, b: T) =
  var b2 = b.copy
  addMove(a, b2)

ditto with prepend, prependMove

it might actually be better, because in rest of stdlib, add is not destructive.

bikeshedding: addMove or addFrom

@Araq
Copy link
Member

Araq commented Dec 17, 2020

Is there a workaround for this?

Use a sink parameter. And maybe overwrite the = operator for linked lists.

@salvipeter
Copy link

Use a sink parameter.

That's nice! Haven't heard of these before. So if the second list is a sink parameter, then even though add is implemented as a move, it is actually a copy when the list passed as second parameter is used afterwards. (Or, to be exact, when it cannot be proved that it is not used afterwards, right?)

I like this, but I don't really like the idea that this operation is O(1) or O(n) based on what the compiler thinks, unknown to the user.

@Araq
Copy link
Member

Araq commented Dec 17, 2020

You can make the = operator for lists an .error and then the copies cannot be implicit.

@salvipeter
Copy link

That sounds like a good solution, but when I actually tried it out at this simple example:

import lists
var a = [1, 2, 3].toSinglyLinkedList
let b = [4, 5].toSinglyLinkedList
a.add b.copy
assert $a == "[1, 2, 3, 4, 5]"
assert $b == "[4, 5]"
a.add b
assert $a == "[1, 2, 3, 4, 5, 4, 5]" # b is inaccessable

... I got an error for the a.add b line, saying that it's not the last read of b... (this was my whole file)

@timotheecour
Copy link
Member Author

timotheecour commented Dec 17, 2020

You can make the = operator for lists an .error and then the copies cannot be implicit.

but that would prevent valid use cases, and be a breaking change (not necessarily in the right direction).

import std/lists
type Foo = object
  list: DoublyLinkedList[int]
proc main()=
  block:
    var a = initDoublyLinkedList[int]()
    let b = a # was legal
  block:
    let a = Foo(list: initDoublyLinkedList[int]())
    let b = a.list # was legal
main()

(and {.byaddr.} still runs into the famous template bug nim-lang/Nim#15920 so isn't always a workaround)

I do prefer the terminology addMove for the overload without copy because rest of stdlib assumes an add doesn't consume the RHS

I don't really like the idea that this operation is O(1) or O(n) based on what the compiler thinks, unknown to the user.

me neither, it's error prone

@Araq
Copy link
Member

Araq commented Dec 17, 2020

... I got an error for the a.add b line, saying that it's not the last read of b... (this was my whole file)

Because b is a global variable, the compiler doesn't try too hard to reason about global variables.

@Araq
Copy link
Member

Araq commented Dec 17, 2020

I do prefer the terminology addMove for the overload without copy because rest of stdlib assumes an add doesn't consume the RHS

That's true. But addMove isn't the best name. Maybe, addConsumed?

@salvipeter
Copy link

I have put it in as addMove for the time being... maybe addMoved is the best of both worlds?

@timotheecour
Copy link
Member Author

timotheecour commented Dec 17, 2020

addMoved is a bit shorter; any name that works well with both add and prepend is fine.
so this would be:

  • addMoved
  • add (calls copy + addMoved)
  • prependMoved
  • prepend (calls copy + prependMoved)

@salvipeter
Copy link

salvipeter commented Dec 17, 2020

Do we need prepend[Moved]? Its naming suggests that the alternative is append[Moved].
And a.prependMove b is essentially just

b.addMove a
swap a, b

(or a = b.addMove a, if you don't care about b being empty afterwards), while a.prepend b is

a = b.copy.addMove a

@timotheecour
Copy link
Member Author

timotheecour commented Dec 18, 2020

add is better than append, even if we later add prepend:

with add: this works out of the box with procs assuming some add, eg collect:

import lists, sugar
# it works, I tried it after s/append/add/ in your PR
let b = collect(initSinglyLinkedList): for i in 0..<3: i

the classical "works better in generic code" is a real argument ... and it doesn't mean that "everything compiles", just the stuff that makes sense ;-).

my suggestion is:

  • don't modify pre-existing append in this PR but use add for your new procs in your PR
  • in future PR, change pre-existing append to add in lists.nim and add:
template append*[T: DoublyLinkedList or SinglyLinkedList](a: T, b: T) {.deprecated: "use add".} = add(a,b)
template append*[T](a: DoublyLinkedList[T] or SinglyLinkedList[T], b: T) {.deprecated: "use add".} = add(a,b)

(I also tried, seems to work; whereas {.deprecated: [append: add].} hits another bug that really should be fixed)

@salvipeter
Copy link

I was not really reasoning for append, more like reasoning against prepend... given that we have add, prepending is a trivial issue, as shown above. I don't consider prepending a basic operation.

@timotheecour
Copy link
Member Author

arguments for prepend(a: list,b: list):

b.addMove a
swap a, b

in any case this can be settled after your PR since it's orthogonal

@Araq
Copy link
Member

Araq commented Dec 18, 2020

the classical "works better in generic code" is a real argument ... and it doesn't mean that "everything compiles", just the stuff that makes sense ;-).

It is a real argument in this case because you gave a real example with collect that is convincing.

@timotheecour
Copy link
Member Author

@salvipeter after nim-lang/Nim#16362 (which I just LGTM'd) is merged, do you have interest in writing another PR for the following:

@timotheecour timotheecour changed the title support destructive add/prepend for SinglyLinkedList, DoublyLinkedList support destructive add/prepend for SinglyLinkedList, DoublyLinkedList + other lists improvement Dec 19, 2020
@timotheecour timotheecour changed the title support destructive add/prepend for SinglyLinkedList, DoublyLinkedList + other lists improvement support destructive add/prepend for SinglyLinkedList, DoublyLinkedList + other lists improvements Dec 19, 2020
@salvipeter
Copy link

I've created a new PR.

@timotheecour
Copy link
Member Author

timotheecour commented Jan 1, 2021

  • this gives SIGSEGV:
when true:
  import lists
  import std/enumerate
  var a = [10,11,12,13].toDoublyLinkedList
  echo a
  a.remove(nil)
  echo a

instead, we should use assert n!=nil (and not doAssert)

@timotheecour
Copy link
Member Author

  • remove(a; DoublyLinkedList, n) behavior when n notin a is error-prone; I don't think it's possible to fix (because O(1) behavior should be kept) but it should be documented (it's not a "noop")
when true:
  import lists
  block:
    var a = [10,11,12,13].toDoublyLinkedList
    var b = [20,21,22,23].toDoublyLinkedList
    a.remove(b.head)
    echo a # [10, 11, 12, 13]
    echo b # [20, 21, 22, 23]
  block:
    var a = [10,11,12,13].toDoublyLinkedList
    var b = [20,21,22,23].toDoublyLinkedList
    a.remove(b.head.next)
    echo a # [10, 11, 12, 13]
    echo b # [20, 22, 23]

@timotheecour
Copy link
Member Author

  • add(a: SinglyLinkedList, n: Node) behavior when n in a is bad, it should be as follows:
when true:
  import lists
  var a = [10,11,12].toSinglyLinkedList
  a.add a.head.next
  echo take(a, 6)
current output: @[10, 11]
expected output: @[10, 11, 12, 11, 12, 11]  # segment followed by cycle

this would be consistent with behavior for a.add a which creates a cycle

(take is defined here: timotheecour/Nim#504)

@timotheecour
Copy link
Member Author

  • add(a: DoublyLinkedList, n: Node) behavior when n in a is bad, it should be as follows:
when true:
  import lists
  var a = [10,11,12].toDoublyLinkedList
  a.add a.head.next
  echo take(a, 6)
current output: @[10, 11]
expected output: raise `ValueError`

unlike with SinglyLinkedList (#303 (comment)), there's no way to have segment followed by cycle because pred(11) would need to have 2 values (12 and 10). So the simplest is to raise an error when n is such that n.prev != nil

If user wants to add a n with n.prev!=nil, we can discuss that; one possibility is to call split (refs nim-lang/Nim#16536 (comment)) or setting n.prev = nil first.

(take is defined here: timotheecour/Nim#504)

@timotheecour timotheecour changed the title support destructive add/prepend for SinglyLinkedList, DoublyLinkedList + other lists improvements std/lists: support destructive add/prepend for SinglyLinkedList, DoublyLinkedList + other lists improvements Jan 6, 2021
@timotheecour
Copy link
Member Author

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants