NEP 2 Catching up with C and Rust: Ownership, destructors, unique pointers

Parashurama edited this page Jan 19, 2017 · 8 revisions

WARNING: Work in progress, this section might contain content that doesn't make any sense yet.

In the c++ model, there are four steps that each object has to go through it its live cycle.

  1. Allocation
  2. Initialization
  3. Deinitialization
  4. Deallocation

A new is allocation on the heap, and initialization, a delete is deinitialization and deallocation. A placement new is only initialization, and a placement delete is only deinitialization. malloc is old allocation and free is only deallocation. The programmer is resposible, that everything is called symmetrically for every object. Every allocation needs its deallocation, and every initialization needs it's deinitialization. Every wrong usage is undefined behaviour. Objects may never be initialized twice, or deallocated without deinitialization.

The Nim compiler ensures all object initialization by setting the entire object memory to zero before usage, and not using deinitialization (without experimental pragma). This works fine for POD (plain old data) types, and even for pointer members assuming that pointers don't express ownership, but it doesn't work, when the object have some sort of ownership to some kind of resources that need to be freed when the object is not used anymore.

My presumption is, that everybody agrees that the language should do everything possible to prevent an asymmetric calling of the four procedures by accident. I am not saying it should be impossible to shoot yourself in the foot, because sometimes people really want to shoot themself in the foot and they need to learn that it hurts by doing so. In current state the nim compiler with experimental features enabled, has some defaults I would not agree on:

type 
  MyType = object
    a,b: int

proc `=destroy`(v: var MyType) =
  echo "destroy ", v

proc main() =
  var v = MyType(a: 1, b:2)
  discard MyType(a: 3, b:4)
  v = MyType(a:5, b:6)
  # let k = MyType(a: 18, b: 19)
  echo "end of main"

main()
echo "end of program"

## output
# end of main
# destroy (a: 5, b: 6)
# end of program

The obvious problem is, that the discarded value is not deinitialized at all. This should not be possible so easily. The less obvious problem is, that there are two initializations for v. One is with the values (1,2) and the other one is with the values (5,6), but there is only one deinitialization.

My suggestion would be to trigger deinitialization of the left operand of the assignment operator, before assignment takes place. Meaning that v = MyType(a:5, b:6) compiles to something equivalent to this pseudocode: destroy(v); init(v, MyType(...)). This would also solve the problem, that current implementors of the = operator have to implement some logic that needs to determine whether the left operand is already initialized or not, because the compiler would ensure that the left operand is not initialized.

type 
  MyType = object
    resources: pointer

proc `=`(dst: var MyType; src: MyType) =
  if dst.resources != nil:
    freeResources(dst.resources)
  dst.resources = src

I just heard, that self assignment on types that have deinitialization would not work anymore. This can be handled by implementing some logic that checks whether it is self assignment, or by simply declaring self assignment illegal. One version that works with self assignment is the following (again in pseudocode), but it still needs to explain what move semantics are.

proc `=`(dst: var MyType; src: MyType) =
  tmp <- copy(src)
  destroy(dst)
  dst <- tmp

move semantics

Move semantics mere introduced into c++ as a new way to construct objects. A move to a from b means that a gets constructed from b, but a may exploit b to do so. To support this feature C++ added two entirely new kinds of references to the language, to move reference, and the forward reference. One reference is marked as &&T and the other one &&T, but here T is a templated type. And then there is move initialization and move assignment.

Nim also needs some sort of move semantics, when it doesn't want to have everything garbage collected, but I do think that the c++ way of doing this is just too complicated, and for most cases unnecessary. The most important context for move optimization is the return value, because nobody wants create a complex copy operation that keeps the source object alive, just to delete this objects instantly after that. And to have move optimizations for return values, the language does not need to have additional move operations.

My Suggestion to the problem

My proposition is, to use swap, for things that are solved with move in other languages (C++). The idea to do is, comes from the fact, that Nim does not have the concept of a constructor, and therefore is not able to introduce a new kind of constructor. Neither does it have a state to describe a variable as is moved from (ivalid to read).

Swap has the advantage, that it never introduces new entities, it just swaps two objects of the same type. Therefore the operation contains two move operations, but without destroying any other variable

example

without swap:

var a: MyObjectType
var b: MyMemberType
b.initialize
a.b = b

In this case the local variable b needs to be copied into a, while leaving the local b intact. Everything that is still done to the local b is probably it's destruction. And the assignment operator So there is an unnecessary duplication of

after:

var a: MyObjectType
var b: MyMemberType
b.initialize
swap(a.b,b)

Now the move has been implemented with a simple swap. It is now possible to see, that b has after the swap only zero memory, because the member it was swapped from, was a member of an uninitialized member. If there is a destructor declared for the MyMemberType, it couldn't have any effect anymore, because zeroed out memory cannot possibly own anything.

returning from a function

Returning from a function could be implemented with a swap.

proc foobar(): MyType =
  result.a = 11
  result.b = 12


var fb = foobar()

background pseudocode:

var fb : MyType
swap(fb, foobar.result)

if fb would have had any content from before the assignment, it would now be in foobar.result and therefore be cleared properly by cleaning the stack.

swap does already exist, what does this proposition solve at all?

The language would not need new features, it would just need to improve on already existing features, and encourage the usage of them.

optimize away destructors on zero memory structs.

Variables that are statically known to have only zero memory, cannot possibly have any ownership that needs to be released. Therefore the compiler could just optimize away any destructor call to variables that are known to be zero memory. Unlike c++, in nim the default initialization is always zero memory, so there might even be several circumstances, where this optimization actually takes place.

etc

  • Swap should be generic, and should not be overridden. Exceptions in a swap should be illegal.
  • What happens on exceptions, can there be objects that leak?
  • Assignment with types that have a destructor, but no copy/clone method should be illegal.
  • Assignment should be generic, and should not be overridden.