Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RFC] View/slice #12

Closed
zielmicha opened this issue Apr 24, 2017 · 24 comments
Closed

[RFC] View/slice #12

zielmicha opened this issue Apr 24, 2017 · 24 comments

Comments

@zielmicha
Copy link

It's very hard to avoid accidental copies when using system.string type. This makes it unsuitable for I/O, especially when moving around large blobs of data.

There is system.shallow(string), which helps a bit, but it's semantics are underspecified and I've struggled to use it. Anyway, I think it's bad idea to mark strings as copying/non-copying at runtime instead of at type level.

Therefore I propose inclusion of new View[T] type to the standard library. View[T] essentially points to an array in memory and in most basic form View[T] = tuple[pointer: ptr T, length: int]. It is modelled after Go slice and after my implementation of View[T] in collections.nim. Similar ideas were discussed before e.g. 1, 2.

The requirements for the type are as follows:

  • non-copying slice(self: View, start: int, length: int) operating, which returns subview of self.
  • ability to construct views pointing to arbitrary memory (useful for C interop)
  • at least partial memory safety - e.g. slice checks for index of bounds errors

It's easy to implement type that fullfils these requirements. However, there are two question:

  • should there be separate ConstView and VarView types (to mark immutability in type system)?
  • should memory pointed by views by tracked by GC?

GcView?

The basic version, implemented as View[T] = tuple[pointer: ptr T, length: int], is not memory-safe when not used carefully.

It's possible to add third field gcptr: RootRef, that can be used force GC to keep memory pointed to by View alive. I've tested this approach before in go2nim branch of collections.nim: gcptrs, goslice (go2nim was my attempt at creating Go to Nim translator).

The main disadvantage is that this increases size of the type to 3 words (e.g. 24 bytes on amd64) and makes making copies a bit more expensive. Maybe we should create separate types ViewPtr and ViewRef?

@andreaferretti
Copy link

should there be separate ConstView and VarView types (to mark immutability in type system)?

I think that the type system already handles that just fine: one can define different procedures for View and var View

@dom96
Copy link
Contributor

dom96 commented Apr 27, 2017

I think that View should be memory safe, i.e. use a 'ref' instead of a 'ptr'. The unsafe version can then be called ViewPtr or similar.

@zielmicha
Copy link
Author

zielmicha commented Apr 27, 2017

I think that the type system already handles that just fine: one can define different procedures for View and var View

That won't work, View is not a value type. For View you can already modify its contest, var View would be for modifing the pointer/length.

@andreaferretti
Copy link

It should work regardless of whether View is a value or ref type. If you declare an argument of type var View and then pass in a View and Nim accepts it, I would call it a bug. var should be about mutability, not about pointers

@zielmicha
Copy link
Author

zielmicha commented Apr 27, 2017

Of course you can, that's not the point of var. It's like the difference between const char* and char* const in C++. And anyway, var is not first class in Nim (e.g. you can't have variable of type seq[var int]).

type View = object
  a: pointer
  b: int

proc bar(v: var View) =
   discard

proc foo(v: View) =
  var w = v
  bar(w)

var v = View(a: nil, b: 0)
foo(v)

@andreaferretti
Copy link

andreaferretti commented Apr 28, 2017

I am not sure what your snippets should show. In your example, w is of type var View, hence I see no contradiction with what I was saying. If you replace the last two lines with

let v = View(a: nil, b: 0)
bar(v)

you get a compilation error, as it should (even if you make View a ref).

About var not being first class: mutability is inherited (as in Rust, and unlike - say - in Scala). This means that essentially there is no such distinction between const char * and char * const. If you have a mutable reference, its content is recursively considered of var type.

It is debatable whether this is a good choice, but it seems pretty consistent.

As you have noticed, you can bypass mutability constraints by taking a mutable reference to an immutable value

@zielmicha
Copy link
Author

@andreaferretti It's theoretically possible to use var in a way you propose, but:

  • it's not done in stdlib for reference types (e.g. you can modify let f = newTable[int,int](), with you understanding of ref+var, you could only modify var f = newTable[int, int]()). seq[T] and string are value types, so they don't count.
  • it was not designed to be used it such way (e.g. 'bypassing' constness with var w = v or by assigning to field of ref/var type)

If you have a mutable reference, its content is recursively considered of var type.

Yes, but you can't have immutable references in Nim. For example:

type F = ref object
   a: int

let f = new(F) # immutable?
f.a = 5 # nope!
proc set(f: F) = # maybe non-var parameters are immutable?
  f.a = 10 # nope!
set(f)

@andreaferretti
Copy link

I was sorry to discover that

import tables

let f = newTable[int,int]()

f[5] = 3

compiles. To me, this should be a compilation error (and it could be if []= was defined on var TableRef). :-(

This whole var thing is confusing. If it is not about mutability (as I always assumed) it should get a better name and be explained better

@dom96
Copy link
Contributor

dom96 commented Apr 28, 2017

@andreaferretti You are creating a TableRef which is also mutable. If you create a Table then you will get a compile-time error:

import tables

let f = initTable[int,int]()

f[5] = 3

@andreaferretti
Copy link

My point is that being mutable or not should not be a property of TableRef.

I have seen two approaches wrt to mutability - each has its own pros and cons but I cannot understand where Nim fits in.

  • In the first model (for instance Scala) each field of an object and local variable can be mutable or not. This allows one to have - say - a mutable reference to an immutable object, an immutable reference to a mutable object and any other combination. This has a few advantages: it is clear, and one can guarantee that some objects will not be mutated (this is convenient for instance for messages sent between threads, as mutating them would not be threadsafe).

Nim seems to differ from this model, because when defining object types there is no place to mark field as mutable/immutable. instead it depends on the instantiation site:

type Foo = object
  a, b: int

var f = Foo(a: 1, b: 2)
f.b = 5 # ok
let f = Foo(a: 1, b: 2)
f.b = 5 # not ok
  • In the second model (for instance Rust) mutability is only defined for local variables. Objects themselves have no notion of mutable fields: mutability defined at the usage site and inherited - that is, one can mutate fields of an object if there is a mutable reference to that object, but not if there is an immutable reference. This has other advantages: there are no things such as immutable references to mutable objects - which may confuse someone -, it is easy to mutate objects and the freeze them by sending an immutable reference, and finally immutable objects are easy to emulate anyway using private fields and (read-only) accessor methods.

At first, Nim seems to have this approach. It even has overloading based on var types, which allows to define mutator functions only for var objects.

But it seems that this fails for reference types. This, in my opinion, makes the whole semantics confusing. I cannot see a reasonable explanation of the exact behaviour, nor a rationale for it.

I would be happy with the Rust model, and I also would be happy with the Scala model (although it would be too breaking to introduce at this point) but I cannot make head or tails of the current Nim model. Can you provide some explanation of why things are this way?

@Araq
Copy link
Member

Araq commented Apr 29, 2017

But it seems that this fails for reference types. This, in my opinion, makes the whole semantics confusing. I cannot see a reasonable explanation of the exact behaviour, nor a rationale for it.

It doesn't need a rationale because it's the natural model when you don't have immutability in the type system. I agree it has undesirable consequences but I don't understand what you don't understand about it. Of course var vs let is about mutability but it's only shallow immutability, not deep immutability, so ptr/ref accessed in this way are mutable. Otherwise you could get the very same behaviour with a temp variable:

proc doesntMutate(n: ref Node) =
  # assume the compiler prevents it:
  n.field = 5
  # but then this is allowed?
  var x = n
  x.field = 5

Furthermore if n: ref Node means an immutable parameter then how do I get a mutable one? var ref Node is different, that's a pointer to a pointer.

@andreaferretti
Copy link

It doesn't need a rationale because it's the natural model when you don't have immutability in the type system.

I am not sure I understand the argument why this is a natural model - I find it pretty contrived. Moreover it looks like Nim tracks immutability in the type system (I have certainly advertised it so!)

I agree it has undesirable consequences but I don't understand what you don't understand about it. Of course var vs let is about mutability but it's only shallow immutability, not deep immutability, so ptr/ref accessed in this way are mutable

The thing is - it is not about shallow vs deep immutability. One can have nested object types and everything is recursively immutable until you hit a pointer. It is not clear to me why going through a pointer should affect mutability. Moreover, when using libraries, one may not even know whether at some point there is a pointer indirection.

It seems some reflection of the fact that Nim compiles to C - probably things are easier this way, but I cannot figure out why exactly. But I think that Nim should have its own semantics, at least because it also compiles to javascript. Look at how NIm goes out of its way to make strings and seqs value types.

Otherwise you could get the very same behaviour with a temp variable

Yes, this makes it easy to bypass the mechanism, but at least it is explicit. I see it like a cheap way to freeze/unfreeze objects. Of course Rust would prohibit such an operation with its borrow checker, but I don't think this is necessary.

Furthermore if n: ref Node means an immutable parameter then how do I get a mutable one? var ref Node is different, that's a pointer to a pointer.

For me, var ref Node has always meant a mutable ref Node - I am not really aware whether this means using a double pointer. If this is the case, then so be it. There may be a little price in performance, but I am pretty sure that all the seqs copies due to seqs having value semantics add more cost overall.

For instance all the collections I have written use a type parameter for mutator operations, like spills or everyting in cello

@zielmicha
Copy link
Author

I also think that current semantics var are not very useful, but I doubt it's possible to change them at this point.

@Araq
Copy link
Member

Araq commented May 1, 2017

It seems some reflection of the fact that Nim compiles to C - probably things are easier this way, but I cannot figure out why exactly.

C has nothing to do with it.

For me, var ref Node has always meant a mutable ref Node - I am not really aware whether this means using a double pointer. If this is the case, then so be it. There may be a little price in performance, but I am pretty sure that all the seqs copies due to seqs having value semantics add more cost overall.

It's not about performance either, there is a subtype relation for ref T, but not one for var ref T. Also consider adding a node to the front of a singly linked list:

proc add(head: var ref Node; x: ref Node) =
  x.next = head
  head = x

Here the fact that var ref Node is a double indirection is essential for this code to work! (In fact, it's also essential for var seq or var string to work.)

@Araq
Copy link
Member

Araq commented May 1, 2017

It is not clear to me why going through a pointer should affect mutability.

Because pointer assignments only copy the pointer and so it's easy to get a mutable view on the data anyway. Note that this completely different for an object, here you need addr or even unsafeAddr to get such a mutable view on the data.

Otherwise you could get the very same behaviour with a temp variable

Yes, this makes it easy to bypass the mechanism, but at least it is explicit. I see it like a cheap way to freeze/unfreeze objects.

But such an easy escape mechanism means the immutability is not guaranteed at all. Every proc can declare "deeply immutable" and yet mutate it under the hood. For example an optimizer cannot rely on it at all then. Where is the gain?

But let's assume your proposal would work out and would produce a better Nim language. Is it still hard to accept that the current design is not "contrived"? (The current design is pretty much Ada's design fwiw.)

@andreaferretti
Copy link

@Araq So, if I understand correctly, the reason that the immutability restriction does not propagate trough pointers is that this would be easy to bypass anyway, and pretending that it does would only lead to a false sense of security (unless one has a compiler mechanism to enforce it, like in Rust).

I still happen to think that deep immutability would save some errors (bypassing the mechanism requires intention), but I now understand better the reasons for this design. I think it would be useful to document it better in the manual or tutorials - if I find the time, I will try to contribute something. Thank you!

(About the linked list example: of course sometimes double indirection is needed - this is clear and I was not discussing this)

@Varriount
Copy link

Varriount commented May 2, 2017

For, say, a string, what's wrong with tuple[bounds: Slice[int], data: ref string] ?

In any case, most cases of superfluous string copying can be solved by using ref string types. Sure, you get a (small) performance penalty because of the double indirection, but it's worth it to avoid the odd semantics shallow() causes. You would have to have a ref string in a view anyway, in order to prevent copies/guarantee memory safety.

The only alternative I can see is creating an immutable string type. I'm all for such an enhancement, but @Araq is against such extensions.

@zielmicha
Copy link
Author

zielmicha commented May 2, 2017 via email

@Varriount
Copy link

What happens if the view 'targets' a sequence, and the sequence grows? Even with the additional RootRef, I'm not quite sure if strings/sequences are implicitly or explicitly deallocated on resize.

@krux02
Copy link
Contributor

krux02 commented May 3, 2017

Well all I have to say about this I already did in this issue: nim-lang/Nim#5437

TL;DR

  • really helpful for C wrappers
  • It is basically openarry, but without the restriction to be argument only converted from array or seq type
  • open-std.org/JTC1/SC22/WG21/docs/papers/2016/p0122r1.pdf (c++ proposal of the same feature)

I also think that current semantics var are not very useful, but I doubt it's possible to change them at this point.

Well I think that could be connected to an overuse of ref types. But that is only a guess here.

@cooldome
Copy link
Member

cooldome commented May 4, 2017

IMO, View and openarray concepts are related. When you accept it as as argument it is pretty much the same thing. If one could return an openarray from proc it would close the gap between the two.

proc subseq[T](seq[T], i: int): openarray[T] # returns a subset of seq without copying

Maybe, it is worth to extend existing openarray concept to match View features?

@krux02
Copy link
Contributor

krux02 commented May 4, 2017

I think the default slicing operator should use a type like this:

let myView: View[char] = myString[3..17]
let myViewAsString: string = myView # maybe this is a good place for a `converter` function

Doing this would cause some backwards incompatibility though :/

@mratsim
Copy link
Collaborator

mratsim commented Mar 1, 2018

Chiming in,

It would be very nice if fixed size slices did not allocate a seq on the heap.

Example:

type Uint256 = distinct array[32, byte]
  # For demo purpose, normally it should take endianness into account

proc readUint256BE*(ba: array[32, byte]): UInt256 =
  ## For demo purpose, normally it should take endianness into account
  Uint256(ba)

proc decode_public_key(pubKey: array[64, byte]): array[2, UInt256] =
  result[0] = readUint256BE pubKey[0 ..< 32]
  result[1] = readUint256BE pubKey[32 ..< 64]

var a: array[64, byte]
echo decode_public_key(a)

Error:

slice_view.nim(13, 29) Error: type mismatch: got <seq[byte]>
but expected one of:
proc readUint256BE(ba: array[32, byte]): Uint256

expression: readUint256BE pubKey[
  ## a shortcut for 'a .. (when b is BackwardsIndex: succ(b) else: pred(b))'.
  0 .. 31]

@narimiran narimiran transferred this issue from nim-lang/Nim Jan 2, 2019
@Araq
Copy link
Member

Araq commented Oct 28, 2020

We now have --experimental:view types and also a decent solution for deep immutability, closing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants