Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Discussion] Do we need =sink operator at all? #13003

Closed
cooldome opened this issue Jan 1, 2020 · 16 comments
Closed

[Discussion] Do we need =sink operator at all? #13003

cooldome opened this issue Jan 1, 2020 · 16 comments

Comments

@cooldome
Copy link
Member

cooldome commented Jan 1, 2020

While working on #13002 I had this weird idea that I want to share. It is possible that we don't need =sink operator at all. It doesn't mean we will not move objects. I suggest we will replace =sink with the combination of the following three operations:

=destroy(dest)  # avoidable if we can prove that it is first assignment to dest
memCopy(dest, source)
wasMoved(source)   # avoidable if we can prove that source can't be double freed

Therefore =sink will always mean bitwise shallow copy. In old days sink operator contained an if statement that checked if it is self assignment. It is now gone it is done outside. Compiler also takes the liberty of replacing =sink calls with bitwise copy when it likes it. Possibly we can just simplify the stack and do bitwise copy in all cases.

It is easy to prove that I am wrong just provide one counter example that will not work without =sink operator.

@cooldome
Copy link
Member Author

cooldome commented Jan 1, 2020

I have one example that needs discussion, I don't it works well now either:

type MyObject = object
   len: int
   data: UncheckedArray[int]

Replacing =sink call with bitwise copy will not copy the data correctly as its size is unknown. We can declare just objects non copyable and non movable. Hence, users should always operate on ptr/ref to such object and not object itself.

Update: not a problem.

@Araq
Copy link
Member

Araq commented Jan 1, 2020

But MyObject cannot be put on the stack anyway. Your premise is correct -- we don't need =sink. And what you're describing is what Rust does afaict.

There is one case where it can be better though: Instead of destroy+copyMem+wasMoved you can treat =sink as an ordinary assignment and reuse an internal buffer, saving the destroy. I think...

@Clyybber
Copy link
Contributor

Clyybber commented Jan 1, 2020

This idea was discussed a few times on IRC.
I was proposing a weakSink/weakAsgn (without the "check and destroy previous location") which is needed for eliding initialization.

@mratsim
Copy link
Collaborator

mratsim commented Jan 1, 2020

I don't know about the =sink operator but the sink parameter is useful for overloading.

For example you could do the following overload to avoid an extra buffer

proc map[T](s: seq[T], p: proc[T](elem: T): T): seq[T] =
  result = newSeq[T](s.len)
  for i in 0 .. s.len:
    result[i] = p(s[i])

proc map[T](s: sink seq[T], p: proc[T](elem: T): T): seq[T] =
  for i in 0 .. s.len:
    s[i] = p(s[i])
  `=sink`(result, s)

Also as mentioned in #12631, for handling atomics users need to be able to override assignment/destructors operators with threadsafe versions.

@cooldome
Copy link
Member Author

cooldome commented Jan 1, 2020

@mratsim: sink type is clearly useful, clears stays. I need to think about atomics.

@Clyybber: I got your point. The = operator can be decomposed in three operations in the same way as sink and embedded =destroy call be elided in many cases in the same way as for =sink operator.

Looks like we moving towards a destructor spec update. So far includes everything proposed to date:

`=destroy`(dest: var T) # no changes here, user implemented destroy if not empty

`=`(dest: var T, src: T, isDestEmpty: bool)  # deepcopy into dest from src, 
                                             # if isDestEmpty is true it is safe not to destroy `dest`.
                                             # if isDestEmpty is false it is important not to use any dest fieds as they can be not initialized

`=sink`(dest: var T, src: T, isDestEmpty: bool)  # now optional operator, compiler will use memcopy if not provided. 
                                                 # default compiler implementation is efficient, hence user should provide one only at his own risk.

Let me know your thoughts?

@Clyybber
Copy link
Contributor

Clyybber commented Jan 1, 2020

@cooldome I think if isDestEmpty should be static bool; or we introduce =weakSink and =weakAsgn/=weak instead. This would make checking for the availability of init elision easier.

@cooldome
Copy link
Member Author

cooldome commented Jan 1, 2020

@Clyybber: static bool might be not bad idea, need to check how easy it is to implement though.
I concerned that we introduce too many type bounds operations users will complain. IMO, if you tell user that they need to implement =destroy, =, =sink, =weak, =weakAsgn per object type they would remove Nim straightway. Users can't bother using language that hates hates them so much. This is less usable than C++ already. What I like about move proposal that it makes =sink optional hence users need to implement only =destroy and =.

@Clyybber
Copy link
Contributor

Clyybber commented Jan 1, 2020

@cooldome If =weakAsgn/=weakSink are not provided for a type the compiler will simply not do init elision for those types. They are an optional optimization.

@cooldome
Copy link
Member Author

cooldome commented Jan 1, 2020

After more thought I am leaning towards more simple and straight forward proposal:

`=destroy`(dest: var T) # no changes here, user implemented destroy if not empty
`=`(dest: var T, src: T)  # deepcopy into dest from src, dest is always empty.
`=sink`(dest: var T, src: T)  # now optional operator, compiler will use memcopy if not provided. 
                              # dest is always empty.

This proposal is closer to what we currently have and easier to implement. In Clyybber's terms = and =sink operations are weak, there are no strong variants at all. It is compiler's job to call destructor before copy/sink invocation for dest when it is necessary,

@Araq
Copy link
Member

Araq commented Jan 1, 2020

For example you could do the following overload to avoid an extra buffer

@mratsim

Overloading based on sink is not ever required, sink was specially designed for not requiring overloading. In your example, only the sink version is required.

@cooldome
Copy link
Member Author

cooldome commented Jan 1, 2020

@Araq: could you please explain your comment a bit more I am not sure how operator overloading came into play. I suggested to make = and =sink to treat dest as uninitialised variable and don't try to access it or trying to destroy it.

Example:

let x = a

will be compiled to

var x # with sfNoInit flag
`=`(x, a)

instead of

var x 
x = default()
`=`(x, y)  #  inside `=` there is `=destroy`(x) call 

One more example:

x = a

will be compiled to:

`=destroy`(x)
`=`(x, a)

instead of:

`=`(x, a)   #  inside `=` there is `=destroy`(x) call

What do think about this idea?

@Araq
Copy link
Member

Araq commented Jan 2, 2020

What do think about this idea?

Yeah, I agree completely, it's what we observed too, @Clyybber is also working on this. :-)

I summarized it as "optimize init+sink to copyMem" on the forum. There is also "optimize away wasMoved+destructor call".

=(dest: var T, src: T) # deepcopy into dest from src, dest is always empty.

This is dangerous for performance! Inside = you can often re-use a buffer if it hasn't been destroyed yet.

@cooldome
Copy link
Member Author

cooldome commented Jan 7, 2020

Hi Gents,
I would like to discuss one more proposal here. Let's concentrate on =sink operator only.

First proposal, mild:
If user provided destructor operator, but haven't provided sink operator. Generate sink operator in the following way:

proc sink(dest: var T, src: T) = 
  `=destroy(dest)`
  dest = src # dest is cursor here, hence it is a mem copy operation

This implementation is efficient so mention in the docs that sink operator is now optional.
Compiler can generate sink operators automatically.

Second proposal more radical:
Implement genSink in injectdestructors in the following way:

proc genSink(c: var Con; dest, ri: PNode): PNode =
  if isFirstWrite(dest, c): # optimize sink call into a bitwise memcopy
    result = newTree(nkFastAsgn, dest)
  else:
    result = # not quite correct but you got the idea 
        genDestroy(dest)
        newTree(nkFastAsgn, dest)

Remove =sink hook completely, make semOverride raise warning on =sink saying that it will not be used for anything. =destroy plus memCopy combo will be used for everything: strings, seqs, ref and user types.

Benefits: shaves off large number of generated procs. Less hooks user needs to provide the better.

@Araq
Copy link
Member

Araq commented Jan 7, 2020

I completely agree with the philosphy and direction but I'm also afraid we're losing something. Consider @mratsim's remark that wasMoved might need to use atomic instructions which C++ can easily do as inside its move you are supposed to disarm the source yourself.

@cooldome
Copy link
Member Author

cooldome commented Jan 7, 2020

So my conclusion. First mild proposal is OK, second one let's wait and see if people are going to find alternative ways to use sink hook.

@cooldome
Copy link
Member Author

cooldome commented Jan 17, 2020

Discussion is closed. Mild proposal was implemented. I will keep issue open to see if we need to implement stronger proposal after it gets clearer if custom sink operator would be useful for atomics.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants