Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

proposal: runtime: add AddCleanup and deprecate SetFinalizer #67535

Open
mknyszek opened this issue May 20, 2024 · 10 comments
Open

proposal: runtime: add AddCleanup and deprecate SetFinalizer #67535

mknyszek opened this issue May 20, 2024 · 10 comments
Labels
Milestone

Comments

@mknyszek
Copy link
Contributor

mknyszek commented May 20, 2024

Background

Go provides one function for object finalization in the form of runtime.SetFinalizer. Finalizers are notoriously hard to use, and the documentation of runtime.SetFinalizer describes all the caveats with a lot of detail. For instance:

  • SetFinalizer must always refer to the first word of an allocation. This means programmers must be aware of what an 'allocation' is whereas that distinction isn't generally exposed in the language.
  • There cannot be more than one finalizer on any object.
  • Objects with finalizers that are involved in any reference cycle will silently fail to be freed and the finalizer will never run.
  • Objects with finalizers require at least two GC cycles to be freed.

The last two of these caveats boil down to the fact that runtime.SetFinalizer allows object resurrection.

Proposal

I propose adding the following API to the runtime package as a replacement for SetFinalizer. I also propose officially deprecating runtime.SetFinalizer.

// AddCleanup attaches a cleanup function to ptr, which is executed some time
// after ptr is no longer reachable. The cleanup function is executed with the
// argument cleanupValue.
//
// cleanupValue must not be equal to ptr and this function will panic if it is.
// If ptr is reachable from cleanup or cleanupValue, ptr will never be collected
// and the cleanup will never run.
//
// The cleanup function is not guaranteed to run in general, and also not guaranteed
// to run before program exit.
func AddCleanup[T, S any](ptr *T, cleanup func(S), cleanupValue S) Cleanup

// Cleanup is a handle to a cleanup function for a specific object.
type Cleanup struct { ... }

// Stop cancels the cleanup function. Stop will have no effect if the cleanup function
// has already been queued for execution once the object becomes unreachable. To
// guarantee that Stop removes the cleanup function, the caller must ensure that the
// pointer that was passed to AddCleanup is reachable across the call to Stop.
func (c Cleanup) Stop() { ... }

AddCleanup resolves many of the problems with SetFinalizer.

It forbids objects from being resurrected, resulting in prompt cleanup, as well as allowing cycles of objects to be cleaned up. Its definition also allows attaching cleanup functions to objects the caller does not own, and possibly attaching multiple cleanup functions to a single object.

However, it is still fundamentally a finalization mechanism, so to avoid restricting the GC implementation, it does not guarantee that the cleanup function will ever run.

Similar to finalizers' restriction on the object not being reachable from the finalizer function, ptr must not be reachable from the value passed to the cleanup function, or from the cleanup function. Usually this results in a memory leak, but the common case of accidentally passing ptr as s out of convenience can be easily caught.

In terms of interactions with finalizers, the cleanup function will always run the first time the value pointed to by ptr becomes unreachable. That is, if an object has both a cleanup function and a finalizer, the cleanup function is guaranteed to run before the finalizer. In other words, the cleanup function does not track object resurrection and will not run again if the finalizer does resurrect the object.

Design discussion

Avoiding allocations in the implementation of AddCleanup

AddCleanup needs somewhere to store cleanupValue until cleanup is ready to be called. Naively, it could just put that value in an any variable somewhere, but this would result in an unnecessary additional allocation.

In the actual implementation, a cleanup will be represented as a runtime "special," an off-heap manually-managed linked-list node data structure whose individual fields are sometimes explicitly inspected by the GC as roots, depending on the "special" type (for example, a finalizer special treats the finalizer function as a root).

Since each "special" is already specially (ha) treated by the GC, we can play some interesting tricks. For example, we could type-specialize specials and store cleanupValue directly in the "special." As long as we retain the type information for cleanupValue, we can get the GC to scan it directly in the special. But this is quite complex.

To begin with, I propose just specializing for word-sized cleanup values. If the cleanup value fits in a pointer-word, we store that directly in the special, and otherwise fall back on the equivalent of an any. This would cover many use-cases. For example, cleanup values that are already heap-allocated pointers wouldn't require an additional allocation. Also, simple cases like passing off a file descriptor to a cleanup function would not create an allocation.

Why func(S) and not func()?

The choice to require an explicit parameter to the cleanup function is to reduce the risk of the cleanup function accidentally closing over ptr. It also makes it easier for a caller to avoid allocating a closure for each cleanup.

Why func(S) and not chan S?

Channels are an attractive alternative because they allow users to build their own finalization queues. The downside however is that each channel owner needs its own goroutine for this to be composable, or some third party package needs to exist to accumulate all these channels and select over them (likely with reflection). It's much simpler if that package is just the runtime: there's already a system goroutine to handle the finalization queue. While this does mean that the handling of finalization is confined to an implementation detail, that's rarely an issue in practice and having the runtime handle it is more resource-efficient overall.

Why return Cleanup instead of *Cleanup?

While Cleanup is a handle and it's nice to represent that handle's unique identity with an explicit pointer, it also forces an allocation of Cleanup's contents in many cases. By returning a value, we can avoid an additional allocation.

Why not have the type arguments (T and/or S) on Cleanup too?

It's not necessary for the implementation of Cleanup for the type arguments to be available, since the internal representation will not even contain a reference to ptr, cleanup, or cleanupValue directly. It does close the door to obtaining these values from Cleanup in a type-safe way, but that's OK: the caller of AddCleanup can already package those up together if it wants to.

@gopherbot gopherbot added this to the Proposal milestone May 20, 2024
@randall77
Copy link
Contributor

Another possibility:

func AddCleanup[T any](ptr *T, cleanup func(T)) Cleanup

When cleaning up, we pass a (shallow) copy of *T to the cleanup function.

This gets rid of the need to store cleanupValue anywhere. It is effectively the final state of the object.

On the downside, this may get weird if you're putting a cleanup on a type defined in another package, as you might not be able to do much with the argument.

@cespare
Copy link
Contributor

cespare commented May 20, 2024

@mknyszek I find it hard from reading the proposed signatures and doc comments to understand what S and cleanupValue are. Could you clarify that a bit? It might also help to get an example of how one would use the new API. For instance, how would one use AddCleanup to replace the use of SetFinalizer to close a file when it is GCed?

Another question: would AddCleanup replace some or all uses of SetFinalizer in the standard library? Are there instances where it cannot work, or would be undesirable?

@Merovius
Copy link
Contributor

Merovius commented May 20, 2024

In terms of interactions with finalizers, the cleanup function will always run the first time the value pointed to by ptr becomes unreachable. That is, if an object has both a cleanup function and a finalizer, the cleanup function is guaranteed to run before the finalizer. In other words, the cleanup function does not track object resurrection and will not run again if the finalizer does resurrect the object.

What's the reasoning here? This means that you can not assume that ptr is unreachable when the cleanup runs. e.g. in the case of go4.org/intern (yes, I know that this particular case will be obsolete by #62483, but just for illustration) that would mean you might get different Values for the same interned string. This seems to combine particularly badly with the idea that you can attach cleanups to values you don't own.

If this was done the other way around - cleanups run after finalizers and only if ptr did not get resurrected - you could rely on ptr being unreachable after the cleanup. And if there are no finalizers, ISTM that everything could work the same. So you'd get better invariants now at no cost for The Bright Future of no Finalizers™.

Or is there something (perhaps in the implementation of the runtime) that I'm unaware of?

@mknyszek
Copy link
Contributor Author

When cleaning up, we pass a (shallow) copy of *T to the cleanup function.

Unfortunately, I think that requires treating the contents of ptr as a root, so you can no longer reclaim ptr if it participates in a cycle.

@mknyszek I find it hard from reading the proposed signatures and doc comments to understand what S and cleanupValue are. Could you clarify that a bit? It might also help to get an example of how one would use the new API.

The idea behind this API is to decouple cleanup from the object being freed, so S and cleanupValue represent that decoupling. I can understand that at first glance it seems odd, but this is really the way to avoid a lot of the footguns. Most of the time, finalizers (and cleanup functions) are necessary for cleaning up things that the GC is unaware of, like file descriptors, memory passed from C, etc. You pass that to the cleanup function, not the object itself.

For instance, how would one use AddCleanup to replace the use of SetFinalizer to close a file when it is GCed?

f, _ := Open(...)
runtime.AddCleanup(f, func(fd uintptr) { syscall.Close(fd) }, f.Fd())

I'm taking a lot of liberties in this snippet, but that would be the general idea. I can update the proposal later.

@randall77
Copy link
Contributor

Unfortunately, I think that requires treating the contents of ptr as a root, so you can no longer reclaim ptr if it participates in a cycle.

That's a good point.

Most of the time, finalizers (and cleanup functions) are necessary for cleaning up things that the GC is unaware of, like file descriptors, memory passed from C, etc. You pass that to the cleanup function, not the object itself.

What about this case?

type T struct {
    buf unsafe.Pointer // pointer to some memory allocated with C.malloc
    ...
}

During the lifetime of a T, we may change buf several times, each time with a free/malloc pair.
When the T goes dead, we want to free the last value that was in buf.

How do I write a cleanup for that? I think I would need to Stop/AddCleanup each time I updated buf. I don't see a way to set a cleanup when T is allocated that does the right thing. Unless buf is indirect somehow.
But maybe that's the intention, that you have to Stop/AddCleanup each time?

@mknyszek
Copy link
Contributor Author

@Merovius I think your point about interacting with existing objects using finalizers is interesting. AFAIK there's nothing in the implementation preventing the semantics you propose. I'll have to give this some more thought.

@mknyszek
Copy link
Contributor Author

Most of the time, finalizers (and cleanup functions) are necessary for cleaning up things that the GC is unaware of, like file descriptors, memory passed from C, etc. You pass that to the cleanup function, not the object itself.

What about this case?

type T struct {
    buf unsafe.Pointer // pointer to some memory allocated with C.malloc
    ...
}

During the lifetime of a T, we may change buf several times, each time with a free/malloc pair. When the T goes dead, we want to free the last value that was in buf.

How do I write a cleanup for that? I think I would need to Stop/AddCleanup each time I updated buf. I don't see a way to set a cleanup when T is allocated that does the right thing. Unless buf is indirect somehow. But maybe that's the intention, that you have to Stop/AddCleanup each time?

Ah, that's interesting. Yeah, you would need to Stop/AddCleanup each time. I don't see a way around that (unless, like you say, you add an indirection on buf, and pass that to the AddCleanup), and having to do that makes the malloc/free somewhat more expensive compared to what you can do with finalizers.

I'm inclined to say that requiring the indirection isn't that bad. It's going to result in an additional allocation for each T that exists, but that allocation will be bound to the lifetime of T. In other words, it's kinda like extending T by 8 bytes. And I think the fact that you can release T's memory immediately (as opposed to waiting an extra GC cycle) may actually make AddCleanup win out in the long run.

@randall77
Copy link
Contributor

The indirect scheme is basically the same thing you would do with finalizers if, e.g., the T's were in a cycle. You would allocate a child object, point T to it, put buf in the child object, and put a finalizer on the child object.

Having the option to Stop/AddCleanup each time instead of using the indirection technique might be useful.

I'm not sure I'm arguing for or against anything at this point. Just trying to understand.

@RogerDilmer
Copy link

What about deferCleanup as a name. Defer is already very understandable by Go developers (albeit in a different context). Add doesn't really say anything about when it happens.

@bjorndm
Copy link

bjorndm commented May 21, 2024

Ruby does something similar for finalizers. Basically the finalizer is run after the object is deallocated. They are also guaranteed to run if the object is deallocated. The Ruby finalizer is not allowed to reference the object that it finalizes. In stead you have to reference to any members through a closure. In my experience this works better than Go finalizers. I would recommend Go adopts the Ruby way of implementing finalizers.

https://ruby-doc.org/core-3.0.0/ObjectSpace.html

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
Status: Incoming
Development

No branches or pull requests

7 participants