Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

proposal: weak reference maps #43615

Open
seebs opened this issue Jan 9, 2021 · 42 comments
Open

proposal: weak reference maps #43615

seebs opened this issue Jan 9, 2021 · 42 comments
Labels
Projects
Milestone

Comments

@seebs
Copy link
Contributor

@seebs seebs commented Jan 9, 2021

This is an outgrowth of #32779, and also of https://mdlayher.com/blog/unsafe-string-interning-in-go/ and ensuing discussion of this on Gopher Slack.

Occasionally, it would be desireable to have weak references. Unfortunately, you can't implement these as a user; you need hooks into the runtime to make a safe weak reference implementation, I think. The string interning hack described above probably works but is not complying with the uintptr conversion rules.

In nearly every case where weak references are desireable, it's desireable to be able to look up specific weak references, which suggests something similar to a map data type. It's also desireable to be able to find out whether the weak reference exists, or to be able to request the reference, getting an actual stable non-weak reference back, without a separation between the query and the retrieval.

Conveniently, a map with a pointer key type already provides exactly this behavior. What it doesn't provide is the potential for GC to remove objects.

Thus, my proposal:

cache := make(weak map[string]*Data)

This is equivalent to make(map[string]*Data), except that things which are in the map but not referenced anywhere else may become deleted from the map at any arbitrary time. However, retrieval from the map is guaranteed to either yield a nil (and a false in the ,ok form) or the pointer last stored for that key, and if it yields a non-nil pointer, that pointer is safe to use.

The internal implementation details aren't 100% obvious to me. You could nearly do it with SetFinalizer, except you need a way to ensure that this doesn't clash with any other finalizers used on the item. (For instance, the same value could be present in two weak maps...)

I'm not totally sure about the syntax, but I'm pretty sure that this wouldn't compile now under any possible circumstances, so it doesn't break any existing code.

Why not sync.WeakMap, parallel to sync.Map and/or sync.Pool? Because type safety is nice, and because the ability to use or not use the ", ok" form of retrieval seems desireable. Plain old maps are easier and friendlier to use, and less error-prone.

That said, sync.WeakMap, which would vaguely combine the behaviors of sync.Map and sync.Pool, would be viable too. The main issue this is intended to address is that "would like to keep this if it's convenient and/or if it's being used, but don't need to retain it if there's memory pressure" seems like a common use case.

@JAicewizard
Copy link

@JAicewizard JAicewizard commented Jan 10, 2021

I assume when iterating over a map the map would be "locked" in some way? To avoid async reads/writes to the map.

@seebs
Copy link
Contributor Author

@seebs seebs commented Jan 10, 2021

I would assume it'd be the same as any other map. Up to you to ensure that you are not modifying it while the iterator is running.

@JAicewizard
Copy link

@JAicewizard JAicewizard commented Jan 10, 2021

What about when the a weak reference gets removed by the runtime while iterating over the map?

@seebs
Copy link
Contributor Author

@seebs seebs commented Jan 10, 2021

Same thing that would happen with anything else involving pointers in a moving GC: It's up to the runtime to have suitable barriers on or around their accesses.

@ianlancetaylor ianlancetaylor added this to Incoming in Proposals Jan 11, 2021
@ianlancetaylor
Copy link
Contributor

@ianlancetaylor ianlancetaylor commented Jan 11, 2021

@dsnet
Copy link
Member

@dsnet dsnet commented Jan 11, 2021

Some thoughts:

  • A detriment of using a string as the key is that it probably requires an allocation forming a key to look something up that may not even exist anymore. I see some benefit to an allocation-free key like [16]byte or [32]byte.
  • How complete is a sync.WeakMap? Are entries allowed to be removed even if there may be references to it (i.e., similar to sync.Pool)?
@seebs
Copy link
Contributor Author

@seebs seebs commented Jan 11, 2021

You can use any key type you want. It's just a normal map. string is just an example.

No, entries can't be removed if there are references to them. If there's references to them, they still exist, and a weak reference should stay valid. The entire point is to have a reliable lookup for these objects if they still exist, without ensuring that they still exist.

@bcmills
Copy link
Member

@bcmills bcmills commented Jan 11, 2021

@seebs, how would weak references interact with finalizers (if at all)?

Specifically: is the weak reference removed from the map before the finalizer is run, or after, or is it forbidden to obtain a weak reference to an object that has a finalizer?

@seebs
Copy link
Contributor Author

@seebs seebs commented Jan 11, 2021

That is a fascinating question. On consideration: If we assume that nothing else ever users finalizers, finalizers might be a credible way to implement this, internally. Runtime can skip the storage for a weak map when marking, and set a finalizer for each item to delete it from the map, roughly. So one solution would be "forbidden to set a finalizer on weak references, or obtain a weak reference to an object with a finalizer". Except... that's horribly brittle, and might result in someone developing a time machine so they can go back to before it was done and add "never use this feature" to Effective Go.

But this is a really interestingly-difficult philosophical question. It is permissible for a finalizer to actually do a thing that prevents a thing from being collected. So if we remove things from the map first, the object will potentially still exist but we will report it missing (and thus, in most use cases, create a new one). If we run finalizers first, we can have objects in the map which have had their finalizers run, and which can thus end up referenced again post-finalizer. And you can't tell, from the perspective of the caller of the finalizer, whether that finalizer may have stashed the object somewhere so that it won't really be collected.

And it might be possible to fix this if finalizers had a way to indicate whether or not they were vetoing collection, but that could nonetheless be wrong...

I think the answer is: Finalizers run after deletion from the map, but if your finalizer preserves the object, it becomes possible that the object exists but the map doesn't refer to it. So don't do that. But not in the sense of "runtime is allowed to explode in a shower of demons", just "yes, so, that previous object is around, and the map doesn't refer to it but might now have a new entry for that key."

But philosophically, in general, a finalizer is supposed to run "after the object is dead and never referenced again", and it should not do something that would make the object get referenced again, it's just after-the-fact cleanup, and it seems fine for the object to no longer be in the map, and then get cleaned up, even if a new object for its former key ends up getting created.

(Although if you did, say, "directory named after the key" and the finalizer was "delete the directory" that would be a Bad Idea, and also your own fault, don't do that.)

@ingvarm-gr
Copy link

@ingvarm-gr ingvarm-gr commented Jan 12, 2021

From a purely philosophical view, is it proper to impose "at most one" on weak references to an object?

@rsc
Copy link
Contributor

@rsc rsc commented Jan 13, 2021

This is a major language and implementation change on the garbage collector side.
It is highly unlikely that we are going to add weak references any time soon.
That's an amazing amount of new complexity.

/cc @aclements @mknyszek

@seebs
Copy link
Contributor Author

@seebs seebs commented Jan 13, 2021

The "major change" thing is both why it's a hard sell to put it in runtime, and why it's impossible to put it somewhere else.

@aclements
Copy link
Member

@aclements aclements commented Jan 13, 2021

This is something I've thought about quite a bit in the past. My past thinking has led to a bit of a counter-proposal: don't allow iteration over weak maps. Right now, the garbage collector is invisible in the language semantics (other than finalizers, which are an unfortunate wart at best). The language specification is built on the illusion of infinite memory. If you have a map that allows lookups and not iteration, we maintain that illusion because the GC can delete unreachable keys from that map without any effect on program execution. It remains invisible. An un-iterable map still provides most functionality that people typically want from a weak map, including string interning and carrying side information for objects without forcing them to remain live. There is also exactly how ECMAScript's WeakMap works.

I would argue that this should at least wait on generics. At that point, we could provide this as a regular type with all the type-safety of the built-in map. I'm not sure where it should live; it's not about synchronization, so it seems odd to put it in sync.

Implementing the above for pointer keys is actually pretty easy. We'd mark the key storage as non-pointer data, use allocation specials to track the set of weak maps containing an object as a key, and delete keys from those weak maps during sweeping. Disallowing iteration simplifies a lot of questions around races with the GC.

Implementing it for non-pointer keys is harder, but probably necessary to make it useful. I think the map would have to find the pointers in a key when it's added and attach specials to each of those allocations. During tracing, if the GC reaches any of these objects, it would have to trace across to everything else referenced by that key as well. Object tracing doesn't currently have to look at specials, so I'm not sure how to do this without a performance penalty to every object scanned during GC.

@seebs
Copy link
Contributor Author

@seebs seebs commented Jan 13, 2021

That's a fascinating idea. My vague intuition is there might be cases where you'd want to iterate the map to do something about the data, but in all those cases, I think you'd also want it to be non-weak at that point.

But I think "unreachable keys" feels like a slightly different question; I was assuming ordinary keys, which might be strings or integers, and potentially-unreachable values. So for instance, with a weak map[string]*Node, if someone has associated a given string with a *Node in the map, then we want any attempt to look up that same string -- even at a different address and constructed differently -- to necessarily find that *Node, if it is still referenced anywhere. But if nothing references that *Node except for this map, then we no longer care; there's no way for anyone to detect that a new node we return isn't the one we had for that string before, in theory.

I guess the answer is that either keys or values could in principle be weak references.

@josharian
Copy link
Contributor

@josharian josharian commented Jan 13, 2021

Implementing it for non-pointer keys is harder

Could the implementation allocate for these and implicitly unbox when returning them? Then the GC could continue to deal only in pointers.

@ianlancetaylor
Copy link
Contributor

@ianlancetaylor ianlancetaylor commented Jan 13, 2021

A weak map has two types: a key type and a value type. Only the value type is weak. That is, if m is a weak map, then m.Set(k, v) stores v with key k. If m is the only reference to v, then the garbage collector is permitted to release v, after which m.Get(k) will indicate that there is no such key. Moreover, if v is released, then k can be released.

My point is that with those semantics I think it's fine to restrict the value type of a weak map to be a pointer. It's hard to even understand what it means to store a non-pointer in a weak map. What does it mean to say that there are no remaining references to a non-pointer value? Isn't that equivalent to saying that there are no pointers to the value?

Are people thinking of different semantics?

@josharian
Copy link
Contributor

@josharian josharian commented Jan 13, 2021

What does it mean to say that there are no remaining references to a non-pointer value? Isn't that equivalent to saying that there are no pointers to the value?

I was thinking of the value being of type (say) [2]*int. There are two semantics possible: (1) Each of the pointers is weak; they can be set to nil independently, when the int that they refer to is GC'd. (2) The entire object is GC'd or not; only when both ints referred to by the pointers are GC'd may both be set to nil.

That said, I'd be perfectly happy to restrict the value type of a weak map to be a pointer. I imagine it'd have the same restrictions as the first argument to runtime.SetFinalizer, which are even stronger than "must be a pointer".

@aclements
Copy link
Member

@aclements aclements commented Jan 13, 2021

It was pointed out to me that my point about non-pointer keys was really unclear and also wrong. I don't mean pointer-free keys; I'm not even sure what that would mean in a weak-keyed map (probably it would just act like a normal map). I mean keys that contain pointers, like string or struct {a, b *int}. However, in my post above I was thinking this was hard because you couldn't remove the key until all of the pointers were unreachable, but you can delete the key as soon as any of the pointers are unreachable (since you can no longer construct that key). That makes non-pointer keys not much harder to support than pointer-typed keys.

But I think "unreachable keys" feels like a slightly different question; I was assuming ordinary keys, which might be strings or integers, and potentially-unreachable values.

Ah, that is quite different from what I was talking about. I much less on-board with a weak-valued map than a weak-keyed (non-interable) map because it exposes the behavior and timing of the garbage collector, exactly the way finalizers do. Also, I understand concrete use cases of weak-keyed maps, but I don't actually understand concrete uses of weak-valued maps. Your example of a string-to-*Node map was rather abstract. Can you give a concrete example of what you would use that for?

Are people thinking of different semantics?

I was definitely thinking of different semantics, but maybe I'm the only one.

@seebs
Copy link
Contributor Author

@seebs seebs commented Jan 13, 2021

Well, hmm. You can't GC the int pointed to by one of the pointers unless you set that pointer to nil, because otherwise you'd have an easy way to get an invalid pointer unexpectedly. I think my intuition is that as long as the [2]*int exists, it is in fact a valid reference and the things it points to aren't collectible, but if nothing refers to it, then it can be collected, at which point the things it points to are no longer referenced (by it, anyway). But that implies that the value type really ought to be *[2]*int, because otherwise it's not really meaningful to talk about collecting it.

I am definitely leaning towards Ian's interpretation; the keys are not "weak" in any way, they're just keys, if they contain pointers they are references to them. The only difference from a regular map is that, if the values are pointers, they don't count as references, but if the thing they point to gets GCd, the key/value pair is deleted from the map when that happens.

So, a concrete example of utility: Interning strings. You want to have a lot of objects which should contain a string, but you don't want to have multiple different allocations for the same string, and you want comparisons of them for sameness to be cheap. So...

type Intern string
var interned weak map[string]*Intern
var internMu sync.Mutex

// intern a string:
func intern(k string) *Intern {
  internMu.Lock()
  defer internMu.Unlock()
  if i, ok := interned[k]; ok { return i }
  internMe := Intern(k)
  interned[k] = &internMe
  return internMe
}

The result of calling intern(foo) for any given string is a *Intern, with the convenient guarantee that anyone else who interned the same string got, not just a pointer to an equal Intern, but exactly the same pointer. You can use these as map keys, for instance, and get single-word compares instead of string comparisons. If you have a million objects that were created using slices of different strings, and all the slices compared equal, you have a single string that all of them intern to, and the things those were sliced out of aren't being pinned. So you get much better performance than you would using the original string object(s).

So why not just use a normal map? Because if the set of strings is largeish, and changes over time, and sometimes a string will stop being used, a non-weak map[string]*Intern will eventually grow without bound. You can't safely remove anything from it because you can't prove that nothing's still using that value, and having two different *Intern that actually refer to identical strings would break using them as map keys, etcetera.

With the weak map, you can discard any values that are no longer referenced anywhere, and you still have the guarantee that no two users who try to intern the same string will ever see different interned values.

This can also be used for some kinds of caching. For instance, say you were doing a naive cache of fetched URLs. If you do it as weak map[url]*body, then if you fetch a URL, and dump it in the map, another fetch while that body is still around can just grab the copy in the weak-reference map, avoiding re-fetching the body that you already have, but if no one is using the body, it can be dropped silently from the cache. (I am aware that this wouldn't be a great choice for a lot of use cases, for a number of reasons. The first hard problem is called cache invalidation.)

So basically, the use case is: (1) it would make sense to have a map with these things as values, (2) we definitely don't want to throw them away if anyone's using them (3) it's fine to throw them away if no one's using them (4) it would be annoying to try to track who might be using them.

I note that "not iterable" seems fine to me for a weak-valued map as well. It's not really useful to iterate it, generally.

I'm not getting a clear picture of how a weak-keyed map would be used. Like, for interning, what would I be storing as keys? The addresses of strings can't be right, because the whole point is that I want to get the same thing from the map given two distinct strings which happen to compare equal.

@aclements
Copy link
Member

@aclements aclements commented Jan 14, 2021

Okay, I think I understand your interning example (I certainly understand the code and intent; not sure I've completely wrapped my head around the implications :). And I think interning like this does require a structure that exposes GC behavior, since it has to distinguish whether objects still exist. So weak-keyed maps wouldn't work for interning.

But my basic concern with a weak-valued map is that its semantics are deeply tied to GC behavior. It can't be described without explaining what it means for things to become unreachable. Finalizers are the only other thing in the language like that right now, and those are an unfortunate and necessary evil. In my opinion, the bar for adding another is extremely high. (sync.Pool's semantics don't expose GC behavior. It can be described simply as a lossy set. GC behavior is key to Pool's performance characteristics, but that's true for a lot of things.)

Also, I believe weak-valued maps are ultimately no different from weak references, and I'm not sure why we would add weak-valued maps rather than just weak references. You can build a weak reference from a weak map using a single-entry map, and you can build a weak map from a weak reference by using weak references for the values (plus either finalizers or phantom reference queues, depending on the details of the weak reference API).

My other concern is implementation. First, a weak-valued map (or any form of weak reference) cracks open the assumptions Go's current write barrier is based on. It might be possible to work around that by synchronizing carefully with the garbage collector during the "Get" and "Put" operations, but I'd have to work out those details. It's particularly hairy if an unreachable value is actively being swept during a Get. Sweeping would probably have to lock the map. (Weak-keyed maps have none of these problems.)

Second, weak-valued maps have the problem I was (incorrectly) getting at above for values containing multiple pointers, like [2]*int. It doesn't make sense to collect the map entry until all of the pointers are unreachable simultaneously, but I don't know how to do that without slowing down the GC on everything. Maybe we would only allow pointer values, but does that restrict the utility of a weak-valued map? I don't know.

@seebs
Copy link
Contributor Author

@seebs seebs commented Jan 14, 2021

So, as a user, I mostly want a thing like this in a case where I don't care whether the object never existed or used to exist but got GCd, I just don't want to make an object if it already exists, but I don't want to pin a zillion objects that I might never use again. So, it sort of exposes a GC behavior, but in a very loose sense; nothing prevents the implementation from continuing to hold things for a long time. (Strictly speaking, existing maps already satisfy the expected usage behavior, in that they meet the constraint that the retrieved pointer is valid as long as the object hasn't been collected, it's just that it always turns out that the object hasn't been collected because the map counts as a reference to it. But the GC would be allowed to never get around to collecting objects anyway...)

The reason I wrote this proposal with a map rather than with references is sort of glossed over in the initial proposal. Basically:

(1) I can't think of a real use case where I want a weak-reference, and I don't want to be able to look it up by some kind of key.
(2) Weak references are naturally well suited to the , ok idiom of maps.

And then, thinking about it, it occurred to me that a lot of the things that make weak references seem like an annoying mess to track are potentially significantly mitigated by grouping them all together. If all the weak references are being kept in one object, then we have one object/region that needs special treatment. If you have a million structs each of which has a weak reference to something, you have to do whatever your special processing is for every one of those weak references, separately from whatever you do for pointers in those structs. If you have a weak map, you can do your special processing for everything in the map and nothing not in the map, and that seems like it should be simpler.

I think a weak-valued map only makes sense if the values in it are themselves pointers to things. Maps aren't addressable; nothing can be pointing-to the map entry, so the map entry can't itself be collectable or pinned. So I think it's only meaningful for a map to be weak when the value type is a pointer type itself.

My vague intuition, which I haven't fully thought through, is that if a weak-valued map didn't mark the things it contains, but did follow pointers within them and mark those, and it were prohibited to use SetFinalizer on things that live in a weak-valued map, and things lived in only one such map, it would be adequate for the runtime to SetFinalizer() on them to delete them from the map. Except, as you note, the sync issue.

I'm still not quite able to make sense of a weak-keyed map. I can describe it but I'm not sure what I'd use it for. I guess it would be a more implicit usage. With the weak-valued map, I have a key and I want to know whether there's an associated value. With the weak-keyed map, if I have a key, I know the value can't have been deleted.

@josharian
Copy link
Contributor

@josharian josharian commented Jan 14, 2021

I can't think of a real use case where I want a weak-reference, and I don't want to be able to look it up by some kind of key.

“Looking up by key” and “stored in a field alongside other data” often end up being interchangeable. Imagine an expensive immutable cached result. You store it with a weak ref alongside the raw data and return it. If no one is using it any more, it returns to nil, and you recalculate again later.

At the end of this conversation, I don’t see the advantage that maps have over plain old weak refs, except possibly in syntax. But with generics, we could have a runtime.WeakRef[T] type.

@ianlancetaylor
Copy link
Contributor

@ianlancetaylor ianlancetaylor commented Jan 14, 2021

Weak references are basically isomorphic to finalizers. It is possible to implement weak references using finalizers, it's just that the result is awkward to use because you can't use ordinary pointers. So if you don't like finalizers, you won't like weak references.

@josharian
Copy link
Contributor

@josharian josharian commented Jan 14, 2021

It is possible to implement weak references using finalizers, it's just that the result is awkward to use because you can't use ordinary pointers.

That was not my experience. Please see the discussion at https://github.com/go4org/intern/blob/3eb7198706b254cddea46d066da7316b2620ba7a/intern.go#L169. I’d love to be wrong about this, but the conclusion I came to was that weak refs are not implementable using finalizers.

@josharian
Copy link
Contributor

@josharian josharian commented Jan 14, 2021

@bcmills made a similar point on Gopher Slack, so cc’ing him here.

@ianlancetaylor
Copy link
Contributor

@ianlancetaylor ianlancetaylor commented Jan 14, 2021

@josharian I sketched it out at #41303 (comment). The problem is that everybody has to use Strong pointers, they can't use ordinary pointers. And they have to be careful about using runtime.KeepAlive.

@ianlancetaylor
Copy link
Contributor

@ianlancetaylor ianlancetaylor commented Jan 14, 2021

Oh, I see you linked there already. Perhaps my sketch is not useful for the intern package, but I claim that it does indeed implement weak references.

@josharian
Copy link
Contributor

@josharian josharian commented Jan 14, 2021

it does indeed implement weak references.

Sorry. You're right. However, there are limitations on how those weak pointers can be used. And those limitations make them not usable in the sole case to date in which I have wanted weak pointers.

Weak references are basically isomorphic to finalizers.

I guess this is the statement that I disagree with, because there are tasks that you can accomplish with weak refs that you can't accomplish with finalizers.

@ianlancetaylor
Copy link
Contributor

@ianlancetaylor ianlancetaylor commented Jan 14, 2021

Fair enough. Let me put it this way: any trouble you can get into with finalizers, you can also get into that trouble with weak references. (Except crashing in the finalizer function or something like that.)

For example, cyclical references don't work with finalizers, and they won't work with weak references either. Also, the compiler can cause a strong reference to die at times when people don't expect, and that can cause the weak reference to stop working.

@bcmills
Copy link
Member

@bcmills bcmills commented Jan 14, 2021

If I understand #41303 (comment) correctly, it does not implement what I would consider to be “weak references”.

In particular, I expect the following properties:

  • A weak reference can be obtained from any strong reference.
  • A strong reference can be obtained from any weak reference if the underlying object still exists.
  • A strong reference obtained from any weak reference to an object keeps all weak references to that object alive.

As far as I can tell, the API in that sketch does not satisfy the third property.

@bcmills
Copy link
Member

@bcmills bcmills commented Jan 14, 2021

For example, cyclical references don't work with finalizers, and they won't work with weak references either.

I don't see why that would be the case. Could you elaborate?

In reference-counted languages, weak references are sometimes used to explicitly break strong-reference cycles.

@ianlancetaylor
Copy link
Contributor

@ianlancetaylor ianlancetaylor commented Jan 14, 2021

Sorry, I meant specifically for Go.

@bcmills
Copy link
Member

@bcmills bcmills commented Jan 15, 2021

I don't see why that would be the case for a native, runtime-supported implementation of weak references in Go either.

If you implement weak references in terms of finalizers, then it is certainly true that they share all the same drawbacks. However, that is a conclusion that follows from the assumption that the two are equivalent — it is not evidence for their intrinsic equivalence.

@ianlancetaylor
Copy link
Contributor

@ianlancetaylor ianlancetaylor commented Jan 15, 2021

In a garbage collected language that doesn't use reference counting, what can you do with weak references that you can't do with finalizers? And vice-versa?

@bcmills
Copy link
Member

@bcmills bcmills commented Jan 15, 2021

With weak references but not finalizers, you can have structs with weakly-referenced fields without rewriting all transitive references to the values to use a “strong reference” wrapper type.

With weak references but not finalizers, you can implement interning tables with weak values, but you have to manually prune the stale keys as values are collected.

With weak references plus finalizers, you can implement interning tables with weak values with automatic pruning of stale keys.

That is not to say that weak references are essential, but they are not redundant with finalizers.

@dsnet
Copy link
Member

@dsnet dsnet commented Jan 15, 2021

Maybe I'm lost in the discussion, but what's the exact relationship between "weak references versus finalizers" and what the issue was originally about, which is weak references maps? Are we suggesting we should provide weak references instead of weak reference maps or are we talking about how a weak reference map would be implemented?

It seems that that weak references alone is insufficient since a key feature of a weak reference map is that the garbage collector takes care of deleting entries that are dead. I'm not sure I know how to implement a weak reference map with weak references alone.

@bcmills
Copy link
Member

@bcmills bcmills commented Jan 15, 2021

@dsnet

Given that Go already has finalizers, I claim that we could enable users to implement weak-reference maps by adding only ordinary weak references to the language.

That seems simpler to me, because the semantics of ordinary weak references seem straightforward (just the invariants described in #43615 (comment)), in comparison to the confusion and ambiguity of “weak keys” vs. “weak values”.

@ianlancetaylor
Copy link
Contributor

@ianlancetaylor ianlancetaylor commented Jan 15, 2021

I don't think this addresses the point I was trying to make but as a practical matter I think it's unimportant.

@aclements
Copy link
Member

@aclements aclements commented Jan 15, 2021

For example, cyclical references don't work with finalizers, and they won't work with weak references either.

@ianlancetaylor, I don't believe this is correct. Cycles through a weak reference can be collected. (There are other forms of finalizers that can also deal with cycles, such as phantoms.)

Maybe I'm lost in the discussion, but what's the exact relationship between "weak references versus finalizers" and what the issue was originally about, which is weak references maps?

@dsnet, that was partly my fault. I brought up weak references vs weak maps here.

There are many variations on weak references. One variation is in how you tell an application that the weak reference has been collected. The runtime can just nil out the reference and leave the application to discover that later, or it can provide a callback or a "reference queue" that lets the user do finalization-style things. Usually when used with weak references, the callback or queue doesn't provide the original object, but some other bit of information you associated with it, for example, the map and key it was stored under. Doing this with weak references and callbacks/queues is nice because it's clear what happens when there are multiple weak references. While there's nothing impossible about having multiple finalizers for an object, we disallow it because it gets messy quickly (what order do they run in? if one of them revives the object, do you run the rest? how do you remove a finalizer?).

One reasons Go finalizers are so tricky in the runtime is that they provide the original object to the callback, which means the runtime has to revive the object just to run the callback. This is what makes objects with finalizers in cycles un-collectable, delays reclaiming of memory for finalized objects, and means that a chain of N objects with finalizers takes N GC cycles to clean up. If weak references had a callback, but didn't provide the original object, they don't have these problems.

@seebs
Copy link
Contributor Author

@seebs seebs commented Jan 15, 2021

Part of the appeal, to me, of the map is that you don't have to think about a specific reference. If you have a reference, it's a strong reference. The weak-reference nature is restricted to grabbing things from a map, and that already has the , ok behavior.

@RLH
Copy link
Contributor

@RLH RLH commented Jan 17, 2021

@dsnet
Copy link
Member

@dsnet dsnet commented Mar 29, 2021

I was looking into a memory leak that stemmed from dynamically created types not being garbage collected (#28783). In that issue, @randall77 comments:

Yep. This is a hard one to fix.
We depend on type uniqueness, so we need to keep track of all the types we've currently made.
To GC the type, we need some way of telling reflect we've done so, so it can remove that type from its list of created types. Probably something akin to Java's weak references.
All the interface tables are allocated in persistent memory, not on the heap, so that would need to change also.
The compiler has a bunch of assumptions about types and interface tables never being GCd (e.g. the first word of an interface isn't a pointer that GC needs to traverse), that would need finding and fixing also.

I don't know much about the implementation of reflect, but there might be overlap in fixing #28783 with this issue.

Even if dynamically constructed types could be garbage collected, we still have the problem where some libraries (e.g., encoding/json, google.golang.org/protobuf, and others) have a global sync.Map that acts like a cache keyed on reflect.Type to some expensively computed data structure. The use of a global sync.Map or even a map[reflect.Type]V would keep dynamically created types alive forever (even if #28783 were fixed) unless there was some weak reference map that allowed for the removal of certain reflect.Type entries when they are GC'd.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
Proposals
Incoming
Linked pull requests

Successfully merging a pull request may close this issue.

None yet