Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

VectorBuilder/Vector has a memory leak #10922

Closed
mkeskells opened this issue Jun 4, 2018 · 5 comments
Closed

VectorBuilder/Vector has a memory leak #10922

mkeskells opened this issue Jun 4, 2018 · 5 comments

Comments

@mkeskells
Copy link

vector builder shared the underlying array with the built vector

def result(): Vector[A] = {
    val size = this.size
    if (size == 0)
      return Vector.empty
    val s = new Vector[A](0, size, 0) // should focus front or back?
    s.initFrom(this)
    if (depth > 1) s.gotoPos(0, size - 1) // we're currently focused to size - 1, not size!
    s
  }

so if I add to the builder after a vector is built, that built vector refers to this item so the new item cannot be GCed until the vectorBuilder, and all vectors are GCed

@julienrf julienrf added this to the 2.13.0-M5 milestone Jun 4, 2018
@retronym
Copy link
Member

retronym commented Jun 5, 2018

Arguably a ReusableBuilder could have undefined behaviour if you retain the reference to it but fail to call clear().

ReusableBuilder is a marker trait that indicates that a Builder
can be reused to build more than one instance of a collection. In
particular, calling result followed by clear will produce a
collection and reset the builder to begin building a new collection
of the same type.

In the case of VectorBuilder I can't see any advantage to leaving its internals un-nulled on return from result. I guess there is a tiny cost of nulling the fields which is wasted in the common case when the builder itself is immediately released.

There is one clear (ha!) bug though: display{1, ...} are not nulled in clear:

scala> val v = collection.immutable.Vector.newBuilder[Any]
v: scala.collection.mutable.Builder[Any,scala.collection.immutable.Vector[Any]] = scala.collection.immutable.VectorBuilder@79e66b2f

scala> for (i <- 0 to 16384) v += i

scala> val r = v.result; v.clear()
r: scala.collection.immutable.Vector[Any] = Vector(0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125, 126, 127, 128, 129, 130, 131, 132, 133, 134, 135, 136, 137, 138, 139, 140, 141, 142, 143, 144, 145, 146, 147, 148, 149, 150, 151, 152, 153, 154, 155, 156, 157, 158, 159, 160, 161, 162, 163, 164, 165, 166, 167, 168, 169, 170, 1...

scala> val f = v.getClass.getDeclaredField("display1"); f.setAccessible(true); f.get(v)
f: java.lang.reflect.Field = private java.lang.Object[] scala.collection.immutable.VectorBuilder.display1
res4: Object = Array(Array(16384, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null), null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null)

scala> val f = v.getClass.getDeclaredField("display2"); f.setAccessible(true); f.get(v)
f: java.lang.reflect.Field = private java.lang.Object[] scala.collection.immutable.VectorBuilder.display2
res6: Object = Array(Array(Array(0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31), Array(32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63), Array(64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95), Array(96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125, 126, 127), Array(128, 129, 130, 131, 132, 133, 134, 135, 136, 137, 138, 139, 140, 141, 142, 143, 144, 145, 146, 147, ...

@mkeskells
Copy link
Author

Arguably a ReusableBuilder could have undefined behaviour if you retain the reference to it but fail to call clear().

@retronym I am not concerned with the builder retaining a reference to an unbuild result - that is part of the behaviour of a builder, its that a pre-build result retains a reference to later added value

 val builder = Vector.newBuilder[String]
builder += "1"
builder += "2"
builder.result()

scala> res2: scala.collection.immutable.Vector[String] = Vector(1, 2)

scala> builder += "this should not be visible"

scala> res2

res4: scala.collection.immutable.Vector[String] = Vector(1, 2)

scala> res2.getClass.getDeclaredField("display0")

res9: java.lang.reflect.Field = private java.lang.Object[] scala.collection.immutable.Vector.display0

scala> res9.setAccessible(true)

scala> res9.get(res2)
res11: Object = Array(1, 2, this should not be visible, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null)

@szeiger
Copy link
Member

szeiger commented Jun 5, 2018

You are relying on undefined behavior. A ReusableBuilder allows reuse by calling clear() after result(). As pointed out in the doc comment, further guarantees (in particular, calling anything other than clear() after result()) are not part of the standard contract:

  *  It is up to subclasses to implement this behavior, and to document any
  *  other behavior that varies from standard `ReusableBuilder` usage
  *  (e.g. operations being well-defined after a call to `result`, or allowing
  *  multiple calls to result to obtain different snapshots of a collection under
  *  construction).

VectorBuilder makes no such guarantees (neither in 2.12.x nor in 2.13.x). If we wish to extend the contract, IMHO the current behavior is the desirable one (but would need to be documented). Vector uses structural sharing, so if you take multiple snapshots from a VectorBuilder it makes sense that they they should share structure rather than copying all the data.

@mkeskells
Copy link
Author

@szeiger
the documentation on ReusableBuilder is out of step with its implementations, at best, e.g. ListBuffer for example has specific code to cope with the reuse pattern that I described, addition then build then further addition, so I do not believe that your statement is observed in common usage, but the documentation does not refer to this extended behaviour. Looking through the subclasses that implement this trait I cannot see any implementation that document such behaviour

Your response regarding VectorBuilder on structural sharing is however contradictory. How can you achieve any structural sharing if you are required to call clear after build. Specifically you could not even call build twice in succession, for the trivial case, and you cant add to a built data, so what is ever possibly sharable?

@szeiger
Copy link
Member

szeiger commented Jun 7, 2018

The documentation in ReusableBuilder covers the guarantees that every ReusableBuilder must make, which is allow reuse after calling clear() after result(). This is all you can rely on in general. It does not mean that subclasses cannot allow other patterns as well.

As you observed, some already do. In particular, ListBuffer.result() simply calls toList, so any interleaving of Buffer operations should be valid. We need to document the cases where this is intentional and can be relied on.

Your response regarding VectorBuilder on structural sharing is however contradictory. How can you achieve any structural sharing if you are required to call clear after build

I was referring to the non-standard use without clear(). VectorBuilder looks like a good candidate for allowing (and documenting) this pattern. There may be cases where it makes sense to hold on to some data even after clear() (see #10923). I haven't looked closely enough at the VectorBuilder implementation to check if this would be the case.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants