Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve the Promise API documentation #11473

Closed
huntc opened this Issue Apr 5, 2019 · 25 comments

Comments

Projects
None yet
7 participants
@huntc
Copy link

commented Apr 5, 2019

Having finally understood that a Promise should nearly always be completed to avoid memory retention, I feel that it would be useful to have its API documentation mention this.

I propose the following wording:

A Promise should always be completed, whether for success or failure. Failing to complete the promise will likely lead to memory be retained for it.

cc @viktorklang

@jroper

This comment has been minimized.

Copy link

commented Apr 5, 2019

I'm not sure if memory is the right word - it's not usually memory that is the problem (or rather, memory is an inconsequential side effect of the leak), most commonly not completing a promise causes a leak, it will be because a client of the API that returned the future is depending on it completing to clean up some other resource, such as close a connection or return it to pool, or shutdown a thread pool, etc. I'd word it as:

If a Promise's Future is passed across an API boundary, the promise should always be completed. Failing to do so could result in resource leaks in cases where a client of the API relies on the Future's completion to clean up resources.

@SethTisue SethTisue added this to the Backlog milestone Apr 5, 2019

@eed3si9n

This comment has been minimized.

Copy link
Member

commented Apr 5, 2019

@huntc Would you like to send a PR for this, if you're happy with @jroper's suggestion?

@viktorklang

This comment has been minimized.

Copy link

commented Apr 5, 2019

@huntc @jroper

A Promise should always be completed, whether for success or failure. Failing to complete the promise will likely lead to memory be retained for it.

I think about it something like this:

Having a reference to a Promise represents an obligation to complete it with a value. This obligation can be shared or transfered by passing the Promise instance to other code. Failing to complete a Promise when having the obligation to do so may lead to resource leaks if its corresponding Future has been shared with other logic.

@huntc

This comment has been minimized.

Copy link
Author

commented Apr 5, 2019

Please note the motivation for this issue was because we reasoned that memory would be retained by the EC.

While the other wording here may be more accurate, I think mine cuts to the chase. The goal is to ensure that the programmer always completes a promise. Completing a Promise is benign even if it doesn’t technically need to be completed (which is rare).

@eed3si9n

This comment has been minimized.

Copy link
Member

commented Apr 5, 2019

@huntc In other words, your concern is server-side (Promiser) leakage of resource over time, not what happens to the client-side Futures?

@huntc

This comment has been minimized.

Copy link
Author

commented Apr 5, 2019

@eed3si9n Correct.

@viktorklang

This comment has been minimized.

Copy link

commented Apr 6, 2019

@huntc I'm not sure I follow then—what resource leakage is there if there are not callbacks registered and it never gets completed?

@viktorklang

This comment has been minimized.

Copy link

commented Apr 6, 2019

@huntc A Promise does not reference any ECs. Only callbacks added to its Future has a reference to any EC.

@longshorej

This comment has been minimized.

Copy link

commented Apr 6, 2019

As far as I understand, if a callback is registered on the Future, even if that callback doesn’t reference anything, the ec will hold a reference until the Promise is completed. Failure to complete it will leak memory.

Is that accurate?

@viktorklang

This comment has been minimized.

Copy link

commented Apr 6, 2019

@longshorej No—the underlying Promise of the Future stores the callbacks along with what EC to submit those callbacks to once the Promise becomes completed. So if all references to the Promise are dropped, all the callbacks are reclaimed by the GC. (Since the Promise can no longer be completed).

Does that make sense?

@longshorej

This comment has been minimized.

Copy link

commented Apr 6, 2019

Thanks @viktorklang - that helps. I have a follow-up question though -- those callbacks are only reclaimed if all references to both the Promise and associated Future are dropped, right? For instance,

def stuff() = {
  val p = Promise[Unit]
  Source.fromFuture(p.future).runWith(Sink.ignore)
}

stuff()

In this scenario, p can no longer be completed, but is it true that it'll never be reclaimed by GC?

@viktorklang

This comment has been minimized.

Copy link

commented Apr 6, 2019

@longshorej Yes, most of the time the Promise secretly IS-A Future, so it returns itself but as another type (Future). In this case the Source.fromFuture will register a callback on that Future, which means that it will be kept in memory until it completes. For your example, you'd want to put a timeout on the stream so that it does not run forever.

@huntc

This comment has been minimized.

Copy link
Author

commented Apr 7, 2019

The promise completion code often cannot make assumptions re how its associated future is consumed. Here’s a slight amendment to my original suggestion:

A Promise should always be completed, whether for success or failure. Failing to complete the promise will likely lead to memory being retained for its associated Future.

@viktorklang

This comment has been minimized.

Copy link

commented Apr 7, 2019

@huntc

This comment has been minimized.

Copy link
Author

commented Apr 7, 2019

But in principle O find that it is easier to be dogmatic about completion always having to be performed.

I’m unsure about the “O find” typo... are you agreeing with my last suggestion? Thanks.

@viktorklang

This comment has been minimized.

Copy link

commented Apr 7, 2019

@jroper

This comment has been minimized.

Copy link

commented Apr 7, 2019

To put it in more concrete terms:

Promise[Unit]().future.map(() => Unit)

Does not under any circumstance leak memory. Promises do not leak references to themselves anywhere, nor do they hold any non gc-able state, and so they do not leak memory as the change suggested implies.

However, the following leaks memory (in the form of a connection):

val is = new SocketInputStream("localhost", 80)
Promise[Unit]().future.map(() => is.close())

Since the promise created was never completed, and I as the client of the promise future attached the cleanup of a resource that is completely unrelated to the promise to the completion of the promise, it leaks. But that leak has nothing to do with the promise itself, it's only that I depended on its completion to close the connection. The leak comes from the implied contract between consumers of futures, and creators of futures, not from any specific resources that promises hold (because they don't hold any).

Bottom line, uncompleted promises by themselves in no way leak memory.

@longshorej

This comment has been minimized.

Copy link

commented Apr 7, 2019

I'm not sure I follow @jroper - Is not the Promise itself an allocation that needs to be freed? For example:

import scala.concurrent._
import scala.concurrent.duration._
import scala.concurrent.ExecutionContext.Implicits.global

object App {
  def main(args: Array[String]): Unit = {
    for (n <- 0 until Int.MaxValue) {
      Await.result(run(), Duration.Inf)
    }
  }

  def run(): Future[Unit] = {
    val done = Promise[Unit]
    val leak = Promise[Unit]

    for {
      _ <- Future { done.success(()) }
      _ <- leak.future
    } yield ()

    done.future
  }
}

This will OOM on me after some time given a small heap, whereas if the leak promise is completed when the done one is, it doesn't appear to ever OOM.

I think the bigger point is that we don't know how the Promise/Future will be used, and therefore not completing a Promise can make it easy to leak memory. I agree with the earlier sentiment about being dogmatic about completion being required, even if there's one rare case where it isn't.

huntc added a commit to huntc/scala that referenced this issue Apr 8, 2019

@jroper

This comment has been minimized.

Copy link

commented Apr 8, 2019

The only reason why that will leak is because the for loop in main is creating a Seq that holds a reference to every future you created, ie, you're creating an array of infinite length, because that for loop desugars to:

val result: Seq[Future[Unit]] = (n <- 0 until Int.MaxValue).map {
  Await.result(run(), Duration.Inf)
}

Replace Await.result(run(), Duration.Inf) with 1, and you'll run out of memory too.

If you replace the for loop with a while loop, you won't run out of memory, you'll just have an infinite loop:

while(true) {
  Await.result(run(), Duration.Inf)
}

Futures don't hold any resources, in memory, they are not much more than just a single field that gets CASed on by every operation - before its redeemed, that field holds a seq of callbacks (and corresponding execution contexts) to run when it gets completed and all the map/filter/flatMap etc operations just add another callback to that seq. When it's redeemed, the seq of callbacks is swapped (via CAS) with the result (either success or failure), and then every callback found is executed (on their corresponding execution contexts). After that point, any calls to map/filter/flatMap see that the field contains the result, and just invokes the callbacks immediately.

If you look at the code, it's a little more complex, there's some optimisations in there to handle very large chains of flatMapped futures (ie in infinite streams) that we had when we used them with Play's old iteratees API. But it's not much different to what I described above.

There's no actor associated with futures (unlike streams), nothing holds a register of them, they don't create any threads. There's nothing, they're just a very ordinary data structure, like a case class, but their primary field is interacted with using a CAS with semantics around completion vs non completion state which determine the current type of that field.

@longshorej

This comment has been minimized.

Copy link

commented Apr 8, 2019

I'm not sure that we're classifying a memory leak as the same thing -- you say a Future doesn't hold any resources, it's a field that holds a seq of callbacks. Do you consider that field itself a resource?

Also, note there's no yield in my original example, so it's unclear to me how it desugars to the code you supplied. scala -Xprint:parser src/main/scala/App.scala indicates the following:

def main(args: Array[String]): Unit = 0.until(Int.MaxValue).foreach(((n) => Await.result(run(), Duration.Inf)))

As suggested, I also did try replacing Await.result(run(), Duration.Inf) with 1 and my program terminated successfully in a few seconds.

I've simplified my original example. This program fails with an OOME after ~14 minutes on my machine (given a 16MB heap):

import scala.concurrent._
import scala.concurrent.ExecutionContext.Implicits.global

object App {
  def main(args: Array[String]): Unit = {
    while (true) {
      run()
    }
  }

  def run(): Unit = {
    val leak = Promise[Unit]

    for {
      _ <- Future.successful(())
      _ <- leak.future
    } yield ()
  }
}
@huntc

This comment has been minimized.

Copy link
Author

commented Apr 8, 2019

Hi everyone,

I think we need to move on. The PR is in and the wording should help programmers avoid the problem that we observe.

Thanks.

@jroper

This comment has been minimized.

Copy link

commented Apr 9, 2019

The other possible reason why it might run out of memory is that on each iteration, you are submitting not one but multiple callbacks to an executor to run. It's possible that your main loop (which is very tight) is submitting tasks to the executor faster than they can be executed, causing the queue to slowly grow over time. While the result of that is running out of memory, it's not a memory leak, if you stop your outer loop, the queue will drain of its own accord without any further intervention, which means nothing has leaked. And whether the promise is completed or not has no bearing on it.

Fields do not inherently leak memory. Allocating any object infinitely and holding a reference to each one will cause an out of memory error, but that doesn't make it a memory leak, that's just attempting to allocate more memory than your machine has. A memory leak occurs when you allocate an object, drop all references to it, but then when GC occurs, that objects memory for some reason is unable to be reclaimed, eg because that object may have placed itself in some statically referenced cache, or because it started a thread that references it and so keeps it alive, etc. Promises don't do that, there's no opportunity for a leak in them. They do pass a reference to themselves to an executor when (and only when) there is something to execute, but the executor will execute that immediately (unless exhausted, as is probably happening here, but that's not the promises problem, and has nothing to do with not completing promises) and once executed the reference to the promise is dropped.

@jroper

This comment has been minimized.

Copy link

commented Apr 9, 2019

Also consider, if there was just a single byte leaked per iteration in your code above, then how long would it take for it to cause an out of memory error on a 16mb heap? approx 16 million iterations. But that loop probably iterated billions of times over the course of 14 minutes. That can't possibly have been a memory leak, I'm betting on queue lag as the cause.

@Jasper-M

This comment has been minimized.

Copy link
Member

commented Apr 9, 2019

Does the following also run out of memory in a loop?

def run(): Unit = {
  val leak = Promise[Unit]
  val future = leak.future

  for {
    _ <- Future.successful(())
    _ <- future
  } yield ()
}
@viktorklang

This comment has been minimized.

Copy link

commented Apr 9, 2019

The reason for talking about "leaks" is because if the reference to the Promise is dropped, but someone still has a reference to the Future, it is essentially a leak, since it will retain memory beyond the point where it could be freed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.