Support disposal of Lwt_pool.t elements #483

hcarty · 2017-10-14T05:48:30Z

An initial attempt at solving #480. Also tried to catch occurrences of thread in the doc comments and replace them with promise.

EDIT by @aantron: resolves #480.

hcarty · 2017-10-14T07:18:30Z

This implementation of clear does not invalidate pool elements which are currently in use once they have been released. I'm not sure if that's ok. I'd like to have clear invalidate all pool elements. One way to do that would be to wrap each element in a record with a mutable disposed field. If the field is set appropriately the element would be considered invalid without any further checks.

I'll hopefully have a chance early next week to update this PR with that improved functionality unless someone disagrees with the approach.

bobbypriam

Thanks @hcarty! I for one wishes this change to be merged. Left a note on the docs though 🙂

bobbypriam · 2017-10-14T08:01:55Z

src/core/lwt_pool.mli

+      to the optional function [check] as [check element callback].  [check]
+      must call [callback] exactly once with [true] if [element] is still valid
+      and [false] otherwise.  If [check] calls [callback false] then [dispose]
+      will be run on [element].


I'm a bit confused on when dispose is called. At a glance, it seems that both validate and check will call dispose. Will dispose be called twice? If not, how is the order of calling done?

Also this:

If a call to {!use} fails with a pool element

As my comment in the issue, I think it's unclear why a use would fail. Does "fail" equals "trying to use an invalid pool element" (due to validate)?

I believe "fail" means raising an exception or rejecting the 'b Lwt.t promise that use's callback returns. It should definitely be clarified in the comment, because I also am guessing about what it means.

Ah, that actually makes sense. I think it's the whole validate and invalid elements on use paragraph just above it that confused me, I thought they are still talking about the same (or at least related) thing.

I tried to tweak the wording at bit - does this make the use of each argument any clearer? Suggestions are welcome - the wording has been tricky to get right.

The new docs are pretty great!

aantron · 2017-10-14T12:18:42Z

This implementation of clear does not invalidate pool elements which are currently in use

@bobbypriambodo Can you comment on this? What should clear do for pool elements that are currently in use?

bobbypriam · 2017-10-14T15:39:07Z

What should clear do for pool elements that are currently in use?

Intuitively for me clear should dispose all elements. However, I'm not sure what the best way is to achieve it.

The way node-pool achieve this is to first call drain() which will make sure that new acquire/use calls will be disallowed when the state is "draining" while letting remaining in-use elements to finish, and resolves when all elements are returned to the pool, at which we could call clear() safely. It also hits me that when we call clear, we might want to prevent new creations of elements, or else there might be some race conditions involved.

Or, another approach that I thought of is that to leave it as it is, and let the user bear the responsibility of making sure that the pool is definitely not used anymore before clearing. In case of a pool of DB connections that I have in mind, clear would be called when the app is shutting down, and it makes sense to first make sure that the app is not accessed before shutting it down (e.g. putting it out of load balancers, mark the instance unhealthy, etc.) to prevent end-users receive all kinds of errors.

aantron · 2017-10-14T20:05:58Z

@bobbypriambodo thanks for your thoughts!

I guess that goes back to the question of whether we want a minimalistic Lwt_pool that is "agnostic" to different ways to address this, or whether there is a common behavior (which I don't know) for this aspect that users tend to expect, and we should implement that behavior.

hcarty · 2017-10-14T21:42:47Z

I agree with @bobbypriambodo - I think it's reasonable to expect clear to invalidate and dispose of any existing pool elements once they are released.

I don't want to block usage of the pool on outstanding element use after a clear- when I get back to this I'll look for a clean way to handle that too.

hcarty · 2017-10-14T21:43:10Z

And suggestions for approaches are of course welcome!

bobbypriam · 2017-10-17T11:03:21Z

@hcarty really love the new doc! It significantly clear things up. Now I understand the differences between check and validate.

Another minor note, there's an inconsistency, you defined [check element is_ok] but later If [check] calls [callback false]... If I understand correctly callback should be is_ok.

aantron · 2017-10-17T11:49:00Z

src/core/lwt_pool.mli

-    Note that pools are not for keeping Lwt threads. Lwt threads are very cheap
-    to create and are pure. It is neither desirable nor possible to reuse them.
-    If you want to have a pool of {e system} threads, consider module
+    Note that pools are not for keeping Lwt promises. Lwt promises are very


I think this paragraph is a fortunate casualty of the change in terminology from threads to promises – as in, I doubt anyone would think of keeping a "promise pool" anyway :) I suggest deleting most of this paragraph, and leaving some modified version of the last sentence about using Lwt_preemptive for a system thread pool.

aantron · 2017-10-17T11:50:51Z

src/core/lwt_pool.mli

    [Lwt_preemptive]. *)

 type 'a t
-  (** Pools containing values of type ['a]. *)
+  (** Pools containing elements of type ['a]. *)

 val create :
  int ->
  ?check : ('a -> (bool -> unit) -> unit) ->
  ?validate : ('a -> bool Lwt.t) ->


Optional, but maybe we should also change the order so that ?validate comes before ?check? On each call to use, ?validate is called first, then ?check if the promise returned by ?use is rejected.

aantron · 2017-10-17T11:51:25Z

src/core/lwt_pool.mli

+      to the optional function [check] as [check element callback].  [check]
+      must call [callback] exactly once with [true] if [element] is still valid
+      and [false] otherwise.  If [check] calls [callback false] then [dispose]
+      will be run on [element].


The new docs are pretty great!

hcarty · 2017-10-18T04:56:17Z

Thanks for the additional comments. I should get to this later this week, along with a potentially better approach to clear.

hcarty · 2017-10-18T16:01:31Z

When check returns false the current approach is to drop the existing pool element (and with this PR, dispose it) and then create a new one to take its place. Would it be ok to change that functionality to not replace the element in the pool immediately after check returns false? If a new element is required to take its place it would still be created during a subsequent use. My reasoning behind this change is:

Creating a new pool element may be costly. It seems better to save that cost until the element is required rather that paying it in the promise where it failed.
The current remake-on-check-failure logic makes clear's semantics more difficult to understand as clear could be called on a pool but an outstanding use promise could end up creating a new pool element if it resolves to an exception.

hcarty · 2017-10-18T16:20:27Z

I think the feedback on the doc comments have been addressed in the latest commit.

I've also updated clear so that any in-use pool elements will be disposed of once they are released. The logic there uses some ugly state tracking but hopefully the implementation is understandable. Please ask questions if it isn't.

Finally, I changed the logic around check so that a failed check does NOT create a new pool element. I think that this is better behavior in general - the only place we should create a new pool member is when we try to acquire one. This also makes the clear semantics and implementation much simpler.

aantron · 2017-10-18T18:33:28Z

Would it be ok to change that functionality to not replace the element in the pool immediately after check returns false?

As maintainer, changing semantics makes me feel extremely paranoid, but I think this is ok. It may affect element creation timing for some users, but given Lwt is already a concurrency library, I would imagine nobody is able to count on the timing anyway.

in the promise where it failed

Sorry to nitpick language, but I have to do this as part of putting the thread terminology to rest. It's misleading to say that anything happens in a promise (compare "in a thread," which makes sense, but Lwt is no longer called a threading library exactly because that doesn't make sense for Lwt). The only things that can be said to be in a promise are either a list of callbacks (when pending) or a value or exception (when resolved). A more right thing to say here is something like "rather than paying it right after a promise returned by the callback of use is rejected," or anything else like that.

aantron · 2017-10-18T18:39:47Z

src/core/lwt_pool.ml

@@ -36,6 +36,8 @@ type 'a t = {
  (* Validate old pool members. *)
  dispose : 'a -> unit Lwt.t;
  (* Dispose of a pool member. *)
+  mutable cleared : bool ref;


The amount of indirection here is a bit sneaky, and the below syntax

p.cleared := true; p.cleared <- ref false;

was mysterious to me when I first saw it. This

cleared = ref false;

also doesn't accurately reflect what's really going on. It's tempting to think that cleared involves only one level of indirection when seeing that.

I'd suggest making this an "immutable" field of type bool ref ref.

aantron · 2017-10-18T18:46:58Z

The docs look pretty good. For me, I'll wait on reviewing the code in detail, as the build is currently failing. I think it's just the order of the labeled arguments that needs updating in the implementation.

hcarty · 2017-10-18T19:05:11Z

A more right thing to say here is something like "rather than paying it right after a promise returned by the callback of use is rejected," or anything else like that.

I'm uncertain here - from a user's perspective it's use which pays the price of creating the new pool element. 'a Lwt.t may not be a thread but it still has a context it's responsible for, including its lifetime. I need to read your updated introduction again to make sure I have my terminology correct!

hcarty · 2017-10-18T19:05:55Z

I'll wait on reviewing the code in detail, as the build is currently failing. I think it's just the order of the labeled arguments that needs updating in the implementation.

Oh no! Sorry about that, I pushed too soon. I'll get a fix in.

aantron · 2017-10-18T19:16:04Z

from a user's perspective it's use which pays the price of creating the new pool element.

Quite true :)

'a Lwt.t may not be a thread but it still has a context it's responsible for, including its lifetime.

In a very loose sense there is a context, but it's a quite different kind of context than what is associated with a thread. That context is exactly a list of callbacks only (for a pending promise), and the promise is not "responsible" for it – it just has a list. I want to exactly avoid the notion that Lwt promises are similar to threads, including having contexts similar in any way one would ordinarily expect, so I'd rather avoid referring to both as "contexts" as well, as it's a loose use of terminology that could be confusing. The point is, promises are not "containers" for sequences of instructions, not units of execution, not something that runs, not something that can be interrupted, nothing "happens" in them. They are just smart references, that can hold lists of callbacks. Something else (a real thread, the main thread of the Lwt program) is responsible for running those callbacks at the right time, that main thread has a genuine thread context, etc. Promises are just inert data objects that are useful for synchronization and thinking about concurrency, and for getting a real thread to run things in a nice order.

aantron · 2017-10-18T19:25:00Z

Or, put another way, promises are similar to streams or events in FRP, except they are one-shot. They are some kind of objects used for factoring your concurrent program, and you can register functions on them to create some kind of promise "network." Of course, I can't write that in the Lwt introduction :p

hcarty · 2017-10-18T19:32:44Z

Working on that test failure now

hcarty · 2017-10-18T19:58:01Z

Sorry for the noise from that failing test!

Latest changes:

cleared is now a bool ref ref now as suggested by @aantron.
Add an internal dispose helper function to help ensure count stays synced up
Make the tests pass and add to the clear test to make sure even currently-in-use pool elements are disposed of once they are returned.
Lift the disposal logic out of Lwt.catch in use to avoid potential double application of dispose if dispose fails.

aantron · 2017-10-19T18:39:50Z

Thanks @hcarty, @bobbypriambodo!

Me and @hcarty decided to merge this, skipping a super-thorough nitpicky review for now, but pushing that review to an overhaul (as recently done with lwt.ml/lwt.mli). It is to be done before 3.2.0. This is because much of the mental work is the same between reviewing this and overhauling Lwt_pool, we have to untangle some issues in the pre-existing code, and there is no point in doing this work twice, and forgetting the results after the first time. Since dispose/clear will not have been released yet, we can freely address any further issues in dispose/clear as part of the overhaul.

Any comments or reviews in this PR are still welcome, though!

aantron · 2017-10-20T14:58:43Z

Finally, I changed the logic around check so that a failed check does NOT create a new pool element. I think that this is better behavior in general - the only place we should create a new pool member is when we try to acquire one. This also makes the clear semantics and implementation much simpler.

@hcarty, is it correct to say that this logic is still in the code after this PR? I do think I see it there..

aantron · 2017-10-20T15:15:45Z

Never mind, sorry. This old comment

lwt/src/core/lwt_pool.ml

Lines 106 to 108 in 2b8e531

    
           (* Verify a member is still valid before releasing it; try to replace 
        
              if it's invalid. *) 
        
           let check_elt p c =

is just 100% misleading. check_elt is called around validate (before use), not check (release after failed use).

aantron · 2017-10-20T15:30:21Z

@hcarty, @bobbypriambodo what do you think about creating elements on an as-needed basis only, i.e. never re-creating inside either validate or check? We would create them only as a direct consequence of calling use.

I guess this would change the semantics even more, as the pool wouldn't try at all to maintain the number of elements it grew to. If there are five concurrent users, the pool will tend to stay around five elements in size. If the number of concurrent users decreases to two, and elements sometimes die in validate or check, the pool will tend to decrease to two elements automatically.

I don't know if there is some wisdom behind the current semantics. I was not able to find any documented anywhere. @bobbypriambodo, do you know what is done in popular pool implementations in other languages?

aantron · 2017-10-20T15:34:01Z

This would also mean that if a pool element fails validate, we would start consuming pool elements in a loop until we find one that passes, or create a new one. The current behavior on failed validate is to leave the tail of the queue alone and create a new element right away. I'm not sure if one behavior is inherently more legitimate than the other.

hcarty · 2017-10-20T16:07:04Z

@aantron My preference is to only create on use. Creating a new element can fail. From a pool user's point of view that failure would make the most sense when there is a need for a new element.

aantron · 2017-10-20T16:23:10Z

Agreed. validate is still "on use," but I think only use itself should trigger creation.

As an example of weird behavior triggered by the current effort to maintain the element count, In the current code, if validate raises an exception, the pool checks for a waiting other promise, then tries to create an element, and fulfills/rejects that other promise depending on whether creation succeeded or also raised an exception. This was not documented, I wouldn't like to document it, and it suggests to me that even if the right thing to do is to create elements eagerly, it shouldn't be done/factored how it is now.

aantron · 2017-10-20T16:25:50Z

In practice, the above means that if validate raises an exception, Lwt_pool only tries to create a new element if the pool is already still at the max limit (since that's why there is a waiting promise).

bobbypriam · 2017-10-21T12:19:33Z

do you know what is done in popular pool implementations in other languages?

Sorry but at the moment I have no idea, I was more of a user than implementer and most of the libs I used just worked so I haven't got the chance to dig into the impls. I can look it up though, and perhaps post it into the Lwt_pool overhaul issue.

Support disposal of Lwt_pool.t elements

43e0f33

hcarty force-pushed the lets_dispose_of_pool_elements branch from 3f3068e to 43e0f33 Compare October 14, 2017 05:58

bobbypriam reviewed Oct 14, 2017

View reviewed changes

Update doc comments; fix a tiny typo

1d61f43

aantron reviewed Oct 17, 2017

View reviewed changes

Clear all pool elements. Address PR feedback.

8f6346e

aantron reviewed Oct 18, 2017

View reviewed changes

Fix Lwt_pool.create parameter order

bc59ca5

Fix failing test, address reviews, better dispose

c13fcf0

aantron merged commit 9d7002d into ocsigen:master Oct 19, 2017

hcarty deleted the lets_dispose_of_pool_elements branch October 19, 2017 18:50

aantron mentioned this pull request Oct 20, 2017

"Overhaul" Lwt_pool #485

Closed

aantron mentioned this pull request Dec 19, 2017

Lwt 3.2.0 ocaml/opam-repository#11071

Merged

Support disposal of Lwt_pool.t elements #483

Support disposal of Lwt_pool.t elements #483

Conversation

hcarty commented Oct 14, 2017 • edited by aantron

hcarty commented Oct 14, 2017

bobbypriam left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

aantron commented Oct 14, 2017

bobbypriam commented Oct 14, 2017

aantron commented Oct 14, 2017

hcarty commented Oct 14, 2017

hcarty commented Oct 14, 2017

bobbypriam commented Oct 17, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

hcarty commented Oct 18, 2017

hcarty commented Oct 18, 2017

hcarty commented Oct 18, 2017 • edited

aantron commented Oct 18, 2017 • edited

aantron Oct 18, 2017 • edited

Choose a reason for hiding this comment

aantron commented Oct 18, 2017

hcarty commented Oct 18, 2017

hcarty commented Oct 18, 2017

aantron commented Oct 18, 2017

aantron commented Oct 18, 2017 • edited

hcarty commented Oct 18, 2017

hcarty commented Oct 18, 2017

aantron commented Oct 19, 2017

aantron commented Oct 20, 2017

aantron commented Oct 20, 2017

aantron commented Oct 20, 2017

aantron commented Oct 20, 2017

hcarty commented Oct 20, 2017

aantron commented Oct 20, 2017 • edited

aantron commented Oct 20, 2017

bobbypriam commented Oct 21, 2017

hcarty commented Oct 14, 2017 •

edited by aantron

hcarty commented Oct 18, 2017 •

edited

aantron commented Oct 18, 2017 •

edited

aantron Oct 18, 2017 •

edited

aantron commented Oct 18, 2017 •

edited

aantron commented Oct 20, 2017 •

edited