Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consistency of the set of members in the cluster #477

Closed
purplefox opened this issue May 9, 2013 · 46 comments

Comments

Projects
None yet
4 participants
@purplefox
Copy link

commented May 9, 2013

I've created this github issue to track this thread from the Google Group:

https://groups.google.com/group/hazelcast/browse_thread/thread/7d70e90c118f01c/

TLDR;

We are currently using Hazelcast in Vert.x and will shortly be implementing cluster failover. We would like to use Hazelcast so we can detect failover, however it seems this is not possible as the set of members can be concurrently updated while a membership is being handled. This makes handling consistent failover impossible.

We would like to know if there are plans to enhance Hazelcast to cope with this use case? If not, we will have to look at other options (zookeeper?) but I would rather stick with Hazelcast since it's been great so far.

So, really looking for some feedback from the Hazelcast team on this. If this feature is not planned for any release, we would greatly appreciate some feedback so we can plan accordingly and look elsewhere.

Thanks!

@pveentjer

This comment has been minimized.

Copy link
Member

commented May 9, 2013

Hi Purple Fox,

I will have serious time for this issue coming (long) weekend. I don't know if it will be full proof, but I hope to get a few steps closer so you can give it a try and we can finish the issue.

I have some functionality in place that serializes the event processing after they are received by Hazelcast, so there you won't get events out of order. But I'm not sure about the part just before I receive the events.. if that is done multithreaded, then events still can get out of order.

@purplefox

This comment has been minimized.

Copy link
Author

commented May 9, 2013

Can't you number the state changes? Then make sure you apply them in the same order?

@pveentjer

This comment has been minimized.

Copy link
Member

commented May 9, 2013

The problem is that I'm not in control of that part :) I'll ask Mehmet.

But if the events are received single threaded, then no order is lost. So I need to know exactly what happens.. and otherwise resequencing is an option. But it would be nice if we don't need to change other parts.

@purplefox

This comment has been minimized.

Copy link
Author

commented May 9, 2013

Hmm, just synchronizing when the events are received will serialize them but it won't necessarily serialize them in the same order as they are serialized on other nodes, unless they're all being read from a single socket by a single thread.
To ensure that the membership events are seen in the same order by each node, you'll either have to have some kind of master node through which all events are ordered to give a total ordering or use totem ring protocol (?). Or you could use virtual synchrony.
But none of these are trivial to implement if Hazelcast doesn't implement them already.

@pveentjer

This comment has been minimized.

Copy link
Member

commented May 12, 2013

I just started working on it.

It is a tricky one.. so it will take some time. Hope to complete it today or tomorrow.

@pveentjer

This comment has been minimized.

Copy link
Member

commented May 12, 2013

There are 2 issues that need to be solved:

  • the memberlisteners need to receive the events in the order they happened.
  • the membershiplistener should be able to see the set of member at the moment the event happened; either just before the event happened, or just after the event happened.

The ordering needs to be solved in 2 parts:

  • some kind of actor based event listener that wraps the original listener and plays the events on the order they are received. This one is easy.
  • make sure that before sending an event to that 'actor' nothing will be placed out of order. Either by resequencing.. or maybe it already is that case. Need to investigate this better.

The consistentview of the members can be stored in that actor and could be attached to the event. The actor will process the events in order and can also update the memberslist and pass it to the actual membershiplistener (e.g. by attaching it to the event.. ).. Need to do more investigation on how this exactly should fit into Hazelcast.

Solutions enough.. need to look for one that doesn't cause too much havoc :)

@pveentjer

This comment has been minimized.

Copy link
Member

commented May 12, 2013

I have added the following method to the MembershipEvent:

/**
* Returns a consistent view of the the members exactly after this MembershipEvent has been processed. So if a
* member is removed, the returned set will not include this member. And if a member is added it will include
* this member.
*
* The problem with calling the {@link com.hazelcast.core.Cluster#getMembers()} is that the content could already
* have changed while processing this event so it becomes very difficult to write a deterministic algorithm since
* you can't get a deterministic view of the members. This method solves that problem.
*
* @return the members
*/
public Set getMembers() {
return members;
}

And with the guarantee that:

  • the membership listener will not be called concurrently
  • the membership listener will receive the events in the order they occurred

Are your problems solved with these guarantees?

I need to have a closer look with Mehmet to some details in Hazelcast, so we need to make sure that we realize these guarantees.

@purplefox

This comment has been minimized.

Copy link
Author

commented May 12, 2013

I believe the guarantees I need are:

Let's say all member ship events are labelled En, e.g.

E1, E2, E3 etc

  1. The set of members will never be updated concurrently with respect to the membership listener being called.
  2. For any members that are in the cluster at the same time they will all receive the same sequence of events. I.e. no breaks.
  3. If getMembers is called from within a membership listener by any member processing event number Ep then the exact same set of members Sp will be returned irrespective of member
@pveentjer

This comment has been minimized.

Copy link
Member

commented May 12, 2013

--> The set of members will never be updated concurrently with respect to the membership listener being called.

If you read the members from the MemberShipEvent, then you will get this guarantee. The set of members is immutable and reflects the new state immediately after the member has been added/removed (so you can reconstruct the old state if you want to yourself).

--> For any members that are in the cluster at the same time they will all receive the same sequence of events. I.e. no breaks.

The same time can't be guaranteed since there is no synchronization, only serialization. So all listeners will receive the same sequence of events. This is something I need to verify with Mehmet. I got part of the event serialization in, but if something in front places events out of order, the promise is broken.

--> If getMembers is called from within a membership listener by any member processing event number Ep then the exact same set of members Sp will be returned irrespective of member

do you mean getMembers on Cluster or getMembers on MemberShipEvent? The getMembers on cluster is probably not what you want; you need to have the one on the MembershipEvent (a new method) that is deterministic and fits your needs.

PS:
Another related problem that I'm going to solve is the actual registration of the MembershipListener, since that also is not deterministic. As soon as a MembershipListener is registered, you also want to know about the current members in a deterministic fashion so that the initial step you begin with is consistent. Else the listening is still broken.

@purplefox

This comment has been minimized.

Copy link
Author

commented May 12, 2013

--> The same time can't be guaranteed since there is no synchronization, only serialization. So all listeners will receive the same sequence of events. This is something I need to verify with Mehmet. I got part of the event serialization in, but if something in front places events out of order, the promise is broken.

Right, this is what I was saying before - just synchronizing will not be sufficient - you need to provide a total ordering of these events across the cluster.

--> Another related problem that I'm going to solve is the actual registration of the MembershipListener, since that also is not deterministic. As soon as a MembershipListener is registered, you also want to know about the current members in a deterministic fashion so that the initial step you begin with is consistent. Else the listening is still broken.

Yes, this is covered under 3) in the guarantees I posted above.

@purplefox

This comment has been minimized.

Copy link
Author

commented May 12, 2013

@pveentjer

This comment has been minimized.

Copy link
Member

commented May 12, 2013

-- Right, this is what I was saying before - just synchronizing will not be sufficient - you need to provide a total ordering of these events across the cluster.

Correct, as long as we both agree that we are not going to provide any synchronisation on the processing events. So you and me (assume that we are messagelisteners) will both receive the same ordered stream of events, but it can be that you process the events faster than I can. So you and me are not guaranteed to be in sync (also not wanted since it will cause scalability problems).

Thanks for the link, I'll have a look.

I'm currently rewriting the whole listenerregistration; hope to have something usable shortly.

@purplefox

This comment has been minimized.

Copy link
Author

commented May 12, 2013

I'm not sure what you mean by this statement "So you and me are not guaranteed to be in sync"

@pveentjer

This comment has been minimized.

Copy link
Member

commented May 12, 2013

Ok: imagine we have an eventstream e1, e2, e3
Imagine that there are 2 eventlisteners, listener1 and listener2.
Imagine that listener1 can do one event per second.
Imagine that listener2 can do one event per minute.

Then listener1 has processed e1, e2, e3 after 3 seconds.
But listener2 is still working on e1 after 3 seconds.

So listener1 and listener2 can get out of sync..

But... imagine that there are no further events.. then after 10 minutes.. listener1 and listener2 are both finished and have received e1,e2,e3 in exactly the same order. So after 10 minutes they will be in sync again.

The solution I'm working on is not going to do any 'coordination' between event listeners to make sure that they are going to wait (synchronize) on each other. The only thing it is going to do is make sure that they will receive exactly the same ordered event stream.

@purplefox

This comment has been minimized.

Copy link
Author

commented May 12, 2013

I don't think that's relevant. Strictly speaking none of the event listeners will be processing messages at exactly the same time.

Actually, defining what "same time" means is a difficult philosophical question, but we can safely leave that one to the philosophers ;)

What's important here is the order, not that messages are being processed at exactly the same time.

@pveentjer

This comment has been minimized.

Copy link
Member

commented May 12, 2013

It is good that we are explaining something to each other we both agree on ;)

@pveentjer

This comment has been minimized.

Copy link
Member

commented May 12, 2013

I think I have got a initial solution. I need to have a chat with Mehmet what we are going to do next.

One of the things we need to talk about is breaking old functionality because the current listener implementation also receives all the initial members through a member added event. This is something different than the old implementation.

So perhaps we'll expose this functionality using an additional method so we don't break anything old:

public interface Cluster {

//old way
void addMembershipListener(MembershipListener listener);

//old way
void addConsistentMembershipListener(MembershipListener listener);

The issues about threading in front of membershiplisteners don't exist luckily. There is only a single service thread inside of the clustermanager which triggers sending the events. So it can't happen that due to concurrency things would get out of order before the improved membershiplistener execution is triggered.

@pveentjer

This comment has been minimized.

Copy link
Member

commented May 12, 2013

You can see the changes here:
3b9e599
work in progress :)

@purplefox

This comment has been minimized.

Copy link
Author

commented May 12, 2013

How are you ensuring that each node seems the events in the same order?

@pveentjer

This comment has been minimized.

Copy link
Member

commented May 12, 2013

That is a good one and needs to be verified with Mehmet.

But afaik the master node takes care of sending messages to the slave nodes, so therefor he sends the events in the same order to all the slaves. And afaik only a single thread is receiving these events on the slave.

@purplefox

This comment has been minimized.

Copy link
Author

commented May 12, 2013

Hmm, are you sure?

I don't know much about Hazelcast internals but I was under the impression there is no "master" node with Hazelcast.

See http://www.hazelcast.com/docs/1.9.4/manual/multi_html/ch15s03.html for example:

"It is important to note that Hazelcast is a peer to peer clustering so there is no 'master' kind of server in Hazelcast. Every member in the cluster is equal and has the same rights and responsibilities."

@mdogan

This comment has been minimized.

Copy link
Member

commented May 12, 2013

Hazelcast doesn't have master node as it's used in master/slave
architectures. But we call the eldest node as master, and it has a few more
privileges than other members, accepting join requests and deciding
partition arrangement.
(Instead of decision mechanisms like voting etc we preferred to leave
decision to the eldest one.) Other than that all members are equal and
every member become master (means the eldest node) some time if the eldest
member leaves the cluster. There's no specially choosen, predefined master
node.

@mmdogan

~Sent from mobile

@purplefox

This comment has been minimized.

Copy link
Author

commented May 12, 2013

How do you determine what the "oldest" node is? Do the clocks on each node have to be exactly synchronized so they can agree which one is the oldest?

@purplefox

This comment has been minimized.

Copy link
Author

commented May 12, 2013

If each node in the cluster already has a list of members that's consistent with every other node so it can determine which is the next "master" if the "master" fails then it seems like you have already implemented what I am looking for?

@mdogan

This comment has been minimized.

Copy link
Member

commented May 12, 2013

Oldest is determined by join order. If two nodes starts simultaneously then
they do a handshake to chose one as oldest and the other one joins to it.
Yes, each node always knows the exact same member list and each knows the
next master if former leaves the cluster. Cluster.getMembers() returns that
list as an ordered set.

@mmdogan

~Sent from mobile
On May 12, 2013 4:12 PM, "Tim Fox" notifications@github.com wrote:

If each node in the cluster already has a list of members that's
consistent with every other node so it can determine which is the next
"master" if the "master" fails then it seems like you have already
implemented what I am looking for?


Reply to this email directly or view it on GitHubhttps://github.com//issues/477#issuecomment-17777703
.

@purplefox

This comment has been minimized.

Copy link
Author

commented May 12, 2013

Well... if you already have this, why is Peter implementing it? ;)

For vert.x we need access to the list of members so we can determine who is going to take over from another if one fails.

The problem is you only expose the set of members as a Set, not a list.

If you already maintain a List of members such that each node always sees the list and set of changes to the list, in the same order then problem solved. No?

Can't you just expose the list?

@purplefox

This comment has been minimized.

Copy link
Author

commented May 12, 2013

Just to clarify - does cluster.getMembers() already satisfy the 3 guarantees I posted above when accessed from a membership listener?

@mdogan

This comment has been minimized.

Copy link
Member

commented May 12, 2013

Cluster.getMembers() is an ordered set of members. I guess exposing member list as an ordered set should not be a problem for you, if you like you can always create a j.u.List from that set. We are guaranteeing the order.

Each node has the exact same list and sees exact set of changes to that list in the same order.

As I understand problem is that; membership listeners (like all other listeners) are called asynchronously, they don't intercept/block membership process. Although membership events are generated ordered (in the exact same order) and membership listeners are called in the exact same order on all of the nodes, one can see different set of members when Cluster.getMembers() is called. Because membership process is independent from listener calling process.

Iterating over your 3 guarantee requests;

1.  The set of members will never be updated concurrently with respect to the membership listener being called.

At the moment, members (I mean Cluster.getMembers()) can change during execution of membership listener.

2.  For any members that are in the cluster at the same time they will all receive the same sequence of events. I.e. no breaks.

This is guaranteed.

3. If getMembers is called from within a membership listener by any member processing event number Ep then the exact same set of members Sp will be returned irrespective of member.

Again no guarantee here, listener calls are async, they don't intercept/block membership process.

What I understand Peter trying to do is, to attach a snapshot of member list to the membership event at the time of event is generated, regardless of current member list. So, during execution of membership listener, all callers of membershipEvent.getMembers() will see the exact same set of members.

@pveentjer

This comment has been minimized.

Copy link
Member

commented May 12, 2013

What I did is:

a) all listeners will process the events in the order they happened
b) a listener will never be called concurrently
c) a listener will receive an event for all members that exist at the
moment it registers to make sure it is up to date.
d) a listener can see the set of members that belong to the state after
dealing with the event (so in a case of a member added event the set will
contain the item, and in case of a member removed it will not contain that
item). The call to Cluster.getMembers should be replaced by
event.getMembers() if you want to have a matching view of the members for
that particular event.

In theory the set of members could be maintained by the membershiplistener
and doesn't need to be added to the event since it can also keep track of
it. So in theory my d) feature could be dropped.

On Sun, May 12, 2013 at 5:41 PM, Mehmet Dogan notifications@github.comwrote:

Cluster.getMembers() is an ordered set of members. I guess exposing
member list as an ordered set should not be a problem for you, if you like
you can always create a j.u.List from that set. We are guaranteeing the
order.

Each node has the exact same list and sees exact set of changes to that
list in the same order.

As I understand problem is that; membership listeners (like all other
listeners) are called asynchronously, they don't intercept/block membership
process. Although membership events are generated ordered (in the exact
same order) and membership listeners are called in the exact same order on
all of the nodes, one can see different set of members when
Cluster.getMembers() is called. Because membership process is independent
from listener calling process.

Iterating over your 3 guarantee requests;

  1. The set of members will never be updated concurrently with respect to the membership listener being called.

At the moment, members (I mean Cluster.getMembers()) can change during
execution of membership listener.

  1. For any members that are in the cluster at the same time they will all receive the same sequence of events. I.e. no breaks.

This is guaranteed.

  1. If getMembers is called from within a membership listener by any member processing event number Ep then the exact same set of members Sp will be returned irrespective of member.

Again no guarantee here, listener calls are async, they don't
intercept/block membership process.

What I understand Peter trying to do is, to attach a snapshot of member
list to the membership event at the time of event is generated, regardless
of current member list. So, during execution of membership listener, all
callers of membershipEvent.getMembers() will see the exact same set of
members.


Reply to this email directly or view it on GitHubhttps://github.com//issues/477#issuecomment-17778931
.

@purplefox

This comment has been minimized.

Copy link
Author

commented May 12, 2013

Mehmet - I see. So it appears you have already solved the hard problem which is making sure each member maintains a consistent list of the members in the cluster. That is good.

This implies you already have a state machine, whereby membership events arrive on the member and you mutate the state (membership list).

The problem with the membership listeners seems to be that you then call the membership listeners asynchronously not synchronously, so the state could mutate again before the listener is called.

To solve this why not just call the membership listeners synchronously using the same thread that you use to mutate the membership list? Doing this would satisfy my three requirements, and I think it would satisfy the "principle of least surprise" with regard to Hazelcast behaviour.

@pveentjer

This comment has been minimized.

Copy link
Member

commented May 12, 2013

The problem with doing everything on the same thread is that one slow
membershiplistener can cause havoc in a system.

My solution still gives the same guarantees, but it doesn't suffer from
that problem.

On Sun, May 12, 2013 at 6:35 PM, Tim Fox notifications@github.com wrote:

Mehmet - I see. So it appears you have already solved the hard problem
which is making sure each member maintains a consistent list of the members
in the cluster. That is good.

This implies you already have a state machine, whereby membership events
arrive on the member and you mutate the state (membership list).

The problem with the membership listeners seems to be that you then call
the membership listeners asynchronously not synchronously, so the state
could mutate again before the listener is called.

To solve this why not just call the membership listeners synchronously
using the same thread that you use to mutate the membership list? Doing
this would satisfy my three requirements, and I think it would satisfy the
"principle of least surprise" with regard to Hazelcast behaviour.


Reply to this email directly or view it on GitHubhttps://github.com//issues/477#issuecomment-17779707
.

@purplefox

This comment has been minimized.

Copy link
Author

commented May 12, 2013

On 12/05/13 16:47, Peter Veentjer wrote:

The problem with doing everything on the same thread is that one slow
membershiplistener can cause havoc in a system.
Users can always do stupid things that will wreck the system whether or
not you execute listeners on different threads.

My choice would be to execute the listeners on the same thread - this is
what we do in Vert.x - in fact we do almost everything on the same
thread (it's a reactor based model).

My solution still gives the same guarantees, but it doesn't suffer from
that problem.

At the end of the day it's your choice. But I wouldn't consider it a
problem. It would also make the solution considerably simpler, and with
less scope for errors, and also hopefully get it in a Hazelcast release
more quickly :)

On Sun, May 12, 2013 at 6:35 PM, Tim Fox notifications@github.com wrote:

Mehmet - I see. So it appears you have already solved the hard problem
which is making sure each member maintains a consistent list of the members
in the cluster. That is good.

This implies you already have a state machine, whereby membership events
arrive on the member and you mutate the state (membership list).

The problem with the membership listeners seems to be that you then call
the membership listeners asynchronously not synchronously, so the state
could mutate again before the listener is called.

To solve this why not just call the membership listeners synchronously
using the same thread that you use to mutate the membership list? Doing
this would satisfy my three requirements, and I think it would satisfy the
"principle of least surprise" with regard to Hazelcast behaviour.


Reply to this email directly or view it on GitHubhttps://github.com//issues/477#issuecomment-17779707
.


Reply to this email directly or view it on GitHub:
#477 (comment)

@mdogan

This comment has been minimized.

Copy link
Member

commented May 12, 2013

Of course people can always do stupid things, but one of the missions of
the api writers is to build a less error prone (or say robust) library and
to make system as possible as immune to user errors.

@mmdogan

~Sent from mobile
On May 12, 2013 9:41 PM, "Tim Fox" notifications@github.com wrote:

On 12/05/13 16:47, Peter Veentjer wrote:

The problem with doing everything on the same thread is that one slow
membershiplistener can cause havoc in a system.
Users can always do stupid things that will wreck the system whether or
not you execute listeners on different threads.

My choice would be to execute the listeners on the same thread - this is
what we do in Vert.x - in fact we do almost everything on the same
thread (it's a reactor based model).

My solution still gives the same guarantees, but it doesn't suffer from
that problem.

At the end of the day it's your choice. But I wouldn't consider it a
problem. It would also make the solution considerably simpler, and with
less scope for errors, and also hopefully get it in a Hazelcast release
more quickly :)

On Sun, May 12, 2013 at 6:35 PM, Tim Fox notifications@github.com
wrote:

Mehmet - I see. So it appears you have already solved the hard problem
which is making sure each member maintains a consistent list of the
members
in the cluster. That is good.

This implies you already have a state machine, whereby membership events
arrive on the member and you mutate the state (membership list).

The problem with the membership listeners seems to be that you then call
the membership listeners asynchronously not synchronously, so the state
could mutate again before the listener is called.

To solve this why not just call the membership listeners synchronously
using the same thread that you use to mutate the membership list? Doing
this would satisfy my three requirements, and I think it would satisfy
the
"principle of least surprise" with regard to Hazelcast behaviour.


Reply to this email directly or view it on GitHub<
https://github.com/hazelcast/hazelcast/issues/477#issuecomment-17779707>
.


Reply to this email directly or view it on GitHub:
#477 (comment)


Reply to this email directly or view it on GitHubhttps://github.com//issues/477#issuecomment-17782849
.

@pveentjer

This comment has been minimized.

Copy link
Member

commented May 12, 2013

I think we can get it into the next release. So that should imho not be a
problem, but I think Mehmet needs to give a definitive go for it.

But are you satisfied with the solution provided? Does it do what you want
it to do?

On Sun, May 12, 2013 at 9:51 PM, Mehmet Dogan notifications@github.comwrote:

Of course people can always do stupid things, but one of the missions of
the api writers is to build a less error prone (or say robust) library and
to make system as possible as immune to user errors.

@mmdogan

~Sent from mobile
On May 12, 2013 9:41 PM, "Tim Fox" notifications@github.com wrote:

On 12/05/13 16:47, Peter Veentjer wrote:

The problem with doing everything on the same thread is that one slow
membershiplistener can cause havoc in a system.
Users can always do stupid things that will wreck the system whether or
not you execute listeners on different threads.

My choice would be to execute the listeners on the same thread - this is
what we do in Vert.x - in fact we do almost everything on the same
thread (it's a reactor based model).

My solution still gives the same guarantees, but it doesn't suffer
from
that problem.

At the end of the day it's your choice. But I wouldn't consider it a
problem. It would also make the solution considerably simpler, and with
less scope for errors, and also hopefully get it in a Hazelcast release
more quickly :)

On Sun, May 12, 2013 at 6:35 PM, Tim Fox notifications@github.com
wrote:

Mehmet - I see. So it appears you have already solved the hard
problem
which is making sure each member maintains a consistent list of the
members
in the cluster. That is good.

This implies you already have a state machine, whereby membership
events
arrive on the member and you mutate the state (membership list).

The problem with the membership listeners seems to be that you then
call
the membership listeners asynchronously not synchronously, so the
state
could mutate again before the listener is called.

To solve this why not just call the membership listeners
synchronously
using the same thread that you use to mutate the membership list?
Doing
this would satisfy my three requirements, and I think it would
satisfy
the
"principle of least surprise" with regard to Hazelcast behaviour.


Reply to this email directly or view it on GitHub<
https://github.com/hazelcast/hazelcast/issues/477#issuecomment-17779707>

.


Reply to this email directly or view it on GitHub:

#477 (comment)


Reply to this email directly or view it on GitHub<
https://github.com/hazelcast/hazelcast/issues/477#issuecomment-17782849>
.


Reply to this email directly or view it on GitHubhttps://github.com//issues/477#issuecomment-17783047
.

@Tembrel

This comment has been minimized.

Copy link

commented May 13, 2013

Regarding synch vs. asynch listener handling: Would it be possible to take a page from Guava's EventBus? In that framework, you register your listeners with an event bus that is either synchronous or asynchronous. If Hazelcast kept the asynch handling as the default but provided synch handling as an advanced option (with warnings and perhaps a more painful-to-access API), the naive user would be protected and the user who really needed synch handling would be able to get it.

@purplefox

This comment has been minimized.

Copy link
Author

commented May 13, 2013

Peter - I think what you suggest should be ok for me :)

@pveentjer

This comment has been minimized.

Copy link
Member

commented May 13, 2013

Hi Tim Fox,

the solution of having a synchronous event listener (and therefor safely
being able to read out cluster.getMembers() or having a asynchronous event
listener (and reading event.getMembers()), will both solve your problems.

The big question is which one has a lower WTF value.

Most listeners I have seen are asynchronous and the current implementation
in Hazelcast is asynchronous. So from a user perspective, imho,
asynchronous listeners don't have a higher WTF value than synchronous ones.

Also using asynchronous event listeners, protects against abuse of the
server thread; there is a single thread that deals with incoming messages
and updates important cluster state. If this thread is abused by a
slow/unresponsive synchronous listener implementation, the node will not be
able to perform its duties and imho this can cause of big WTF's. I don't
know if the serverthread is holding locks while calling the listeners, but
in theory this could be a source of deadlocks.

Therefore I believe that the asynchronous solution is the best one. It
serves your purpose and stability of the Hazelcast cluster is not put in
any danger. The code has been implemented, and we can release it in the
next release. If you want, you can cherry-pick the changes and patch one of
the latests releases and have it in private repo.

On Mon, May 13, 2013 at 4:35 PM, Tim Fox notifications@github.com wrote:

Peter - I think what you suggest should be ok for me :)


Reply to this email directly or view it on GitHubhttps://github.com//issues/477#issuecomment-17811653
.

@purplefox

This comment has been minimized.

Copy link
Author

commented May 13, 2013

Personally I would go for the synchronous approach every time. I believe
it's satisfies the principle of least surprise and is much more in
keeping with the concurrency model of Vert.x (reactor pattern)

However I won't argue with you on this one.

If you prefer an async approach I don't really care - this code is not
performance critical so the overhead of delivering events on a different
thread is not high.

As long as the behaviour satisfies the three guarantees I need, I am
happy :)

On 13/05/13 14:53, Peter Veentjer wrote:

Hi Tim Fox,

the solution of having a synchronous event listener (and therefor safely
being able to read out cluster.getMembers() or having a asynchronous event
listener (and reading event.getMembers()), will both solve your problems.

The big question is which one has a lower WTF value.

Most listeners I have seen are asynchronous and the current implementation
in Hazelcast is asynchronous. So from a user perspective, imho,
asynchronous listeners don't have a higher WTF value than synchronous ones.

Also using asynchronous event listeners, protects against abuse of the
server thread; there is a single thread that deals with incoming messages
and updates important cluster state. If this thread is abused by a
slow/unresponsive synchronous listener implementation, the node will not be
able to perform its duties and imho this can cause of big WTF's. I don't
know if the serverthread is holding locks while calling the listeners, but
in theory this could be a source of deadlocks.

Therefore I believe that the asynchronous solution is the best one. It
serves your purpose and stability of the Hazelcast cluster is not put in
any danger. The code has been implemented, and we can release it in the
next release. If you want, you can cherry-pick the changes and patch one of
the latests releases and have it in private repo.

On Mon, May 13, 2013 at 4:35 PM, Tim Fox notifications@github.com wrote:

Peter - I think what you suggest should be ok for me :)


Reply to this email directly or view it on GitHubhttps://github.com//issues/477#issuecomment-17811653
.


Reply to this email directly or view it on GitHub:
#477 (comment)

@pveentjer

This comment has been minimized.

Copy link
Member

commented May 13, 2013

Hi Tim Peierls,

I checked the page and I see that in some projects it can be very useful.

But in this particular case I think that having synchronous event listeners
is dangerous since the service thread (only 1) can be abused by a bad
listener implementation.

So I would strongly advice against abusing the server thread for for
listener notification.

Of course.. there are other solutions.. we could create a single thread
that AND sets the members AND does the listener notification. The server
thread can handover work this this worker thread.

Personally I don't see a reason to make use of synchronous listener
notification if we can easily add the guarantees Tim Fox is looking for. We
also need to apply the same fixes to 3.0 so the load balancers (part of the
new client api) are deterministic with regard to member updates.

On Mon, May 13, 2013 at 4:33 PM, Tim Peierls notifications@github.comwrote:

Regarding synch vs. asynch listener handling: Would it be possible to take
a page from Guava's EventBushttp://docs.guava-libraries.googlecode.com/git-history/v14.0.1/javadoc/com/google/common/eventbus/package-summary.html?
In that framework, you register your listeners with an event bus that is
either synchronous or asynchronous. If Hazelcast kept the asynch handling
as the default but provided synch handling as an advanced option (with
warnings and perhaps a more painful-to-access API), the naive user would be
protected and the user who really needed synch handling would be able to
get it.


Reply to this email directly or view it on GitHubhttps://github.com//issues/477#issuecomment-17811521
.

@pveentjer

This comment has been minimized.

Copy link
Member

commented May 13, 2013

Good :)

Member updates are not going to be send millions of times a second :) So
I'm not worried about the performance at all for this part.

I'm waiting for a talk with Mehmet to be able to close this task (and port
it to 3.0 since we need it there as well for the clients).

On Mon, May 13, 2013 at 4:58 PM, Tim Fox notifications@github.com wrote:

Personally I would go for the synchronous approach every time. I believe
it's satisfies the principle of least surprise and is much more in
keeping with the concurrency model of Vert.x (reactor pattern)

However I won't argue with you on this one.

If you prefer an async approach I don't really care - this code is not
performance critical so the overhead of delivering events on a different
thread is not high.

As long as the behaviour satisfies the three guarantees I need, I am
happy :)

On 13/05/13 14:53, Peter Veentjer wrote:

Hi Tim Fox,

the solution of having a synchronous event listener (and therefor safely
being able to read out cluster.getMembers() or having a asynchronous
event
listener (and reading event.getMembers()), will both solve your problems.

The big question is which one has a lower WTF value.

Most listeners I have seen are asynchronous and the current
implementation
in Hazelcast is asynchronous. So from a user perspective, imho,
asynchronous listeners don't have a higher WTF value than synchronous
ones.

Also using asynchronous event listeners, protects against abuse of the
server thread; there is a single thread that deals with incoming messages
and updates important cluster state. If this thread is abused by a
slow/unresponsive synchronous listener implementation, the node will not
be
able to perform its duties and imho this can cause of big WTF's. I don't
know if the serverthread is holding locks while calling the listeners,
but
in theory this could be a source of deadlocks.

Therefore I believe that the asynchronous solution is the best one. It
serves your purpose and stability of the Hazelcast cluster is not put in
any danger. The code has been implemented, and we can release it in the
next release. If you want, you can cherry-pick the changes and patch one
of
the latests releases and have it in private repo.

On Mon, May 13, 2013 at 4:35 PM, Tim Fox notifications@github.com
wrote:

Peter - I think what you suggest should be ok for me :)


Reply to this email directly or view it on GitHub<
https://github.com/hazelcast/hazelcast/issues/477#issuecomment-17811653>
.


Reply to this email directly or view it on GitHub:
#477 (comment)


Reply to this email directly or view it on GitHubhttps://github.com//issues/477#issuecomment-17813032
.

@pveentjer

This comment has been minimized.

Copy link
Member

commented May 14, 2013

Hi Tim Fox,

are you using the listeners on the serverside or on the client side?

On Mon, May 13, 2013 at 5:04 PM, Peter Veentjer alarmnummer@gmail.comwrote:

Good :)

Member updates are not going to be send millions of times a second :) So
I'm not worried about the performance at all for this part.

I'm waiting for a talk with Mehmet to be able to close this task (and port
it to 3.0 since we need it there as well for the clients).

On Mon, May 13, 2013 at 4:58 PM, Tim Fox notifications@github.com wrote:

Personally I would go for the synchronous approach every time. I believe
it's satisfies the principle of least surprise and is much more in
keeping with the concurrency model of Vert.x (reactor pattern)

However I won't argue with you on this one.

If you prefer an async approach I don't really care - this code is not
performance critical so the overhead of delivering events on a different
thread is not high.

As long as the behaviour satisfies the three guarantees I need, I am
happy :)

On 13/05/13 14:53, Peter Veentjer wrote:

Hi Tim Fox,

the solution of having a synchronous event listener (and therefor safely
being able to read out cluster.getMembers() or having a asynchronous
event
listener (and reading event.getMembers()), will both solve your
problems.

The big question is which one has a lower WTF value.

Most listeners I have seen are asynchronous and the current
implementation
in Hazelcast is asynchronous. So from a user perspective, imho,
asynchronous listeners don't have a higher WTF value than synchronous
ones.

Also using asynchronous event listeners, protects against abuse of the
server thread; there is a single thread that deals with incoming
messages
and updates important cluster state. If this thread is abused by a
slow/unresponsive synchronous listener implementation, the node will
not be
able to perform its duties and imho this can cause of big WTF's. I don't
know if the serverthread is holding locks while calling the listeners,
but
in theory this could be a source of deadlocks.

Therefore I believe that the asynchronous solution is the best one. It
serves your purpose and stability of the Hazelcast cluster is not put in
any danger. The code has been implemented, and we can release it in the
next release. If you want, you can cherry-pick the changes and patch
one of
the latests releases and have it in private repo.

On Mon, May 13, 2013 at 4:35 PM, Tim Fox notifications@github.com
wrote:

Peter - I think what you suggest should be ok for me :)


Reply to this email directly or view it on GitHub<
https://github.com/hazelcast/hazelcast/issues/477#issuecomment-17811653>
.


Reply to this email directly or view it on GitHub:
#477 (comment)


Reply to this email directly or view it on GitHubhttps://github.com//issues/477#issuecomment-17813032
.

@pveentjer

This comment has been minimized.

Copy link
Member

commented May 15, 2013

Hi Tim,

a status update.

I'm currently implementing the functionality for the ClusterClientProxy so
that clients can benefit from the same new functionality.

I hope that it can be included in the next release of Hazelcast.

On Tue, May 14, 2013 at 8:25 AM, Peter Veentjer alarmnummer@gmail.comwrote:

Hi Tim Fox,

are you using the listeners on the serverside or on the client side?

On Mon, May 13, 2013 at 5:04 PM, Peter Veentjer alarmnummer@gmail.comwrote:

Good :)

Member updates are not going to be send millions of times a second :) So
I'm not worried about the performance at all for this part.

I'm waiting for a talk with Mehmet to be able to close this task (and
port it to 3.0 since we need it there as well for the clients).

On Mon, May 13, 2013 at 4:58 PM, Tim Fox notifications@github.comwrote:

Personally I would go for the synchronous approach every time. I believe
it's satisfies the principle of least surprise and is much more in
keeping with the concurrency model of Vert.x (reactor pattern)

However I won't argue with you on this one.

If you prefer an async approach I don't really care - this code is not
performance critical so the overhead of delivering events on a different
thread is not high.

As long as the behaviour satisfies the three guarantees I need, I am
happy :)

On 13/05/13 14:53, Peter Veentjer wrote:

Hi Tim Fox,

the solution of having a synchronous event listener (and therefor
safely
being able to read out cluster.getMembers() or having a asynchronous
event
listener (and reading event.getMembers()), will both solve your
problems.

The big question is which one has a lower WTF value.

Most listeners I have seen are asynchronous and the current
implementation
in Hazelcast is asynchronous. So from a user perspective, imho,
asynchronous listeners don't have a higher WTF value than synchronous
ones.

Also using asynchronous event listeners, protects against abuse of the
server thread; there is a single thread that deals with incoming
messages
and updates important cluster state. If this thread is abused by a
slow/unresponsive synchronous listener implementation, the node will
not be
able to perform its duties and imho this can cause of big WTF's. I
don't
know if the serverthread is holding locks while calling the listeners,
but
in theory this could be a source of deadlocks.

Therefore I believe that the asynchronous solution is the best one. It
serves your purpose and stability of the Hazelcast cluster is not put
in
any danger. The code has been implemented, and we can release it in the
next release. If you want, you can cherry-pick the changes and patch
one of
the latests releases and have it in private repo.

On Mon, May 13, 2013 at 4:35 PM, Tim Fox notifications@github.com
wrote:

Peter - I think what you suggest should be ok for me :)


Reply to this email directly or view it on GitHub<
https://github.com/hazelcast/hazelcast/issues/477#issuecomment-17811653>
.


Reply to this email directly or view it on GitHub:

#477 (comment)


Reply to this email directly or view it on GitHubhttps://github.com//issues/477#issuecomment-17813032
.

@pveentjer

This comment has been minimized.

Copy link
Member

commented May 16, 2013

Hi Tim,

can you confirm you don't need the consistent listener functionality for the client.

@purplefox

This comment has been minimized.

Copy link
Author

commented May 18, 2013

Peter,

Correct, we don't need this on the client.

Thanks again for all your efforts (and Mehmet too) :)

@pveentjer

This comment has been minimized.

Copy link
Member

commented May 18, 2013

Great :)

Implementing it on the client is nearly impossible without major changes in
the 2.x branch.

A pull request is made, need to wait for Mehmet to merge it. I hope it will
be part of the next 2.x release.

On Sat, May 18, 2013 at 9:44 AM, Tim Fox notifications@github.com wrote:

Peter,

Correct, we don't need this on the client.

Thanks again for all your efforts (and Mehmet too) :)


Reply to this email directly or view it on GitHubhttps://github.com//issues/477#issuecomment-18096125
.

@ghost ghost assigned pveentjer May 27, 2013

@mdogan

This comment has been minimized.

Copy link
Member

commented May 27, 2013

Fixed by commit 672211f.

@mdogan mdogan closed this May 27, 2013

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.