You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am currently re-writing the Process.register(...) and Process.deregister(...) behaviour. If you rely on this then be aware of what's coming in the next nuget release (it's already on the master branch).
Just to re-cap, this is how it worked previously:
Calling Process.register(processId, name) would create a new proxy Process at the address /registered/<name>
Process.register(...) would return the proxy's process ID
Because /registered is outside of the scope of the node's address space (/node-name/...) it essentially acted as a DNS service for any process to find any other process by a known name.
This was achieved by calling Process.find(name), which didn't actually do any searching, it just built a ProcessId in the format /registered/<name>
The proxy at that address would deal with the routing of the messages to the registered Process
Problems with that solution are:
There is no cluster wide coordination of registered processes
That means if two separate nodes register the same process name then the two proxy processes would be fighting over the shared inbox. This would lead to undefined behaviour.
Proxy processes aren't great for subscriptions. You can't subscribe to a proxy and have it auto-subscribe to the thing it's proxying (this may change, but right now it's a limitation).
Dispatchers
Now that the Process system supports dispatchers we can implement a more advanced and robust system. Dispatchers put control on the sender's side. Here's an example:
In that example 3 processes are grouped into one ProcessId. You can then tell, ask, subscribe, etc. because it's just a regular ProcessId. The Process system itself can spot that there are multiple processes referenced and it deals with the dispatch, without a router or proxy Process.
The disp part tells the system to use a named dispatcher, the broadcast part is the name of the dispatcher (you can register your own dispatchers via Dispatch.register(...)). There are several built in dispatchers: broadcast, random, round-robin, random, first. The name of the dispatcher decides the bespoke behaviour to run on: [/node-name/user/proc1,/node-name/user/proc2,/node-name/user/proc3]
Back to registered processes
When you register a Process it now does one of two things:
If the Process is local-only, then it gets registered in an in-memory map of names to ProcessIds
If the Process is visible to the cluster, then it gets registered in a Redis map of names to ProcessIds
The result of calling Process.register(name) is still a ProcessId, but instead of it looking like this: /registered/<name> it will now look like this: /disp/reg/<name>. As you can probably see there is now a dispatcher for registered Processes called reg.
The default behaviour of this dispatcher is to get the full list of Processes that have been registered with a specific name, and dispatch to all of them (broadcast). This behaviour is more consistent overall, because it doesn't pass any judgement on who registered what when. It simply realises there are multiple processes registered with the same name, and you're trying to communicate with a Process by name, and therefore that's all of them.
When you call Process.find(name) the system does a similar thing as before of not actually doing any searching at that point, it merely returns /disp/reg/<name> - so as processes register or deregister the number of possible destinations for a message increases and decreases dynamically.
The keen eyed amongst you may realise that if you can get n processes registering themselves as 'a named thing', then you could implement high-availability strategies. And to that end, you can combine a registered ProcessId with other dispatcher behaviour. i.e.
This is actually a general feature of dispatchers that they can be combined. You can imagine that the reg dispatcher returns a list of registered 'mail-server' ProcessIds and then the least-busy dispatcher finds out which of those mail-server processes has the smallest queue before dispatching the message.
You can take this even further and register a dispatcher. Remember when you call register(name, pid) the pid is a ProcessId and so are the special dispatcher ProcessIds. So you could do this:
If you then did a series of tell calls against reg then the messages would be sent round-robin to pid1, pid2 and pid3. This has very similar functionality to routers without the need for a router Process.
If you think the implications of that through further, let's say you had two data-centres and you wanted an 'eventually consistent' system by sending the same message to both data-centres, but you wanted the least-busy of 3 nodes in each centre to receive the message. A node in each centre could register a least-busy dispatcher ProcessId under the same registered name, and because the default behaviour of the registered dispatcher is to broadcast, you'd get the exact behaviour you wanted.
Things to note are that this isn't an aliveness system (see the roles section later for that). A registered Process is registered until:
You call deregisterById(pid)
You call kill(pid)
Killing a process wipes its state, inbox and registrations. If you want to kill a process but maintain its cluster state then call: shutdown(pid).
So if a registered Process is offline then its inbox will keep filling up until it comes back online - so that facilitates eventually consistent behaviours.
Roles
Eventually consistent isn't always the desired behaviour, often you want to just find a Process that does 'a thing' and you want it to do that thing now. Roles facilitate that behaviour. Each node in the cluster must have a role name. Roles use the Process.ClusterNodes property to work out which member nodes are actually available (it's at most 3 seconds out of date, if a node has died, otherwise it's updated every second).
If you had 10 mail-servers, you could find the least-busy SMTP process by doing something like this:
The first child mail-server is the role name (which you specify when you call Cluster.register(...) at the start of your app), the rest of it is a relative leaf /user/outboud/smtp that will refer to N processes in the mail-server role.
The problem with that ProcessId is that you need to know about the inner workings of the mail-server node to know that the smtp Process is on the leaf /user/outbound/smtp, and that means that the Process hierarchy for the mail-server can't ever change. However because pid is just a ProcessId the mail-server node itself could register it instead:
Then any other node that wanted to send a message to the least-busy smtp Process could call:
tell(find("smtp"), msg);
You'll notice also that the mail-server nodes themselves have the control over how to route messages, whether it's least-busy, round-robin, etc. They can change their strategy without it affecting the sender applications.
Although this SMTP example isn't a great one, it should indicate how you can use registered names to represent a dynamically changing set of nodes and processes it the cluster.
De-registering by name
You can also wipe all registrations for by name:
deregisterByName(name);
That will clear all registrations for the name specified. This is pretty brutal behaviour, because you don't know who else in the cluster has registered a Process and you're basically wiping their decision. You could use it as a type of leader election system (by deregistering everyone else and registering yourself); but one thing to note is the process wouldn't be atomic, and is therefore not particularly bulletproof.
Existing code
So how will this affect existing code?
The signature to register has changed - there is no need for the flags or mailbox size argument any more (because the proxy has gone)
Any previous attempt to use this system for either broadcast or leader election would need to be reviewed. If you were doing this it was probably buggy anyway, but now it will be broadcast by default.
Process.Registered has gone, so if you're using that to build registered ProcessIds then you will need to use Process.find(...)
Process.deregister(...) is now Process.deregisterById(...) and Process.deregisterByName(...). Try to avoid using deregisterByName
This system is significantly more robust and powerful, so I hopefully you'll find that the breaking changes are worth it.
I am currently re-writing the
Process.register(...)
andProcess.deregister(...)
behaviour. If you rely on this then be aware of what's coming in the next nuget release (it's already on the master branch).Just to re-cap, this is how it worked previously:
Process.register(processId, name)
would create a new proxy Process at the address/registered/<name>
Process.register(...)
would return the proxy's process ID/registered
is outside of the scope of the node's address space (/node-name/...
) it essentially acted as a DNS service for any process to find any other process by a known name.Process.find(name)
, which didn't actually do any searching, it just built aProcessId
in the format/registered/<name>
Problems with that solution are:
Dispatchers
Now that the Process system supports dispatchers we can implement a more advanced and robust system. Dispatchers put control on the sender's side. Here's an example:
In that example 3 processes are grouped into one ProcessId. You can then
tell
,ask
,subscribe
, etc. because it's just a regularProcessId
. The Process system itself can spot that there are multiple processes referenced and it deals with the dispatch, without a router or proxy Process.In the above example
pid
looks like this:The
disp
part tells the system to use a named dispatcher, thebroadcast
part is the name of the dispatcher (you can register your own dispatchers viaDispatch.register(...)
). There are several built in dispatchers:broadcast
,random
,round-robin
,random
,first
. The name of the dispatcher decides the bespoke behaviour to run on:[/node-name/user/proc1,/node-name/user/proc2,/node-name/user/proc3]
Back to registered processes
When you
register
a Process it now does one of two things:ProcessIds
ProcessIds
The result of calling
Process.register(name)
is still aProcessId
, but instead of it looking like this:/registered/<name>
it will now look like this:/disp/reg/<name>
. As you can probably see there is now a dispatcher for registered Processes calledreg
.The default behaviour of this dispatcher is to get the full list of Processes that have been registered with a specific name, and dispatch to all of them (broadcast). This behaviour is more consistent overall, because it doesn't pass any judgement on who registered what when. It simply realises there are multiple processes registered with the same name, and you're trying to communicate with a Process by name, and therefore that's all of them.
When you call
Process.find(name)
the system does a similar thing as before of not actually doing any searching at that point, it merely returns/disp/reg/<name>
- so as processes register or deregister the number of possible destinations for a message increases and decreases dynamically.The keen eyed amongst you may realise that if you can get
n
processes registering themselves as 'a named thing', then you could implement high-availability strategies. And to that end, you can combine a registeredProcessId
with other dispatcher behaviour. i.e.The
pid
variable above would look like this:This is actually a general feature of dispatchers that they can be combined. You can imagine that the
reg
dispatcher returns a list of registered 'mail-server' ProcessIds and then theleast-busy
dispatcher finds out which of those mail-server processes has the smallest queue before dispatching the message.You can take this even further and register a dispatcher. Remember when you call
register(name, pid)
thepid
is aProcessId
and so are the special dispatcherProcessIds
. So you could do this:The value of
reg
would be:If you then did a series of
tell
calls againstreg
then the messages would be sent round-robin topid1
,pid2
andpid3
. This has very similar functionality to routers without the need for a router Process.If you think the implications of that through further, let's say you had two data-centres and you wanted an 'eventually consistent' system by sending the same message to both data-centres, but you wanted the least-busy of 3 nodes in each centre to receive the message. A node in each centre could register a least-busy dispatcher
ProcessId
under the same registered name, and because the default behaviour of the registered dispatcher is to broadcast, you'd get the exact behaviour you wanted.Things to note are that this isn't an aliveness system (see the roles section later for that). A registered Process is registered until:
deregisterById(pid)
kill(pid)
Killing a process wipes its state, inbox and registrations. If you want to kill a process but maintain its cluster state then call:
shutdown(pid)
.So if a registered Process is offline then its inbox will keep filling up until it comes back online - so that facilitates eventually consistent behaviours.
Roles
Eventually consistent isn't always the desired behaviour, often you want to just find a Process that does 'a thing' and you want it to do that thing now. Roles facilitate that behaviour. Each node in the cluster must have a role name. Roles use the
Process.ClusterNodes
property to work out which member nodes are actually available (it's at most 3 seconds out of date, if a node has died, otherwise it's updated every second).If you had 10 mail-servers, you could find the least-busy SMTP process by doing something like this:
The first child
mail-server
is the role name (which you specify when you callCluster.register(...)
at the start of your app), the rest of it is a relative leaf/user/outboud/smtp
that will refer to N processes in themail-server
role.The problem with that
ProcessId
is that you need to know about the inner workings of the mail-server node to know that thesmtp
Process is on the leaf/user/outbound/smtp
, and that means that the Process hierarchy for the mail-server can't ever change. However becausepid
is just a ProcessId the mail-server node itself could register it instead:Then any other node that wanted to send a message to the least-busy smtp Process could call:
You'll notice also that the mail-server nodes themselves have the control over how to route messages, whether it's least-busy, round-robin, etc. They can change their strategy without it affecting the sender applications.
Although this SMTP example isn't a great one, it should indicate how you can use registered names to represent a dynamically changing set of nodes and processes it the cluster.
De-registering by name
You can also wipe all registrations for by name:
That will clear all registrations for the name specified. This is pretty brutal behaviour, because you don't know who else in the cluster has registered a Process and you're basically wiping their decision. You could use it as a type of leader election system (by deregistering everyone else and registering yourself); but one thing to note is the process wouldn't be atomic, and is therefore not particularly bulletproof.
Existing code
So how will this affect existing code?
register
has changed - there is no need for the flags or mailbox size argument any more (because the proxy has gone)Process.Registered
has gone, so if you're using that to build registered ProcessIds then you will need to useProcess.find(...)
Process.deregister(...)
is nowProcess.deregisterById(...)
andProcess.deregisterByName(...)
. Try to avoid usingderegisterByName
This system is significantly more robust and powerful, so I hopefully you'll find that the breaking changes are worth it.
Process system dispatch documenation
The text was updated successfully, but these errors were encountered: