Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with HTTPS or Subversion.

Download ZIP
Browse files

Copyediting changes

  • Loading branch information...
commit 837932bd0bca9cfff7e0a0c2beda4d14c688b900 1 parent d63fed7
@hintjens hintjens authored
Showing with 1,916 additions and 1,786 deletions.
  1. +1 −0  .gitignore
  2. +75 −77 chapter1.txt
  3. +163 −168 chapter2.txt
  4. +160 −160 chapter3.txt
  5. +214 −206 chapter4.txt
  6. +158 −154 chapter5.txt
  7. +236 −236 chapter6.txt
  8. +282 −282 chapter7.txt
  9. +352 −265 chapter8.txt
  10. +2 −2 examples/C/asyncsrv.c
  11. +6 −6 examples/C/bstar.c
  12. +1 −1  examples/C/bstarcli.c
  13. +3 −3 examples/C/bstarsrv.c
  14. +12 −12 examples/C/clone.c
  15. +1 −1  examples/C/clonecli2.c
  16. +3 −3 examples/C/clonecli3.c
  17. +2 −2 examples/C/clonecli4.c
  18. +2 −2 examples/C/clonecli5.c
  19. +1 −1  examples/C/clonesrv2.c
  20. +1 −1  examples/C/clonesrv3.c
  21. +1 −1  examples/C/clonesrv4.c
  22. +2 −2 examples/C/clonesrv5.c
  23. +15 −15 examples/C/clonesrv6.c
  24. +1 −1  examples/C/espresso.c
  25. +1 −1  examples/C/fileio1.c
  26. +1 −1  examples/C/fileio2.c
  27. +17 −17 examples/C/flcliapi.c
  28. +1 −1  examples/C/flclient1.c
  29. +5 −5 examples/C/flclient2.c
  30. +13 −13 examples/C/interface.c
  31. +12 −12 examples/C/kvmsg.c
  32. +11 −10 examples/C/kvsimple.c
  33. +6 −6 examples/C/lbbroker.c
  34. +3 −3 examples/C/lbbroker2.c
  35. +2 −2 examples/C/lbbroker3.c
  36. +1 −1  examples/C/lvcache.c
  37. +13 −13 examples/C/mdbroker.c
  38. +8 −8 examples/C/mdcliapi.c
  39. +4 −3 examples/C/mdwrkapi.c
  40. +1 −1  examples/C/mtserver.c
  41. +3 −3 examples/C/peering2.c
  42. +10 −10 examples/C/peering3.c
  43. +3 −3 examples/C/ppqueue.c
  44. +5 −4 examples/C/ppworker.c
  45. +1 −1  examples/C/psenvpub.c
  46. +1 −1  examples/C/psenvsub.c
  47. +1 −1  examples/C/rrbroker.c
  48. +1 −1  examples/C/rrworker.c
  49. +1 −1  examples/C/rtreq.c
  50. +3 −3 examples/C/suisnail.c
  51. +2 −2 examples/C/ticlient.c
  52. +14 −14 examples/C/titanic.c
  53. +1 −1  examples/C/tripping.c
  54. +1 −1  examples/C/udpping1.c
  55. +5 −0 notes.txt
  56. +1 −1  part1.txt
  57. +1 −1  part2.txt
  58. +54 −28 postface.txt
  59. +10 −10 preface_print.txt
  60. +2 −2 preface_web.txt
  61. +3 −0  styles.txt
View
1  .gitignore
@@ -16,3 +16,4 @@ book.ps
book.xml
book.tex
upload.sh
+changes.txt
View
152 chapter1.txt
@@ -6,23 +6,21 @@
How to explain 0MQ? Some of us start by saying all the wonderful things it does. //It's sockets on steroids. It's like mailboxes with routing. It's fast!// Others try to share their moment of enlightenment, that zap-pow-kaboom satori paradigm-shift moment when it all became obvious. //Things just become simpler. Complexity goes away. It opens the mind.// Others try to explain by comparison. //It's smaller, simpler, but still looks familiar.// Personally, I like to remember why we made 0MQ at all, because that's most likely where you, the reader, still are today.
-Programming is a science dressed up as art, because most of us don't understand the physics of software, and it's rarely if ever taught. The physics of software is not algorithms, data structures, languages and abstractions. These are just tools we make, use, throw away. The real physics of software is the physics of people.
-
-Specifically, our limitations when it comes to complexity, and our desire to work together to solve large problems in pieces. This is the science of programming: make building blocks that people can understand and use //easily//, and people will work together to solve the very largest problems.
+Programming is science dressed up as art because most of us don't understand the physics of software and it's rarely, if ever, taught. The physics of software is not algorithms, data structures, languages and abstractions. These are just tools we make, use, throw away. The real physics of software is the physics of people--specifically, our limitations when it comes to complexity, and our desire to work together to solve large problems in pieces. This is the science of programming: make building blocks that people can understand and use //easily//, and people will work together to solve the very largest problems.
We live in a connected world, and modern software has to navigate this world. So the building blocks for tomorrow's very largest solutions are connected and massively parallel. It's not enough for code to be "strong and silent" any more. Code has to talk to code. Code has to be chatty, sociable, well-connected. Code has to run like the human brain, trillions of individual neurons firing off messages to each other, a massively parallel network with no central control, no single point of failure, yet able to solve immensely difficult problems. And it's no accident that the future of code looks like the human brain, because the endpoints of every network are, at some level, human brains.
-If you've done any work with threads, protocols, or networks, you'll realize this is pretty much impossible. It's a dream. Even connecting a few programs across a few sockets is plain nasty, when you start to handle real life situations. Trillions? The cost would be unimaginable. Connecting computers is so difficult that software and services to do this is a multi-billion dollar business.
+If you've done any work with threads, protocols, or networks, you'll realize this is pretty much impossible. It's a dream. Even connecting a few programs across a few sockets is plain nasty when you start to handle real life situations. Trillions? The cost would be unimaginable. Connecting computers is so difficult that software and services to do this is a multi-billion dollar business.
So we live in a world where the wiring is years ahead of our ability to use it. We had a software crisis in the 1980s, when leading software engineers like Fred Brooks believed [http://en.wikipedia.org/wiki/No_Silver_Bullet there was no "Silver Bullet"] to "promise even one order of magnitude of improvement in productivity, reliability, or simplicity".
-Brooks missed free and open source software, which solved that crisis, enabling us to share knowledge efficiently. Today we face another software crisis, but it's one we don't talk about much. Only the largest, richest firms can afford to create connected applications. There is a cloud, but it's proprietary. Our data, our knowledge is disappearing from our personal computers into clouds that we cannot access, cannot compete with. Who owns our social networks? It is like the mainframe-PC revolution in reverse.
+Brooks missed free and open source software, which solved that crisis, enabling us to share knowledge efficiently. Today we face another software crisis, but it's one we don't talk about much. Only the largest, richest firms can afford to create connected applications. There is a cloud, but it's proprietary. Our data and our knowledge is disappearing from our personal computers into clouds that we cannot access and with which we cannot compete. Who owns our social networks? It is like the mainframe-PC revolution in reverse.
-We can leave the political philosophy [http://swsi.info for another book]. The point is that while the Internet offers the potential of massively connected code, the reality is that this is out of reach for most of us, and so, large interesting problems (in health, education, economics, transport, and so on) remain unsolved because there is no way to connect the code, and thus no way to connect the brains that could work together to solve these problems.
+We can leave the political philosophy [http://swsi.info for another book]. The point is that while the Internet offers the potential of massively connected code, the reality is that this is out of reach for most of us, and so large interesting problems (in health, education, economics, transport, and so on) remain unsolved because there is no way to connect the code, and thus no way to connect the brains that could work together to solve these problems.
-There have been many attempts to solve the challenge of connected code. There are thousands of IETF specifications, each solving part of the puzzle. For application developers, HTTP is perhaps the one solution to have been simple enough to work, but it arguably makes the problem worse, by encouraging developers and architects to think in terms of big servers and thin, stupid clients.
+There have been many attempts to solve the challenge of connected code. There are thousands of IETF specifications, each solving part of the puzzle. For application developers, HTTP is perhaps the one solution to have been simple enough to work, but it arguably makes the problem worse by encouraging developers and architects to think in terms of big servers and thin, stupid clients.
-So today people are still connecting applications using raw UDP and TCP, proprietary protocols, HTTP, Websockets. It remains painful, slow, hard to scale, and essentially centralized. Distributed P2P architectures are mostly for play, not work. How many applications use Skype or Bittorrent to exchange data?
+So today people are still connecting applications using raw UDP and TCP, proprietary protocols, HTTP, and Websockets. It remains painful, slow, hard to scale, and essentially centralized. Distributed P2P architectures are mostly for play, not work. How many applications use Skype or Bittorrent to exchange data?
Which brings us back to the science of programming. To fix the world, we needed to do two things. One, to solve the general problem of "how to connect any code to any code, anywhere". Two, to wrap that up in the simplest possible building blocks that people could understand and use //easily//.
@@ -30,7 +28,7 @@ It sounds ridiculously simple. And maybe it is. That's kind of the whole point.
++ Starting Assumptions
-We assume you are using at least release 3.2 of 0MQ. We assume you are using a Linux box or something similar. We assume you can read C code, more or less, that's the default language for the examples. We assume that when we write constants like PUSH or SUBSCRIBE you can imagine they are really called {{ZMQ_PUSH}} or {{ZMQ_SUBSCRIBE}} if the programming language needs it.
+We assume you are using at least version 3.2 of 0MQ. We assume you are using a Linux box or something similar. We assume you can read C code, more or less, as that's the default language for the examples. We assume that when we write constants like PUSH or SUBSCRIBE, you can imagine they are really called {{ZMQ_PUSH}} or {{ZMQ_SUBSCRIBE}} if the programming language needs it.
++ Getting the Examples
@@ -40,7 +38,7 @@ The examples live in a public [https://github.com/imatix/zguide GitHub repositor
git clone --depth=1 git://github.com/imatix/zguide.git
[[/code]]
-And then browse the examples subdirectory. You'll find examples by language. If there are examples missing in a language you use, you're encouraged to [http://zguide.zeromq.org/main:translate submit a translation]. This is how this text became so useful, thanks to the work of many people. All examples are licensed under MIT/X11.
+Next, browse the examples subdirectory. You'll find examples by language. If there are examples missing in a language you use, you're encouraged to [http://zguide.zeromq.org/main:translate submit a translation]. This is how this text became so useful, thanks to the work of many people. All examples are licensed under MIT/X11.
++ Ask and Ye Shall Receive
@@ -67,9 +65,9 @@ Hello | | World
#------------#
[[/code]]
-The REQ-REP socket pair is in lockstep. The client issues {{zmq_send[3]}} and then {{zmq_recv[3]}}, in a loop (or once if that's all it needs). Doing any other sequence (e.g. sending two messages in a row) will result in a return code of -1 from the {{send}} or {{recv}} call. Similarly, the service issues {{zmq_recv[3]}} and then {{zmq_send[3]}} in that order, as often as it needs to.
+The REQ-REP socket pair is in lockstep. The client issues {{zmq_send[3]}} and then {{zmq_recv[3]}}, in a loop (or once if that's all it needs). Doing any other sequence (e.g., sending two messages in a row) will result in a return code of -1 from the {{send}} or {{recv}} call. Similarly, the service issues {{zmq_recv[3]}} and then {{zmq_send[3]}} in that order, as often as it needs to.
-0MQ uses C as its reference language and this is the main language we'll use for examples. If you're reading this on-line, the link below the example takes you to translations into other programming languages. Let's compare the same server in C++:
+0MQ uses C as its reference language and this is the main language we'll use for examples. If you're reading this online, the link below the example takes you to translations into other programming languages. Let's compare the same server in C++:
[[code type="example" title="Hello World server" name="hwserver" language="C++"]]
[[/code]]
@@ -89,15 +87,15 @@ Here's the client code:
Now this looks too simple to be realistic, but 0MQ sockets have, as we already learned, superpowers. You could throw thousands of clients at this server, all at once, and it would continue to work happily and quickly. For fun, try starting the client and //then// starting the server, see how it all still works, then think for a second what this means.
-Let us explain briefly what these two programs are actually doing. They create a 0MQ context to work with, and a socket. Don't worry what the words mean. You'll pick it up. The server binds its REP (reply) socket to port 5555. The server waits for a request, in a loop, and responds each time with a reply. The client sends a request and reads the reply back from the server.
+Let us explain briefly what these two programs are actually doing. They create a 0MQ context to work with, and a socket. Don't worry what the words mean. You'll pick it up. The server binds its REP (reply) socket to port 5555. The server waits for a request in a loop, and responds each time with a reply. The client sends a request and reads the reply back from the server.
If you kill the server (Ctrl-C) and restart it, the client won't recover properly. Recovering from crashing processes isn't quite that easy. Making a reliable request-reply flow is complex enough that we won't cover it until [#reliable-request-reply].
-There is a lot happening behind the scenes but what matters to us programmers is how short and sweet the code is, and how often it doesn't crash, even under heavy load. This is the request-reply pattern, probably the simplest way to use 0MQ. It maps to RPC and the classic client-server model.
+There is a lot happening behind the scenes but what matters to us programmers is how short and sweet the code is, and how often it doesn't crash, even under a heavy load. This is the request-reply pattern, probably the simplest way to use 0MQ. It maps to RPC and the classic client/server model.
++ A Minor Note on Strings
-0MQ doesn't know anything about the data you send except its size in bytes. That means you are responsible for formatting it safely so that applications can read it back. Doing this for objects and complex data types is a job for specialized libraries like Protocol Buffers. But even for strings you need to take care.
+0MQ doesn't know anything about the data you send except its size in bytes. That means you are responsible for formatting it safely so that applications can read it back. Doing this for objects and complex data types is a job for specialized libraries like Protocol Buffers. But even for strings, you need to take care.
In C and some other languages, strings are terminated with a null byte. We could send a string like "HELLO" with that extra null byte:
@@ -105,13 +103,13 @@ In C and some other languages, strings are terminated with a null byte. We could
zmq_send (requester, "Hello", 6, 0);
[[/code]]
-However if you send a string from another language it probably will not include that null byte. For example, when we send that same string in Python, we do this:
+However, if you send a string from another language, it probably will not include that null byte. For example, when we send that same string in Python, we do this:
[[code language="Python"]]
socket.send ("Hello")
[[/code]]
-Then what goes onto the wire is a length (one byte for shorter strings) and the string contents, as individual characters[figure].
+Then what goes onto the wire is a length (one byte for shorter strings) and the string contents as individual characters[figure].
[[code type="textdiagram" title="A 0MQ string"]]
#-----# #-----+-----+-----+-----+-----#
@@ -121,9 +119,9 @@ Then what goes onto the wire is a length (one byte for shorter strings) and the
And if you read this from a C program, you will get something that looks like a string, and might by accident act like a string (if by luck the five bytes find themselves followed by an innocently lurking null), but isn't a proper string. When your client and server don't agree on the string format, you will get weird results.
-When you receive string data from 0MQ, in C, you simply cannot trust that it's safely terminated. Every single time you read a string you should allocate a new buffer with space for an extra byte, copy the string, and terminate it properly with a null.
+When you receive string data from 0MQ in C, you simply cannot trust that it's safely terminated. Every single time you read a string, you should allocate a new buffer with space for an extra byte, copy the string, and terminate it properly with a null.
-So let's establish the rule that **0MQ strings are length-specified, and are sent on the wire //without// a trailing null**. In the simplest case (and we'll do this in our examples) a 0MQ string maps neatly to a 0MQ message frame, which looks like the above figure, a length and some bytes.
+So let's establish the rule that **0MQ strings are length-specified and are sent on the wire //without// a trailing null**. In the simplest case (and we'll do this in our examples), a 0MQ string maps neatly to a 0MQ message frame, which looks like the above figure--a length and some bytes.
Here is what we need to do, in C, to receive a 0MQ string and deliver it to the application as a valid C string:
@@ -143,7 +141,7 @@ s_recv (void *socket) {
}
[[/code]]
-This makes a handy helper function and in the spirit of making things we can reuse profitably, let's write a similar 's_send' function that sends strings in the correct 0MQ format, and package this into a header file we can reuse.
+This makes a handy helper function and in the spirit of making things we can reuse profitably, let's write a similar {{s_send}} function that sends strings in the correct 0MQ format, and package this into a header file we can reuse.
The result is {{zhelpers.h}}, which lets us write sweeter and shorter 0MQ applications in C. It is a fairly long source, and only fun for C developers, so [https://github.com/imatix/zguide/blob/master/examples/C/zhelpers.h read it at leisure].
@@ -165,9 +163,9 @@ Here's the server. We'll use port 5556 for this application:
[[code type="example" title="Weather update server" name="wuserver"]]
[[/code]]
-There's no start, and no end to this stream of updates, it's like a never ending broadcast[figure].
+There's no start and no end to this stream of updates, it's like a never ending broadcast[figure].
-Here is client application, which listens to the stream of updates and grabs anything to do with a specified zip code, by default New York City because that's a great place to start any adventure:
+Here is the client application, which listens to the stream of updates and grabs anything to do with a specified zip code, by default New York City because that's a great place to start any adventure:
[[code type="example" title="Weather update client" name="wuclient"]]
[[/code]]
@@ -197,38 +195,38 @@ Here is client application, which listens to the stream of updates and grabs any
#------------# #------------# #------------#
[[/code]]
-Note that when you use a SUB socket you **must** set a subscription using {{zmq_setsockopt[3]}} and SUBSCRIBE, as in this code. If you don't set any subscription, you won't get any messages. It's a common mistake for beginners. The subscriber can set many subscriptions, which are added together. That is, if a update matches ANY subscription, the subscriber receives it. The subscriber can also cancel specific subscriptions. A subscription is often but not necessarily a printable string. See {{zmq_setsockopt[3]}} for how this works.
+Note that when you use a SUB socket you **must** set a subscription using {{zmq_setsockopt[3]}} and SUBSCRIBE, as in this code. If you don't set any subscription, you won't get any messages. It's a common mistake for beginners. The subscriber can set many subscriptions, which are added together. That is, if an update matches ANY subscription, the subscriber receives it. The subscriber can also cancel specific subscriptions. A subscription is often, but not necessarily a printable string. See {{zmq_setsockopt[3]}} for how this works.
-The PUB-SUB socket pair is asynchronous. The client does {{zmq_recv[3]}}, in a loop (or once if that's all it needs). Trying to send a message to a SUB socket will cause an error. Similarly the service does {{zmq_send[3]}} as often as it needs to, but must not do {{zmq_recv[3]}} on a PUB socket.
+The PUB-SUB socket pair is asynchronous. The client does {{zmq_recv[3]}}, in a loop (or once if that's all it needs). Trying to send a message to a SUB socket will cause an error. Similarly, the service does {{zmq_send[3]}} as often as it needs to, but must not do {{zmq_recv[3]}} on a PUB socket.
-In theory with 0MQ sockets, it does not matter which end connects, and which end binds. However in practice there are undocumented differences that I'll come to later. For now, bind the PUB and connect the SUB, unless your network design makes that impossible.
+In theory with 0MQ sockets, it does not matter which end connects and which end binds. However, in practice there are undocumented differences that I'll come to later. For now, bind the PUB and connect the SUB, unless your network design makes that impossible.
There is one more important thing to know about PUB-SUB sockets: you do not know precisely when a subscriber starts to get messages. Even if you start a subscriber, wait a while, and then start the publisher, **the subscriber will always miss the first messages that the publisher sends**. This is because as the subscriber connects to the publisher (something that takes a small but non-zero time), the publisher may already be sending messages out.
-This "slow joiner" symptom hits enough people, often enough, that we're going to explain it in detail. Remember that 0MQ does asynchronous I/O, i.e. in the background. Say you have two nodes doing this, in this order:
+This "slow joiner" symptom hits enough people often enough that we're going to explain it in detail. Remember that 0MQ does asynchronous I/O, i.e., in the background. Say you have two nodes doing this, in this order:
* Subscriber connects to an endpoint and receives and counts messages.
* Publisher binds to an endpoint and immediately sends 1,000 messages.
-Then the subscriber will most likely not receive anything. You'll blink, check that you set a correct filter, and try again, and the subscriber will still not receive anything.
+Then the subscriber will most likely not receive anything. You'll blink, check that you set a correct filter and try again, and the subscriber will still not receive anything.
-Making a TCP connection involves to and from handshaking that takes several milliseconds depending on your network and the number of hops between peers. In that time, 0MQ can send very many messages. For sake of argument assume it takes 5 msecs to establish a connection, and that same link can handle 1M messages per second. During the 5 msecs that the subscriber is connecting to the publisher, it takes the publisher only 1 msec to send out those 1K messages.
+Making a TCP connection involves to and from handshaking that takes several milliseconds depending on your network and the number of hops between peers. In that time, 0MQ can send many messages. For sake of argument assume it takes 5 msecs to establish a connection, and that same link can handle 1M messages per second. During the 5 msecs that the subscriber is connecting to the publisher, it takes the publisher only 1 msec to send out those 1K messages.
-In [#sockets-and-patterns] we'll explain how to synchronize a publisher and subscribers so that you don't start to publish data until the subscriber(s) really are connected and ready. There is a simple and stupid way to delay the publisher, which is to sleep. Don't do this in a real application, though, because it is extremely fragile as well as inelegant and slow. Use sleeps to prove to yourself what's happening, and then wait for [#sockets-and-patterns] to see how to do this right.
+In [#sockets-and-patterns] we'll explain how to synchronize a publisher and subscribers so that you don't start to publish data until the subscribers really are connected and ready. There is a simple and stupid way to delay the publisher, which is to sleep. Don't do this in a real application, though, because it is extremely fragile as well as inelegant and slow. Use sleeps to prove to yourself what's happening, and then wait for [#sockets-and-patterns] to see how to do this right.
-The alternative to synchronization is to simply assume that the published data stream is infinite and has no start, and no end. One also assumes that the subscriber doesn't care what transpired before it started up. This is how we built our weather client example.
+The alternative to synchronization is to simply assume that the published data stream is infinite and has no start and no end. One also assumes that the subscriber doesn't care what transpired before it started up. This is how we built our weather client example.
So the client subscribes to its chosen zip code and collects a thousand updates for that zip code. That means about ten million updates from the server, if zip codes are randomly distributed. You can start the client, and then the server, and the client will keep working. You can stop and restart the server as often as you like, and the client will keep working. When the client has collected its thousand updates, it calculates the average, prints it, and exits.
Some points about the publish-subscribe (pub-sub) pattern:
-* A subscriber can connect to more than one publisher, using one 'connect' call each time. Data will then arrive and be interleaved ("fair-queued") so that no single publisher drowns out the others.
+* A subscriber can connect to more than one publisher, using one connect call each time. Data will then arrive and be interleaved ("fair-queued") so that no single publisher drowns out the others.
* If a publisher has no connected subscribers, then it will simply drop all messages.
-* If you're using TCP, and a subscriber is slow, messages will queue up on the publisher. We'll look at how to protect publishers against this, using the "high-water mark" later.
+* If you're using TCP and a subscriber is slow, messages will queue up on the publisher. We'll look at how to protect publishers against this using the "high-water mark" later.
-* From 0MQ 3.x, filtering happens at the publisher side, when using a connected protocol ({{tcp://}} or {{ipc://}}). Using the {{epgm://}} protocol, filtering happens at the subscriber side. In 0MQ/2.x, all filtering happened at the subscriber side.
+* From 0MQ v3.x, filtering happens at the publisher side when using a connected protocol ({{tcp://}} or {{ipc://}}). Using the {{epgm://}} protocol, filtering happens at the subscriber side. In 0MQ v2.x, all filtering happened at the subscriber side.
This is how long it takes to receive and filter 10M messages on my laptop, which is an 2011-era Intel i5, decent but nothing special:
@@ -282,34 +280,34 @@ As a final example (you are surely getting tired of juicy code and want to delve
* A set of workers that process tasks
* A sink that collects results back from the worker processes
-In reality, workers run on superfast boxes, perhaps using GPUs (graphic processing units) to do the hard math. Here is the ventilator. It generates 100 tasks, each is a message telling the worker to sleep for some number of milliseconds:
+In reality, workers run on superfast boxes, perhaps using GPUs (graphic processing units) to do the hard math. Here is the ventilator. It generates 100 tasks, each one is a message telling the worker to sleep for some number of milliseconds:
[[code type="example" title="Parallel task ventilator" name="taskvent"]]
[[/code]]
-Here is the worker application. It receives a message, sleeps for that number of seconds, then signals that it's finished:
+Here is the worker application. It receives a message, sleeps for that number of seconds, and then signals that it's finished:
[[code type="example" title="Parallel task worker" name="taskwork"]]
[[/code]]
-Here is the sink application. It collects the 100 tasks, then calculates how long the overall processing took, so we can confirm that the workers really were running in parallel, if there are more than one of them:
+Here is the sink application. It collects the 100 tasks, then calculates how long the overall processing took, so we can confirm that the workers really were running in parallel if there are more than one of them:
[[code type="example" title="Parallel task sink" name="tasksink"]]
[[/code]]
The average cost of a batch is 5 seconds. When we start 1, 2, or 4 workers we get results like this from the sink:
-* 1 worker - total elapsed time: 5034 msecs.
-* 2 workers - total elapsed time: 2421 msecs.
-* 4 workers - total elapsed time: 1018 msecs.
+* 1 worker: total elapsed time: 5034 msecs.
+* 2 workers: total elapsed time: 2421 msecs.
+* 4 workers: total elapsed time: 1018 msecs.
Let's look at some aspects of this code in more detail:
-* The workers connect upstream to the ventilator, and downstream to the sink. This means you can add workers arbitrarily. If the workers bound to their endpoints, you would need (a) more endpoints and (b) to modify the ventilator and/or the sink each time you added a worker. We say that the ventilator and sink are 'stable' parts of our architecture and the workers are 'dynamic' parts of it.
+* The workers connect upstream to the ventilator, and downstream to the sink. This means you can add workers arbitrarily. If the workers bound to their endpoints, you would need (a) more endpoints and (b) to modify the ventilator and/or the sink each time you added a worker. We say that the ventilator and sink are //stable// parts of our architecture and the workers are //dynamic// parts of it.
-* We have to synchronize the start of the batch with all workers being up and running. This is a fairly common gotcha in 0MQ and there is no easy solution. The 'connect' method takes a certain time. So when a set of workers connect to the ventilator, the first one to successfully connect will get a whole load of messages in that short time while the others are also connecting. If you don't synchronize the start of the batch somehow, the system won't run in parallel at all. Try removing the wait, and see.
+* We have to synchronize the start of the batch with all workers being up and running. This is a fairly common gotcha in 0MQ and there is no easy solution. The {{zmq_connect}} method takes a certain time. So when a set of workers connect to the ventilator, the first one to successfully connect will get a whole load of messages in that short time while the others are also connecting. If you don't synchronize the start of the batch somehow, the system won't run in parallel at all. Try removing the wait in the ventilator, and see what happens.
-* The ventilator's PUSH socket distributes tasks to workers (assuming they are all connected //before// the batch starts going out) evenly. This is called //load-balancing// and it's something we'll look at again in more detail.
+* The ventilator's PUSH socket distributes tasks to workers (assuming they are all connected //before// the batch starts going out) evenly. This is called //load balancing// and it's something we'll look at again in more detail.
* The sink's PULL socket collects results from workers evenly. This is called //fair-queuing//[figure].
@@ -331,21 +329,21 @@ Let's look at some aspects of this code in more detail:
#-------------#
[[/code]]
-The pipeline pattern also exhibits the "slow joiner" syndrome, leading to accusations that PUSH sockets don't load balance properly. If you are using PUSH and PULL, and one of your workers gets way more messages than the others, it's because that PULL socket has joined faster than the others, and grabs a lot of messages before the others manage to connect. If you want proper load-balancing, you probably want to look at the The load-balancing pattern in [#advanced-request-reply].
+The pipeline pattern also exhibits the "slow joiner" syndrome, leading to accusations that PUSH sockets don't load balance properly. If you are using PUSH and PULL, and one of your workers gets way more messages than the others, it's because that PULL socket has joined faster than the others, and grabs a lot of messages before the others manage to connect. If you want proper load balancing, you probably want to look at the The load balancing pattern in [#advanced-request-reply].
++ Programming with 0MQ
-Having seen some examples, you're eager to start using 0MQ in some apps. Before you start that, take a deep breath, chillax, and reflect on some basic advice that will save you stress and confusion.
+Having seen some examples, you must be eager to start using 0MQ in some apps. Before you start that, take a deep breath, chillax, and reflect on some basic advice that will save you much stress and confusion.
-* Learn 0MQ step by step. It's just one simple API but it hides a world of possibilities. Take the possibilities slowly, master each one.
+* Learn 0MQ step-by-step. It's just one simple API, but it hides a world of possibilities. Take the possibilities slowly and master each one.
-* Write nice code. Ugly code hides problems and makes it hard for others to help you. You might get used to meaningless variable names, but people reading your code won't. Use names that are real words, that say something other than "I'm too careless to tell you what this variable is really for". Use consistent indentation, clean layout. Write nice code and your world will be more comfortable.
+* Write nice code. Ugly code hides problems and makes it hard for others to help you. You might get used to meaningless variable names, but people reading your code won't. Use names that are real words, that say something other than "I'm too careless to tell you what this variable is really for". Use consistent indentation and clean layout. Write nice code and your world will be more comfortable.
* Test what you make as you make it. When your program doesn't work, you should know what five lines are to blame. This is especially true when you do 0MQ magic, which just //won't// work the first few times you try it.
-* When you find that things don't work as expected, break your code into pieces, test each one, see which one is not working. 0MQ lets you make essentially modular code, use that to your advantage.
+* When you find that things don't work as expected, break your code into pieces, test each one, see which one is not working. 0MQ lets you make essentially modular code; use that to your advantage.
-* Make abstractions (classes, methods, whatever) as you need them. If you copy/paste a lot of code you're going to copy/paste errors too.
+* Make abstractions (classes, methods, whatever) as you need them. If you copy/paste a lot of code, you're going to copy/paste errors, too.
+++ Getting the Context Right
@@ -353,13 +351,13 @@ Having seen some examples, you're eager to start using 0MQ in some apps. Before
**Do one {{zmq_ctx_new[3]}} at the start of your main line code, and one {{zmq_ctx_destroy[3]}} at the end.**
-If you're using the {{fork()}} system call, each process needs its own context. If you do {{zmq_ctx_new[3]}} in the main process before calling {{fork()}}, the child processes get their own contexts. In general you want to do the interesting stuff in the child processes, and just manage these from the parent process.
+If you're using the {{fork()}} system call, each process needs its own context. If you do {{zmq_ctx_new[3]}} in the main process before calling {{fork()}}, the child processes get their own contexts. In general, you want to do the interesting stuff in the child processes and just manage these from the parent process.
+++ Making a Clean Exit
-Classy programmers share the same motto as classy hit men: always clean-up when you finish the job. When you use 0MQ in a language like Python, stuff gets automatically freed for you. But when using C you have to carefully free objects when you're finished with them, or you get memory leaks, unstable applications, and generally bad karma.
+Classy programmers share the same motto as classy hit men: always clean-up when you finish the job. When you use 0MQ in a language like Python, stuff gets automatically freed for you. But when using C, you have to carefully free objects when you're finished with them or else you get memory leaks, unstable applications, and generally bad karma.
-Memory leaks are one thing, but 0MQ is quite finicky about how you exit an application. The reasons are technical and painful but the upshot is that if you leave any sockets open, the {{zmq_ctx_destroy[3]}} function will hang forever. And even if you close all sockets, {{zmq_ctx_destroy[3]}} will by default wait forever if there are pending connects or sends. Unless you set the LINGER to zero on those sockets before closing them.
+Memory leaks are one thing, but 0MQ is quite finicky about how you exit an application. The reasons are technical and painful, but the upshot is that if you leave any sockets open, the {{zmq_ctx_destroy[3]}} function will hang forever. And even if you close all sockets, {{zmq_ctx_destroy[3]}} will by default wait forever if there are pending connects or sends unless you set the LINGER to zero on those sockets before closing them.
The 0MQ objects we need to worry about are messages, sockets, and contexts. Luckily it's quite simple, at least in simple programs:
@@ -367,17 +365,17 @@ The 0MQ objects we need to worry about are messages, sockets, and contexts. Luck
* If you do use {{zmq_msg_recv[3]}}, always release the received message as soon as you're done with it, by calling {{zmq_msg_close[3]}}.
-* If you are opening and closing a lot of sockets, that's probably a sign you need to redesign your application. In some cases socket handles won't be freed until you destroy the context.
+* If you are opening and closing a lot of sockets, that's probably a sign that you need to redesign your application. In some cases socket handles won't be freed until you destroy the context.
* When you exit the program, close your sockets and then call {{zmq_ctx_destroy[3]}}. This destroys the context.
-This is at least for C development. In a language with automatic object destruction, sockets and contexts will be destroyed as you leave the scope. If you use exceptions you'll have to do the clean-up in something like a "final" block, the same as for any resource.
+This is at least the case for C development. In a language with automatic object destruction, sockets and contexts will be destroyed as you leave the scope. If you use exceptions you'll have to do the clean-up in something like a "final" block, the same as for any resource.
-If you're doing multithreaded work, it gets rather more complex than this. We'll get to multithreading in the next chapter, but because some of you will, despite warnings, will try to run before you can safely walk, below is the quick and dirty guide to making a clean exit in a //multithreaded// 0MQ application.
+If you're doing multithreaded work, it gets rather more complex than this. We'll get to multithreading in the next chapter, but because some of you will, despite warnings, try to run before you can safely walk, below is the quick and dirty guide to making a clean exit in a //multithreaded// 0MQ application.
-First, do not try to use the same socket from multiple threads. No, don't explain why you think this would be excellent fun, just please don't do it. Next, you need to shut down each socket that has ongoing requests. The proper way is to set a low LINGER value (1 second), then close the socket. If your language binding doesn't do this for you automatically when you destroy a context, I'd suggest sending a patch.
+First, do not try to use the same socket from multiple threads. Please don't explain why you think this would be excellent fun, just please don't do it. Next, you need to shut down each socket that has ongoing requests. The proper way is to set a low LINGER value (1 second), and then close the socket. If your language binding doesn't do this for you automatically when you destroy a context, I'd suggest sending a patch.
-Finally, destroy the context. This will cause any blocking receives or polls or sends in attached threads (i.e. which share the same context) to return with an error. Catch that error, and then set linger on, and close sockets in //that// thread, and exit. Do not destroy the same context twice. The zmq_ctx_destroy in the main thread will block until all sockets it knows about are safely closed.
+Finally, destroy the context. This will cause any blocking receives or polls or sends in attached threads (i.e., which share the same context) to return with an error. Catch that error, and then set linger on, and close sockets in //that// thread, and exit. Do not destroy the same context twice. The {{zmq_ctx_destroy}} in the main thread will block until all sockets it knows about are safely closed.
Voila! It's complex and painful enough that any language binding author worth his or her salt will do this automatically and make the socket closing dance unnecessary.
@@ -387,17 +385,17 @@ Now that you've seen 0MQ in action, let's go back to the "why".
Many applications these days consist of components that stretch across some kind of network, either a LAN or the Internet. So many application developers end up doing some kind of messaging. Some developers use message queuing products, but most of the time they do it themselves, using TCP or UDP. These protocols are not hard to use, but there is a great difference between sending a few bytes from A to B, and doing messaging in any kind of reliable way.
-Let's look at the typical problems we face when we start to connect pieces using raw TCP. Any reusable messaging layer would need to solve all or most these:
+Let's look at the typical problems we face when we start to connect pieces using raw TCP. Any reusable messaging layer would need to solve all or most of these:
* How do we handle I/O? Does our application block, or do we handle I/O in the background? This is a key design decision. Blocking I/O creates architectures that do not scale well. But background I/O can be very hard to do right.
-* How do we handle dynamic components, i.e. pieces that go away temporarily? Do we formally split components into "clients" and "servers" and mandate that servers cannot disappear? What then if we want to connect servers to servers? Do we try to reconnect every few seconds?
+* How do we handle dynamic components, i.e., pieces that go away temporarily? Do we formally split components into "clients" and "servers" and mandate that servers cannot disappear? What then if we want to connect servers to servers? Do we try to reconnect every few seconds?
* How do we represent a message on the wire? How do we frame data so it's easy to write and read, safe from buffer overflows, efficient for small messages, yet adequate for the very largest videos of dancing cats wearing party hats?
-* How do we handle messages that we can't deliver immediately? Particularly, if we're waiting for a component to come back on-line? Do we discard messages, put them into a database, or into a memory queue?
+* How do we handle messages that we can't deliver immediately? Particularly, if we're waiting for a component to come back online? Do we discard messages, put them into a database, or into a memory queue?
-* Where do we store message queues? What happens if the component reading from a queue is very slow, and causes our queues to build up? What's our strategy then?
+* Where do we store message queues? What happens if the component reading from a queue is very slow and causes our queues to build up? What's our strategy then?
* How do we handle lost messages? Do we wait for fresh data, request a resend, or do we build some kind of reliability layer that ensures messages cannot be lost? What if that layer itself crashes?
@@ -411,11 +409,11 @@ Let's look at the typical problems we face when we start to connect pieces using
* How do we handle network errors? Do we wait and retry, ignore them silently, or abort?
-Take a typical open source project like [http://hadoop.apache.org/zookeeper/ Hadoop Zookeeper] and read the C API code in [http://github.com/apache/zookeeper/blob/trunk/src/c/src/zookeeper.c src/c/src/zookeeper.c]. As I write this, in 2010, the code is 3,200 lines of mystery and in there is an undocumented, client-server network communication protocol. I see it's efficient because it uses poll() instead of select(). But really, Zookeeper should be using a generic messaging layer and an explicitly documented wire level protocol. It is incredibly wasteful for teams to be building this particular wheel over and over.
+Take a typical open source project like [http://hadoop.apache.org/zookeeper/ Hadoop Zookeeper] and read the C API code in {{[http://github.com/apache/zookeeper/blob/trunk/src/c/src/zookeeper.c src/c/src/zookeeper.c]}}. When I read this code, in January 2013, it was 4,200 lines of mystery and in there is an undocumented, client/server network communication protocol. I see it's efficient because it uses {{poll}} instead of {{select}}. But really, Zookeeper should be using a generic messaging layer and an explicitly documented wire level protocol. It is incredibly wasteful for teams to be building this particular wheel over and over.
-But how to make a reusable messaging layer? Why, when so many projects need this technology, are people still doing it the hard way, by driving TCP sockets in their code, and solving the problems in that long list, over and over[figure]?
+But how to make a reusable messaging layer? Why, when so many projects need this technology, are people still doing it the hard way by driving TCP sockets in their code, and solving the problems in that long list over and over[figure]?
-It turns out that building reusable messaging systems is really difficult, which is why few FOSS projects ever tried, and why commercial messaging products are complex, expensive, inflexible, and brittle. In 2006 iMatix designed [http://www.amqp.org AMQP] which started to give FOSS developers perhaps the first reusable recipe for a messaging system. AMQP works better than many other designs [http://www.imatix.com/articles:whats-wrong-with-amqp but remains relatively complex, expensive, and brittle]. It takes weeks to learn to use, and months to create stable architectures that don't crash when things get hairy.
+It turns out that building reusable messaging systems is really difficult, which is why few FOSS projects ever tried, and why commercial messaging products are complex, expensive, inflexible, and brittle. In 2006, iMatix designed [http://www.amqp.org AMQP] which started to give FOSS developers perhaps the first reusable recipe for a messaging system. AMQP works better than many other designs, [http://www.imatix.com/articles:whats-wrong-with-amqp but remains relatively complex, expensive, and brittle]. It takes weeks to learn to use, and months to create stable architectures that don't crash when things get hairy.
[[code type="textdiagram" title="Messaging as it Starts"]]
.------------.
@@ -435,9 +433,9 @@ It turns out that building reusable messaging systems is really difficult, which
'------------'
[[/code]]
-Most messaging projects, like AMQP, that try to solve this long list of problems in a reusable way do so by inventing a new concept, the "broker", that does addressing, routing, and queuing. This results in a client-server protocol or a set of APIs on top of some undocumented protocol, that let applications speak to this broker. Brokers are an excellent thing in reducing the complexity of large networks. But adding broker-based messaging to a product like Zookeeper would make it worse, not better. It would mean adding an additional big box, and a new single point of failure. A broker rapidly becomes a bottleneck and a new risk to manage. If the software supports it, we can add a second, third, fourth broker and make some fail-over scheme. People do this. It creates more moving pieces, more complexity, more things to break.
+Most messaging projects, like AMQP, that try to solve this long list of problems in a reusable way do so by inventing a new concept, the "broker", that does addressing, routing, and queuing. This results in a client/server protocol or a set of APIs on top of some undocumented protocol that allows applications to speak to this broker. Brokers are an excellent thing in reducing the complexity of large networks. But adding broker-based messaging to a product like Zookeeper would make it worse, not better. It would mean adding an additional big box, and a new single point of failure. A broker rapidly becomes a bottleneck and a new risk to manage. If the software supports it, we can add a second, third, and fourth broker and make some failover scheme. People do this. It creates more moving pieces, more complexity, and more things to break.
-And a broker-centric set-up needs its own operations team. You literally need to watch the brokers day and night, and beat them with a stick when they start misbehaving. You need boxes, and you need backup boxes, and you need people to manage those boxes. It is only worth doing for large applications with many moving pieces, built by several teams of people, over several years.
+And a broker-centric setup needs its own operations team. You literally need to watch the brokers day and night, and beat them with a stick when they start misbehaving. You need boxes, and you need backup boxes, and you need people to manage those boxes. It is only worth doing for large applications with many moving pieces, built by several teams of people over several years.
[[code type="textdiagram" title="Messaging as it Becomes"]]
.---. .---.
@@ -467,9 +465,9 @@ And a broker-centric set-up needs its own operations team. You literally need to
'---' '---'
[[/code]]
-So small to medium application developers are trapped. Either they avoid network programming, and make monolithic applications that do not scale. Or they jump into network programming and make brittle, complex applications that are hard to maintain. Or they bet on a messaging product, and end up with scalable applications that depend on expensive, easily broken technology. There has been no really good choice, which is maybe why messaging is largely stuck in the last century and stirs strong emotions. Negative ones for users, gleeful joy for those selling support and licenses[figure].
+So small to medium application developers are trapped. Either they avoid network programming and make monolithic applications that do not scale. Or they jump into network programming and make brittle, complex applications that are hard to maintain. Or they bet on a messaging product, and end up with scalable applications that depend on expensive, easily broken technology. There has been no really good choice, which is maybe why messaging is largely stuck in the last century and stirs strong emotions: negative ones for users, gleeful joy for those selling support and licenses[figure].
-What we need is something that does the job of messaging but does it in such a simple and cheap way that it can work in any application, with close to zero cost. It should be a library that you just link with, without any other dependencies. No additional moving pieces, so no additional risk. It should run on any OS and work with any programming language.
+What we need is something that does the job of messaging, but does it in such a simple and cheap way that it can work in any application, with close to zero cost. It should be a library which which you just link, without any other dependencies. No additional moving pieces, so no additional risk. It should run on any OS and work with any programming language.
And this is 0MQ: an efficient, embeddable library that solves most of the problems an application needs to become nicely elastic across a network, without much cost.
@@ -499,7 +497,7 @@ Specifically:
* It reduces your carbon footprint. Doing more with less CPU means your boxes use less power, and you can keep your old boxes in use for longer. Al Gore would love 0MQ.
-Actually 0MQ does rather more than this. It has a subversive effect on how you develop network-capable applications. Superficially it's a socket-inspired API on which you do {{zmq_recv[3]}} and {{zmq_send[3]}}. But message processing rapidly becomes the central loop, and your application soon breaks down into a set of message processing tasks. It is elegant and natural. And it scales: each of these tasks maps to a node, and the nodes talk to each other across arbitrary transports. Two nodes in one process (node is a thread), two nodes on one box (node is a process), or two boxes on one network (node is a box) - it's all the same, with no application code changes.
+Actually 0MQ does rather more than this. It has a subversive effect on how you develop network-capable applications. Superficially, it's a socket-inspired API on which you do {{zmq_recv[3]}} and {{zmq_send[3]}}. But message processing rapidly becomes the central loop, and your application soon breaks down into a set of message processing tasks. It is elegant and natural. And it scales: each of these tasks maps to a node, and the nodes talk to each other across arbitrary transports. Two nodes in one process (node is a thread), two nodes on one box (node is a process), or two boxes on one network (node is a box)--it's all the same, with no application code changes.
++ Socket Scalability
@@ -514,7 +512,7 @@ wuclient 45678 &
wuclient 56789 &
[[/code]]
-As the clients run, we take a look at the active processes using 'top', and we see something like (on a 4-core box):
+As the clients run, we take a look at the active processes using the {{top}} command', and we see something like (on a 4-core box):
[[code]]
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
@@ -528,15 +526,15 @@ PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
Let's think for a second about what is happening here. The weather server has a single socket, and yet here we have it sending data to five clients in parallel. We could have thousands of concurrent clients. The server application doesn't see them, doesn't talk to them directly. So the 0MQ socket is acting like a little server, silently accepting client requests and shoving data out to them as fast as the network can handle it. And it's a multithreaded server, squeezing more juice out of your CPU.
-++ Upgrading from 0MQ/2.2 to 0MQ/3.2
+++ Upgrading from 0MQ v2.2 to 0MQ v3.2
+++ Compatible Changes
These changes don't impact existing application code directly:
-* Pub-sub filtering is now done at the publisher side instead of subscriber side. This improves performance significantly in many pub-sub use cases. You can mix 3.2 and 2.1/2.2 publishers and subscribers safely.
+* Pub-sub filtering is now done at the publisher side instead of subscriber side. This improves performance significantly in many pub-sub use cases. You can mix v3.2 and v2.1/v2.2 publishers and subscribers safely.
-* 0MQ/3.2 has many new API methods ({{zmq_disconnect[3]}}, {{zmq_unbind[3]}}, {{zmq_monitor[3]}}, {{zmq_ctx_set[3]}}, etc.)
+* 0MQ v3.2 has many new API methods ({{zmq_disconnect[3]}}, {{zmq_unbind[3]}}, {{zmq_monitor[3]}}, {{zmq_ctx_set[3]}}, etc.)
+++ Incompatible Changes
@@ -544,7 +542,7 @@ These are the main areas of impact on applications and language bindings:
* Changed send/recv methods: {{zmq_send[3]}} and {{zmq_recv[3]}} have a different, simpler interface, and the old functionality is now provided by {{zmq_msg_send[3]}} and {{zmq_msg_recv[3]}}. Symptom: compile errors. Solution: fix up your code.
-* These two methods return positive values on success, and -1 on error. In 2.x they always returned zero on success. Symptom: apparent errors when things actually work fine. Solution: test strictly for return code = -1, not non-zero.
+* These two methods return positive values on success, and -1 on error. In v2.x they always returned zero on success. Symptom: apparent errors when things actually work fine. Solution: test strictly for return code = -1, not non-zero.
* {{zmq_poll[3]}} now waits for milliseconds, not microseconds. Symptom: application stops responding (in fact responds 1000 times slower). Solution: use the {{ZMQ_POLL_MSEC}} macro defined below, in all {{zmq_poll}} calls.
@@ -558,7 +556,7 @@ These are the main areas of impact on applications and language bindings:
+++ Suggested Shim Macros
-For applications that want to run on both 2.x and 3.2, such as language bindings, our advice is to emulate 3.2 as far as possible. Here are C macro definitions that help your C/C++ code to work across both versions (taken from [http://czmq.zeromq.org CZMQ]):
+For applications that want to run on both v2.x and v3.2, such as language bindings, our advice is to emulate c3.2 as far as possible. Here are C macro definitions that help your C/C++ code to work across both versions (taken from [http://czmq.zeromq.org CZMQ]):
[[code type="fragment" name="upgrade-shim"]]
#ifndef ZMQ_DONTWAIT
@@ -576,13 +574,13 @@ For applications that want to run on both 2.x and 3.2, such as language bindings
#endif
[[/code]]
-++ Warning - Unstable Paradigms!
+++ Warning: Unstable Paradigms!
-Traditional network programming is built on the general assumption that one socket talks to one connection, one peer. There are multicast protocols but these are exotic. When we assume "one socket = one connection", we scale our architectures in certain ways. We create threads of logic where each thread work with one socket, one peer. We place intelligence and state in these threads.
+Traditional network programming is built on the general assumption that one socket talks to one connection, one peer. There are multicast protocols, but these are exotic. When we assume "one socket = one connection", we scale our architectures in certain ways. We create threads of logic where each thread work with one socket, one peer. We place intelligence and state in these threads.
In the 0MQ universe, sockets are doorways to fast little background communications engines that manage a whole set of connections automagically for you. You can't see, work with, open, close, or attach state to these connections. Whether you use blocking send or receive, or poll, all you can talk to is the socket, not the connections it manages for you. The connections are private and invisible, and this is the key to 0MQ's scalability.
-Because your code, talking to a socket, can then handle any number of connections across whatever network protocols are around, without change. A messaging pattern sitting in 0MQ scales more cheaply than a messaging pattern sitting in your application code.
+This is because your code, talking to a socket, can then handle any number of connections across whatever network protocols are around, without change. A messaging pattern sitting in 0MQ scales more cheaply than a messaging pattern sitting in your application code.
So the general assumption no longer applies. As you read the code examples, your brain will try to map them to what you know. You will read "socket" and think "ah, that represents a connection to another node". That is wrong. You will read "thread" and your brain will again think, "ah, a thread represents a connection to another node", and again your brain will be wrong.
View
331 chapter2.txt
163 additions, 168 deletions not shown
View
320 chapter3.txt
@@ -2,22 +2,22 @@
.bookmark advanced-request-reply
+ Advanced Request-Reply Patterns
-In [#sockets-and-patterns] we worked through the basics of using 0MQ by developing a series of small applications, each time exploring new aspects of 0MQ. We'll continue this approach in this chapter, as we explore advanced patterns built on top of 0MQ's core request-reply pattern.
+In [#sockets-and-patterns] we worked through the basics of using 0MQ by developing a series of small applications, each time exploring new aspects of 0MQ. We'll continue this approach in this chapter as we explore advanced patterns built on top of 0MQ's core request-reply pattern.
We'll cover:
-* How the request-reply mechanisms work.
-* How to combine REQ, REP, DEALER, and ROUTER sockets.
-* How ROUTER sockets work, in detail.
-* The load-balancing pattern.
-* Building a simple load-balancing message broker.
-* Designing a high-level API for 0MQ.
-* Building an asynchronous request-reply server.
-* A detailed inter-broker routing example.
+* How the request-reply mechanisms work
+* How to combine REQ, REP, DEALER, and ROUTER sockets
+* How ROUTER sockets work, in detail
+* The load balancing pattern
+* Building a simple load balancing message broker
+* Designing a high-level API for 0MQ
+* Building an asynchronous request-reply server
+* A detailed inter-broker routing example
++ The Request-Reply Mechanisms
-We already looked briefly at multi-part messages. Let's now look at a major use-case, which is //reply message envelopes//. An envelope is a way of safely packaging up data with an address, without touching the data itself. By separating reply addresses into an envelope we make it possible to write general-purpose intermediaries such as APIs and proxies that create, read, and remove addresses no matter what the message payload or structure.
+We already looked briefly at multipart messages. Let's now look at a major use case, which is //reply message envelopes//. An envelope is a way of safely packaging up data with an address, without touching the data itself. By separating reply addresses into an envelope we make it possible to write general purpose intermediaries such as APIs and proxies that create, read, and remove addresses no matter what the message payload or structure.
In the request-reply pattern, the envelope holds the return address for replies. It is how a 0MQ network with no state can create round-trip request-reply dialogs.
@@ -25,7 +25,7 @@ When you use REQ and REP sockets you don't even see envelopes; these sockets dea
+++ The Simple Reply Envelope
-A request-reply exchange consists of a //request// message, and an eventual //reply// message. In the simple request-reply pattern there's one reply for each request. In more advanced patterns, requests and replies can flow asynchronously. However, the reply envelope always works the same way.
+A request-reply exchange consists of a //request// message, and an eventual //reply// message. In the simple request-reply pattern, there's one reply for each request. In more advanced patterns, requests and replies can flow asynchronously. However, the reply envelope always works the same way.
The 0MQ reply envelope formally consists of zero or more reply addresses, followed by an empty frame (the envelope delimiter), followed by the message body (zero or more frames). The envelope is created by multiple sockets working together in a chain. We'll break this down.
@@ -41,11 +41,11 @@ Frame 2 | 5 | Hello | Data frame
The REP socket does the matching work: it strips off the envelope, up to and including the delimiter frame, saves the whole envelope, and passes the "Hello" string up the application. Thus our original Hello World example used request-reply envelopes internally, but the application never saw them.
-If you spy on the network data flowing between hwclient and hwserver, this is what you'll see: every request and every reply is in fact two frames, an empty frame and then the body. It doesn't seem to make much sense for a simple REQ-REP dialog. However you'll see the reason when we explore how ROUTERS and DEALERS handle envelopes.
+If you spy on the network data flowing between {{hwclient}} and {{hwserver}}, this is what you'll see: every request and every reply is in fact two frames, an empty frame and then the body. It doesn't seem to make much sense for a simple REQ-REP dialog. However you'll see the reason when we explore how ROUTER and DEALER handle envelopes.
+++ The Extended Reply Envelope
-Now let's extend the REQ-REP pair with a ROUTER-DEALER proxy in the middle and see how this affects the reply envelope. This is the //extended request-reply pattern// we already saw in [#sockets-and-patterns]. We can in fact insert any number of proxy steps[figure]. The mechanics are the same.
+Now let's extend the REQ-REP pair with a ROUTER-DEALER proxy in the middle and see how this affects the reply envelope. This is the //extended request-reply pattern// we already saw in [#sockets-and-patterns]. We can, in fact, insert any number of proxy steps[figure]. The mechanics are the same.
[[code type="textdiagram" title="Extended Request-Reply Pattern"]]
#-------# #-------#
@@ -83,9 +83,9 @@ The {{zmq_socket[3]}} man page describes it thus:
> When receiving messages a ZMQ_ROUTER socket shall prepend a message part containing the identity of the originating peer to the message before passing it to the application. Messages received are fair-queued from among all connected peers. When sending messages a ZMQ_ROUTER socket shall remove the first part of the message and use it to determine the identity of the peer the message shall be routed to.
-As a historical note, 0MQ/2.2 and earlier use UUIDs as identities, and 0MQ/3.0 and later use short integers. There's some impact on network performance but only when you use multiple proxy hops, which is rare. Mostly the change was to simplify building libzmq by removing the dependency on a UUID library.
+As a historical note, 0MQ v2.2 and earlier use UUIDs as identities, and 0MQ v3.0 and later use short integers. There's some impact on network performance, but only when you use multiple proxy hops, which is rare. Mostly the change was to simplify building libzmq by removing the dependency on a UUID library.
-Identies are a difficult concept to understand but essential if you want to become a 0MQ expert. The ROUTER socket //invents// an random identity for each connection it works with. If there are three REQ sockets connected to a ROUTER socket, it will invent three random identities, one for each REQ socket.
+Identies are a difficult concept to understand, but it's essential if you want to become a 0MQ expert. The ROUTER socket //invents// a random identity for each connection with which it works. If there are three REQ sockets connected to a ROUTER socket, it will invent three random identities, one for each REQ socket.
So if we continue our worked example, let's say the REQ socket has a 3-byte identity {{ABC"}}. Internally, this means the ROUTER socket keeps a hash table where it can search for {{ABC}} and find the TCP connection for the REQ socket.
@@ -101,11 +101,11 @@ Frame 3 | 5 | Hello | Data frame
#---+-------#
[[/code]]
-The core of the proxy loop is 'read from one socket, write to the other', so we literally send these three frames out on the DEALER socket. If you now sniffed the network traffic, you would see these three frames flying from the DEALER socket to the REP socket. The REP socket does as before, strips off the whole envelope including the new reply address, and once again delivers the "Hello" to the caller.
+The core of the proxy loop is "read from one socket, write to the other", so we literally send these three frames out on the DEALER socket. If you now sniffed the network traffic, you would see these three frames flying from the DEALER socket to the REP socket. The REP socket does as before, strips off the whole envelope including the new reply address, and once again delivers the "Hello" to the caller.
Incidentally the REP socket can only deal with one request-reply exchange at a time, which is why if you try to read multiple requests or send multiple replies without sticking to a strict recv-send cycle, it gives an error.
-You should now be able to visualize the return path. When the hwserver sends "World" back, the REP socket wraps that with the envelope it saved, and sends a three-frame reply message across the wire to the DEALER socket[figure].
+You should now be able to visualize the return path. When {{hwserver}} sends "World" back, the REP socket wraps that with the envelope it saved, and sends a three-frame reply message across the wire to the DEALER socket[figure].
[[code type="textdiagram" title="Reply with one Address"]]
#---+-----#
@@ -131,13 +131,13 @@ The REQ socket picks this message up, and checks that the first frame is the emp
+++ What's This Good For?
-To be honest the use-cases for strict request-reply or extended request-reply are somewhat limited. For one thing, there's no easy way to recover from common failures like the server crashing due to buggy application code. We'll see more about this in [#reliable-request-reply]. However once you grasp the way these four sockets deal with envelopes, and how they talk to each other, you can do very useful things. We saw how ROUTER uses the reply envelope to decide which client REQ socket to route a reply back to. Now let's express this another way:
+To be honest, the use cases for strict request-reply or extended request-reply are somewhat limited. For one thing, there's no easy way to recover from common failures like the server crashing due to buggy application code. We'll see more about this in [#reliable-request-reply]. However once you grasp the way these four sockets deal with envelopes, and how they talk to each other, you can do very useful things. We saw how ROUTER uses the reply envelope to decide which client REQ socket to route a reply back to. Now let's express this another way:
-* Each time ROUTER gives you a message it tells you what peer that came from, as an identity.
+* Each time ROUTER gives you a message, it tells you what peer that came from, as an identity.
* You can use this with a hash table (with the identity as key) to track new peers as they arrive.
* ROUTER will route messages asynchronously to any peer connected to it, if you prefix the identity as the first frame of the message.
-ROUTER sockets don't care about the whole envelope. They don't know anything about the empty delimiter. All they care about is that one identity frame that lets them figure out which connect to send a message to.
+ROUTER sockets don't care about the whole envelope. They don't know anything about the empty delimiter. All they care about is that one identity frame that lets them figure out which connection to send a message to.
+++ Recap of Request-Reply Sockets
@@ -147,9 +147,9 @@ Let's recap this:
* The REP socket reads and saves all identity frames up to and including the empty delimiter, then passes the following frame or frames to the caller. REP sockets are synchronous and talk to one peer at a time. If you connect a REP socket to multiple peers, requests are read from peers in fair fashion, and replies are always sent to the same peer that made the last request.
-* The DEALER socket is oblivious to the reply envelope and handles this like any multi-part message. DEALER sockets are asynchronous and like PUSH and PULL combined. They distribute sent messages among all connections, and fair-queue received messages from all connections.
+* The DEALER socket is oblivious to the reply envelope and handles this like any multipart message. DEALER sockets are asynchronous and like PUSH and PULL combined. They distribute sent messages among all connections, and fair-queue received messages from all connections.
-* The ROUTER socket is oblivious to the reply envelope, like DEALER. It creates identities for its connections, and passes these identities to the caller as a first frame in any received message. Conversely, when the caller sends a message, it use the first message frame as an identity to look-up the connection to send to. ROUTERS are asynchronous.
+* The ROUTER socket is oblivious to the reply envelope, like DEALER. It creates identities for its connections, and passes these identities to the caller as a first frame in any received message. Conversely, when the caller sends a message, it use the first message frame as an identity to look up the connection to send to. ROUTERS are asynchronous.
++ Request-Reply Combinations
@@ -171,44 +171,44 @@ And these combinations are invalid (and I'll explain why):
* REP to REP
* REP to ROUTER
-Here are some tips for remembering the semantics. DEALER is like an asynchronous REQ socket, and ROUTER is like an asynchronous REP socket. Where we use a REQ socket we can use a DEALER, we just have to read and write the envelope ourselves. Where we use a REP socket we can stick a ROUTER, we just need to manage the identities ourselves.
+Here are some tips for remembering the semantics. DEALER is like an asynchronous REQ socket, and ROUTER is like an asynchronous REP socket. Where we use a REQ socket, we can use a DEALER; we just have to read and write the envelope ourselves. Where we use a REP socket, we can stick a ROUTER; we just need to manage the identities ourselves.
-Think of REQ and DEALER sockets as "clients" and REP and ROUTER sockets as "servers". Mostly you'll want to bind REP and ROUTER sockets, and connect REQ and DEALER sockets to them. It's not always going to be this simple, but it is a clean and memorable place to start.
+Think of REQ and DEALER sockets as "clients" and REP and ROUTER sockets as "servers". Mostly, you'll want to bind REP and ROUTER sockets, and connect REQ and DEALER sockets to them. It's not always going to be this simple, but it is a clean and memorable place to start.
+++ The REQ to REP Combination
-We've already covered a REQ client talking to a REP server but let's take one aspect: the REQ client //must// initiate the message flow. A REP server cannot talk to a REQ client that hasn't first sent it a request. Technically it's not even possible, and the API also returns an {{EFSM}} error if you try it.
+We've already covered a REQ client talking to a REP server but let's take one aspect: the REQ client //must// initiate the message flow. A REP server cannot talk to a REQ client that hasn't first sent it a request. Technically, it's not even possible, and the API also returns an {{EFSM}} error if you try it.
+++ The DEALER to REP Combination
-Now, let's replace the REQ client with a DEALER. This gives us an asynchronous client that can talk to multiple REP servers. If we rewrote the hello world client using DEALER, we'd be able to send off any number of "Hello" requests without waiting for replies.
+Now, let's replace the REQ client with a DEALER. This gives us an asynchronous client that can talk to multiple REP servers. If we rewrote the "Hello World" client using DEALER, we'd be able to send off any number of "Hello" requests without waiting for replies.
When we use a DEALER to talk to a REP socket, we //must// accurately emulate the envelope that the REQ socket would have sent, otherwise the REP socket will discard the message as invalid. So, to send a message, we:
-* send an empty message frame with the MORE flag set; then
-* send the message body.
+* Send an empty message frame with the MORE flag set; then
+* Send the message body.
And when we receive a message, we:
-* receive the first frame, if it's not empty, discard the whole message;
-* receive the next frame and pass that to the application.
+* Receive the first frame and if it's not empty, discard the whole message;
+* Receive the next frame and pass that to the application.
+++ The REQ to ROUTER Combination
-In the same way as we can replace REQ with DEALER, we can replace REP with ROUTER. This gives us an asynchronous server that can talk to multiple REQ clients at the same time. If we rewrote the hello world server using ROUTER, we'd be able to process any number of "Hello" requests in parallel. We saw this in the [#sockets-and-patterns] mtserver example.
+In the same way that we can replace REQ with DEALER, we can replace REP with ROUTER. This gives us an asynchronous server that can talk to multiple REQ clients at the same time. If we rewrote the "Hello World" server using ROUTER, we'd be able to process any number of "Hello" requests in parallel. We saw this in the [#sockets-and-patterns] {{mtserver}} example.
We can use ROUTER in two distinct ways:
-* As a proxy that switches messages between a frontend and backend sockets.
+* As a proxy that switches messages between frontend and backend sockets.
* As an application that reads the message and acts on it.
-In the first case the ROUTER simply reads all frames including the artificial identity frame, and passes them on blindly. In the second case the ROUTER //must// know the format of the reply envelope it's being sent. As the other peer is a REQ socket, the ROUTER gets the identity frame, an empty frame, then the data frame.
+In the first case, the ROUTER simply reads all frames, including the artificial identity frame, and passes them on blindly. In the second case the ROUTER //must// know the format of the reply envelope it's being sent. As the other peer is a REQ socket, the ROUTER gets the identity frame, an empty frame, and then the data frame.
+++ The DEALER to ROUTER Combination
-Now, we can switch out both REQ and REP with DEALER and ROUTER to get the most powerful socket combination, which is DEALER talking to ROUTER. It gives us asynchronous clients talking to asynchronous servers, where both sides have full control over the message formats.
+Now we can switch out both REQ and REP with DEALER and ROUTER to get the most powerful socket combination, which is DEALER talking to ROUTER. It gives us asynchronous clients talking to asynchronous servers, where both sides have full control over the message formats.
-Since both DEALER and ROUTER can work with arbitrary message formats, if you hope to use these safely you have to become a little bit of a protocol designer. At the very least you must decide whether you wish to emulate the REQ/REP reply envelope. It depends on whether you actually need to send replies, or not.
+Because both DEALER and ROUTER can work with arbitrary message formats, if you hope to use these safely, you have to become a little bit of a protocol designer. At the very least you must decide whether you wish to emulate the REQ/REP reply envelope. It depends on whether you actually need to send replies or not.
+++ The DEALER to DEALER Combination
@@ -218,33 +218,33 @@ When you replace a REP with a DEALER, your worker can suddenly go full asynchron
+++ The ROUTER to ROUTER Combination
-This sounds perfect for N-to-N connections but it's the most difficult combination to use. You should avoid it until you are well-advanced with 0MQ. We'll see one example it in the Freelance pattern in [#reliable-request-reply], and an alternative DEALER to ROUTER design for peer-to-peer work in [#moving-pieces].
+This sounds perfect for N-to-N connections, but it's the most difficult combination to use. You should avoid it until you are well advanced with 0MQ. We'll see one example it in the Freelance pattern in [#reliable-request-reply], and an alternative DEALER to ROUTER design for peer-to-peer work in [#moving-pieces].
+++ Invalid Combinations
-Mostly, trying to connect clients to clients, or servers to servers, is a bad idea and won't work. However rather than give general vague warnings, I'll explain in detail:
+Mostly, trying to connect clients to clients, or servers to servers is a bad idea and won't work. However, rather than give general vague warnings, I'll explain in detail:
* REQ to REQ: both sides want to start by sending messages to each other, and this could only work if you timed things so that both peers exchanged messages at the same time. It hurts my brain to even think about it.
-* REQ to DEALER: you could in theory do this, but it would break if you added a second REQ, since DEALER has no way of sending a reply to the original peer. Thus the REQ socket would get confused, and/or return messages meant for another client.
+* REQ to DEALER: you could in theory do this, but it would break if you added a second REQ because DEALER has no way of sending a reply to the original peer. Thus the REQ socket would get confused, and/or return messages meant for another client.
* REP to REP: both sides would wait for the other to send the first message.
* REP to ROUTER: the ROUTER socket can in theory initiate the dialog and send a properly-formatted request, if it knows the REP socket has connected //and// it knows the identity of that connection. It's messy and adds nothing over DEALER to ROUTER.
-The common thread in this valid vs. invalid breakdown is that a 0MQ socket connection is always biased towards one peer that binds to an endpoint, and another that connects to that. Further, that which side binds and which side connects is not arbitrary, but follows natural patterns. The side which we expect to "be there" binds: it'll be a server, a broker, a publisher, a collector. The side that "comes and goes" connects: it'll be clients and workers. Remembering this will help you design better 0MQ architectures.
+The common thread in this valid versus invalid breakdown is that a 0MQ socket connection is always biased towards one peer that binds to an endpoint, and another that connects to that. Further, that which side binds and which side connects is not arbitrary, but follows natural patterns. The side which we expect to "be there" binds: it'll be a server, a broker, a publisher, a collector. The side that "comes and goes" connects: it'll be clients and workers. Remembering this will help you design better 0MQ architectures.
++ Exploring ROUTER Sockets
+++ Identities and Addresses
-The //identity// concept in 0MQ refers specifically to ROUTER sockets and how they identity the connections they have to other sockets. More broadly, identities are used as addresses in the reply envelope. In most cases the identity is arbitrary and local to the ROUTER socket: it's a lookup key in a hash table. Independently, a peer can have an address that is physical (a network endpoint like "tcp://192.168.55.117:5670") or logical (a UUID or email address or other unique key).
+The //identity// concept in 0MQ refers specifically to ROUTER sockets and how they identify the connections they have to other sockets. More broadly, identities are used as addresses in the reply envelope. In most cases, the identity is arbitrary and local to the ROUTER socket: it's a lookup key in a hash table. Independently, a peer can have an address that is physical (a network endpoint like "tcp://192.168.55.117:5670") or logical (a UUID or email address or other unique key).
-An application that uses a ROUTER socket to talk to specific peers can convert a logical address to an identity if it has built the necessary hash table. Since ROUTER sockets only announce the identity of a connection (to a specific peer) when that peer sends a message, you can only really reply to a message, not spontaneously talk to a peer.
+An application that uses a ROUTER socket to talk to specific peers can convert a logical address to an identity if it has built the necessary hash table. Because ROUTER sockets only announce the identity of a connection (to a specific peer) when that peer sends a message, you can only really reply to a message, not spontaneously talk to a peer.
-This is true even if you flip the rules and make the ROUTER connect to the peer rather than wait for the peer to connect to the ROUTER. However you can force the ROUTER socket to use a logical address in place of its identity. The {{zmq_setsockopt}} reference page calls this "setting the socket identity". It works as follows:
+This is true even if you flip the rules and make the ROUTER connect to the peer rather than wait for the peer to connect to the ROUTER. However you can force the ROUTER socket to use a logical address in place of its identity. The {{zmq_setsockopt}} reference page calls this //setting the socket identity//. It works as follows:
-* The peer application sets the {{ZMQ_IDENTITY}} option its peer socket (DEALER or REQ), //before// binding or connecting.
+* The peer application sets the {{ZMQ_IDENTITY}} option of its peer socket (DEALER or REQ) //before// binding or connecting.
* Usually the peer then connects to the already-bound ROUTER socket. But the ROUTER can also connect to the peer.
* At connection time, the peer socket tells the router socket, "please use this identity for this connection".
* If the peer socket doesn't say that, the router generates its usual arbitrary random identity for the connection.
@@ -271,33 +271,33 @@ Here is what the program prints:
+++ ROUTER Error Handling
-ROUTER sockets do have a somewhat brutal way of dealing with messages they can't send anywhere: they drop them silently. It's an attitude that makes sense in working code, but makes debugging hard. The "send identity as first frame" is tricky enough that we get this wrong when we're learning, and the ROUTER's stony silence when we mess up isn't very constructive.
+ROUTER sockets do have a somewhat brutal way of dealing with messages they can't send anywhere: they drop them silently. It's an attitude that makes sense in working code, but it makes debugging hard. The "send identity as first frame" approach is tricky enough that we often get this wrong when we're learning, and the ROUTER's stony silence when we mess up isn't very constructive.
-Since 0MQ/3.2 there's a socket option you can set to catch this error: {{ZMQ_ROUTER_MANDATORY}}. Set that on the ROUTER socket and then you provide an unroutable identity on a send call, the socket will signal an EHOSTUNREACH error.
+Since 0MQ v3.2 there's a socket option you can set to catch this error: {{ZMQ_ROUTER_MANDATORY}}. Set that on the ROUTER socket and then when you provide an unroutable identity on a send call, the socket will signal an EHOSTUNREACH error.
-++ The Load-balancing Pattern
+++ The Load Balancing Pattern
-Let's now look at some code. We'll see how to connect a ROUTER socket to a REQ socket, and then to a DEALER socket. These two examples follow the same logic, which is a //load-balancing// pattern. This pattern is our first exposure to using the ROUTER socket for deliberate routing, rather than simply acting as a reply channel.
+Now let's look at some code. We'll see how to connect a ROUTER socket to a REQ socket, and then to a DEALER socket. These two examples follow the same logic, which is a //load balancing// pattern. This pattern is our first exposure to using the ROUTER socket for deliberate routing, rather than simply acting as a reply channel.
-The load-balancing pattern is very common and we'll see it several times in this book. It solves the main problem with simple round-robin routing (as PUSH and DEALER offer) which is that round-robin becomes inefficient if tasks do not all roughly take the same time.
+The load balancing pattern is very common and we'll see it several times in this book. It solves the main problem with simple round robin routing (as PUSH and DEALER offer) which is that round robin becomes inefficient if tasks do not all roughly take the same time.
-It's the post office analogy. If you have one queue per counter, and you have some people buying stamps (a fast, simple transaction), and some people opening new accounts (a very slow transaction), then you will find stamp-buyers getting unfairly stuck in queues. Just as in a post office, if your messaging architecture is unfair, people will get annoyed.
+It's the post office analogy. If you have one queue per counter, and you have some people buying stamps (a fast, simple transaction), and some people opening new accounts (a very slow transaction), then you will find stamp buyers getting unfairly stuck in queues. Just as in a post office, if your messaging architecture is unfair, people will get annoyed.
-The solution in the post office is to create a single queue so that even if one or two counters get 'stuck' with slow work, other counters will continue to serve clients on a first-come, first-serve basis.
+The solution in the post office is to create a single queue so that even if one or two counters get stuck with slow work, other counters will continue to serve clients on a first-come, first-serve basis.
-One reason PUSH and DEALER use the simplistic approach is sheer performance. If you arrive in any major US airport, you'll find long queues of people waiting at immigration. The border-patrol officials will send people in advance to queue up at each counter, rather than using a single queue. Having people walk fifty yards in advance saves a minute or two per passenger. And since every passport check takes roughly the same time, it's more or less fair. And this is the strategy for PUSH and DEALER: send work loads ahead of time so that there is less walking distance.
+One reason PUSH and DEALER use the simplistic approach is sheer performance. If you arrive in any major US airport, you'll find long queues of people waiting at immigration. The border patrol officials will send people in advance to queue up at each counter, rather than using a single queue. Having people walk fifty yards in advance saves a minute or two per passenger. And because every passport check takes roughly the same time, it's more or less fair. This is the strategy for PUSH and DEALER: send work loads ahead of time so that there is less travel distance.
-This is a recurring theme with 0MQ: the world's problems are diverse and you can really benefit from solving different problems each in the right way. The airport isn't the post-office and one size fits no-one, really well.
+This is a recurring theme with 0MQ: the world's problems are diverse and you can benefit from solving different problems each in the right way. The airport isn't the post office and one size fits no one, really well.
-Back to a worker (DEALER or REQ) connected to a broker (ROUTER). The broker has to know when the worker is ready, and keep a list of workers so that it can take the //least recently used// worker each time.
+Let's return to the scenario of a worker (DEALER or REQ) connected to a broker (ROUTER). The broker has to know when the worker is ready, and keep a list of workers so that it can take the //least recently used// worker each time.
-The solution is really simple in fact: workers send a "Ready" message when they start, and after they finish each task. The broker reads these messages one by one. Each time it reads a message, that is from the last used worker. And since we're using a ROUTER socket, we get an identity that we can then use to send a task back to the worker.
+The solution is really simple, in fact: workers send a "ready" message when they start, and after they finish each task. The broker reads these messages one-by-one. Each time it reads a message, it is from the last used worker. And because we're using a ROUTER socket, we get an identity that we can then use to send a task back to the worker.
It's a twist on request-reply because the task is sent with the reply, and any response for the task is sent as a new request. The following code examples should make it clearer.
+++ ROUTER Broker and REQ Workers
-Here is an example of the load-balancing pattern using a ROUTER broker talking to a set of REQ workers:
+Here is an example of the load balancing pattern using a ROUTER broker talking to a set of REQ workers:
[[code type="example" title="ROUTER-to-REQ" name="rtreq"]]
[[/code]]
@@ -336,22 +336,22 @@ Anywhere you can use REQ, you can use DEALER. There are two specific differences
* The REQ socket always sends an empty delimiter frame before any data frames; the DEALER does not.
* The REQ socket will send only one message before it receives a reply; the DEALER is fully asynchronous.
-The synchronous vs. asynchronous behavior has no effect on our example since we're doing strict request-reply anyhow. It is more relevant when we come to recovering from failures, which we'll come to in [#reliable-request-reply].
+The synchronous versus asynchronous behavior has no effect on our example because we're doing strict request-reply. It is more relevant when we address recovering from failures, which we'll come to in [#reliable-request-reply].
Now let's look at exactly the same example but with the REQ socket replaced by a DEALER socket:
[[code type="example" title="ROUTER-to-DEALER" name="rtdealer"]]
[[/code]]
-The code is almost identical except that the worker uses a DEALER socket, and reads and writes that empty frame before the data frame. This is the approach I'd use when I wanted to keep compatibility with REQ workers.
+The code is almost identical except that the worker uses a DEALER socket, and reads and writes that empty frame before the data frame. This is the approach I use when I want to keep compatibility with REQ workers.
-However remember the reason for that empty delimiter frame: it's to allow multihop extended requests that terminate in a REP socket, which uses that delimiter to split off the reply envelope, so it can hand the data frames to its application.
+However, remember the reason for that empty delimiter frame: it's to allow multihop extended requests that terminate in a REP socket, which uses that delimiter to split off the reply envelope so it can hand the data frames to its application.
If we never need to pass the message along to a REP socket, we can simply drop the empty delimiter frame at both sides, which makes things simpler. This is usually the design I use for pure DEALER to ROUTER protocols.
-+++ A Load-Balancing Message Broker
++++ A Load Balancing Message Broker
-[[code type="textdiagram" title="Load-Balancing Broker"]]
+[[code type="textdiagram" title="Load Balancing Broker"]]
#--------# #--------# #--------#
| Client | | Client | | Client |
+--------+ +--------+ +--------+
@@ -363,7 +363,7 @@ If we never need to pass the message along to a REP socket, we can simply drop t
.---+----.
| ROUTER | Frontend
+--------+
- | Proxy | Load-balancer
+ | Proxy | Load balancer
+--------+
| ROUTER | Backend
'---+----'
@@ -379,23 +379,23 @@ If we never need to pass the message along to a REP socket, we can simply drop t
The previous example is half-complete. It can manage a set of workers with dummy requests and replies, but it has no way to talk to clients.
-If we add a second //frontend// ROUTER socket that accepts client requests, and turn our example into a proxy that can switch messages from frontend to backend, we get a useful and reusable tiny load-balancing message broker[figure].
+If we add a second //frontend// ROUTER socket that accepts client requests, and turn our example into a proxy that can switch messages from frontend to backend, we get a useful and reusable tiny load balancing message broker[figure].
-What this broker does is:
+This broker does the following:
* Accepts connections from a set of clients.
* Accepts connections from a set of workers.
* Accepts requests from clients and holds these in a single queue.
-* Sends these requests to workers using the load-balancing pattern.
+* Sends these requests to workers using the load balancing pattern.
* Receives replies back from workers.
* Sends these replies back to the original requesting client.
-The broker code is fairly long but worth understanding:
+The broker code is fairly long, but worth understanding:
-[[code type="example" title="Load-balancing broker" name="lbbroker"]]
+[[code type="example" title="Load balancing broker" name="lbbroker"]]
[[/code]]
-The difficult part of this program is (a) the envelopes that each socket reads and writes, and (b) the load-balancing algorithm. We'll take these in turn, starting with the message envelope formats.
+The difficult part of this program is (a) the envelopes that each socket reads and writes, and (b) the load balancing algorithm. We'll take these in turn, starting with the message envelope formats.
[[code type="textdiagram" title="Message that Client Sends"]]
#---+-------#
@@ -403,7 +403,7 @@ Frame 1 | 5 | Hello | Data frame
#---+-------#
[[/code]]
-Let's walk through a full request-reply chain from client to worker and back. In this code we set the identity of client and worker sockets to make it easier to trace the message frames. In reality we'd allow the ROUTER sockets to invent identities for connections. Let's assume the client's identity is "CLIENT" and the worker's identity is "WORKER". The client application sends a single frame containing "Hello"[figure].
+Let's walk through a full request-reply chain from client to worker and back. In this code we set the identity of client and worker sockets to make it easier to trace the message frames. In reality, we'd allow the ROUTER sockets to invent identities for connections. Let's assume the client's identity is "CLIENT" and the worker's identity is "WORKER". The client application sends a single frame containing "Hello"[figure].
[[code type="textdiagram" title="Message Coming in on Frontend"]]
#---+--------#
@@ -415,7 +415,7 @@ Frame 3 | 5 | Hello | Data frame
#---+-------#
[[/code]]
-Since the REQ socket adds its empty delimiter frame, and the ROUTER socket adds its connection identity, what the proxy reads off the frontend ROUTER socket are three frames: the client address, empty delimiter frame, and the data part[figure].
+Because the REQ socket adds its empty delimiter frame and the ROUTER socket adds its connection identity, the proxy reads off the frontend ROUTER socket the client address, empty delimiter frame, and the data part[figure].
[[code type="textdiagram" title="Message Sent to Backend"]]
#---+--------#
@@ -435,7 +435,7 @@ The broker sends this to the worker, prefixed by the address of the chosen worke
This complex envelope stack gets chewed up first by the backend ROUTER socket, which removes the first frame. Then the REQ socket in the worker removes the empty part, and provides the rest to the worker application[figure].
-The worker has to save the envelope (which is all the parts up to and including the empty message frame) and then it can do what's needed with the data part. Note that a REP socket would do this automatically but we're using the REQ-ROUTER pattern so we can get proper load-balancing.
+The worker has to save the envelope (which is all the parts up to and including the empty message frame) and then it can do what's needed with the data part. Note that a REP socket would do this automatically, but we're using the REQ-ROUTER pattern so that we can get proper load balancing.
[[code type="textdiagram" title="Message Delivered to Worker"]]
#---+--------#
@@ -447,25 +447,25 @@ Frame 3 | 5 | Hello | Data frame
#---+-------#
[[/code]]
-On the return path the messages are the same as when they come in, i.e. the backend socket gives the broker a message in five parts, and the broker sends the frontend socket a message in three parts, and the client gets a message in one part.
+On the return path, the messages are the same as when they come in, i.e., the backend socket gives the broker a message in five parts, and the broker sends the frontend socket a message in three parts, and the client gets a message in one part.
-Now let's look at the load-balancing algorithm. It requires that both clients and workers use REQ sockets, and that workers correctly store and replay the envelope on messages they get. The algorithm is:
+Now let's look at the load balancing algorithm. It requires that both clients and workers use REQ sockets, and that workers correctly store and replay the envelope on messages they get. The algorithm is:
-* Create a pollset which polls the backend always, and the frontend only if there are one or more workers available.
+* Create a pollset that always polls the backend, and polls the frontend only if there are one or more workers available.
* Poll for activity with infinite timeout.
-* If there is activity on the backend, we either have a "ready" message or a reply for a client. In either case we store the worker address (the first part) on our worker queue, and if the rest is a client reply we send it back to that client via the frontend.
+* If there is activity on the backend, we either have a "ready" message or a reply for a client. In either case, we store the worker address (the first part) on our worker queue, and if the rest is a client reply, we send it back to that client via the frontend.
* If there is activity on the frontend, we take the client request, pop the next worker (which is the last used), and send the request to the backend. This means sending the worker address, empty part, and then the three parts of the client request.
-You should now see that you can reuse and extend the load-balancing algorithm with variations based on the information the worker provides in its initial "ready" message. For example, workers might start up and do a performance self-test, then tell the broker how fast they are. The broker can then choose the fastest available worker rather than the oldest.
+You should now see that you can reuse and extend the load balancing algorithm with variations based on the information the worker provides in its initial "ready" message. For example, workers might start up and do a performance self test, then tell the broker how fast they are. The broker can then choose the fastest available worker rather than the oldest.
++ A High-Level API for 0MQ
+++ Making a Detour
-We're going to push request-reply onto the stack and open a different area, which is the 0MQ API itself. There's a reason for this detour: as we write more complex examples, the low-level 0MQ API starts to look increasingly clumsy. Look at the core of the worker thread from our load-balancing broker:
+We're going to push request-reply onto the stack and open a different area, which is the 0MQ API itself. There's a reason for this detour: as we write more complex examples, the low-level 0MQ API starts to look increasingly clumsy. Look at the core of the worker thread from our load balancing broker:
[[code type="fragment" name="lbreader"]]
while (true) {
@@ -487,7 +487,7 @@ while (true) {
}
[[/code]]
-That code isn't even reusable, because it can only handle one reply address in the envelope. And it already does some wrapping around the 0MQ API. If we used the libzmq simple message API this is what we'd have to write:
+That code isn't even reusable because it can only handle one reply address in the envelope, and it already does some wrapping around the 0MQ API. If we used the libzmq simple message API this is what we'd have to write:
[[code type="fragment" name="lowreader"]]
while (true) {
@@ -518,19 +518,19 @@ while (true) {
}
[[/code]]
-And when code is too long to write quickly, it's also too long to understand. Up to now, I've stuck to the native API because as 0MQ users we need to know that intimately. But when it gets in our way, we have to treat it as a problem to solve.
+And when code is too long to write quickly, it's also too long to understand. Up until now, I've stuck to the native API because, as 0MQ users, we need to know that intimately. But when it gets in our way, we have to treat it as a problem to solve.
-We can't of course just change the 0MQ API, which is a documented public contract that thousands of people have agreed to and depend on. Instead, we construct a higher-level API on top, based on our experience so far, and most specifically, our experience from writing more complex request-reply patterns.
+We can't of course just change the 0MQ API, which is a documented public contract on which thousands of people agree and depend. Instead, we construct a higher-level API on top based on our experience so far, and most specifically, our experience from writing more complex request-reply patterns.
What we want is an API that lets us receive and send an entire message in one shot, including the reply envelope with any number of reply addresses. One that lets us do what we want with the absolute least lines of code.
-Making a good message API is fairly difficult. We have a problem of terminology: 0MQ uses "message" to describe both multi-part messages, and individual message frames. We have a problem of expectations: sometimes it's natural to see message content as printable string data, sometimes as binary blobs. And we have technical challenges, especially if we want to avoid copying data around too much.
+Making a good message API is fairly difficult. We have a problem of terminology: 0MQ uses "message" to describe both multipart messages, and individual message frames. We have a problem of expectations: sometimes it's natural to see message content as printable string data, sometimes as binary blobs. And we have technical challenges, especially if we want to avoid copying data around too much.
-The challenge of making a good API affects all languages, though my specific use-case is C. Whatever language you use, think about how you could contribute to your language binding to make it as good (or better) than the C binding I'm going to describe.
+The challenge of making a good API affects all languages, though my specific use case is C. Whatever language you use, think about how you could contribute to your language binding to make it as good (or better) than the C binding I'm going to describe.
+++ Features of a Higher-Level API
-My solution is to use three fairly natural and obvious concepts: //string// (already the basis for our {{s_send} and {{s_recv}}) helpers, //frame// (a message frame), and //message// (a list of one or more frames). Here is the worker code, rewritten onto an API using these concepts:
+My solution is to use three fairly natural and obvious concepts: //string// (already the basis for our {{s_send}} and {{s_recv}}) helpers, //frame// (a message frame), and //message// (a list of one or more frames). Here is the worker code, rewritten onto an API using these concepts:
[[code type="fragment" name="highreader"]]
while (true) {
@@ -540,27 +540,27 @@ while (true) {
}
[[/code]]
-Cutting the amount of code we need to read and write complex messages is great: the results are easy to read and understand. Let's continue this process for other aspects of working with 0MQ. Here's a wishlist of things I'd like in a higher-level API, based on my experience with 0MQ so far:
+Cutting the amount of code we need to read and write complex messages is great: the results are easy to read and understand. Let's continue this process for other aspects of working with 0MQ. Here's a wish list of things I'd like in a higher-level API, based on my experience with 0MQ so far:
* //Automatic handling of sockets.// I find it cumbersome to have to close sockets manually, and to have to explicitly define the linger timeout in some (but not all) cases. It'd be great to have a way to close sockets automatically when I close the context.
-* //Portable thread management.// Every non-trivial 0MQ application uses threads, but POSIX threads aren't portable. So a decent high-level API should hide this under a portable layer.
+* //Portable thread management.// Every nontrivial 0MQ application uses threads, but POSIX threads aren't portable. So a decent high-level API should hide this under a portable layer.
* //Piping from parent to child threads.// It's a recurrent problem: how to signal between parent and child threads. Our API should provide a 0MQ message pipe (using PAIR sockets and {{inproc}} automatically.
* //Portable clocks.// Even getting the time to a millisecond resolution, or sleeping for some milliseconds, is not portable. Realistic 0MQ applications need portable clocks, so our API should provide them.
-* //A reactor to replace {{zmq_poll[3]}}.// The poll loop is simple but clumsy. Writing a lot of these, we end up doing the same work over and over: calculating timers, and calling code when sockets are ready. A simple reactor with socket readers, and timers, would save a lot of repeated work.
+* //A reactor to replace {{zmq_poll[3]}}.// The poll loop is simple, but clumsy. Writing a lot of these, we end up doing the same work over and over: calculating timers, and calling code when sockets are ready. A simple reactor with socket readers and timers would save a lot of repeated work.
* //Proper handling of Ctrl-C.// We already saw how to catch an interrupt. It would be useful if this happened in all applications.
+++ The CZMQ High-Level API
-Turning this wishlist into reality for the C language gives us [http://zero.mq/c CZMQ], a 0MQ language binding for C. This high-level binding in fact developed out of earlier versions of the examples. It combines nicer semantics for working with 0MQ with some portability layers, and (importantly for C but less for other languages) containers like hashes and lists. CZMQ also uses an elegant object model that leads to frankly lovely code.
+Turning this wish list into reality for the C language gives us [http://zero.mq/c CZMQ], a 0MQ language binding for C. This high-level binding, in fact, developed out of earlier versions of the examples. It combines nicer semantics for working with 0MQ with some portability layers, and (importantly for C, but less for other languages) containers like hashes and lists. CZMQ also uses an elegant object model that leads to frankly lovely code.
-Here is the load-balancing broker rewritten to use a higher-level API (CZMQ for the C case):
+Here is the load balancing broker rewritten to use a higher-level API (CZMQ for the C case):
-[[code type="example" title="Load-balancing broker using high-level API" name="lbbroker2"]]
+[[code type="example" title="Load balancing broker using high-level API" name="lbbroker2"]]
[[/code]]
One thing CZMQ provides is clean interrupt handling. This means that Ctrl-C will cause any blocking 0MQ call to exit with a return code -1 and errno set to EINTR. The high-level recv methods will return NULL in such cases. So, you can cleanly exit a loop like this:
@@ -586,12 +586,12 @@ if (zmq_poll (items, 2, 1000 * 1000) == -1)
The previous example still uses {{zmq_poll[3]}}. So how about reactors? The CZMQ {{zloop}} reactor is simple but functional. It lets you:
-* Set a reader on any socket, i.e. code that is called whenever the socket has input.
+* Set a reader on any socket, i.e., code that is called whenever the socket has input.
* Cancel a reader on a socket.
* Set a timer that goes off once or multiple times at specific intervals.
* Cancel a timer.
-{{zloop}} of course uses {{zmq_poll[3]}} internally. It rebuilds its poll set each time you add or remove readers, and it calculates the poll timeout to match the next timer. Then, it calls the reader and timer handlers for each socket and timer that needs attention.
+{{zloop}} of course uses {{zmq_poll[3]}} internally. It rebuilds its poll set each time you add or remove readers, and it calculates the poll timeout to match the next timer. Then, it calls the reader and timer handlers for each socket and timer that need attention.
When we use a reactor pattern, our code turns inside out. The main logic looks like this:
@@ -602,18 +602,18 @@ zloop_start (reactor);
zloop_destroy (&reactor);
[[/code]]
-While the actual handling of messages sits inside dedicated functions or methods. You may not like the style, it's a matter of taste. What it does help with is mixing timers and socket activity. In the rest of this text we'll use {{zmq_poll[3]}} in simpler cases, and {{zloop}} in more complex examples.
+The actual handling of messages sits inside dedicated functions or methods. You may not like the style--it's a matter of taste. What it does help with is mixing timers and socket activity. In the rest of this text, we'll use {{zmq_poll[3]}} in simpler cases, and {{zloop}} in more complex examples.
-Here is the load-balancing broker rewritten once again, this time to use {{zloop}}:
+Here is the load balancing broker rewritten once again, this time to use {{zloop}}:
[[code type="example" title="Load balancing broker using zloop" name="lbbroker3"]]
[[/code]]
-Getting applications to properly shut-down when you send them Ctrl-C can be tricky. If you use the {{zctx}} class it'll automatically set-up signal handling, but your code still has to cooperate. You must break any loop if {{zmq_poll}} returns -1 or if any of the {{zstr_recv}}, {{zframe_recv}}, or {{zmsg_recv}} methods return NULL. If you have nested loops, it can be useful to make the outer ones conditional on {{!zctx_interrupted}}.
+Getting applications to properly shut down when you send them Ctrl-C can be tricky. If you use the {{zctx}} class it'll automatically set up signal handling, but your code still has to cooperate. You must break any loop if {{zmq_poll}} returns -1 or if any of the {{zstr_recv}}, {{zframe_recv}}, or {{zmsg_recv}} methods return NULL. If you have nested loops, it can be useful to make the outer ones conditional on {{!zctx_interrupted}}.
-++ The Asynchronous Client-Server Pattern
+++ The Asynchronous Client/Server Pattern
-[[code type="textdiagram" title="Asynchronous Client-Server"]]
+[[code type="textdiagram" title="Asynchronous Client/Server"]]
#----------# #----------#
| Client | | Client |
+----------+ +----------+
@@ -631,7 +631,7 @@ Getting applications to properly shut-down when you send them Ctrl-C can be tric
#-------------#
[[/code]]
-In the ROUTER to DEALER example we saw a 1-to-N use-case where one server talks asynchronously to multiple workers. We can turn this upside-down to get a very useful N-to-1 architecture where various clients talk to a single server, and do this asynchronously[figure].
+In the ROUTER to DEALER example, we saw a 1-to-N use case where one server talks asynchronously to multiple workers. We can turn this upside down to get a very useful N-to-1 architecture where various clients talk to a single server, and do this asynchronously[figure].
Here's how it works:
@@ -642,10 +642,10 @@ Here's how it works:
Here's code that shows how this works:
-[[code type="example" title="Asynchronous client-server" name="asyncsrv"]]
+[[code type="example" title="Asynchronous client/server" name="asyncsrv"]]
[[/code]]
-The example runs in one process, with multiple threads simulating a real multi-process architecture. When you run the example, you'll see three clients (each with a random ID), printing out the replies they get from the server. Look carefully and you'll see each client task gets 0 or more replies per request.
+The example runs in one process, with multiple threads simulating a real multiprocess architecture. When you run the example, you'll see three clients (each with a random ID), printing out the replies they get from the server. Look carefully and you'll see each client task gets 0 or more replies per request.
[[code type="textdiagram" title="Detail of Asynchronous Server"]]
#---------# #---------# #---------#
@@ -687,7 +687,7 @@ Some comments on this code:
* The server uses a pool of worker threads, each processing one request synchronously. It connects these to its frontend socket using an internal queue[figure]. It connects the frontend and backend sockets using a {{zmq_proxy[3]}} call.
-Note that we're doing DEALER to ROUTER dialog between client and server, but internally between the server main thread and workers we're doing DEALER to DEALER. If the workers were strictly synchronous, we'd use REP. But since we want to send multiple replies we need an async socket. We do //not// want to route replies, they always go to the single server thread that sent us the request.
+Note that we're doing DEALER to ROUTER dialog between client and server, but internally between the server main thread and workers, we're doing DEALER to DEALER. If the workers were strictly synchronous, we'd use REP. However, because we want to send multiple replies, we need an async socket. We do //not// want to route replies, they always go to the single server thread that sent us the request.
Let's think about the routing envelope. The client sends a message consisting of a single frame. The server thread receives a two-frame message (original message prefixed by client identity). We send these two frames on to the worker, which treats it as a normal reply envelope, returns that to us as a two frame message. We then use the first frame as an identity to route the second frame back to the client as a reply.
@@ -699,13 +699,13 @@ It looks something like this:
1 part 2 parts 2 parts
[[/code]]
-Now for the sockets: we could use the load-balancing ROUTER to DEALER pattern to talk to workers, but it's extra work. In this case a DEALER to DEALER pattern is probably fine: the trade-off is lower-latency for each request but higher risk of unbalanced work distribution. Simplicity wins in this case.
+Now for the sockets: we could use the load balancing ROUTER to DEALER pattern to talk to workers, but it's extra work. In this case, a DEALER to DEALER pattern is probably fine: the trade-off is lower latency for each request, but higher risk of unbalanced work distribution. Simplicity wins in this case.
When you build servers that maintain stateful conversations with clients, you will run into a classic problem. If the server keeps some state per client, and clients keep coming and going, eventually it will run out of resources. Even if the same clients keep connecting, if you're using default identities, each connection will look like a new one.
-We cheat in the above example by keeping state only for a very short time (the time it takes a worker to process a request) and then throwing away the state. But that's not practical for many cases. To properly manage client state in a stateful asynchronous server you have to:
+We cheat in the above example by keeping state only for a very short time (the time it takes a worker to process a request) and then throwing away the state. But that's not practical for many cases. To properly manage client state in a stateful asynchronous server, you have to:
-* Do heartbeating from client to server. In our example we send a request once per second, which can reliably be used as a heartbeat.
+* Do heartbeating from client to server. In our example, we send a request once per second, which can reliably be used as a heartbeat.
* Store state using the client identity (whether generated or explicit) as key.
@@ -713,11 +713,11 @@ We cheat in the above example by keeping state only for a very short time (the t
++ Worked Example: Inter-Broker Routing
-Let's take everything we've seen so far, and scale things up to a real application. We'll build this step by step over several iterations. Our best client calls us urgently and asks for a design of a large cloud computing facility. He has this vision of a cloud that spans many data centers, each a cluster of clients and workers, and that works together as a whole. Because we're smart enough to know that practice always beats theory, we propose to make a working simulation using 0MQ. Our client, eager to lock down the budget before his own boss changes his mind, and having read great things about 0MQ on Twitter, agrees.
+Let's take everything we've seen so far, and scale things up to a real application. We'll build this step-by-step over several iterations. Our best client calls us urgently and asks for a design of a large cloud computing facility. He has this vision of a cloud that spans many data centers, each a cluster of clients and workers, and that works together as a whole. Because we're smart enough to know that practice always beats theory, we propose to make a working simulation using 0MQ. Our client, eager to lock down the budget before his own boss changes his mind, and having read great things about 0MQ on Twitter, agrees.
+++ Establishing the Details
-Several espressos later, we want to jump into writing code but a little voice tells us to get more details before making a sensational solution to entirely the wrong problem. "What kind of work is the cloud doing?", we ask.
+Several espressos later, we want to jump into writing code, but a little voice tells us to get more details before making a sensational solution to entirely the wrong problem. "What kind of work is the cloud doing?", we ask.
The client explains:
@@ -729,19 +729,19 @@ The client explains:
* If there are no workers in their own cluster, clients' tasks will go off to other available workers in the cloud.
-* Clients send out one task at a time, waiting for a reply. If they don't get an answer within X seconds they'll just send out the task again. This ain't our concern, the client API does it already.
+* Clients send out one task at a time, waiting for a reply. If they don't get an answer within X seconds, they'll just send out the task again. This isn't our concern; the client API does it already.
-* Workers process one task at a time, they are very simple beasts. If they crash, they get restarted by whatever script started them.
+* Workers process one task at a time; they are very simple beasts. If they crash, they get restarted by whatever script started them.
-So we double check to make sure that we understood this correctly:
+So we double-check to make sure that we understood this correctly:
* "There will be some kind of super-duper network interconnect between clusters, right?", we ask. The client says, "Yes, of course, we're not idiots."
-* "What kind of volumes are we talking about?", we ask. The client replies, "Up to a thousand clients per cluster, each doing max. ten requests per second. Requests are small, and replies are also small, no more than 1K bytes each."
+* "What kind of volumes are we talking about?", we ask. The client replies, "Up to a thousand clients per cluster, each doing at most ten requests per second. Requests are small, and replies are also small, no more than 1K bytes each."
So we do a little calculation and see that this will work nicely over plain TCP. 2,500 clients x 10/second x 1,000 bytes x 2 directions = 50MB/sec or 400Mb/sec, not a problem for a 1Gb network.
-It's a straight-forward problem that requires no exotic hardware or protocols, just some clever routing algorithms and careful design. We start by designing one cluster (one data center) and then we figure out how to connect clusters together.
+It's a straightforward problem that requires no exotic hardware or protocols, just some clever routing algorithms and careful design. We start by designing one cluster (one data center) and then we figure out how to connect clusters together.
[[code type="textdiagram" title="Cluster Architecture"]]
#--------# #--------# #--------#
@@ -772,9 +772,9 @@ It's a straight-forward problem that requires no exotic hardware or protocols, j
+++ Architecture of a Single Cluster
-Workers and clients are synchronous. We want to use the load-balancing pattern to route tasks to workers. Workers are all identical, our facility has no notion of different services. Workers are anonymous, clients never address them directly. We make no attempt here to provide guaranteed delivery, retry, etc.
+Workers and clients are synchronous. We want to use the load balancing pattern to route tasks to workers. Workers are all identical; our facility has no notion of different services. Workers are anonymous; clients never address them directly. We make no attempt here to provide guaranteed delivery, retry, and so on.
-For reasons we already looked at, clients and workers won't speak to each other directly. It makes it impossible to add or remove nodes dynamically. So our basic model consists of the request-reply message broker we saw earlier[figure].
+For reasons we already examined, clients and workers won't speak to each other directly. It makes it impossible to add or remove nodes dynamically. So our basic model consists of the request-reply message broker we saw earlier[figure].
+++ Scaling to Multiple Clusters
@@ -800,19 +800,19 @@ Now we scale this out to more than one cluster. Each cluster has a set of client
The question is: how do we get the clients of each cluster talking to the workers of the other cluster? There are a few possibilities, each with pros and cons:
-* Clients could connect directly to both brokers. The advantage is that we don't need to modify brokers or workers. But clients get more complex, and become aware of the overall topology. If we want to add, e.g. a third or forth cluster, all the clients are affected. In effect we have to move routing and fail-over logic into the clients and that's not nice.
+* Clients could connect directly to both brokers. The advantage is that we don't need to modify brokers or workers. But clients get more complex and become aware of the overall topology. If we want to add a third or forth cluster, for example, all the clients are affected. In effect we have to move routing and failover logic into the clients and that's not nice.
-* Workers might connect directly to both brokers. But REQ workers can't do that, they can only reply to one broker. We might use REPs but REPs don't give us customizable broker-to-worker routing like load-balancing, only the built-in load balancing. That's a fail, if we want to distribute work to idle workers: we precisely need load-balancing. One solution would be to use ROUTER sockets for the worker nodes. Let's label this "Idea #1".
+* Workers might connect directly to both brokers. But REQ workers can't do that, they can only reply to one broker. We might use REPs but REPs don't give us customizable broker-to-worker routing like load balancing does, only the built-in load balancing. That's a fail; if we want to distribute work to idle workers, we precisely need load balancing. One solution would be to use ROUTER sockets for the worker nodes. Let's label this "Idea #1".
-* Brokers could connect to each other. This looks neatest because it creates the fewest additional connections. We can't add clusters on the fly but that is probably out of scope. Now clients and workers remain ignorant of the real network topology, and brokers tell each other when they have spare capacity. Let's label this "Idea #2".
+* Brokers could connect to each other. This looks neatest because it creates the fewest additional connections. We can't add clusters on the fly, but that is probably out of scope. Now clients and workers remain ignorant of the real network topology, and brokers tell each other when they have spare capacity. Let's label this "Idea #2".
-Let's explore Idea #1. In this model we have workers connecting to both brokers and accepting jobs from either[figure].
+Let's explore Idea #1. In this model, we have workers connecting to both brokers and accepting jobs from either one[figure].
-It looks feasible. However it doesn't provide what we wanted, which was that clients get local workers if possible and remote workers only if it's better than waiting. Also workers will signal "ready" to both brokers and can get two jobs at once, while other workers remain idle. It seems this design fails because again we're putting routing logic at the edges.
+It looks feasible. However, it doesn't provide what we wanted, which was that clients get local workers if possible and remote workers only if it's better than waiting. Also workers will signal "ready" to both brokers and can get two jobs at once, while other workers remain idle. It seems this design fails because again we're putting routing logic at the edges.
-So idea #2 then. We interconnect the brokers and don't touch the clients or workers, which are REQs like we're used to[figure].
+So, idea #2 then. We interconnect the brokers and don't touch the clients or workers, which are REQs like we're used to[figure].
-[[code type="textdiagram" title="Idea 1 - Cross-connected Workers"]]
+[[code type="textdiagram" title="Idea 1: Cross-connected Workers"]]
Cluster 1 : Cluster 2
:
#--------# : #--------#
@@ -833,17 +833,17 @@ So idea #2 then. We interconnect the brokers and don't touch the clients or work
#--------# #--------# #--------# :
[[/code]]
-This design is appealing because the problem is solved in one place, invisible to the rest of the world. Basically, brokers open secret channels to each other and whisper, like camel traders, "Hey, I've got some spare capacity, if you have too many clients give me a shout and we'll deal".
+This design is appealing because the problem is solved in one place, invisible to the rest of the world. Basically, brokers open secret channels to each other and whisper, like camel traders, "Hey, I've got some spare capacity. If you have too many clients, give me a shout and we'll deal".
-It is in effect just a more sophisticated routing algorithm: brokers become subcontractors for each other. Other things to like about this design, even before we play with real code:
+In effect it is just a more sophisticated routing algorithm: brokers become subcontractors for each other. There are other things to like about this design, even before we play with real code:
* It treats the common case (clients and workers on the same cluster) as default and does extra work for the exceptional case (shuffling jobs between clusters).
-* It lets us use different message flows for the different types of work. That means we can handle them differently, e.g. using different types of network connection.
+* It lets us use different message flows for the different types of work. That means we can handle them differently, e.g., using different types of network connection.
-* It feels like it would scale smoothly. Interconnecting three, or more brokers doesn't get over-complex. If we find this to be a problem, it's easy to solve by adding a super-broker.
+* It feels like it would scale smoothly. Interconnecting three or more brokers doesn't get overly complex. If we find this to be a problem, it's easy to solve by adding a super-broker.
-[[code type="textdiagram" title="Idea 2 - Brokers Talking to Each Other"]]
+[[code type="textdiagram" title="Idea 2: Brokers Talking to Each Other"]]
Cluster 1 : Cluster 2
:
.---. .---. .---. : .---. .---. .---.
@@ -861,17 +861,17 @@ It is in effect just a more sophisticated routing algorithm: brokers become subc
'---' '---' '---' : '---' '---' '---'
[[/code]]
-We'll now make a worked example. We'll pack an entire cluster into one process. That is obviously not realistic but it makes it simple to simulate, and the simulation can accurately scale to real processes. This is the beauty of 0MQ, you can design at the microlevel and scale that up to the macro level. Threads become processes, become boxes and the patterns and logic remain the same. Each of our 'cluster' processes contains client threads, worker threads, and a broker thread.
+We'll now make a worked example. We'll pack an entire cluster into one process. That is obviously not realistic, but it makes it simple to simulate, and the simulation can accurately scale to real processes. This is the beauty of 0MQ--you can design at the micro-level and scale that up to the macro-level. Threads become processes, and then become boxes and the patterns and logic remain the same. Each of our "cluster" processes contains client threads, worker threads, and a broker thread.
We know the basic model well by now:
* The REQ client (REQ) threads create workloads and pass them to the broker (ROUTER).
* The REQ worker (REQ) threads process workloads and return the results to the broker (ROUTER).
-* The broker queues and distributes workloads using the load-balancing pattern.
+* The broker queues and distributes workloads using the load balancing pattern.
-+++ Federation vs. Peering
++++ Federation Versus Peering
-There are several possible ways to interconnect brokers. What we want is to be able to tell other brokers, "we have capacity", and then receive multiple tasks. We also need to be able to tell other brokers "stop, we're full". It doesn't need to be perfect: sometimes we may accept jobs we can't process immediately, then we'll do them as soon as possible.
+There are several possible ways to interconnect brokers. What we want is to be able to tell other brokers, "we have capacity", and then receive multiple tasks. We also need to be able to tell other brokers, "stop, we're full". It doesn't need to be perfect; sometimes we may accept jobs we can't process immediately, then we'll do them as soon as possible.
[[code type="textdiagram" title="Cross-connected Brokers in Federation Model"]]
Cluster 1 : Cluster 2
@@ -893,17 +893,17 @@ There are several possible ways to interconnect brokers. What we want is to be a
:
[[/code]]
-The simplest interconnect is //federation// in which brokers simulate clients and workers for each other. We would do this by connecting our frontend to the other broker's backend socket[figure]. Note that it is legal to both bind a socket to an endpoint and connect it to other endpoints.
+The simplest interconnect is //federation//, in which brokers simulate clients and workers for each other. We would do this by connecting our frontend to the other broker's backend socket[figure]. Note that it is legal to both bind a socket to an endpoint and connect it to other endpoints.
-This would give us simple logic in both brokers and a reasonably good mechanism: when there are no clients, tell the other broker 'ready', and accept one job from it. The problem is also that it is too simple for this problem. A federated broker would be able to handle only one task at once. If the broker emulates a lock-step client and worker, it is by definition also going to be lock-step and if it has lots of available workers they won't be used. Our brokers need to be connected in a fully asynchronous fashion.
+This would give us simple logic in both brokers and a reasonably good mechanism: when there are no clients, tell the other broker "ready", and accept one job from it. The problem is also that it is too simple for this problem. A federated broker would be able to handle only one task at a time. If the broker emulates a lock-step client and worker, it is by definition also going to be lock-step, and if it has lots of available workers they won't be used. Our brokers need to be connected in a fully asynchronous fashion.
-The federation model is perfect for other kinds of routing, especially service-oriented architectures (SOAs) which route by service name and proximity rather than load-balancing or round-robin. So don't dismiss it as useless, it's just not right for all use-cases.
+The federation model is perfect for other kinds of routing, especially service-oriented architectures (SOAs), which route by service name and proximity rather than load balancing or round robin. So don't dismiss it as useless, it's just not right for all use cases.
Instead of federation, let's look at a //peering// approach in which brokers are explicitly aware of each other and talk over privileged channels. Let's break this down, assuming we want to interconnect N brokers. Each broker has (N - 1) peers, and all brokers are using exactly the same code and logic. There are two distinct flows of information between brokers:
-* Each broker needs to tell its peers how many workers it has available at any time. This can be fairly simple information, just a quantity that is updated regularly. The obvious (and correct) socket pattern for this is pub-sub. So every broker opens a PUB socket and publishes state information on that, and every broker also opens a SUB socket and connects that to the PUB socket of every other broker, to get state information from its peers.
+* Each broker needs to tell its peers how many workers it has available at any time. This can be fairly simple information--just a quantity that is updated regularly. The obvious (and correct) socket pattern for this is pub-sub. So every broker opens a PUB socket and publishes state information on that, and every broker also opens a SUB socket and connects that to the PUB socket of every other broker to get state information from its peers.
-* Each broker needs a way to delegate tasks to a peer and get replies back, asynchronously. We'll do this using ROUTER/ROUTER (ROUTER/ROUTER) sockets, no other combination works. Each broker has two such sockets: one for tasks it receives, one for tasks it delegates. If we didn't use two sockets it would be more work to know whether we were reading a request or a reply each time. That would mean adding more information to the message envelope.
+* Each broker needs a way to delegate tasks to a peer and get replies back, asynchronously. We'll do this using ROUTER sockets; no other combination works. Each broker has two such sockets: one for tasks it receives and one for tasks it delegates. If we didn't use two sockets, it would be more work to know whether we were reading a request or a reply each time. That would mean adding more information to the message envelope.
And there is also the flow of information between a broker and its local clients and workers.
@@ -945,7 +945,7 @@ And there is also the flow of information between a broker and its local clients
#---------# #---------# #---------#
[[/code]]
-Three flows x two sockets for each flow = six sockets that we have to manage in the broker. Choosing good names is vital to keeping a multi-socket juggling act reasonably coherent in our minds. Sockets //do// something and what they do should form the basis for their names. It's about being able to read the code several weeks later on a cold Monday morning before coffee, and not feeling pain.
+Three flows x two sockets for each flow = six sockets that we have to manage in the broker. Choosing good names is vital to keeping a multisocket juggling act reasonably coherent in our minds. Sockets //do// something and what they do should form the basis for their names. It's about being able to read the code several weeks later on a cold Monday morning before coffee, and not feel any pain.
Let's do a shamanistic naming ceremony for the sockets. The three flows are:
@@ -953,23 +953,23 @@ Let's do a shamanistic naming ceremony for the sockets. The three flows are:
* A //cloud// request-reply flow between the broker and its peer brokers.
* A //state// flow between the broker and its peer brokers.
-Finding meaningful names that are all the same length means our code will align nicely. It's not a big thing, but attention to details helps. For each flow the broker has two sockets that we can orthogonally call the "frontend" and "backend". We've used these names quite often. A frontend receives information or tasks. A backend sends those out to other peers. The conceptual flow is from front to back (with replies going in the opposite direction from back to front).
+Finding meaningful names that are all the same length means our code will align nicely. It's not a big thing, but attention to details helps. For each flow the broker has two sockets that we can orthogonally call the //frontend// and //backend//. We've used these names quite often. A frontend receives information or tasks. A backend sends those out to other peers. The conceptual flow is from front to back (with replies going in the opposite direction from back to front).
-So in all the code we write for this tutorial will use these socket names:
+So in all the code we write for this tutorial, we will use these socket names:
* //localfe// and //localbe// for the local flow.
* //cloudfe// and //cloudbe// for the cloud flow.
* //statefe// and //statebe// for the state flow.
-For our transport and because we're simulating the whole thing on one box, we'll use {{ipc}} for everything. This has the advantage of working like {{tcp}} in terms of connectivity (i.e. it's a disconnected transport, unlike {{inproc}}), yet we don't need IP addresses or DNS names, which would be a pain here. Instead, we will use {{ipc}} endpoints called //something//-{{local}}, //something//-{{cloud}}, and //something//-{{state}}, where //something// is the name of our simulated cluster.
+For our transport and because we're simulating the whole thing on one box, we'll use {{ipc}} for everything. This has the advantage of working like {{tcp}} in terms of connectivity (i.e., it's a disconnected transport, unlike {{inproc}}), yet we don't need IP addresses or DNS names, which would be a pain here. Instead, we will use {{ipc}} endpoints called //something//-{{local}}, //something//-{{cloud}}, and //something//-{{state}}, where //something// is the name of our simulated cluster.
-You may be thinking that this is a lot of work for some names. Why not call them s1, s2, s3, s4, etc.? The answer is that if your brain is not a perfect machine, you need a lot of help when reading code, and we'll see that these names do help. It's easier to remember "three flows, two directions" than "six different sockets"[figure].
+You might be thinking that this is a lot of work for some names. Why not call them s1, s2, s3, s4, etc.? The answer is that if your brain is not a perfect machine, you need a lot of help when reading code, and we'll see that these names do help. It's easier to remember "three flows, two directions" than "six different sockets"[figure].
Note that we connect the cloudbe in each broker to the cloudfe in every other broker, and likewise we connect the statebe in each broker to the statefe in every other broker.
+++ Prototyping the State Flow
-Since each socket flow has its own little traps for the unwary, we will test them in real code one by one, rather than try to throw the whole lot into code in one go. When we're happy with each flow, we can put them together into a full program. We'll start with the state flow[figure].
+Because each socket flow has its own little traps for the unwary, we will test them in real code one-by-one, rather than try to throw the whole lot into code in one go. When we're happy with each flow, we can put them together into a full program. We'll start with the state flow[figure].
Here is how this works in code:
@@ -1014,15 +1014,15 @@ Here is how this works in code:
Notes about this code:
-* Each broker has an identity that we use to construct {{ipc}} endpoint names. A real broker would need to work with TCP and a more sophisticated configuration scheme. We'll look at such schemes later in this book but for now, using generated {{ipc}} names lets us ignore the problem of where to get TCP/IP addresses or names from.
+* Each broker has an identity that we use to construct {{ipc}} endpoint names. A real broker would need to work with TCP and a more sophisticated configuration scheme. We'll look at such schemes later in this book, but for now, using generated {{ipc}} names lets us ignore the problem of where to get TCP/IP addresses or names.
* We use a {{zmq_poll[3]}} loop as the core of the program. This processes incoming messages and sends out state messages. We send a state message //only// if we did not get any incoming messages //and// we waited for a second. If we send out a state message each time we get one in, we'll get message storms.
-* We use a two-part pubsub message consisting of sender address and data. Note that we will need to know the address of the publisher in order to send it tasks, and the only way is to send this explicitly as a part of the message.
+* We use a two-part pub-sub message consisting of sender address and data. Note that we will need to know the address of the publisher in order to send it tasks, and the only way is to send this explicitly as a part of the message.
-* We don't set identities on subscribers, because if we did then we'd get out of date state information when connecting to running brokers.
+* We don't set identities on subscribers because if we did then we'd get outdated state information when connecting to running brokers.
-* We don't set a HWM on the publisher, but if we were using 0MQ/2.x that would be a wise idea.
+* We don't set a HWM on the publisher, but if we were using 0MQ v2.x that would be a wise idea.
We can build this little program and run it three times to simulate three clusters. Let's call them DC1, DC2, and DC3 (the names are arbitrary). We run these three commands, each in a separate window:
@@ -1034,9 +1034,9 @@ peering1 DC3 DC1 DC2 # Start DC3 and connect to DC1 and DC2
You'll see each cluster report the state of its peers, and after a few seconds they will all happily be printing random numbers once per second. Try this and satisfy yourself that the three brokers all match up and synchronize to per-second state updates.
-In real life we'd not send out state messages at regular intervals but rather whenever we had a state change, i.e. whenever a worker becomes available or unavailable. That may seem like a lot of traffic but state messages are small and we've established that the inter-cluster connections are super-fast.
+In real life, we'd not send out state messages at regular intervals, but rather whenever we had a state change, i.e., whenever a worker becomes available or unavailable. That may seem like a lot of traffic, but state messages are small and we've established that the inter-cluster connections are super fast.
-If we wanted to send state messages at precise intervals we'd create a child thread and open the statebe socket in that thread. We'd then send irregular state updates to that child thread from our main thread, and allow the child thread to conflate them into regular outgoing messages. This is more work than we need here.
+If we wanted to send state messages at precise intervals, we'd create a child thread and open the {{statebe}} socket in that thread. We'd then send irregular state updates to that child thread from our main thread and allow the child thread to conflate them into regular outgoing messages. This is more work than we need here.
+++ Prototyping the Local and Cloud Flows
@@ -1078,30 +1078,30 @@ If we wanted to send state messages at precise intervals we'd create a child thr
Let's now prototype at the flow of tasks via the local and cloud sockets[figure]. This code pulls requests from clients and then distributes them to local workers and cloud peers on a random basis.
-Before we jump into the code, which is getting a little complex, let's sketch the core routing logic and break it down into a simple but robust design.
+Before we jump into the code, which is getting a little complex, let's sketch the core routing logic and break it down into a simple yet robust design.
We need two queues, one for requests from local clients and one for requests from cloud clients. One option would be to pull messages off the local and cloud frontends, and pump these onto their respective queues. But this is kind of pointless because 0MQ sockets //are// queues already. So let's use the 0MQ socket buffers as queues.
-This was the technique we used in the load-balancing broker, and it worked nicely. We only read from the two frontends when there is somewhere to send the requests. We can always read from the backends, since they give us replies to route back. As long as the backends aren't talking to us, there's no point in even looking at the frontends.
+This was the technique we used in the load balancing broker, and it worked nicely. We only read from the two frontends when there is somewhere to send the requests. We can always read from the backends, as they give us replies to route back. As long as the backends aren't talking to us, there's no point in even looking at the frontends.
So our main loop becomes:
-* Poll the backends for activity. When we get a message, it may be "READY" from a worker or it may be a reply. If it's a reply, route back via the local or cloud frontend.
+* Poll the backends for activity. When we get a message, it may be "ready" from a worker or it may be a reply. If it's a reply, route back via the local or cloud frontend.
* If a worker replied, it became available, so we queue it and count it.
-* While there are workers available, take a request, if any, from either frontend and route to a local worker, or randomly, a cloud peer.
+* While there are workers available, take a request, if any, from either frontend and route to a local worker, or randomly, to a cloud peer.
-Randomly sending tasks to a peer broker rather than a worker simulates work distribution across the cluster. It's dumb but that is fine for this stage.
+Randomly sending tasks to a peer broker rather than a worker simulates work distribution across the cluster. It's dumb, but that is fine for this stage.
-We use broker identities to route messages between brokers. Each broker has a name, which we provide on the command line in this simple prototype. As long as these names don't overlap with the 0MQ-generated UUIDs used for client nodes, we can figure out whether to route a reply back to a client or to a broker.
+We use broker identities to route messages between brokers. Each broker has a name that we provide on the command line in this simple prototype. As long as these names don't overlap with the 0MQ-generated UUIDs used for client nodes, we can figure out whether to route a reply back to a client or to a broker.
Here is how this works in code. The interesting part starts around the comment "Interesting part".
[[code type="example" title="Prototype local and cloud flow" name="peering2"]]
[[/code]]
-Run this by, for instance, starting two instance of the broker in two windows:
+Run this by, for instance, starting two instances of the broker in two windows:
[[code]]
peering2 me you
@@ -1112,9 +1112,9 @@ Some comments on this code:
* In the C code at least, using the zmsg class makes life much easier, and our code much shorter. It's obviously an abstraction that works. If you build 0MQ applications in C, you should use CZMQ.
-* Since we're not getting any state information from peers, we naively assume they are running. The code prompts you to confirm when you've started all the brokers. In the real case we'd not send anything to brokers who had not told us they exist.
+* Because we're not getting any state information from peers, we naively assume they are running. The code prompts you to confirm when you've started all the brokers. In the real case, we'd not send anything to brokers who had not told us they exist.
-You can satisfy yourself that the code works by watching it run forever. If there were any misrouted messages, clients would end up blocking, and the brokers would stop printing trace information. You can prove that by killing either of the brokers. The other broker tries to send requests to the cloud, and one by one its clients block, waiting for an answer.
+You can satisfy yourself that the code works by watching it run forever. If there were any misrouted messages, clients would end up blocking, and the brokers would stop printing trace information. You can prove that by killing either of the brokers. The other broker tries to send requests to the cloud, and one-by-one its clients block, waiting for an answer.
+++ Putting it All Together
@@ -1125,22 +1125,22 @@ This code is the size of both previous prototypes together, at 270 LoC. That's p
[[code type="example" title="Full cluster simulation" name="peering3"]]
[[/code]]
-It's a non-trivial program and took about a day to get working. These are the highlights:
+It's a nontrivial program and took about a day to get working. These are the highlights:
* The client threads detect and report a failed request. They do this by polling for a response and if none arrives after a while (10 seconds), printing an error message.
-* Client threads don't print directly, but instead send a message to a 'monitor' socket (PUSH) that the main loop collects (PULL) and prints off. This is the first case we've seen of using 0MQ sockets for monitoring and logging; this is a big use-case we'll come back to later.
+* Client threads don't print directly, but instead send a message to a monitor socket (PUSH) that the main loop collects (PULL) and prints off. This is the first case we've seen of using 0MQ sockets for monitoring and logging; this is a big use case that we'll come back to later.
* Clients simulate varying loads to get the cluster 100% at random moments, so that tasks are shifted over to the cloud. The number of clients and workers, and delays in the client and worker threads control this. Feel free to play with them to see if you can make a more realistic simulation.
* The main loop uses two pollsets. It could in fact use three: information, backends, and frontends. As in the earlier prototype, there is no point in taking a frontend message if there is no backend capacity.
-These are some of the problems that hit during development of this program:
+These are some of the problems that arose during development of this program:
-* Clients would freeze, due to requests or replies getting lost somewhere. Recall that the 0MQ ROUTER/ROUTER socket drops messages it can't route. The first tactic here was to modify the client thread to detect and report such problems. Secondly, I put zmsg_dump() calls after every recv() and before every send() in the main loop, until it was clear what the problems were.
+* Clients would freeze, due to requests or replies getting lost somewhere. Recall that the ROUTER socket drops messages it can't route. The first tactic here was to modify the client thread to detect and report such problems. Secondly, I put {{zmsg_dump()}} calls after every receive and before every send in the main loop, until the origin of the problems was clear.
-* The main loop was mistakenly reading from more than one ready socket. This caused the first message to be lost. Fixed that by reading only from the first ready socket.
+* The main loop was mistakenly reading from more than one ready socket. This caused the first message to be lost. I fixed that by reading only from the first ready socket.
-* The zmsg class was not properly encoding UUIDs as C strings. This caused UUIDs that contain 0 bytes to be corrupted. Fixed by modifying zmsg to encode UUIDs as printable hex strings.
+* The {{zmsg}} class was not properly encoding UUIDs as C strings. This caused UUIDs that contain 0 bytes to be corrupted. I fixed that by modifying {{zmsg}} to encode UUIDs as printable hex strings.
-This simulation does not detect disappearance of a cloud peer. If you start several peers and stop one, and it was broadcasting capacity to the others, they will continue to send it work even if it's gone. You can try this, and you will get clients that complain of lost requests. The solution is twofold: first, only keep the capacity information for a short time so that if a peer does disappear, its capacity is quickly set to 'zero'. Second, add reliability to the request-reply chain. We'll look at reliability in the next chapter.
+This simulation does not detect disappearance of a cloud peer. If you start several peers and stop one, and it was broadcasting capacity to the others, they will continue to send it work even if it's gone. You can try this, and you will get clients that complain of lost requests. The solution is twofold: first, only keep the capacity information for a short time so that if a peer does disappear, its capacity is quickly set to zero. Second, add reliability to the request-reply chain. We'll look at reliability in the next chapter.
View
420 chapter4.txt
214 additions, 206 deletions not shown
View
312 chapter5.txt
@@ -2,67 +2,69 @@
.bookmark advanced-pub-sub
+ Advanced Pub-Sub Patterns
-In [#advanced-request-reply] and [#reliable-request-reply] we looked at advanced use of 0MQ's request-reply pattern. If you managed to digest all that, congratulations. In this chapter we'll focus on publish-subscribe ("pub-sub"), and extend 0MQ's core pub-sub pattern with higher-level patterns for performance, reliability, state distribution, and monitoring.
+In [#advanced-request-reply] and [#reliable-request-reply] we looked at advanced use of 0MQ's request-reply pattern. If you managed to digest all that, congratulations. In this chapter we'll focus on publish-subscribe and extend 0MQ's core pub-sub pattern with higher-level patterns for performance, reliability, state distribution, and monitoring.
We'll cover:
-* When to use publish-subscribe.
-* How to handle too-slow subscribers (the //Suicidal Snail// pattern).
-* How to design high-speed subscribers (the //Black Box// pattern).
-* How to monitor a pub-sub network (the //Espresso// pattern).
-* How to build a shared key-value store (the //Clone// pattern).
-* How to use reactors to simplify complex servers.
-* How to use the Binary Star pattern to add failover to a server.
+* When to use publish-subscribe
+* How to handle too-slow subscribers (the //Suicidal Snail// pattern)
+* How to design high-speed subscribers (the //Black Box// pattern)
+* How to monitor a pub-sub network (the //Espresso// pattern)
+* How to build a shared key-value store (the //Clone// pattern)
+* How to use reactors to simplify complex servers
+* How to use the Binary Star pattern to add failover to a server
++ Pros and Cons of Pub-Sub
0MQ's low-level patterns have their different characters. Pub-sub addresses an old messaging problem, which is //multicast// or //group messaging//. It has that unique mix of meticulous simplicity and brutal indifference that characterizes 0MQ. It's worth understanding the trade-offs that pub-sub makes, how these benefit us, and how we can work around them if needed.
-First, PUB sends each message to "all of many", whereas PUSH and DEALER rotate messages to "one of many". You cannot simply replace PUSH with PUB or vice-versa and hope that things will work. This bears repeating because people seem to quite often suggest doing this.
+First, PUB sends each message to "all of many", whereas PUSH and DEALER rotate messages to "one of many". You cannot simply replace PUSH with PUB or vice versa and hope that things will work. This bears repeating because people seem to quite often suggest doing this.
More profoundly, pub-sub is aimed at scalability. This means large volumes of data, sent rapidly to many recipients. If you need millions of messages per second sent to thousands of points, you'll appreciate pub-sub a lot more than if you need a few messages a second sent to a handful of recipients.
-To get scalability, pub-sub uses the same trick as push-pull, which is to get rid of back-chatter. This means, recipients don't talk back to senders. There are some exceptions, e.g. SUB sockets will send subscriptions to PUB sockets, but it's anonymous and infrequent.
+To get scalability, pub-sub uses the same trick as push-pull, which is to get rid of back-chatter. This means that recipients don't talk back to senders. There are some exceptions, e.g., SUB sockets will send subscriptions to PUB sockets, but it's anonymous and infrequent.
-Killing back-chatter is essential to real scalability. With pub-sub, it's how the pattern can map cleanly to the PGM multicast protocol, which is handled by the network switch. I.e. subscribers don't connect to the publisher at all, they connect to a multicast 'group' on the switch, to which the publisher sends its messages.
+Killing back-chatter is essential to real scalability. With pub-sub, it's how the pattern can map cleanly to the PGM multicast protocol, which is handled by the network switch. In other words, subscribers don't connect to the publisher at all, they connect to a multicast //group// on the switch, to which the publisher sends its messages.
When we remove back-chatter, our overall message flow becomes //much// simpler, which lets us make simpler APIs, simpler protocols, and in general reach many more people. But we also remove any possibility to coordinate senders and receivers. What this means is:
-* Publishers can't tell when subscribers are successfully connected, both on initial connections, and on re-connections after network failures.
-* Subscribers can't tell publishers anything that would allow publishers to control the rate of messages they send. Publishers only have one setting, which is //full-speed//, and subscribers must either keep up, or lose messages.
-* Publishers can't tell when subscribers have disappeared due to processes crashing, networks breaking, etc.
+* Publishers can't tell when subscribers are successfully connected, both on initial connections, and on reconnections after network failures.
+
+* Subscribers can't tell publishers anything that would allow publishers to control the rate of messages they send. Publishers only have one setting, which is //full-speed//, and subscribers must either keep up or lose messages.
+
+* Publishers can't tell when subscribers have disappeared due to processes crashing, networks breaking, and so on.
The downside is that we actually need all of these if we want to do reliable multicast. The 0MQ pub-sub pattern will lose messages arbitrarily when a subscriber is connecting, when a network failure occurs, or just if the subscriber or network can't keep up with the publisher.
-The upside is that there are many use-cases where //almost// reliable multicast is just fine. When we need this back-chatter, we can either switch to using ROUTER-DEALER (which I tend to do for most normal volume cases), or we can add a separate channel for synchronization (we'll see an example of this later in this chapter).
+The upside is that there are many use cases where //almost// reliable multicast is just fine. When we need this back-chatter, we can either switch to using ROUTER-DEALER (which I tend to do for most normal volume cases), or we can add a separate channel for synchronization (we'll see an example of this later in this chapter).
-Pub-sub is like a radio broadcast, you miss everything before you join, and then how much information you get depends on the quality of your reception. Surprisingly this model is useful and wide-spread, because it maps perfectly to real-world distribution of information. Think of Facebook and Twitter, the BBC World Service, and the sports results.
+Pub-sub is like a radio broadcast; you miss everything before you join, and then how much information you get depends on the quality of your reception. Surprisingly, this model is useful and widespread because it maps perfectly to real world distribution of information. Think of Facebook and Twitter, the BBC World Service, and the sports results.
As we did for request-reply, let's define //reliability// in terms of what can go wrong. Here are the classic failure cases for pub-sub:
-* Subscribers join late, so miss messages the server already sent.
+* Subscribers join late, so they miss messages the server already sent.
* Subscribers can fetch messages too slowly, so queues build up and then overflow.
* Subscribers can drop off and lose messages while they are away.
-* Subscribers can crash, and restart, and lose whatever data they already received.
+* Subscribers can crash and restart, and lose whatever data they already received.
* Networks can become overloaded and drop data (specifically, for PGM).
-* Networks can become too slow, so publisher-side queues overflow, and publishers crash.
+* Networks can become too slow, so publisher-side queues overflow and publishers crash.
-A lot more can go wrong but these are the typical failures we see in a realistic system. Since 3.x, 0MQ forces default limits on its internal buffers (the so-called 'high-water mark' or HWM), so publisher crashes are rarer unless you deliberately set the HWM to infinite.
+A lot more can go wrong but these are the typical failures we see in a realistic system. Since v3.x, 0MQ forces default limits on its internal buffers (the so-called high-water mark or HWM), so publisher crashes are rarer unless you deliberately set the HWM to infinite.
-All of these failure cases have answers, though not always simple ones. Reliability means complexity, and most of us don't need reliability, most of the time, which is why 0MQ doesn't attempt to provide it out of the box (even if there was one global design for reliability, which there isn't).
+All of these failure cases have answers, though not always simple ones. Reliability requires complexity that most of us don't need, most of the time, which is why 0MQ doesn't attempt to provide it out of the box (even if there was one global design for reliability, which there isn't).
-++ Pub-sub Tracing (Espresso Pattern)
+++ Pub-Sub Tracing (Espresso Pattern)
-Let's start this chapter by looking at a way to trace pub-sub networks. In [#sockets-and-patterns] we saw a simple proxy that used these to do transport bridging. The {{zmq_proxy[3]}} method has three arguments; a //front-end// and //back-end// socket that it bridges together, and a //capture// socket that it will send all messages to.
+Let's start this chapter by looking at a way to trace pub-sub networks. In [#sockets-and-patterns] we saw a simple proxy that used these to do transport bridging. The {{zmq_proxy[3]}} method has three arguments: a //frontend// and //backend// socket that it bridges together, and a //capture// socket to which it will send all messages.
The code is deceptively simple:
[[code type="example" title="Espresso Pattern" name="espresso"]]
[[/code]]
-Espresso works by creating a listener thread that reads a PAIR socket and prints anything it gets. That PAIR socket is one end of a pipe; the other end (another PAIR) is the socket we pass to {{zmq_proxy[3]}}. In practice you'd filter interesting messages to get the essence of what you want to track (hence the name of the pattern).
+Espresso works by creating a listener thread that reads a PAIR socket and prints anything it gets. That PAIR socket is one end of a pipe; the other end (another PAIR) is the socket we pass to {{zmq_proxy[3]}}. In practice, you'd filter interesting messages to get the essence of what you want to track (hence the name of the pattern).
-The subscriber thread subscribes to "A" and "B", receives five messages, and then destroys its socket. When you run example, the listener prints two subscription messages, five data messages, two unsubscribe messages, and then silence:
+The subscriber thread subscribes to "A" and "B", receives five messages, and then destroys its socket. When you run the example, the listener prints two subscription messages, five data messages, two unsubscribe messages, and then silence:
[[code]]
[002] 0141
@@ -76,19 +78,19 @@ The subscriber thread subscribes to "A" and "B", receives five messages, and the
[002] 0042
[[/code]]
-Which shows neatly how the publisher socket stops sending data when there are no subscribers for it. The publisher thread is still sending messages. The socket just drops them silently.
+This shows neatly how the publisher socket stops sending data when there are no subscribers for it. The publisher thread is still sending messages. The socket just drops them silently.
++ Last Value Caching
-If you've used commercial pub-sub systems you may be used to some features that are missing in the fast and cheerful 0MQ pub-sub model. One of these is "last value caching" (LVC). The problem this solves is how a new subscriber catches up when it joins the network. The theory is that publishers get notified when a new subscriber joins and subscribes to some specific topics. The publisher can then re-broadcast the last message for those topics.
+If you've used commercial pub-sub systems, you may be used to some features that are missing in the fast and cheerful 0MQ pub-sub model. One of these is //last value caching// (LVC). This solves the problem of how a new subscriber catches up when it joins the network. The theory is that publishers get notified when a new subscriber joins and subscribes to some specific topics. The publisher can then rebroadcast the last message for those topics.
-I've already explained why publishers don't get notified when there are new subscribers, because in large pub-sub systems the volumes of data make it pretty much impossible. To make really large-scale pub-sub networks you need a protocol like PGM that exploits an upscale Ethernet switch's ability to multicast data to thousands of subscribers. Trying to do a TCP unicast from the publisher to each of thousands of subscribers just doesn't scale. You get weird spikes, unfair distribution (some subscribers getting the message before others), network congestion, and general unhappiness.
+I've already explained why publishers don't get notified when there are new subscribers, because in large pub-sub systems, the volumes of data make it pretty much impossible. To make really large-scale pub-sub networks, you need a protocol like PGM that exploits an upscale Ethernet switch's ability to multicast data to thousands of subscribers. Trying to do a TCP unicast from the publisher to each of thousands of subscribers just doesn't scale. You get weird spikes, unfair distribution (some subscribers getting the message before others), network congestion, and general unhappiness.
PGM is a one-way protocol: the publisher sends a message to a multicast address at the switch, which then rebroadcasts that to all interested subscribers. The publisher never sees when subscribers join or leave: this all happens in the switch, which we don't really want to start reprogramming.
-However in a lower-volume network, with a few dozen subscribers, and a limited number of topics, we can use TCP and then the XSUB and XPUB sockets //do// talk to each other, as we just saw in the Espresso pattern.
+However, in a lower-volume network with a few dozen subscribers and a limited number of topics, we can use TCP and then the XSUB and XPUB sockets //do// talk to each other as we just saw in the Espresso pattern.
-Can we make an LVC using 0MQ? The answer is "yes", if we make a proxy that sits between the publisher and subscribers; an analog for the PGM switch, but one we can program ourselves.
+Can we make an LVC using 0MQ? The answer is yes, if we make a proxy that sits between the publisher and subscribers; an analog for the PGM switch, but one we can program ourselves.
I'll start by making a publisher and subscriber that highlight the worst case scenario. This publisher is pathological. It starts by immediately sending messages to each of a thousand topics, and then it sends one update a second to a random topic. A subscriber connects, and subscribes to a topic. Without LVC, a subscriber would have to wait an average of 500 seconds to get any data. To add some drama, let's pretend there's an escaped convict called Gregor threatening to rip the head off Roger the toy bunny if we can't fix that 8.3 minutes' delay.
@@ -109,12 +111,12 @@ Try building and running these: first the subscriber, then the publisher. You'll
./pathopub
[[/code]]
-It's when you run a second subscriber that you understand Roger's predicament. You have to leave it an awful long time before it reports getting any data. So, here's our last value cache. As I promised, it's a proxy that binds to two sockets and then handles messages on both them:
+It's when you run a second subscriber that you understand Roger's predicament. You have to leave it an awful long time before it reports getting any data. So, here's our last value cache. As I promised, it's a proxy that binds to two sockets and then handles messages on both:
[[code type="example" title="Last Value Caching Proxy" name="lvcache"]]
[[/code]]
-Now, run the proxy, then the publisher:
+Now, run the proxy, and then the publisher:
[[code]]
./lvcache &
@@ -127,9 +129,9 @@ And now run as many instances of the subscriber as you want to try, each time co
./pathosub tcp://localhost:5558
[[/code]]
-Each subscriber happily reports "Save Roger", and Gregor the Escaped Convict slinks back to his seat for dinner and a nice cup of hot milk, which is all he really wanted anyhow.
+Each subscriber happily reports "Save Roger", and Gregor the Escaped Convict slinks back to his seat for dinner and a nice cup of hot milk, which is all he really wanted in the first place.
-One note: the XPUB socket by default does not report duplicate subscriptions, which is what you want when you're naively connecting an XPUB to an XSUB. Our example sneakily gets around this by using random topics so the chance of it not working is one in a million. In a real LVC proxy you'll want to use the {{ZMQ_XPUB_VERBOSE}} option that we implement in [#the-community] as an exercise.
+One note: by default, the XPUB socket does not report duplicate subscriptions, which is what you want when you're naively connecting an XPUB to an XSUB. Our example sneakily gets around this by using random topics so the chance of it not working is one in a million. In a real LVC proxy, you'll want to use the {{ZMQ_XPUB_VERBOSE}} option that we implement in [#the-community] as an exercise.
++ Slow Subscriber Detection (Suicidal Snail Pattern)
@@ -137,37 +139,38 @@ A common problem you will hit when using the pub-sub pattern in real life is the
How do we handle a slow subscriber? The ideal fix is to make the subscriber faster, but that might take work and time. Some of the classic strategies for handling a slow subscriber are:
-* **Queue messages on the publisher**. This is what Gmail does when I don't read my email for a couple of hours. But in high-volume messaging, pushing queues upstream has the thrilling but unprofitable result of making publishers run out of memory and crash. Especially if there are lots of subscribers and it's not possible to flush to disk for performance reasons.
+* **Queue messages on the publisher**. This is what Gmail does when I don't read my email for a couple of hours. But in high-volume messaging, pushing queues upstream has the thrilling but unprofitable result of making publishers run out of memory and crash--especially if there are lots of subscribers and it's not possible to flush to disk for performance reasons.
-* **Queue messages on the subscriber**. This is much better, and it's what 0MQ does by default if the network can keep up with things. If anyone's going to run out of memory and crash, it'll be the subscriber rather than the publisher, which is fair. This is perfect for "peaky" streams where a subscriber can't keep up for a while, but can catch up when the stream slows down. However it's no answer to a subscriber that's simply too slow in general.
+* **Queue messages on the subscriber**. This is much better, and it's what 0MQ does by default if the network can keep up with things. If anyone's going to run out of memory and crash, it'll be the subscriber rather than the publisher, which is fair. This is perfect for "peaky" streams where a subscriber can't keep up for a while, but can catch up when the stream slows down. However, it's no answer to a subscriber that's simply too slow in general.
-* **Stop queuing new messages after a while**. This is what Gmail does when my mailbox overflows its 7.554GB, no 7.555GB of space. New messages just get rejected or dropped. This is a great strategy from the perspective of the publisher, and it's what 0MQ does when the publisher sets a high water mark or HWM. However it still doesn't help us fix the slow subscriber. Now we just get gaps in our message stream.
+* **Stop queuing new messages after a while**. This is what Gmail does when my mailbox overflows its precious gigabytes of space. New messages just get rejected or dropped. This is a great strategy from the perspective of the publisher, and it's what 0MQ does when the publisher sets a HWM. However, it still doesn't help us fix the slow subscriber. Now we just get gaps in our message stream.
-* **Punish slow subscribers with disconnect**. This is what Hotmail (remember that?) did when I didn't login for two weeks, which is why I was on my fifteenth Hotmail account when it hit me that there was perhaps a better way. It's a nice brutal strategy that forces subscribers to sit up and pay attention, and would be ideal, but 0MQ doesn't do this, and there's no way to layer it on top since subscribers are invisible to publisher applications.
+* **Punish slow subscribers with disconnect**. This is what Hotmail (remember that?) did when I didn't log in for two weeks, which is why I was on my fifteenth Hotmail account when it hit me that there was perhaps a better way. It's a nice brutal strategy that forces subscribers to sit up and pay attention and would be ideal, but 0MQ doesn't do this, and there's no way to layer it on top because subscribers are invisible to publisher applications.
-None of these classic strategies fit. So we need to get creative. Rather than disconnect the publisher, let's convince the subscriber to kill itself. This is the Suicidal Snail pattern. When a subscriber detects that it's running too slowly (where "too slowly" is presumably a configured option that really means "so slowly that if you ever get here, shout really loudly because I need to know, so I can fix this!"), it croaks and dies.
+None of these classic strategies fit, so we need to get creative. Rather than disconnect the publisher, let's convince the subscriber to kill itself. This is the Suicidal Snail pattern. When a subscriber detects that it's running too slowly (where "too slowly" is presumably a configured option that really means "so slowly that if you ever get here, shout really loudly because I need to know, so I can fix this!"), it croaks and dies.
-How can a subscriber detect this? One way would be to sequence messages (number them in order), and use a HWM at the publisher. Now, if the subscriber detects a gap (i.e. the numbering isn't consecutive), it knows something is wrong. We then tune the HWM to the "croak and die if you hit this" level.
+How can a subscriber detect this? One way would be to sequence messages (number them in order) and use a HWM at the publisher. Now, if the subscriber detects a gap (i.e., the numbering isn't consecutive), it knows something is wrong. We then tune the HWM to the "croak and die if you hit this" level.
There are two problems with this solution. One, if we have many publishers, how do we sequence messages? The solution is to give each publisher a unique ID and add that to the sequencing. Second, if subscribers use {{ZMQ_SUBSCRIBE}} filters, they will get gaps by definition. Our precious sequencing will be for nothing.
-Some use-cases won't use filters, and sequencing will work for them. But a more general solution is that the publisher timestamps each message. When a subscriber gets a message it checks the time, and if the difference is more than, say, one second, it does the "croak and die" thing. Possibly firing off a squawk to some operator console first.
+Some use cases won't use filters, and sequencing will work for them. But a more general solution is that the publisher timestamps each message. When a subscriber gets a message, it checks the time, and if the difference is more than, say, one second, it does the "croak and die" thing, possibly firing off a squawk to some operator console first.
The Suicide Snail pattern works especially when subscribers have their own clients and service-level agreements and need to guarantee certain maximum latencies. Aborting a subscriber may not seem like a constructive way to guarantee a maximum latency, but it's the assertion model. Abort today, and the problem will be fixed. Allow late data to flow downstream, and the problem may cause wider damage and take longer to appear on the radar.
-So here is a minimal example of a Suicidal Snail:
+Here is a minimal example of a Suicidal Snail:
[[code type="example" title="Suicidal Snail" name="suisnail"]]
[[/code]]
-Notes about this example:
+Here are some things to note about the Suicidal Snail example:
-* The message here consists simply of the current system clock as a number of milliseconds. In a realistic application you'd have at least a message header with the timestamp, and a message body with data.
-* The example has subscriber and publisher in a single process, as two threads. In reality they would be separate processes. Using threads is just convenient for the demonstration.
+* The message here consists simply of the current system clock as a number of milliseconds. In a realistic application, you'd have at least a message header with the timestamp and a message body with data.
-++ High-speed Subscribers (Black Box Pattern)
+* The example has subscriber and publisher in a single process as two threads. In reality, they would be separate processes. Using threads is just convenient for the demonstration.
-Now lets look at one way to make our subscribers faster. A common use-case for pub-sub is distributing large data streams like market data coming from stock exchanges. A typical set-up would have a publisher connected to a stock exchange, taking price quotes, and sending them out to a number of subscribers. If there are a handful of subscribers, we could use TCP. If we have a larger number of subscribers, we'd probably use reliable multicast, i.e. PGM.
+++ High-Speed Subscribers (Black Box Pattern)
+
+Now lets look at one way to make our subscribers faster. A common use case for pub-sub is distributing large data streams like market data coming from stock exchanges. A typical setup would have a publisher connected to a stock exchange, taking price quotes, and sending them out to a number of subscribers. If there are a handful of subscribers, we could use TCP. If we have a larger number of subscribers, we'd probably use reliable multicast, i.e., PGM.
[[code type="textdiagram" title="The Simple Black Box Pattern"]]
#-----------#
@@ -198,18 +201,19 @@ Now lets look at one way to make our subscribers faster. A common use-case for p
'----------------------------------------'
[[/code]]
-Let's imagine our feed has an average of 100,000 100-byte messages a second. That's a typical rate, after filtering market data we don't need to send on to subscribers. Now we decide to record a day's data (maybe 250 GB in 8 hours), and then replay it to a simulation network, i.e. a small group of subscribers. While 100K messages a second is easy for a 0MQ application, we want to replay //much faster//.
+Let's imagine our feed has an average of 100,000 100-byte messages a second. That's a typical rate, after filtering market data we don't need to send on to subscribers. Now we decide to record a day's data (maybe 250 GB in 8 hours), and then replay it to a simulation network, i.e., a small group of subscribers. While 100K messages a second is easy for a 0MQ application, we want to replay it //much faster//.
-So we set-up our architecture with a bunch of boxes, one for the publisher, and one for each subscriber. These are well-specified boxes, eight cores, twelve for the publisher.
+So we set up our architecture with a bunch of boxes--one for the publisher and one for each subscriber. These are well-specified boxes--eight cores, twelve for the publisher.
And as we pump data into our subscribers, we notice two things:
# When we do even the slightest amount of work with a message, it slows down our subscriber to the point where it can't catch up with the publisher again.
-# We're hitting a ceiling, at both publisher and subscriber, to around say 6M messages a second, even after careful optimization and TCP tuning.
-The first thing we have to do is break our subscriber into a multithreaded design so that we can do work with messages in one set of threads, while reading messages in another. Typically we don't want to process every message the same way. Rather, the subscriber will filter some messages, perhaps by prefix key. When a message matches some criteria, the subscriber will call a worker to deal with it. In 0MQ terms this means sending the message to a worker thread.
+# We're hitting a ceiling, at both publisher and subscriber, to around 6M messages a second, even after careful optimization and TCP tuning.
+
+The first thing we have to do is break our subscriber into a multithreaded design so that we can do work with messages in one set of threads, while reading messages in another. Typically, we don't want to process every message the same way. Rather, the subscriber will filter some messages, perhaps by prefix key. When a message matches some criteria, the subscriber will call a worker to deal with it. In 0MQ terms, this means sending the message to a worker thread.
-So the subscriber looks something like a queue device. We could use various sockets to connect the subscriber and workers. If we assume one-way traffic, and workers that are all identical, we can use PUSH and PULL, and delegate all the routing work to 0MQ[figure]. This is the simplest and fastest approach.
+So the subscriber looks something like a queue device. We could use various sockets to connect the subscriber and workers. If we assume one-way traffic and workers that are all identical, we can use PUSH and PULL and delegate all the routing work to 0MQ[figure]. This is the simplest and fastest approach.
The subscriber talks to the publisher over TCP or PGM. The subscriber talks to its workers, which are all in the same process, over {{inproc://}}.
@@ -242,9 +246,9 @@ The subscriber talks to the publisher over TCP or PGM. The subscriber talks to i
'----------------------------------------'
[[/code]]
-Now to break that ceiling. What happens is that the subscriber thread hits 100% of CPU, and since it is one thread, it cannot use more than one core. A single thread will always hit a ceiling, be it at 2M, 6M, or more messages per second. We want to split the work across multiple threads that can run in parallel.
+Now to break that ceiling. The subscriber thread hits 100% of CPU and because it is one thread, it cannot use more than one core. A single thread will always hit a ceiling, be it at 2M, 6M, or more messages per second. We want to split the work across multiple threads that can run in parallel.
-The approach used by many high-performance products, which works here, is //sharding//, meaning we split the work into parallel and independent streams. E.g. half of the topic keys are in one stream, half in another. We could use many streams, but performance won't scale unless we have free cores. So let's see how to shard into two streams[figure].
+The approach used by many high-performance products, which works here, is //sharding//. Using sharding, we split the work into parallel and independent streams, such as half of the topic keys in one stream, and half in another. We could use many streams, but performance won't scale unless we have free cores. So let's see how to shard into two streams[figure].
With two streams, working at full speed, we would configure 0MQ as follows:
@@ -256,7 +260,7 @@ With two streams, working at full speed, we would configure 0MQ as follows:
* The remaining cores assigned to worker threads.
* Worker threads connected to both subscriber PUSH sockets.
-With ideally, no more threads in our architecture than we had cores. Once we create more threads than cores, we get contention between threads, and diminishing returns. There would be no benefit, for example, in creating more I/O threads.
+Ideally, we want to match the number of fully-loaded threads in our architecture with the number of cores. When threads start to fight for cores and CPU cycles, the cost of adding more threads outweighs the benefits. There would be no benefit, for example, in creating more I/O threads.
++ Reliable Pub-Sub (Clone Pattern)
@@ -267,33 +271,33 @@ As a larger worked example, we'll take the problem of making a reliable pub-sub
* These applications must share a single eventually-consistent //state//.
* Any application can update the state at any point in time.
-Let's say that updates are reasonably low-volume. We don't have real-time goals. The whole state can fit into memory. Some plausible use cases are:
+Let's say that updates are reasonably low-volume. We don't have real time goals. The whole state can fit into memory. Some plausible use cases are:
* A configuration that is shared by a group of cloud servers.
* Some game state shared by a group of players.
-* Exchange rate data that is updated in real-time and available to applications.
+* Exchange rate data that is updated in real time and available to applications.
-+++ Centralized vs. Decentralized
++++ Centralized Versus Decentralized
-A first decision we have to make is whether we work with a central server or not. It makes a big difference in the design we end up with. The trade-offs are these:
+A first decision we have to make is whether we work with a central server or not. It makes a big difference in the resulting design. The trade-offs are these:
-* Conceptually, a central server is simpler to understand, since networks are not naturally symmetrical. With a central server we avoid all questions of discovery, bind vs. connect, etc.
+* Conceptually, a central server is simpler to understand because networks are not naturally symmetrical. With a central server, we avoid all questions of discovery, bind versus connect, and so on.
-* Generally, a fully-distributed architecture is technically more challenging but ends up with simpler protocols. That is, each node has to act as server and client in the right way, which is delicate. But when done right the results are simpler than using a central server. We saw this in the Freelance pattern in [#reliable-request-reply].
+* Generally, a fully-distributed architecture is technically more challenging but ends up with simpler protocols. That is, each node must act as server and client in the right way, which is delicate. When done right, the results are simpler than using a central server. We saw this in the Freelance pattern in [#reliable-request-reply].
-* A central server will become a bottleneck in high-volume use cases. If we really have to handle //scale//, i.e. millions of messages a second, we should aim for decentralization right away.
+* A central server will become a bottleneck in high-volume use cases. If handling scale in the order of millions of messages a second is required, we should aim for decentralization right away.
-* A central server will, ironically, scale to more nodes more easily than decentralization. That is, it's easier to connect 10,000 nodes to one server than to each other.
+* Ironically, a centralized architecture will scale to more nodes more easily than a decentralized one. That is, it's easier to connect 10,000 nodes to one server than to each other.
So, for the Clone pattern we'll work with a //server// that publishes state updates and a set of //clients// that represent applications.
+++ Representing State as Key-Value Pairs
-We'll develop Clone in stages, solving one problem at a time. First, let's look at how to update a shared state across a set of clients. We need to decide how to represent our state, and the updates. The simplest plausible format is a key-value store, where one key-value pair represents an atomic unit of change in the shared state.
+We'll develop Clone in stages, solving one problem at a time. First, let's look at how to update a shared state across a set of clients. We need to decide how to represent our state, as well as the updates. The simplest plausible format is a key-value store, where one key-value pair represents an atomic unit of change in the shared state.
We have a simple pub-sub example in [#basics], the weather server and client. Let's change the server to send key-value pairs, and the client to store these in a hash table. This lets us send updates from one server to a set of clients using the classic pub-sub model[figure].
-An update is either a new key-value pair, a modified value for an existing key, or a deleted key. We can assume for now that the whole store fits in memory and that applications access it by key, e.g. using a hash table or dictionary. For larger stores and some kind of persistence we'd probably store the state in a database, but that's not relevant here.
+An update is either a new key-value pair, a modified value for an existing key, or a deleted key. We can assume for now that the whole store fits in memory and that applications access it by key, such as by using a hash table or dictionary. For larger stores and some kind of persistence we'd probably store the state in a database, but that's not relevant here.
This is the server:
@@ -324,32 +328,32 @@ And here is the client:
#--------# #--------# #--------#
[[/code]]
-Some notes about this code:
+Here are some things to note about this first model:
-* All the hard work is done in a **kvmsg** class. This class works with key-value message objects, which are multi-part 0MQ messages structured as three frames: a key (a 0MQ string), a sequence number (64-bit value, in network byte order), and a binary body (holds everything else).
+* All the hard work is done in a {{kvmsg}} class. This class works with key-value message objects, which are multipart 0MQ messages structured as three frames: a key (a 0MQ string), a sequence number (64-bit value, in network byte order), and a binary body (holds everything else).
* The server generates messages with a randomized 4-digit key, which lets us simulate a large but not enormous hash table (10K entries).
* We don't implement deletions in this version: all messages are inserts or updates.
-* The server does a 200 millisecond pause after binding its socket. This is to prevent "slow joiner syndrome" where the subscriber loses messages as it connects to the server's socket. We'll remove that in later versions of the Clone code.
+* The server does a 200 millisecond pause after binding its socket. This is to prevent //slow joiner syndrome//, where the subscriber loses messages as it connects to the server's socket. We'll remove that in later versions of the Clone code.
-* We'll use the terms 'publisher' and 'subscriber' in the code to refer to sockets. This will help later when we have multiple sockets doing different things.
+* We'll use the terms //publisher// and //subscriber// in the code to refer to sockets. This will help later when we have multiple sockets doing different things.
-Here is the kvmsg class, in the simplest form that works for now:
+Here is the {{kvmsg}} class, in the simplest form that works for now:
[[code type="example" title="Key-value message class" name="kvsimple"]]
[[/code]]
-Later we'll make a more sophisticated kvmsg class that will work in in real applications.
+Later, we'll make a more sophisticated {{kvmsg}} class that will work in in real applications.
-Both the server and client maintain hash tables, but this first model only works properly if we start all clients before the server, and the clients never crash. That's very artificial.
+Both the server and client maintain hash tables, but this first model only works properly if we start all clients before the server and the clients never crash. That's very artificial.
-+++ Getting an Out-of-band Snapshot
++++ Getting an Out-of-Band Snapshot
So now we have our second problem: how to deal with late-joining clients or clients that crash and then restart.
-In order to allow a late (or recovering) client to catch up with a server it has to get a snapshot of the server's state. Just as we've reduced "message" to mean "a sequenced key-value pair", we can reduce "state" to mean "a hash table". To get the server state, a client opens a DEALER socket and asks for it explicitly[figure].
+In order to allow a late (or recovering) client to catch up with a server, it has to get a snapshot of the server's state. Just as we've reduced "message" to mean "a sequenced key-value pair", we can reduce "state" to mean "a hash table". To get the server state, a client opens a DEALER socket and asks for it explicitly[figure].
To make this work, we have to solve a problem of timing. Getting a state snapshot will take a certain time, possibly fairly long if the snapshot is large. We need to correctly apply updates to the snapshot. But the server won't know when to start sending us updates. One way would be to start subscribing, get a first update, and then ask for "state for update N". This would require the server storing one snapshot for each update, which isn't practical.
@@ -380,7 +384,7 @@ So we will do the synchronization in the client, as follows:
* The client waits for the server to reply with state, and meanwhile queues all updates. It does this simply by not reading them: 0MQ keeps them queued on the socket queue.
-* When the client receives its state update, it begins once again to read updates. However it discards any updates that are older than the state update. So if the state update includes updates up to 200, the client will discard updates up to 201.
+* When the client receives its state update, it begins once again to read updates. However, it discards any updates that are older than the state update. So if the state update includes updates up to 200, the client will discard updates up to 201.
* The client then applies updates to its own state snapshot.
@@ -394,31 +398,31 @@ And here is the client:
[[code type="example" title="Clone client, Model Two" name="clonecli2"]]
[[/code]]
-Some notes about this code:
+Here are some things to note about these two programs:
-* The server uses two tasks. One thread produces the updates (randomly) and sends these to the main PUB socket, while the second thread handles state requests on the ROUTER socket. The two communicate across PAIR sockets over an {{inproc://}} connection.
+* The server uses two tasks. One thread produces the updates (randomly) and sends these to the main PUB socket, while the other thread handles state requests on the ROUTER socket. The two communicate across PAIR sockets over an {{inproc://}} connection.
-* The client is really simple. In C, about fifty lines of code. A lot of the heavy lifting is done in the kvmsg class, but still, the basic Clone pattern is easier to implement than it seemed at first.
+* The client is really simple. In C, it consists of about fifty lines of code. A lot of the heavy lifting is done in the {{kvmsg}} class. Even so, the basic Clone pattern is easier to implement than it seemed at first.
-* We don't use anything fancy for serializing the state. The hash table holds a set of kvmsg objects, and the server sends these, as a batch of messages, to the client requesting state. If multiple clients request state at once, each will get a different snapshot.
+* We don't use anything fancy for serializing the state. The hash table holds a set of {{kvmsg}} objects, and the server sends these, as a batch of messages, to the client requesting state. If multiple clients request state at once, each will get a different snapshot.
-* We assume that the client has exactly one server to talk to. The server **must** be running; we do not try to solve the question of what happens if the server crashes.
+* We assume that the client has exactly one server to talk to. The server must be running; we do not try to solve the question of what happens if the server crashes.
Right now, these two programs don't do anything real, but they correctly synchronize state. It's a neat example of how to mix different patterns: PAIR-PAIR, PUB-SUB, and ROUTER-DEALER.
+++ Republishing Updates from Clients
-In our second model, changes to the key-value store came from the server itself. This is a centralized model, useful for example if we have a central configuration file we want to distribute, with local caching on each node. A more interesting model takes updates from clients, not the server. The server thus becomes a stateless broker. This gives us some benefits:
+In our second model, changes to the key-value store came from the server itself. This is a centralized model that is useful, for example if we have a central configuration file we want to distribute, with local caching on each node. A more interesting model takes updates from clients, not the server. The server thus becomes a stateless broker. This gives us some benefits:
-* We're less worried about the reliability of the server. If it crashes, we can start a new instance, and feed it new values.
+* We're less worried about the reliability of the server. If it crashes, we can start a new instance and feed it new values.
* We can use the key-value store to share knowledge between active peers.
-To send updates from clients back to the server we could use a variety of socket patterns. The simplest plausible solution is a PUSH-PULL combination[figure].
+To send updates from clients back to the server, we could use a variety of socket patterns. The simplest plausible solution is a PUSH-PULL combination[figure].
-Why don't we allow clients to publish updates directly to each other? While this would reduce latency, it would remove the guarantee of consistency. You can't get consistent shared state if you allow the order of updates to change depending on who receives them. Say we have two clients, changing different keys. This will work fine. But if the two clients try to change the same key at roughly the same time, they'll end up with different notions of what its value is.
+Why don't we allow clients to publish updates directly to each other? While this would reduce latency, it would remove the guarantee of consistency. You can't get consistent shared state if you allow the order of updates to change depending on who receives them. Say we have two clients, changing different keys. This will work fine. But if the two clients try to change the same key at roughly the same time, they'll end up with different notions of its value.
-There are a few strategies for getting consistency when changes happen in multiple places at once. We'll use the approach of centralizing all change. No matter the precise timing of the changes that clients make, they are all pushed through the server, which enforces a single sequence according to the order in which it gets updates.
+There are a few strategies for obtaining consistency when changes happen in multiple places at once. We'll use the approach of centralizing all change. No matter the precise timing of the changes that clients make, they are all pushed through the server, which enforces a single sequence according to the order in which it gets updates.
[[code type="textdiagram" title="Republishing Updates"]]
#----------------------#
@@ -443,7 +447,7 @@ There are a few strategies for getting consistency when changes happen in multip
#---------------------# #---------------------#
[[/code]]
-By meditating all changes, the server can also add a unique sequence number to all updates. With unique sequencing, clients can detect the nastier failures - network congestion and queue overflow. If a client discovers that its incoming message stream has a hole, it can take action. It seems sensible that the client contact the server and ask for the missing messages, but in practice that isn't useful. If there are holes, they're caused by network stress, and adding more stress to the network will make things worse. All the client can really do is warn its users "Unable to continue", and stop, and not restart until someone has manually checked the cause of the problem.
+By mediating all changes, the server can also add a unique sequence number to all updates. With unique sequencing, clients can detect the nastier failures, including network congestion and queue overflow. If a client discovers that its incoming message stream has a hole, it can take action. It seems sensible that the client contact the server and ask for the missing messages, but in practice that isn't useful. If there are holes, they're caused by network stress, and adding more stress to the network will make things worse. All the client can do is warn its users that it is "unable to continue", stop, and not restart until someone has manually checked the cause of the problem.
We'll now generate state updates in the client. Here's the server:
@@ -455,24 +459,24 @@ And here is the client:
[[code type="example" title="Clone client, Model Three" name="clonecli3"]]
[[/code]]
-Some notes about this code:
+Here are some things to note about this third design:
* The server has collapsed to a single task. It manages a PULL socket for incoming updates, a ROUTER socket for state requests, and a PUB socket for outgoing updates.
-* The client uses a simple tickless timer to send a random update to the server once a second. In a real implementation we would drive updates from application code.
+* The client uses a simple tickless timer to send a random update to the server once a second. In a real implementation, we would drive updates from application code.
+++ Working with Subtrees
As we grow the number of clients, the size of our shared store will also grow. It stops being reasonable to send everything to every client. This is the classic story with pub-sub: when you have a very small number of clients, you can send every message to all clients. As you grow the architecture, this becomes inefficient. Clients specialize in different areas.
-So even a shared store, some clients will want to work only with a part of that store, which we call a //subtree//. The client has to request the subtree when it makes a state request, and it has to specify the same subtree when it subscribes to updates.
+So even when working with a shared store, some clients will want to work only with a part of that store, which we call a //subtree//. The client has to request the subtree when it makes a state request, and it must specify the same subtree when it subscribes to updates.
-There are a couple of common syntaxes for trees. One is the "path hierarchy", and another is the "topic tree". These look like this:
+There are a couple of common syntaxes for trees. One is the //path hierarchy//, and another is the //topic tree//. These look like this:
* Path hierarchy: {{/some/list/of/paths}}
* Topic tree: {{some.list.of.topics}}
-We'll use the path hierarchy, and extend our client and server so that a client can work with a single subtree. Working with multiple subtrees is not much more difficult but not necessary for an example.
+We'll use the path hierarchy, and extend our client and server so that a client can work with a single subtree. Once you see how to work with a single subtree you'll be able to extend this yourself to handle multiple subtrees, if your use case demands it.
Here's the server implementing subtrees, a small variation on Model Three:
@@ -488,20 +492,20 @@ And here is the corresponding client:
An ephemeral value is one that expires automatically unless regularly refreshed. If you think of Clone being used for a registration service, then ephemeral values would let you do dynamic values. A node joins the network, publishes its address, and refreshes this regularly. If the node dies, its address eventually gets removed.
-The usual abstraction for ephemeral values is to attach them to a "session", and delete them when the session ends. In Clone, sessions would be defined by clients, and would end if the client died. A simpler alternative is just to attach a "time to live" (TTL) to ephemeral values, which the server uses to expire values that haven't been refreshed in time.
+The usual abstraction for ephemeral values is to attach them to a //session//, and delete them when the session ends. In Clone, sessions would be defined by clients, and would end if the client died. A simpler alternative is to attach a //time to live// (TTL) to ephemeral values, which the server uses to expire values that haven't been refreshed in time.
-It's a good design principle, and one that I use whenever possible, to //not invent concepts// that are not absolutely essential. What sessions offer is better performance if we have very large numbers of ephemeral values. If we use a handful of ephemeral values, it's fine to set a TTL on each one. If we use masses of ephemeral values, it's more efficient to attach them to sessions, and expire them in bulk. This isn't a problem we face at this stage, and may never face, so sessions go out the window.
+A good design principle that I use whenever possible is to //not invent concepts that are not absolutely essential//. If we have very large numbers of ephemeral values, sessions will offer better performance. If we use a handful of ephemeral values, it's fine to set a TTL on each one. If we use masses of ephemeral values, it's more efficient to attach them to sessions and expire them in bulk. This isn't a problem we face at this stage, and may never face, so sessions go out the window.
-Now to implement ephemeral values. First, we need a way to encode the TTL in the key-value message. We could add a frame. The problem with using 0MQ frames for properties is that each time we want to add a new property, we have to change the message structure. It breaks compatibility. So let's add a 'properties' frame to the message, and code to let us get and put property values.
+Now we will implement ephemeral values. First, we need a way to encode the TTL in the key-value message. We could add a frame. The problem with using 0MQ frames for properties is that each time we want to add a new property, we have to change the message structure. It breaks compatibility. So let's add a properties frame to the message, and write the code to let us get and put property values.
-Next, we need a way to say, "delete this value". Up to now servers and clients have always blindly inserted or updated new values into their hash table. We'll say that if the value is empty, that means "delete this key".
+Next, we need a way to say, "delete this value". Up until now, servers and clients have always blindly inserted or updated new values into their hash table. We'll say that if the value is empty, that means "delete this key".
-Here's a more complete version of the kvmsg class, which implements a 'properties' frame (and adds a UUID frame, which we'll need later on). It also handles empty values by deleting the key from the hash, if necessary:
+Here's a more complete version of the {{kvmsg}} class, which implements the properties frame (and adds a UUID frame, which we'll need later on). It also handles empty values by deleting the key from the hash, if necessary:
-[[code type="example" title="Key-value message class - full" name="kvmsg"]]
+[[code type="example" title="Key-value message class: full" name="kvmsg"]]
[[/code]]
-The Model Five client is almost identical to Model Four. It uses the full kvmsg class now, and sets a randomized 'ttl' property (measured in seconds) on each message:
+The Model Five client is almost identical to Model Four. It uses the full {{kvmsg}} class now, and sets a randomized {{ttl}} property (measured in seconds) on each message:
[[code type="fragment" name="kvsetttl"]]
kvmsg_set_prop (kvmsg, "ttl", "%d", randof (30));
@@ -509,9 +513,9 @@ kvmsg_set_prop (kvmsg, "ttl", "%d", randof (30));
+++ Using a Reactor
-Up to now we used a poll loop in the server. In this next model of the server we switch to using a reactor. In C, we use CZMQ's zloop class. Using a reactor makes the code more verbose but easier to understand and build out, since each piece of the server is handled by a separate reactor handler.
+Until now, we have used a poll loop in the server. In this next model of the server, we switch to using a reactor. In C, we use CZMQ's {{zloop}} class. Using a reactor makes the code more verbose, but easier to understand and build out because each piece of the server is handled by a separate reactor handler.
-We use a single thread, and pass a server object around to the reactor handlers. We could have organized the server as multiple threads, each handling one socket or timer, but that works better when threads don't have to share data. Here all work is centered around the server's hashmap, so one thread is simpler.
+We use a single thread and pass a server object around to the reactor handlers. We could have organized the server as multiple threads, each handling one socket or timer, but that works better when threads don't have to share data. In this case all work is centered around the server's hashmap, so one thread is simpler.
There are three reactor handlers:
@@ -524,33 +528,33 @@ There are three reactor handlers:
+++ Adding the Binary Star Pattern for Reliability
-The Clone models up to now are relatively simple. However we're now going to get into unpleasantly complex territory here that has me getting up for another espresso. You should appreciate that making "reliable" messaging is complex enough that you always need to ask, "do we actually need this?" before jumping into it. If you can get away with unreliable, or "good enough" reliability, you can make a huge win in terms of cost and complexity. Sure, you may lose some data now and then. It is often a good trade-off. Having said, that, and... sips... since the espresso is really good, let's jump in.
+The Clone models we've explored up to now have been relatively simple. Now we're going to get into unpleasantly complex territory, which has me getting up for another espresso. You should appreciate that making "reliable" messaging is complex enough that you always need to ask, "Do we actually need this?" before jumping into it. If you can get away with unreliable or with "good enough" reliability, you can make a huge win in terms of cost and complexity. Sure, you may lose some data now and then. It is often a good trade-off. Having said, that, and... sips... because the espresso is really good, let's jump in.
-As you play with the last model, you'll stop and restart the server. It might look like it recovers, but of course it's applying updates to an empty state, instead of the proper current state. Any new client joining the network will get just the latest updates, instead of the full historical record.
+As you play with the last model, you'll stop and restart the server. It might look like it recovers, but of course it's applying updates to an empty state instead of the proper current state. Any new client joining the network will only get the latest updates instead of the full historical record.
-What we want is a way for the server to recover from being killed, or crashing. We also need to provide some backup in case the server is out of commission for any length of time. When someone asks for "reliability", ask them to list the failures they want to handle. In our case, these are:
+What we want is a way for the server to recover from being killed, or crashing. We also need to provide backup in case the server is out of commission for any length of time. When someone asks for "reliability", ask them to list the failures they want to handle. In our case, these are:
* The server process crashes and is automatically or manually restarted. The process loses its state and has to get it back from somewhere.
-* The server machine dies and is off-line for a significant time. Clients have to switch to an alternate server somewhere.
+* The server machine dies and is offline for a significant time. Clients have to switch to an alternate server somewhere.
-* The server process or machine gets disconnected from the network, e.g. a switch dies or a datacenter gets knocked out. It may come back at some point, but in the meantime clients need an alternate server.
+* The server process or machine gets disconnected from the network, e.g., a switch dies or a datacenter gets knocked out. It may come back at some point, but in the meantime clients need an alternate server.
Our first step is to add a second server. We can use the Binary Star pattern from [#reliable-request-reply] to organize these into primary and backup. Binary Star is a reactor, so it's useful that we already refactored the last server model into a reactor style.
We need to ensure that updates are not lost if the primary server crashes. The simplest technique is to send them to both servers. The backup server can then act as a client, and keep its state synchronized by receiving updates as all clients do. It'll also get new updates from clients. It can't yet store these in its hash table, but it can hold onto them for a while.
-So, Model Six introduces these changes over Model Five:
+So, Model Six introduces the following changes over Model Five:
* We use a pub-sub flow instead of a push-pull flow for client updates sent to the servers. This takes care of fanning out the updates to both servers. Otherwise we'd have to use two DEALER sockets.
* We add heartbeats to server updates (to clients), so that a client can detect when the primary server has died. It can then switch over to the backup server.
-* We connect the two servers using the Binary Star {{bstar}} reactor class. Binary Star relies on the clients to 'vote' by making an explicit request to the server they consider "active". We'll use snapshot requests as the voting mechanism.
+* We connect the two servers using the Binary Star {{bstar}} reactor class. Binary Star relies on the clients to vote by making an explicit request to the server they consider active. We'll use snapshot requests as the voting mechanism.
-* We make all update messages uniquely identifiable by adding a UUID field. The client generates this, and the server propagates it back on re-published updates.
+* We make all update messages uniquely identifiable by adding a UUID field. The client generates this, and the server propagates it back on republished updates.
-* The passive server keeps a "pending list" of updates that it has received from clients, but not yet from the active server. Or, updates it's received from the active, but not yet clients. The list is ordered from oldest to newest, so that it is easy to remove updates off the head.
+* The passive server keeps a "pending list" of updates that it has received from clients, but not yet from the active server; or updates it's received from the active server, but not yet from the clients. The list is ordered from oldest to newest, so that it is easy to remove updates off the head.
[[code type="textdiagram" title="Clone Client Finite State Machine"]]
#-----------#
@@ -580,35 +584,35 @@ Request|snapshot | |
It's useful to design the client logic as a finite state machine. The client cycles through three states:
-* The client opens and connects its sockets, and then requests a snapshot from the first server. To avoid request storms, it will ask any given server only twice. One request might get lost, that'd be bad luck. Two getting lost would be carelessness.
+* The client opens and connects its sockets, and then requests a snapshot from the first server. To avoid request storms, it will ask any given server only twice. One request might get lost, which would be bad luck. Two getting lost would be carelessness.
* The client waits for a reply (snapshot data) from the current server, and if it gets it, it stores it. If there is no reply within some timeout, it fails over to the next server.
* When the client has gotten its snapshot, it waits for and processes updates. Again, if it doesn't hear anything from the server within some timeout, it fails over to the next server.
-The client loops forever. It's quite likely during startup or fail-over that some clients may be trying to talk to the primary server while others are trying to talk to the backup server. The Binary Star state machine handles this[figure], hopefully accurately. It's hard to prove software correct; instead we hammer it until we can't prove it wrong.
+The client loops forever. It's quite likely during startup or failover that some clients may be trying to talk to the primary server while others are trying to talk to the backup server. The Binary Star state machine handles this[figure], hopefully accurately. It's hard to prove software correct; instead we hammer it until we can't prove it wrong.
-Fail-over happens as follows:
+Failover happens as follows:
-* The client detects that primary server is no longer sending heartbeats, so has died. The client connects to the backup server and requests a new state snapshot.
+* The client detects that primary server is no longer sending heartbeats, and concludes that it has died. The client connects to the backup server and requests a new state snapshot.
-* The backup server starts to receive snapshot requests from clients, and detects that primary server has gone, so takes over as primary.
+* The backup server starts to receive snapshot requests from clients, and detects that primary server has gone, so it takes over as primary.
* The backup server applies its pending list to its own hash table, and then starts to process state snapshot requests.
-When the primary server comes back on-line, it will:
+When the primary server comes back online, it will:
* Start up as passive server, and connect to the backup server as a Clone client.
* Start to receive updates from clients, via its SUB socket.
-We make some assumptions:
+We make a few assumptions:
-* That at least one server will keep running. If both servers crash, we lose all server state and there's no way to recover it.
+* At least one server will keep running. If both servers crash, we lose all server state and there's no way to recover it.