Join GitHub today
GitHub is home to over 20 million developers working together to host and review code, manage projects, and build software together.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
Already on GitHub? Sign in to your account
Replace HTTP replication with TCP replication (Server side part) #2082
Conversation
erikjohnston
added some commits
Mar 27, 2017
erikjohnston
assigned
richvdh
Mar 30, 2017
| + """Returns a deferred which resolves when there is new data for | ||
| + replication to handle. | ||
| + """ | ||
| + return self.replication_deferred.observe() |
erikjohnston
Mar 30, 2017
Owner
It may be nicer to let the replication resource register a callback rather than bouncing through a deferred?
| + connections if there are. | ||
| + | ||
| + This should get called each time new data is available, even if it | ||
| + is currently being executed, so that nothing gets missed |
erikjohnston
Mar 30, 2017
Owner
Maybe it would be a lot clearer if this function looped until there were no changes? Although that is less immediately obvious is correct.
It may be more obvious if we can register this function directly with the notifer, rather than bouncing via a deferred in a loop.
richvdh
Mar 30, 2017
Member
+1 to registering a callback
-1 to looping until there are no changes. It will complicate this function and I don't think it will make anything clearer.
erikjohnston
referenced this pull request
Mar 30, 2017
Closed
Replace HTTP replication with TCP replication #2069
erikjohnston
added some commits
Mar 30, 2017
| +TCP Replication | ||
| +=============== | ||
| + | ||
| +This describes the TCP replication protocol that replaces the HTTP protocol. |
| +Motivation | ||
| +---------- | ||
| + | ||
| +The HTTP API used long poll from the workers to the master, this has the problem |
richvdh
Mar 30, 2017
Member
This paragraph is going to look out of date real soon. I would go straight for:
Previously the workers used an HTTP long poll mechanism to get updates from the master, which had the problem of causing a lot of duplicate work on the server. This TCP protocol replaces those APIs with the aim of increased efficiency.
[or something]
| +-------- | ||
| + | ||
| +The protocol is based on fire and forget, line based commands. An example flow | ||
| +would be (where '>' indicates master->worker and '<' worker->master flows):: |
richvdh
Mar 30, 2017
Member
can you find some other way of writing "master->worker" so that github doesn't put a linebreak in the middle of "->"
| +The example shows the server accepting a new connection and sending its identity | ||
| +with the ``SERVER`` command, followed by the client asking to subscribe to the | ||
| +``events`` stream from the token ``53``. The server then periodically sends ``RDATA`` | ||
| +commands which have the format ``RDATA <stream_name> <token> <row>```, where the |
| + | ||
| +Blank lines are ignored. | ||
| + | ||
| + |
richvdh
Mar 30, 2017
Member
it would be nice to give a complete list of the commands here, with the command syntax, the direction of transmission, a quick summary and a reference to the section where it is explained in more detail.
| +Reliability | ||
| +~~~~~~~~~~~ | ||
| + | ||
| +In general the replication stream should be consisdered an unreliable transport |
| + updates.append(prev_state.copy_and_replace( | ||
| + last_user_sync_ts=time_now_ms, | ||
| + )) | ||
| + process_presence.discard(user_id) |
richvdh
Mar 30, 2017
Member
we'll hit this if is_syncing and user_id in process_presence. I think it's incorrect.
| + if updates: | ||
| + yield self._update_states(updates) | ||
| + | ||
| + self.external_process_last_updated_ms[process_id] = self.clock.time_msec() |
| + """Returns a deferred which resolves when there is new data for | ||
| + replication to handle. | ||
| + """ | ||
| + return self.replication_deferred.observe() |
| + > RDATA events 55 ["$foo4:bar.com", ...] | ||
| + | ||
| +The example shows the server accepting a new connection and sending its identity | ||
| +with the `SERVER` command, followed by the client asking to subscribe to the |
richvdh
Mar 30, 2017
Member
I'm not going to insist you go through and change them, but for future reference: AIUI docstrings are interpreted as RST and really ought to have double-backticks.
| + * resource.py - the server classes that accepts and handle client connections | ||
| + * streams.py - the definitons of all the valid streams | ||
| + | ||
| +Further details can be found in docs/tcp_replication.rst |
richvdh
Mar 30, 2017
Member
suggest moving this up a bit to where you describe the protocol. I nearly missed it here.
Tbh I'm not sure you actually need the description of the protocol in this file - just referring to the doc would be fine imho
| + connections if there are. | ||
| + | ||
| + This should get called each time new data is available, even if it | ||
| + is currently being executed, so that nothing gets missed |
richvdh
Mar 30, 2017
Member
+1 to registering a callback
-1 to looping until there are no changes. It will complicate this function and I don't think it will make anything clearer.
| + True then limit is provided, otherwise it's not. | ||
| + | ||
| + Returns: | ||
| + list(tuple): the first entry in the tuple is the token for that |
| + """Get all the pushers that have changed between the given tokens. | ||
| + | ||
| + Returns: | ||
| + list(tuple): each tuple consists of: |
richvdh
Mar 30, 2017
Member
No, it returns a Deferred, which resolves to that lot.
(I'd prefer it if we even put this on @defer.inlineCallbacks, but I won't insist there because it's "obvious". Here though, there is no clue that it actually returns a Deferred, rather than a list.)
richvdh
assigned
erikjohnston
and unassigned
richvdh
Mar 30, 2017
erikjohnston
added some commits
Mar 31, 2017
|
I think I've addressed all the issues raised. I also cheekily added a |
erikjohnston commentedMar 30, 2017
•
edited
This is the server side component of #2069, including the requested changes.
Docs rendered