Skip to content

Commit

Permalink
response-2018-10-10
Browse files Browse the repository at this point in the history
  • Loading branch information
SohumB committed Oct 9, 2018
1 parent 27bddfb commit 42e8cfd
Showing 1 changed file with 46 additions and 0 deletions.
46 changes: 46 additions & 0 deletions responses/7868-response-2018-10-10.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,46 @@
This is the original paper on Lamport clocks! It's one of those really fascinating older
papers that seem to have a lot more license to discuss implications and internal
digressions than modern papers do.

A system is distributed if the message transmission delay is not negligible compared to the
time between events in a single process.

This is a really good definition of what a distributed system is. Actually—

…a single process is defined to be a set of events with an a priori total ordering.

This is a _really_ good definition of a process.

I guess that's what happens when you're in a position to essentially lay out a lot of the
fundamental assumptions about a field.

The paper builds the partial-order-based setup of analysis of distributed systems from
these foundations, and then moves to Lamport clocks. A Lamport clock is a global, logical
clock that respects the partial order of messages within a distributed system: i.e., it
monotonically increases across a single process' events and across messages being sent and
received from different processes.

This can be implemented fairly easily, if all messages contain logical timestamps; each
process maintains its own local clock (incrementing it between events) and upon
receiving a message, updates its clock to be later than the message's timestamp if
necessary.

Lamport argues that the induced total ordering on events in the system is valuable for
many kinds of distributed systems—and indeed it is, though of course as we mentioned in
class it doesn't quite correctly capture actual causality in a distributed system.

Lamport then extends this to a scheme that attempts to do wallclock synchronisation across
multiple machines. It's the same idea: when you receive a message, if it attests to a time
that's later than you think it is (modulo the expected minimum latency), then update your
clock to stay in sync. Apparently this actually does keep your distributed system in sync;
the proof of this is rather nontrivial, and I'm not going to try to parse its subtleties.

I will however raise a question: so this system doesn't keep the distributed system in
sync with _actual time_, right? Lamport needs the requirement that clocks never be set
backwards for Very Understandable Reasons, but surely this means that the system will
synchronise itself to the fastest clock in the network, right?

I'd also like to have a look at what the state of the art was at the time for cooperative
protocols that handled failure/latency. Byzantine protocols I assume come much later, but
it would be interesting to discuss their “implementation of reliable multiprocess systems”
and how it differs from state of the art today.

0 comments on commit 42e8cfd

Please sign in to comment.