-
Notifications
You must be signed in to change notification settings - Fork 52
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Potential problems with distributed lock manager (DLM) #1
Comments
Hi @stIncMale . First I would like to thank you for the thread and for the amazing comment! I'm using Redis as a DLM to avoid 2 bookings for the same room at the same days. So, let's imagine this situation: Request 1: Book room A from 01/01/2020 to 10/01/2020 Request 2: Book room A from 08/01/2020 to 15/01/2020. Now let's imagine that we will receive those requests in parallel at the same time. The lock will avoid one of those requests since we have a validation on the method responsible to book a room to check if the room is available. So, to handle the locks I'm using a lib creates by @alturkovic and it is working fine, as you can check using this JMeter file Do you think that is it possible to reproduce the GC error? Maybe we can use some Chaos Engineering Tool to do it, like Perses (one tool that I co-created with 2 friends) |
I have a suspicion that If this is not the case, i.e. bookings are stored not in an ACID DBMS, or sharding is used and in such a way that bookings for the same room are stored in different shards (I would be highly surprised to discover that this is the case), then could you please describe the booking storage? |
So, we have only a single node that handles writes and stores all the bookings. We also have a replication of this DB but only for read only purpose. About your suggestion: Are you talking about to handle this lock inside Postgres instead of Redis? It it is, on this case Postgres will be my DLM and I'll have the same issue, correct? |
Yes, you may be able to do this even with the guarantees given to you by the A (atomicity) and I (isolation) properties (see https://www.postgresql.org/docs/current/transaction-iso.html) without resorting to explicit locking (but if you want to, PostgreSQL has advisory locks).
No, your single writable PG node is going to act as a centralized locking manager (as opposed to a distributed one) which is not only much less likely to have correctness issues but also is combined with the data store, meaning that the asynchrony of the network cannot cause troubles. Room booking is one of the most basic examples of what ACID DBMS can help to solve easily. And since you already store bookings there, there is no reason whatsoever to control concurrency by means of any external system. |
Yes, I agree that is possible to have this behavior using, for example, the SERIALIZABLE isolation level in the |
If you don't want transactions to be failing as a result of concurrent requests (this is a downside of MVCC), you can use explicit advisory PostgreSQL locks. Centralized LM implemented by the DMBS that stores booking data is enough here because there is no need for an LM to be available when a DBMS is not available (who needs a lock that protects a resource that is unavailable anyway?). I see downsides and do not see any upside of using a DLM for concurrency control in a single instance of an ACID DBMS. |
Good point @stIncMale ! So, the lib that I'm using already supports this lock, I'll create another branch and test this new implementation using Postgres to handle the locks. But I really want to continue the discussion around the RedLock issue! Maybe we can create a scenario that make sense to use a DLM. What do you think? |
Just make sure that the locks you acquire (whether at session level or transaction level) are acquired in the same session/transaction that does the
It's not that much about Redlock specifically, as it is about the feasibility of using DLM for correctness in general.
Well, I probably do not have enough fantasy/experience to imagine one. I tried asking Martin in the comments under his article, and tried connecting with him via LinkedIn - no luck. I'll try asking Kyle Kingsbury, a.k.a "Aphyr" what he thinks about this topic (he is the author of Jepsen), and I'll try asking a question on Stack Overflow (if it fits the format). |
@toktarev, thank you very much for your contribution! I don't have experience with consensus protocol but I think Zookeeper is the most used consensus system that we have today. (Kafka, Hadoop, Mesos, Cassandra, Solr and etc are using Zookeeper). So, your suggestion for this case is: Change Redis to Zookeeper, correct? And about your note: I totally agree, but I can not see another option here. |
I added the distributed lock to handle this situation: #1 (comment) |
If I do this I will ignore the load balancer and I can not see how it will solve the problem. Can you explain this, please? |
Hmm, I'll try to implement something like this to validate the idea. But I can not see how to handle the requests inside my bucket. Do you have an example? |
About the GC problem, I have a point! When we talk about Java 11+, we can use the ZGC as Garbage Collector, and the GC pauses will not be a problem anymore since the limit for those pauses is 10ms.
|
To be honest I don't like this approach. It will limit us to scale. |
I liked the approach to use Zookeeper and I'll create a branch with this change! Again, thank you very much for your contribution @toktarev ! |
So, for now we have 4 ways (for now) to handle this distributed lock: 1 - Using Redis as a DLM 2 - Using Zookeeper as a DLM 3 - Using the Isolation level of a Relational DB like Postgres (ACID) 3 - Handle the Load balancer to send the requests for the same room to one instance and sync the code 4 - Using a message broker and send the messages for only 1 consumer since the default behavior is FIFO Now would be really nice to have the pros and cons of each approach. |
@toktarev amazing!!!!! So, now we have 5 ideas. 1 - Using Redis as a DLM 2 - Using Zookeeper as a DLM 3 - Using the Isolation level of a Relational DB like Postgres 3 - Handle the Load balancer to send the requests for the same room to one instance and sync the code 4 - Using a message broker and send the messages for only 1 consumer since the default behavior is FIFO 5 - Lock free with HazelCast or Apache Flink. And I really liked this fifth option! I agree with when you say: Can you help me to implement this strategy in this scenario? I'm work on a branch with Zookeeper and I really think that with the all 5 options implemented we can have a great sandbox! Again, thank you very much to bring amazing ideas to this issue! |
No problem! I'll implement all options and when I finish I'll mark you in the PR! |
Alex, there are no multiple nodes here, there is a single writable PostgreSQL instance that stores bookings (see #1 (comment)). So essentially the system is not distributed (in fact, this is a very primitive situation). Using any external to the DBMS way of handling booking conflicts in this specific situation does the following:
I am suspecting that the reason it was not implemented via the PostgreSQL instance where the booking data is written to, is unfamiliarity with the basic ACID DBMS functionality.
Júlio, what do you mean by the "GC problem"? If you are referring to the example that Martin mentioned to illustrate how an arbitrary delay may happen in a non-hard-realtime system, then this is not the point you need to pay attention to. The actual point is that we develop algorithms and software for systems that do not have hard realtime guarantees (both hardware and software) and networks that are asynchronous. This means that you cannot make any assumptions about timing when reasoning about correctness of an algorithm.
The point I made is not about a specific DLM implementation, it is about how DLM must be used to actually guarantee mutual exclusion and whether such usage makes practical sense.
There is a single PostgreSQL instance accepting writes in the application, which makes it an obvious point that cannot be scaled horizontally (and also a SPOF). Therefore the booking functionality cannot be scaled horizontally regardless of the way conflicts resulting from concurrent requests are handled. |
I think I misunderstood, when I read "node" I was thinking about the Service node, not about Postgres node.
About this, can be possible! But I agree, maybe there is a better way to do this using ACID DBMS functionality. Can you help me to implement? |
@toktarev , here is a diagram of the architecture: https://github.com/juliofalbo/complete-microservices-env/blob/master/HotelBookingSystemArchitecture.png?raw=true I think the solution is more simple than we are discussing here, but the goal is to provide a great discussion with possible solutions for the community. |
If I understood @stIncMale and @toktarev correctly both point that: Relying on the RDMBS Transaction and Concurrency default mechanism (MVCC) would solve 99% of the possible problems (Including the room booking example). Please, guys, correct me if I'm wrong Common pattern IMO would be to let it handle the concurrency itself and scale horizontally (adding read replica's) in the case of a read-heavy application and this would probably be already reliable and safe enough for the huge majority of the possible scenarios |
I was reading about MVCC on Postgres and I found this explanation:
Reference: https://www.postgresql.org/docs/9.1/mvcc-intro.html Another nice article about MVCC: https://vladmihalcea.com/how-does-mvcc-multi-version-concurrency-control-work/ So, I think it is a good idea to implement this approach but my question is: How can I implement this using JPA (Spring Data)? |
If you are using serializable transaction isolation level in PostgreSQL, then you do not need to explicitly lock anything at all. The specific DBMS implementation is important as not all of them implement serializability correctly; e.g. Oracle provides snapshot isolation (this is the same isolation level that PostgreSQL provides at repeatable read isolation) instead of serializable, but calls it serializable. In fact, using serializable isolation level to correctly implement booking is a no-brainer in this system. And because this approach is the simplest one, I am suggesting to use it. Once it is successfully implemented and if you find any performance problems with it, only then I would suggest you to look at other approaches (like using relaxed isolation levels potentially with explicit locks in PostgreSQL either directly or via the API that your ORM provides).
Just do all booking actions (checking that a room is available during the requested period and marking it as unavailable) related to a single booking request in a serializable transaction. And if the transaction fails with sqlstate 40001 (serialization failure in PG), then retry it again (no changes needed, just do exactly the same actions in a new transaction again). Of course, the retry must be transparent to a user and thus should be done by the same application that started the failed transaction and not by a user that requested booking.
Whether you are using JPA, or JDBC, or any other way of accessing PostgreSQL (see https://www.postgresql.org/docs/current/external-interfaces.html), the answer is the same as mentioned above.
I was talking in general and about a different thing when I filed this issue, but the discussion turned out to be almost completely unrelated to what I was saying in the issue, and it is rather about the booking implementation in this project. I am saying that in this specific system where all writes caused by booking go to a single PostgreSQL instance, there is no reason to not use the concurrency control provided by this instance and instead invent anything else. I am also saying (again, for this specific situation) that any other approach to handle concurrency control would be much more complicated and would involve additional network communications that could only result in higher latencies or lower throughput of processing booking requests.
I agree completely provided that by "it" you mean the writable instance of PostgreSQL, and I proposed exactly this in the very beginning of this discussion (#1 (comment)) and have been advocating for this approach ever since. It is good that you emphasized that only reads can be horizontally scaled in this system. As a general note, serializable (and linearizable for that matter) read-write operations cannot be scaled horizontally in principle, and cannot be made CAP-available. This is the sole reason why databases with more relaxed consistency models exist. |
Generally speaking, locks may be used for two different purposes:
Here I am going to talk only about DLMs used to achieve correctness in an asynchronous model. This means that safety properties of algorithms cannot rely on any timing assumptions; liveness properties still can rely on such assumptions; see [1]. All real networks are as poor as the asynchronous model (unless you are Google who built physical infrastructure for their Cloud Spanner to be able to guarantee for example that the clock skew on different nodes is limited with a known limit, see "Spanner, TrueTime and the CAP Theorem" by Eric Brewer, Spanner: Google’s Globally-Distributed Database). Moreover, I will hardly say something new comparing to what has already been said in the following sources:
The idea of DLM, as opposed to a centralized locking manager (which is easy to implement), is to provide a lock manager that is more available than a lock manager with a single point of failure (SPOF). This means that a DLM must be able to release a lock that is being held by an unresponsive process. If a DLM does not have such a feature, then a stuck DLM client that fails to release its lock in a timely manner effectively renders the whole DLM unavailable (all of this is mentioned in [2]). In order to provide this DLM-side lock release feature, the majority (all?) of DLMs introduce lock expiry.
The problem with expiring lock is that as soon as we use them, DLM alone cannot guarantee mutual exclusion anymore, and the resource protected by a lock has to take active participation so that the mutual exclusion is preserved. Both [1] and [3] propose using fencing tokens as a solution: together with acquiring a lock from a DLM, a DLM client also receives a unique fencing token that it sends to the resource protected by the lock together with a request. The protected resource handles the request only in the following situations (otherwise the request is rejected):
It appears that the simplest way of achieving this is to use always increasing token values (as mentioned in [1]), but [2] also mentions that sufficiently large random tokens are fine to guarantee uniqueness (which I agree, but with this approach, the protected resource has to remember and search through all the values it has ever seen as opposed to remembering only the last value).
Implementing this technique requires the following:
Let's look at the situation again. In order to correctly use DLM to exclude concurrent access to a resource, we must:
It appears to me that once we have the first two conditions satisfied, we could have programmed the correct handling of concurrent requests in the resource, which would be simpler and more robust than correctly achieving the same with a DLM.
As a result of all I wrote before, I think that using DLM for correctness is a hypothetical idea that has little to no practical value. I expect that in the majority of situations when DLM is used for correctness the solution is incorrect but appears to work most of the time. [3] mentions a specific bug in Apache HBase caused exactly by the scenario that fencing tokens are supposed to help to deal with. And here https://aphyr.com/tags/jepsen one may find many stories of how distributed DBs often miserably fail to work correctly.
It would be interesting to see/discuss scenarios where DLM can be applied correctly and at the same time it's usage is necessary. I am not talking here about the coordination mechanisms used inside a distributed DBMS (i.e. for accessing the state stored in it) like Apache ZooKeeper or Cassandra (for LWTs), but rather talking about situations when a DLM implemented in one system is used by another system for correctness. @juliofalbo May I ask you to please describe in detail a scenario in which DLM is used in this project?
Update 2021-Feb-10: I posted a comment a while ago about the practical usability of the fencing approach for DLM explained in "How to do distributed locking" by Martin Kleppmann. Recently someone else posted another comment expressing the same concern, and Martin replied (I don't want to copy-paste the text of the concerns and of the reply here, so I linked all the relevant comments and the reply). Essentially, while trying to come up with a correct DLM algorithm, we may only end up having one more consensus algorithm.
The text was updated successfully, but these errors were encountered: