Introduce Jepsen tests for the new CP Subsystem module of Hazelcast IMDG#325
Introduce Jepsen tests for the new CP Subsystem module of Hazelcast IMDG#325aphyr merged 2 commits intojepsen-io:masterfrom
Conversation
aphyr
left a comment
There was a problem hiding this comment.
Hey! Happy to merge this, but I've left some comments that might help improve the tests going forward.
| lockConfig1 (FencedLockConfig. "jepsen.cpLock1" 1) | ||
| lockConfig2 (FencedLockConfig. "jepsen.cpLock2" 2) | ||
|
|
||
| _ (.setLeaderElectionTimeoutInMillis raftAlgorithmConfig 1000) |
There was a problem hiding this comment.
Since you're not using any of these return values, and there's no bindings after them, you can move all of these expressions into the body of the let expression, and drop the _s.
hazelcast/src/jepsen/hazelcast.clj
Outdated
| (def pid-file (str dir "/server.pid")) | ||
| (def log-file (str dir "/server.log")) | ||
|
|
||
| (def CLIENT_ID_SEPARATOR "xxxxxxxxxx") |
There was a problem hiding this comment.
This feels a bit out of place for Clojure constants; consider (def client-id-separator ...)
hazelcast/src/jepsen/hazelcast.clj
Outdated
| (info err) | ||
| (info exit) | ||
| (assert (zero? exit))))) | ||
| ) |
There was a problem hiding this comment.
It looks like this might be mis-formatted--did you mean to take this back a level of indentation?
hazelcast/src/jepsen/hazelcast.clj
Outdated
| :cas (let [[currentV newV] (:value op)] | ||
| (if (.compareAndSet atomic-long currentV newV) | ||
| (assoc op :type :ok) | ||
| (assoc op :type :fail :error :cas-failed) |
There was a problem hiding this comment.
This isn't super important, but idomatically, you'd close off your parens at the end of this line, instead of scattering them onto later lines. You might want to do a pass through the source to clean these up; not critical, but it'll help you blend in with normal Clojure style. :)
hazelcast/src/jepsen/hazelcast.clj
Outdated
| (defrecord OwnerAwareMutex [owner] | ||
| Model | ||
| (step [this op] | ||
| (let [client (getClient op)] |
There was a problem hiding this comment.
Ditto here; I think you can get rid of external client IDs and use processes IDs instead.
hazelcast/src/jepsen/hazelcast.clj
Outdated
| (defrecord FencedMutex [owner lockFence prevOwner] | ||
| Model | ||
| (step [this op] | ||
| (let [client (getClient op) fence (getFence op)] |
hazelcast/src/jepsen/hazelcast.clj
Outdated
|
|
||
|
|
||
|
|
||
| (defrecord ReentrantFencedMutex [owner lockCount currentFence highestObservedFence highestObservedFenceOwner] |
There was a problem hiding this comment.
hooo boy, OK, you're on your own here haha
| :checker (checker/linearizable) | ||
| :model (model/mutex)} | ||
| :non-reentrant-cp-lock {:client (fenced-lock-client "jepsen.cpLock1") | ||
| :generator (->> [{:type :invoke, :f :acquire :value (.toString (UUID/randomUUID))} |
There was a problem hiding this comment.
Did you mean for every acquire to use the same value? Why choose a random value at all?
|
|
||
| round="1" | ||
|
|
||
| while [ ${round} -le ${repeat} ]; do |
There was a problem hiding this comment.
You can also use the built-in --test-count n option.
Thanks for your review! We will address them and will update the PR accordingly. |
Please see the README file for details Co-authored-by: Mehmet Dogan <mehmet@hazelcast.com> Co-authored-by: Ensar Basri Kahveci <basri@hazelcast.com>
* Follow clojure's style rules * Get rid of sleep calls and use gen/stagger instead * Track Hazelcast clients name between test clients and models via an atom map * Simplify the repeat script * No logging in models * Simplify logging and Jepsen op result creation in CP test clients * Other minor improvements based on the PR review comments Co-authored-by: Mehmet Dogan <mehmet@hazelcast.com> Co-authored-by: Ensar Basri Kahveci <basri@hazelcast.com>
|
Hi @aphyr Thanks for all your comments. They have been quite useful for improving quality of our test suite. I think we handled all of them now. You can see the description in the second commit for details. Would you like to have another look before merging the PR? Thanks in advance, |
|
Hey! I'm short on time and this kind of review takes a while, but skimming, this looks okay! Thanks for the update. :) |
CP Subsystem contains linearizable & distributed implementations of the java.util.concurrent APIs offered by Hazelcast. We have been testing it with a new test suite we wrote while developing the new system. We would like to submit these tests to the official Jepsen repo.
Javadoc for the CP Subsystem: https://docs.hazelcast.org/docs/3.12-BETA-1/javadoc/com/hazelcast/cp/CPSubsystem.html
Reference Manual for the CP Subsystem: https://docs.hazelcast.org/docs/3.12-BETA-1/manual/html-single/index.html#cp-subsystem
Our CP subsystem test suite is as follows:
Non-reentrant lock (
--workload non-reentrant-cp-lock): In this test case, we test if the newFencedLockdata structure behaves as a non-reentrant mutex, i.e., it can be held by a single endpoint at a time and only the lock holder endpoint can release it. Moreover, the lock cannot be acquired by the same endpoint reentrantly. It means that while a client holds the lock, it cannot acquire the lock again, without releasing the lock first.Reentrant lock (
--workload reentrant-cp-lock): We test if the newFencedLockdata structure behaves as a reentrant mutex. The lock instance can be held by a single endpoint at a time and only the lock holder endpoint can release it. Moreover, the current lock holder can reentrantly acquire the lock one more time. Reentrant lock acquire limit is 2 for this test.Non-reentrant fenced lock (
--workload non-reentrant-fenced-lock):FencedLockorders lock holders by a monotonic fencing token, which is incremented each time the lock switches from the free state to the held state. In this test case, we validate monotonicity of fencing tokens assigned to subsequent lock holders. Moreover, the lock cannot be acquired by the same endpoint reentrantly. It means that while a client holds the lock, it cannot acquire the lock again, without releasing the lock first.Reentrant fenced lock (
--workload reentrant-fenced-lock):FencedLockorders lock holders by a monotonic fencing token, which is incremented each time the lock switches from the free state to the held state. However, if the current lock holder acquires the lock reentrantly, it will get the same fencing token. Reentrant lock acquire limit is 2 for this test.Semaphore (
--workload semaphore): In this test, we initialize our new linearizableISemaphorewith 2 permits. Each client acquires and releases a permit in a loop and we validate permits are held by at most 2 clients at a time.Unique ID Generation with the new linearizable
IAtomicLong(--workload cp-id-gen-long): In this test, each client generates a unique long id by using a linearizable IAtomicLong instance and we validate uniqueness of generated ids.Compare-and-swap Register with the new linearizable
IAtomicLong(--workload cp-cas-long): In this test, clients randomly perform write and compare-and-swap operations.Compare-and-swap Register with the new linearizable
IAtomicReference(--workload cp-cas-reference): In this test, clients randomly perform write and compare-and-swap operations.Please see the README file for more details.
We are happy to answer any questions related to the new impls and tests.
Thanks in advance for your review and happy testing!
Co-authored-by: Mehmet Dogan mehmet@hazelcast.com
Co-authored-by: Ensar Basri Kahveci basri@hazelcast.com