Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce Jepsen tests for the new CP Subsystem module of Hazelcast IMDG #325

Merged
merged 2 commits into from Mar 4, 2019

Conversation

Projects
None yet
2 participants
@metanet
Copy link
Contributor

commented Feb 28, 2019

CP Subsystem contains linearizable & distributed implementations of the java.util.concurrent APIs offered by Hazelcast. We have been testing it with a new test suite we wrote while developing the new system. We would like to submit these tests to the official Jepsen repo.

Javadoc for the CP Subsystem: https://docs.hazelcast.org/docs/3.12-BETA-1/javadoc/com/hazelcast/cp/CPSubsystem.html

Reference Manual for the CP Subsystem: https://docs.hazelcast.org/docs/3.12-BETA-1/manual/html-single/index.html#cp-subsystem

Our CP subsystem test suite is as follows:

  • Non-reentrant lock (--workload non-reentrant-cp-lock): In this test case, we test if the new FencedLock data structure behaves as a non-reentrant mutex, i.e., it can be held by a single endpoint at a time and only the lock holder endpoint can release it. Moreover, the lock cannot be acquired by the same endpoint reentrantly. It means that while a client holds the lock, it cannot acquire the lock again, without releasing the lock first.

  • Reentrant lock (--workload reentrant-cp-lock): We test if the new FencedLock data structure behaves as a reentrant mutex. The lock instance can be held by a single endpoint at a time and only the lock holder endpoint can release it. Moreover, the current lock holder can reentrantly acquire the lock one more time. Reentrant lock acquire limit is 2 for this test.

  • Non-reentrant fenced lock (--workload non-reentrant-fenced-lock): FencedLock orders lock holders by a monotonic fencing token, which is incremented each time the lock switches from the free state to the held state. In this test case, we validate monotonicity of fencing tokens assigned to subsequent lock holders. Moreover, the lock cannot be acquired by the same endpoint reentrantly. It means that while a client holds the lock, it cannot acquire the lock again, without releasing the lock first.

  • Reentrant fenced lock (--workload reentrant-fenced-lock): FencedLock orders lock holders by a monotonic fencing token, which is incremented each time the lock switches from the free state to the held state. However, if the current lock holder acquires the lock reentrantly, it will get the same fencing token. Reentrant lock acquire limit is 2 for this test.

  • Semaphore (--workload semaphore): In this test, we initialize our new linearizable ISemaphore with 2 permits. Each client acquires and releases a permit in a loop and we validate permits are held by at most 2 clients at a time.

  • Unique ID Generation with the new linearizable IAtomicLong (--workload cp-id-gen-long): In this test, each client generates a unique long id by using a linearizable IAtomicLong instance and we validate uniqueness of generated ids.

  • Compare-and-swap Register with the new linearizable IAtomicLong (--workload cp-cas-long): In this test, clients randomly perform write and compare-and-swap operations.

  • Compare-and-swap Register with the new linearizable IAtomicReference (--workload cp-cas-reference): In this test, clients randomly perform write and compare-and-swap operations.

Please see the README file for more details.

We are happy to answer any questions related to the new impls and tests.

Thanks in advance for your review and happy testing!

Co-authored-by: Mehmet Dogan mehmet@hazelcast.com
Co-authored-by: Ensar Basri Kahveci basri@hazelcast.com

@aphyr

aphyr approved these changes Feb 28, 2019

Copy link
Collaborator

left a comment

Hey! Happy to merge this, but I've left some comments that might help improve the tests going forward.

lockConfig1 (FencedLockConfig. "jepsen.cpLock1" 1)
lockConfig2 (FencedLockConfig. "jepsen.cpLock2" 2)

_ (.setLeaderElectionTimeoutInMillis raftAlgorithmConfig 1000)

This comment has been minimized.

Copy link
@aphyr

aphyr Feb 28, 2019

Collaborator

Since you're not using any of these return values, and there's no bindings after them, you can move all of these expressions into the body of the let expression, and drop the _s.

@@ -48,38 +50,44 @@
(def pid-file (str dir "/server.pid"))
(def log-file (str dir "/server.log"))

(def CLIENT_ID_SEPARATOR "xxxxxxxxxx")

This comment has been minimized.

Copy link
@aphyr

aphyr Feb 28, 2019

Collaborator

This feels a bit out of place for Clojure constants; consider (def client-id-separator ...)

(info err)
(info exit)
(assert (zero? exit)))))
)

This comment has been minimized.

Copy link
@aphyr

aphyr Feb 28, 2019

Collaborator

It looks like this might be mis-formatted--did you mean to take this back a level of indentation?

Show resolved Hide resolved hazelcast/src/jepsen/hazelcast.clj
:cas (let [[currentV newV] (:value op)]
(if (.compareAndSet atomic-long currentV newV)
(assoc op :type :ok)
(assoc op :type :fail :error :cas-failed)

This comment has been minimized.

Copy link
@aphyr

aphyr Feb 28, 2019

Collaborator

This isn't super important, but idomatically, you'd close off your parens at the end of this line, instead of scattering them onto later lines. You might want to do a pass through the source to clean these up; not critical, but it'll help you blend in with normal Clojure style. :)

(defrecord OwnerAwareMutex [owner]
Model
(step [this op]
(let [client (getClient op)]

This comment has been minimized.

Copy link
@aphyr

aphyr Feb 28, 2019

Collaborator

Ditto here; I think you can get rid of external client IDs and use processes IDs instead.

(defrecord FencedMutex [owner lockFence prevOwner]
Model
(step [this op]
(let [client (getClient op) fence (getFence op)]

This comment has been minimized.

Copy link
@aphyr

aphyr Feb 28, 2019

Collaborator

Again here.




(defrecord ReentrantFencedMutex [owner lockCount currentFence highestObservedFence highestObservedFenceOwner]

This comment has been minimized.

Copy link
@aphyr

aphyr Feb 28, 2019

Collaborator

hooo boy, OK, you're on your own here haha

:checker (checker/linearizable)
:model (model/mutex)}
:non-reentrant-cp-lock {:client (fenced-lock-client "jepsen.cpLock1")
:generator (->> [{:type :invoke, :f :acquire :value (.toString (UUID/randomUUID))}

This comment has been minimized.

Copy link
@aphyr

aphyr Feb 28, 2019

Collaborator

Did you mean for every acquire to use the same value? Why choose a random value at all?


round="1"

while [ ${round} -le ${repeat} ]; do

This comment has been minimized.

Copy link
@aphyr

aphyr Feb 28, 2019

Collaborator

You can also use the built-in --test-count n option.

@metanet

This comment has been minimized.

Copy link
Contributor Author

commented Mar 1, 2019

Hey! Happy to merge this, but I've left some comments that might help improve the tests going forward.

Thanks for your review! We will address them and will update the PR accordingly.

@metanet metanet force-pushed the hazelcast:cp-subsystem branch from 3613e3e to 2aa62e5 Mar 4, 2019

metanet and others added some commits Feb 27, 2019

Introduce Jepsen tests for the new CP Subsystem module of Hazelcast IMDG
Please see the README file for details

Co-authored-by: Mehmet Dogan <mehmet@hazelcast.com>
Co-authored-by: Ensar Basri Kahveci <basri@hazelcast.com>
Handle the PR review comments
* Follow clojure's style rules
* Get rid of sleep calls and use gen/stagger instead
* Track Hazelcast clients name between test clients and models via an atom map
* Simplify the repeat script
* No logging in models
* Simplify logging and Jepsen op result creation in CP test clients
* Other minor improvements based on the PR review comments

Co-authored-by: Mehmet Dogan <mehmet@hazelcast.com>
Co-authored-by: Ensar Basri Kahveci <basri@hazelcast.com>

@metanet metanet force-pushed the hazelcast:cp-subsystem branch from 2aa62e5 to 6b2d9ef Mar 4, 2019

@metanet

This comment has been minimized.

Copy link
Contributor Author

commented Mar 4, 2019

Hi @aphyr

Thanks for all your comments. They have been quite useful for improving quality of our test suite. I think we handled all of them now. You can see the description in the second commit for details.

Would you like to have another look before merging the PR?

Thanks in advance,

@aphyr

This comment has been minimized.

Copy link
Collaborator

commented Mar 4, 2019

Hey! I'm short on time and this kind of review takes a while, but skimming, this looks okay! Thanks for the update. :)

@aphyr aphyr merged commit 2d7e042 into jepsen-io:master Mar 4, 2019

1 check passed

Travis CI - Pull Request Build Passed
Details
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.