Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Test crashed when failure with :type :info #485

Closed
yito88 opened this issue Aug 20, 2020 · 3 comments
Closed

Test crashed when failure with :type :info #485

yito88 opened this issue Aug 20, 2020 · 3 comments

Comments

@yito88
Copy link

yito88 commented Aug 20, 2020

I got an error from independent concurrent-generator when I'm migrating my tests from Jepsen 0.1.18 to 0.2.0.

WARN [2020-08-20 09:05:46,585] main - jepsen.core Test crashed!
clojure.lang.ExceptionInfo: Generator threw  #error {
 :cause Key must be integer
 :via
 [{:type java.lang.IllegalArgumentException
   :message Key must be integer
   :at [clojure.lang.APersistentVector assoc APersistentVector.java 347]}]
 :trace
 [[clojure.lang.APersistentVector assoc APersistentVector.java 347]
  [clojure.lang.APersistentVector assoc APersistentVector.java 18]
  [clojure.lang.RT assoc RT.java 827]
  [clojure.core$assoc__5416 invokeStatic core.clj 191]
  [clojure.core$update invokeStatic core.clj 6202]
  [clojure.core$update invoke core.clj 6188]
  [jepsen.independent.ConcurrentGenerator update independent.clj 210]
  [jepsen.generator.OnThreads update generator.clj 851]
  [jepsen.generator.OnThreads update generator.clj 851]
  [jepsen.generator.Any$updater__3853 invoke generator.clj 923]
  [clojure.core$mapv$fn__8445 invoke core.clj 6912]
  [clojure.lang.PersistentVector reduce PersistentVector.java 343]
  [clojure.core$reduce invokeStatic core.clj 6827]
  [clojure.core$mapv invokeStatic core.clj 6903]
  [clojure.core$mapv invoke core.clj 6903]
  [jepsen.generator.Any update generator.clj 923]
  [jepsen.generator.TimeLimit update generator.clj 1231]
  [jepsen.generator.FriendlyExceptions update generator.clj 675]
  [jepsen.generator.Validate update generator.clj 648]
  [jepsen.generator.interpreter$run_BANG_ invokeStatic interpreter.clj 232]
  [jepsen.generator.interpreter$run_BANG_ invoke interpreter.clj 180]
  [jepsen.core$run_case_BANG_ invokeStatic core.clj 220]
  [jepsen.core$run_case_BANG_ invoke core.clj 212]
  [jepsen.core$run_BANG_$fn__9722$fn__9729 invoke core.clj 345]
  [jepsen.core$run_BANG_$fn__9722 invoke core.clj 342]
  [jepsen.core$run_BANG_ invokeStatic core.clj 328]
  [jepsen.core$run_BANG_ invoke core.clj 254]
  [cassandra.runner$test_cmd$fn__2816 invoke runner.clj 108]
  [jepsen.cli$run_BANG_ invokeStatic cli.clj 316]
  [jepsen.cli$run_BANG_ invoke cli.clj 246]
  [cassandra.runner$_main invokeStatic runner.clj 120]
  [cassandra.runner$_main doInvoke runner.clj 118]
  [clojure.lang.RestFn invoke RestFn.java 703]
  [clojure.lang.Var invoke Var.java 447]
  [user$eval140 invokeStatic form-init4635373681402498005.clj 1]
  [user$eval140 invoke form-init4635373681402498005.clj 1]
  [clojure.lang.Compiler eval Compiler.java 7177]
  [clojure.lang.Compiler eval Compiler.java 7167]
  [clojure.lang.Compiler load Compiler.java 7636]
  [clojure.lang.Compiler loadFile Compiler.java 7574]
  [clojure.main$load_script invokeStatic main.clj 475]
  [clojure.main$init_opt invokeStatic main.clj 477]
  [clojure.main$init_opt invoke main.clj 477]
  [clojure.main$initialize invokeStatic main.clj 508]
  [clojure.main$null_opt invokeStatic main.clj 542]
  [clojure.main$null_opt invoke main.clj 539]
  [clojure.main$main invokeStatic main.clj 664]
  [clojure.main$main doInvoke main.clj 616]
  [clojure.lang.RestFn applyTo RestFn.java 137]
  [clojure.lang.Var applyTo Var.java 705]
  [clojure.main main main.java 40]]}
...
Context:

{:time 6236131371,
 :free-threads #object[io.lacuna.bifurcan.Set 0x59d5a6fd "{1}"],
 :workers {0 0, 1 4, 2 2, :nemesis :nemesis}}
Event:

{:type :info,
 :f :write,
 :value [0 1],
 :time 6236131371,
 :process 1,
 :error :write-timed-out}
...
Caused by: java.lang.IllegalArgumentException: Key must be integer
	at clojure.lang.APersistentVector.assoc(APersistentVector.java:347)
	at clojure.lang.APersistentVector.assoc(APersistentVector.java:18)
	at clojure.lang.RT.assoc(RT.java:827)
	at clojure.core$assoc__5416.invokeStatic(core.clj:191)
	at clojure.core$update.invokeStatic(core.clj:6202)
	at clojure.core$update.invoke(core.clj:6188)
	at jepsen.independent.ConcurrentGenerator.update(independent.clj:210)
	at jepsen.generator.OnThreads.update(generator.clj:851)
	at jepsen.generator.OnThreads.update(generator.clj:851)
	at jepsen.generator.Any$updater__3853.invoke(generator.clj:923)
	at clojure.core$mapv$fn__8445.invoke(core.clj:6912)
	at clojure.lang.PersistentVector.reduce(PersistentVector.java:343)
	at clojure.core$reduce.invokeStatic(core.clj:6827)
	at clojure.core$mapv.invokeStatic(core.clj:6903)
	at clojure.core$mapv.invoke(core.clj:6903)
	at jepsen.generator.Any.update(generator.clj:923)
	at jepsen.generator.TimeLimit.update(generator.clj:1231)
	at jepsen.generator.FriendlyExceptions.update(generator.clj:675)
	... 33 common frames omitted

The error always happens when an operation fails and returns op with {:type :info}. I think :info is required when a test is not sure whether a write has succeeded or not.

In my investigation, each thread-process pair of ctx's worker is basically a pair of a number like {0 0, 1 1, 2 2} when concurrency 3. But, when {:type :info}, the pair has different number like {0 0, 1 4, 2 2}.
I found the ctx's worker is updated when {:type :info} as below.

I guess {:type :info} can be used with 0.2.0 because it is used in other tests like scylla or stolen tests.
How should I use {:type :info} for unknown writes?

https://github.com/jepsen-io/jepsen/blame/master/jepsen/src/jepsen/generator/interpreter.clj#L225-L230

@aphyr
Copy link
Collaborator

aphyr commented Aug 20, 2020

I thiiiink what's happening here is that ConcurrentGenerator's update calls jepsen.generator/thread->group based on the generated event, but it's being given the new context resulting from process 1's crash. If you take a look at

gen (gen/update gen test ctx op')

There's where we perform the generator update--and we use the new context, rather than the same context that produced the operation. This is... a little tricky, but I think the right call here is to use some, but not all, of the context changes: specifically, we want to preserve the new timestamp and thread availability, but leave the thread->process mapping in place for the update, and only after update cycle to a new thread.

I'm busy with a client right now so I don't have time to write a full test case, but I put together a small patch which doesn't break any of the existing tests. Want to try out 8b9f04c and let me know if that helps?

@aphyr
Copy link
Collaborator

aphyr commented Aug 20, 2020

Just wanted to let you know--I've had a chance to test this properly, and I've released it under 0.2.1-SNAPSHOT--you should be able to bump your Jepsen dependency and have it work. :)

@yito88
Copy link
Author

yito88 commented Aug 21, 2020

@aphyr Thank you so much!
Our test worked well with 8b9f04c without the error 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants