This is a very small feature release with support for some upcoming tests.
New Features
- jepsen.cli's test runner commands now accept a --leave-db-running flag, which leaves the database available for inspection at the end of a test run. Helpful for in-place debugging!
- jepsen.util/ex-root-cause extracts the root cause of an exception, which is useful for clients which like to wrap their errors in deep/unpredictable layers of exceptions.
Assets
2
This release focuses on making it easier to write complex tests with many types of failures and workloads. New protocols in jepsen.db provide hooks for killing, starting, pausing, and resuming databases, as well as identifying current primary nodes. A combined nemesis makes it easy to write test suites which hammer a system with random mixtures of faults, and a new test runner takes some of the busywork out of writing comprehensive test suites. There are no significant API changes, but there are several important bugfixes, including correctness fixes in the experimental jepsen.tests.cycle.append. These issues did not affect any Jepsen report, but may have led to false positives for other users.
Thanks, as always, to everyone who contributed patches and feedback. :)
New Features
- jepsen.cli/test-all-cmd: a CLI command for running a whole test suite in one pass, with unified error reporting.
- jepsen.nemesis.combined: A nemesis (and generator) which mixes process kills, pauses, clock skew, and network partitions. Faults, intervals, and target nodes are tunable. Also provides functions for composing nemesis+generator packages.
- jepsen.db/Primary: an optional protocol for databases that can identify primary nodes.
- jepsen.db/Pause: an optional protocol for databases that can be paused and resumed.
- jepsen.db/Process: an optional protocol for databases that can be killed and started.
- jepsen.generator/flip-flop: alternates between two generators.
Minor Changes
- Integration tests are now much quieter in their logging.
- checker.perf/plot! now includes output from gnuplot when throwing gnuplot-related exceptions.
- docker compose can now expose DB ports for inspection from your docker host.
- jepsen.control.util/grepkill can now take keyword signals, like :kill.
- jepsen.util/parse-long: it's long past time.
- jepsen.control.util/wget can now take usernames and passwords
Dependency Upgrades
- codox 0.10.7
- knossos 0.3.6
- gnuplot 0.1.2
- Fipp 0.6.14
Bugfixes
- jepsen.util/name+: fixed a bug where this function always used pr-str.
- jepsen.tests.cycle.append: fix a bug where internal consistency checks could compute incorrect expected orders after reading
nilvalues. - jepsen.tests.cycle.append: we no longer incorrectly find duplicates and incompatible orders in aborted reads.
- jepsen.os.centos: update dpkg version used for installing start-stop-daemon.
- jepsen.nemesis: fixed a misleading error message which said it expected :type :ok, not :type :info
Assets
2
This is mostly an ergonomics & bugfix release, with a few new minor checkers. In particular, you may want to try the stats and unhandled-exceptions checkers, which can help you avoid issues where your test passes because every operation failed! We also track exceptions thrown by client operations, and can summarize them to tell you what kinds of exceptions you aren't catching. This can help make your tests more robust without endless scrolling through logs. We've added some retries for flaky SCP downloads, and made logged exceptions more useful in some places. Plus more!
Special thanks to Vojtech Juranek for JDK12 compatibility, and to everyone else who contributed patches and feedback. :)
New Features
- jepsen.checker/stats, jepsen.checker/unhandled-exceptions: some basic statistics and error reporting which can be applied to almost any test.
- jepsen.tests.cycle.append can now detect internal consistency violations within transactions.
API Changes
- jepsen.test.cycle.append now has a generator and a default test map which makes it easier to build append tests.
- When exceptions are thrown by Client/invoke!, we attach the exception (as clojure data) to the generated
:infoop under the:exceptionkey.
Minor Changes
- jepsen.control/download now retries some SCP failures. These have been traditionally flaky.
- Tests now log much less garbage to the console.
- jepsen.nemesis.time now stops the ntp service, in addition to ntpd, during setup.
- We're now compatible with JDK12.
- jepsen.tests.cycle.append doesn't generate empty transactions as a part of its workload.
- jepsen.reconnect now logs the full exception when a reconnectable error occurs.
Bugfixes
- Exceptions thrown in (e.g.) OS and DB setup could be propagated incorrectly as BrokenBarrierExceptions, which, while technically correct, didn't provide much useful information about what went wrong. We now make a special effort to provide useful exceptions.
- jepsen.control.util/grepkill now properly catches
no such processerrors, which could happen when racing to kill a process.
Assets
2
This is a big release! We've got a bunch of bugfixes--most importantly, an issue which allowed tests.long-fork to fail to find long fork anomalies, and a long-standing bug which wrote invalid .fressian files. Plot rendering has been totally re-worked, which standardizes behavior between several types of plot that used to do their own thing, and adds colorizable nemeses to all the usual plots. There's also a new, somewhat experimental set of tests for cycle detection in jepsen.tests.cycle. Finally, we've got several quality-of-life improvements around debuggability and error handling: better log messages for jepsen.control errors, crashes in analyses, and more careful choices about which exceptions to throw when more than one occurs concurrently, or sequentially, during a test.
Special thanks to Kit Patella and Peter Alvaro for their work and discussion around plotting and cycle detection, and to Craig Pastro, who made several documentation fixes and improvements to Docker support.
New Features
- Totally re-worked plots. Latency, rate, clock, and bank plots now have a unified system for rendering nemesis operations. Nemesis ops have nice colors that match their legends. Fixed a whole bunch of edge cases with rendering outside plot ranges. Fixed issues with autoscaling. Fixed issues with multiple start operations followed by a single stop. Fixed several issues with crashes when plotting short or empty histories. Fixed issues with nemeses which extended to the end of the test. You can now include a :plot map to tests, specifying how to classify, label, and colorize nemeses in plots.
- jepsen.tests.cycle: New tests based on cycle detection between operations. It's functional, finds bugs, and the bugs it finds all check out so far, but expect some API changes and re-organization as we refine it for use in additional databases.
API Changes
- jepsen.tests.linearizable-register: now uses process-limit, rather than limit, by default. This should make register tests better at finding bugs, and less likely to become incredibly expensive to analyze.
Minor Changes
- jepsen.txn 0.1.1: adds some additional support functions for transactional histories
- dom-top 1.0.5
- jepsen.core: when threads race to abort, try to throw more meaningful exceptions, rather than broken barriers/interrupts.
- Fixed a bug in internal integration tests for teardown
- jepsen.control.util/wget now retries NXDOMAIN wget errors. I know, this shouldn't happen. EC2's DNS is apparently awful?
- jepsen.faketime: add an installer for our custom build of faketime supporting CLOCK_MONOTONIC_COARSE
- jepsen.control/exec exceptions now include a human & logger-friendly error message as well as data
- jepsen.db/cycle no longer swallows exceptions that occur during teardown!. This created hard-to-debug situations.
- jepsen.control now uses real-pmap from dom-top.
- jepsen.core: log a special message when :valid? is :unknown
- jepsen.control.util/start-daemon! now logs the escaped command line used to start the daemon, which is really helpful for debugging startup issues.
- jepsen.core/run! now throws the original exception if a test aborts and, during cleanup, an error occurs while snarfing logs. The log-snarfing message is still logged, but it's probably not as useful as the original error that interrupted the test!
- jepsen.txn/reduce-mops: a helper for writing reductions over ever micro-op in a history
- Docker support now uses networks instead of links, some other assorted updates for Debian Stretch.
- jepsen.control.util/signal! sends a signal to a process by name
- jepsen.os.debian now installs dirmngr by default.
Bugfixes
- jepsen.tests.long-fork often failed to find long forks. This was a serious issue which could have allowed tests to pass when they should have failed. Now fixed.
- jepsen.store: fixed a longstanding bug with writing invalid .fressian files for tests containing sets
- jepsen.faketime/wrap!: if a test crashes during wrap!, don't get stuck with wrapper scripts without mode +x
- jepsen.control.util/grepkill! no longer throws when no processes match
- jepsen.tests.sequential: fixed a misleading namespace docstring which mischaracterized the invariants the test looked for. The test itself was OK; the docs were just wrong.
- Don't copy temporary files into the control docker image, which could cause errors when copying symlinks referring to directories created during previous runs.
Assets
2
This is a small release to provide support for Debian Stretch. Debian Jessie mirrors were shut down recently; I thought that as amd64 users we'd be supported via LTS, but this was not the case.
Minor Changes
- Tests now log a message when test relative time begins
- Support for Debian Stretch
Bugfixes
- Fixed a bug causing the smartos namespace to fail to compile
Assets
2
New Features
- When tests crash, Jepsen will write the stacktrace to that test's
jepsen.logfile for you - Named locks: a concurrency primitive for locking a dynamic pool of resources by some identifier
- Knossos now supports timeouts to help bound search time
jepsen.checker/set-fullreads can now use vectors, not just sets, and will flag duplicate values- Tests now take a :logging map, which can override package log levels. Helpful for tracing, or noisy clients
- Latency and rate graphs now render different kinds of nemesis operations in separate tracks, like gantt charts, with colors and customizable legends.
jepsen.generator/process-limit: bounds number of processes, rather than number of operations. Helpful for linearizable tests, where process concurrency is the dominant factor in complexity.jepsen.generator/seq-all: likeseq, but emits every element of each generator it's given. Useful for constructing an infinite series of finite generators.- New type of test:
jepsen.tests.causal-reverse, which looks for incompatible read orders in serializable systems. jepsen.os.ubuntu: supports running tests on Ubuntu.- Docker scripts can also set up Ubuntu nodes with
--ubuntu
API Changes
- Keyword nodes (
:n1) are no longer supported. We've all been using string node names for a few years now. - Tests no longer take a
:modelkey. Only a few checkers used them; you provide models to checkers on construction now. jepsen.control.utilnow throws ex-info exception maps using Slingshot, which means you can pattern-match return codes in exception handlers for shell commands usingslingshot/try+. Exceptions fromexecalso have their stdout, stderr, and command neatly separated into different fields, which should cut down on regex tomfoolery.
Minor Changes
- Additional test coverage
- Knossos 0.3.4
jepsen.checker/counteris now more precise: it filters out failed operations- TravisCI integration means we'll be more rigorous about tests and PRs
- Performance improvements to
jepsen.checker/set-full
Bugfixes
- Clock skew plots no longer explode on empty histories
- Bank plots no longer explode on empty histories
- Fixed a nullpointerexception serializing empty multisets
- Documentation fixes
- Typo fixes in the tutorial
Notes
jepsen.generator/time-limit continues to be bad; it contains a least two race conditions around nested time limits that can mostly work, but can also ruin your life. We need to fundamentally redesign generators.
Assets
2
New Features
- jepsen.checker/set-full: A new set checker which supports reads throughout the lifecycle, as well as linearizable and eventually consistent checker modes. This checker provides quantitative bounds on stale and lost reads, including latency quantiles for visibility. This is still somewhat experimental--in particular, it may, for systems with stale reads, report spurious lost records at the end of history--records which would have appeared if given some time to cool off before reading. Check the output and history carefully.
- Jepsen now maintains a
currentsymlink for the test which is presently running;latestrefers to the last completed test. Helpful for looking through logfiles as tests are running, and debugging tests which crashed. - checker/clock-plot plots relative clock offsets, as recorded by nemesis.time ops
- generator/map: applies a function to each operation generated by some other generator
- nemesis/timeout: wrap any nemesis and force its operations to time out. Helpful when nemeses can get stuck doing something to a database.
API Changes
- jepsen.adya renamed to jepsen.tests.adya. We're moving reusable workload-specific support namespaces under jepsen.tests.
Minor Changes
- Upgrade to tools.cli 0.4.1
- Upgrade to tools.logging 0.4.1
- Upgrade to tea-time 1.0.1
- Upgrade to dom-top 1.0.4
- Debian now installs apt-transport-https by default
- nemesis/partitioner can now take grudges as values, allowing generators to control partition topologies
- nemesis/node-start-stopper targeting fns can optionally take a test map
- Use Fipp, a faster pretty-printer, for writing EDN output like histories and analysis results. Significant speedups!
- jepsen.util/map-keys: transforms keys in a map by applying a function to each
- jepsen.store: can now serialize java.util.Instants
- jepsen.web logs parse errors when reading results
- jepsen.store/load-results can now read back defrecords. May come to regret this...
Bugfixes
- Jepsen now catches all Exceptions when reopening clients, not just RuntimeException.
debian/install!now tells debian that the frontend is noninteractive, which fixes occasional dpkg-preconfigure errors- Fix several race conditions in jepsen.core which allowed workers to deadlock or move on to new phases, like a nemesis running operations while clients were still setting up the test.
- jepsen.nemesis.time no longer crashes horribly when setting up on nodes where NTP is not yet installed
- jepsen.generator.f-map now passes through nils instead of calling f on them, since
nilrepresents the end of a generator
Assets
2
New Features
- nemesis.time now supports CentOS as well
API Changes
Minor Changes
- CentOS now installs compiler tools by default
Bugfixes
- Performance graphs no longer crash on empty histories
Assets
2
New Features
- A new namespace,
jepsen.tests.bank, provides support for running snapshot-isolated bank tests, including visualizations! - A new namespace,
jepsen.tests.long-fork, looks for long forks, an anomaly possible under parallel snapshot isolation - For flaky databases, you can now throw a particular type of exception to trigger automatic retries in DB setup
jepsen.control/daemon-running?checks to see if pidfiles are alive- Jepsen now downloads logs automatically if the JVM is interrupted; e.g. if a test crashes or if you ^C
- Generators now provide a somewhat more helpful prn/pprint representation
jepsen.generator/time-limitnow interrupts threads when the time limit expires, rather than waiting up to dt secondsjepsen.cli/single-test-cmdnow also provides ananalyzecommand for re-running an analysis on the last test. This is a gross hack, but really helpful.- CLI tests can now pass
--nodes n1,n2,n3instead of passing--node n1 --node n2, ...
API Changes
jepsen.checker/setandqueuenow emit absolute counts, rather than ambiguous fractions.
Minor Changes
jepsen.checker/setnow accepts any type of collection, not just sets. We're laying the groundwork for multiset checkers- The web interface now shows test times with punctuation
jepsen.nemesisdefault nemeses now use the Nemesis protocoljepsen.generator/delayandstaggernow validate their arguments are non-negative integers
Bugfixes
jepsen.checker.perf/qs->colorsworks with more than six quantiles now.jepsen.generator/phasesandconcatno longer make you wade through every phase in order to get ops from later phases. This fixes a common issue where operations in a test with a waiting phase had to wait on every final operation.
Assets
2
This is a minor release for performance and usability. Stuff that should have made it into 0.1.7, but I only remembered while re-writing the tutorial. The tutorial is also updated for 0.1.8, and includes two totally new chapters on command line parameters and breaking up tests into composable workloads.
Improvements
- Network partitions are now significantly faster, thanks to an optional protocol for making all network changes at once, instead of via separate commands.
- control.net/ip resolution is now memoized; also speeds up partitions.
- Checking, writing histories, and writing results is now parallelized to take full advantage of multi-core systems. On my 48-way Xeon, this cuts analysis phases from 10+ minutes to ~20 seconds.
- checker/concurrency-limit can limit parallelization if you have an expensive checker to run, or if you're running in a CI context where turnaround time is less important.
- The web interface now serves common files (like logs) with utf-8 content types, fixing some encoding bugs. Emoji table flips look correct now!