Rework API to better work with async and sync calls #46

robhruska · 2015-11-16T04:22:33Z

These changes will all be backwards compatible, so this ought to result in a minor version update where clients can move code from the old Command over to the two new types at their leisure. Not sure if Command will get an [Obsolete] tag yet, though.

For an example of some of the API changes, check out Example.cs

Code Review: Two good entry points for review are CommandInvoker.cs and BaseCommand.cs. CommandInvoker delegates to a couple other invokers (BulkheadInvoker and BreakerInvoker) that could use review as well.

Separate sync/async Command APIs

Instead of Invoke() and InvokeAsync() on Command, implementations need to inherit from SyncCommand or AsyncCommand. This avoids the forced async-to-sync that's built into the library, and avoids the need for callers to make problematic async-to-sync conversions. Though convenient for callers, it's often a source of confusion, and also means that Mjolnir is more inefficient with async than it ought to be.

Currently, Commands are used like this:

// MyCommand : Command<TResult>

var command = new MyCommand(foo, bar, baz);
var value = await command.InvokeAsync();
// or
var value = command.Invoke(); // Sync equivalent.

This PR offers an alternative API:

var invoker = new CommandInvoker(); // Injectable via ICommandInvoker

// MyAsyncCommand : AsyncCommand<TResult>

var command = new MyAsyncCommand(foo, bar, baz);
var value = await invoker.InvokeThrowAsync(command, 1000); // Throws exceptions

var command = new MyAsyncCommand(fool, bar, baz);
var wrapped = await invoker.InvokeReturnAsync(command, 1000); // Wraps exceptions and returns
var value = wrapped.Value;

// MySyncCommand : SyncCommand<TResult>

var command = new MySyncCommand(foo, bar, baz);
var result = invoker.InvokeReturn(command);
var value = result.Value;

Throw and Return variants

The Return variants will catch and wrap Exceptions, preferring to always return a result to the caller (even when errors occur). Since Mjolnir's intent is to introduce fast failure to avoid failure cascading in unpredictable ways, it's useful for callers to handle that failure in ways beyond just re-throwing the exception upward. This could be handled by the caller using try/catch, but baking it into the Mjolnir API helps callers realize that they need to think about failure and make conscious decisions about how to handle it.

The Return variant returns a CommandResult<TResult> instead of just TResult. The wrapper contains the result or the causing exception if the command failed. The Throw method overrides don't return a wrapped result since it would add an unnecessary "unwrapping" step for callers.

Timeout override

Default and configured timeouts still apply, but callers can also provide a per-call timeout if they desire more lenient/rigid SLAs.

Improved testability

This will also make commands more unit-testable, since ICommandInvoker will be injectable (where injecting Command behavior is currently a challenge).

Use semaphore bulkheads (without queues) by default

I'm not sold on the use of thread pools as a bulkhead. Their strongest advantage is the ability to have a small queue in front that allows for absorbing some latency and bursts. However, there are a number of disadvantages.

They're unoptimal for async. Fully async code means that threads are returned to the thread pool until the async work is done. Introducing an entire thread to "wrap" that and limit concurrency feels nonsensical to me.
SmartThreadPool is a bit rough. We currently have to deliver our own forked copy of STP with Mjolnir because the main repo has gone stale and not merged PRs with functionality that we need. We've also had struggles with context flowing across STP threads, and the library isn't really well-equipped for async work.
Queuing and pooling makes Mjolnir harder to reason about. Since the message gets queued off the current thread, keeping track of it and getting the result back to the appropriate context is a bit tricky. It works, but means a fair amount of hackery around delegates and WorkItems. It also means that we've forced both synchronous and async code down the same code path, which makes the async bits a lot heavier (see bullet 1).

I'd like to try using semaphores as the default bulkhead. It does mean (for now, at least) getting rid of the queues, but they can be re-introduced if needed in the future.

Better interface for hooking into metrics

The existing interface (IStats ^[docs]) is pretty vague and free-form, and makes it difficult to adapt events into collectors that use standardized metric types like timers and meters.

A new interface, IMetricEvents, is available that should make that easier. It provides specific methods for events that occur, and recommends a metric type for each. A couple of example methods:

void CommandInvoked(string commandName, double invokeMillis, double executeMillis, string status, string failureAction);
void RejectedByBulkhead(string bulkheadName, string commandName);
void BulkheadConfigGauge(string bulkheadName, string bulkheadType, int maxConcurrent);

The original IStats were targeted both at profiling and operational events; IMetricEvents focuses less on method profiling and more on relevant bulkhead, breaker, and command events. Parity between IStats and IMetricEvents wasn't a goal, so they're fairly different.

The new BaseCommand/CommandInvoker pipeline will not publish the old IStats events.

The adapter we use will also be made available (once complete).

For Release Notes

A list of items that should end up in the release notes.

Differences between old and new Command

The Invoke INFO log is Invoke for both sync and async calls (Invoke Command={0} Breaker={1} Pool={2} Timeout={3}). In the old Command, it's InvokeAsync for the async call.
Exceptions aren't wrapped in a CommandFailedException. Instead, the root cause exception is simply rethrown with some Command-specific data attached to it.
"Pool" is now "Bulkhead" in a handful of configs, logs, and properties. If porting old Commands over to the new base classes, be sure to duplicate any configured pool values with the "bulkhead" configuration key.
New commands (sync/async) have a timeout configuration key that starts with mjolnir.command. The old commands were inconsistent with the rest of the library and just started with command.. Examples:
- Old: command.my-group.MyCommand.Timeout=1000
- New: mjolnir.command.my-group.MyCommand.Timeout=1000
New commands default to a 2000ms timeout instead of 15000ms.

Possible Remaining Work

[Command] support for sync and async commands
- This may go into a future release that also involves pulling the invoker out of the main Mjolnir project and into its own (e.g. Hudl.Mjolnir.Interceptor), which would help get rid of a dependency on Castle.Core.
Ability to set separate bulkhead and breaker groups on new commands
Support for returning Task (non-generic) from execute methods
- This is tricky - if anyone's got suggestions on how to do it without heavily duplicating code paths, I'd love some ideas. Everything I've tried ends up kind of lame.
Ability to roll out commands in a "dry-run" or "measure-only" mode, where nothing will get rejected, but MetricEvents will still fire. This would allow tuning before committing to a set of failure thresholds.

lewislabs · 2015-11-16T10:59:39Z

Hudl.Mjolnir/Command/CommandInvoker.cs

+            var executeStopwatch = Stopwatch.StartNew();
+            var status = CommandCompletionStatus.RanToCompletion;
+
+            var cts = timeout.HasValue


Where are you passing through the Command's CancellationToken? i.e. if one has been explicitly passed into the call.
This might be happening somewhere else, but I can't find it at quick glance.

The GetActualTimeout() call on L61 should get the right value given configuration and default values - is that what you're referring to? I'm still kind of rethinking this a bit, too, for testing and logging purposes.

Ignore me, I was thinking that there was a user specified CancellationToken passed through here, but actually there's not.....carry on 👀

lewislabs · 2015-11-16T11:00:41Z

This looks so much cleaner and easier to debug!

Timeout tests are finicky and non-deterministic. Working on that.

deflume1 · 2015-12-02T16:13:32Z

Hudl.Mjolnir/Breaker/FailurePercentageCircuitBreaker.cs


        // ReSharper disable NotAccessedField.Local
        // Don't let these get garbage collected.
        private readonly GaugeTimer _timer;
+        private readonly GaugeTimer _timer2;


Could we maybe rename this one to _metricsTimer, and the other one to _statsTimer? I was confused about why we had two, until I read the comment below.

Renamed these.

deflume1 · 2015-12-02T17:04:10Z

This looks sweet, nice work.

lewislabs · 2015-12-03T11:04:57Z

Hudl.Mjolnir/Util/ConcurrentDictionaryExtensions.cs

@@ -12,5 +12,11 @@ internal static class ConcurrentDictionaryExtensions
            var lazy = dictionary.GetOrAdd(key, new Lazy<V>(() => valueFactory(key), LazyThreadSafetyMode.PublicationOnly));


Maybe call the method below? Also I think we spoke about PublicationOnly before, should we make the default ExecutionAndPublication, since the most common use case is to not allow the valueFactory to be called more than once.

Yeah, I might even get rid of this one altogether and have the caller make a decision about the concurrency they'd like.

lewislabs · 2015-12-03T18:05:04Z

General comment: It looks like the new default timeout will be 2000ms, rather than 15000. Should it be clear somewhere in code/release notes that this is changing when you switch to the new CommandInvoker approach?

robhruska · 2015-12-03T18:08:21Z

@lewislabs Yeah, good call. I added a bullet to the release notes. Any thoughts on the change to 2000 as a default?

lewislabs · 2015-12-04T11:45:17Z

I think it's a good idea to bring it down. Did @marcusdarmstrong have some analysis on timeouts that we could use for a suitable default?

tlil · 2015-12-07T18:14:34Z

Hudl.Mjolnir/Bulkhead/SemaphoreBulkhead.cs

+    {
+        void Release();
+        bool TryEnter();
+        int Available { get; }


Do we want to rename this to CountAvailable or something indicative of it being a count?

Renamed to CountAvailable.

This should be okay. Two risks: 1) if locking code ever gets introduced into the actual constructors (unlikely), there could be deadlocks, and 2) we're changing the exception caching mode so that exceptions will be cached and re-thrown on subsequent attempts to initialize the Lazy, which is probably fine. We don't see any initialization errors today.

Rework API to better work with async and sync calls

robhruska added 10 commits November 15, 2015 09:26

Prototyping a more appropriate sync/async API

640dc1f

Copypasta code through both sync and async paths

a54f908

Extract BreakerInvoker and BulkheadInvoker

20c0c13

Extract CommandInvoker

4d7039c

Rework exception handling and remove fallbacks

703cdec

Remove commented out fallback code

6ef2941

Fix interfaces and instantiation

0dd80d4

Use semaphore bulkhead for new command pipeline

9d1a99a

Wire in timeouts, rework timeout calculation

3e0bffc

Extract a couple more classes

bd8bdf5

robhruska mentioned this pull request Nov 16, 2015

Version 2.6.0 #47

Merged

robhruska added 3 commits November 15, 2015 22:27

Merge remote-tracking branch 'origin/next' into ReworkSyncAndAsyncApi

74aac73

Begin unit test work, add ConfigureAwait calls

e3fbc31

Remove some fixed TODOs

2c65f26

lewislabs reviewed Nov 16, 2015
View reviewed changes

robhruska added 14 commits November 16, 2015 11:59

Rework cancellation support in CommandInvoker

b2f5c8e

A couple unit tests on InvokeAsync()

e282df9

More invoke unit tests

177dd5c

Timeout tests are finicky and non-deterministic. Working on that.

Create canceled token if timeout is 0

3b74dc5

Add IsIgnored to custom token, add more tests

9e70009

Make ignore timeouts config injectable

5e398eb

Rework descriptions for cancellation

2ee3af7

Unit tests for invoker when rejected

4a2ce3c

Fix ignored cancellation not applying to tokens

8b1e8aa

Synchronous versions of all the invoker unit tests

1611516

Minor comment updates

2fc189e

When exception is ignored, still bump metrics

a828f97

Replace example app with a single Example.cs

e866494

Remove ExampleConsole

364bcd9

Update tests for exception-throwing .Value

5aa2a44

deflume1 reviewed Dec 2, 2015
View reviewed changes

lewislabs reviewed Dec 3, 2015
View reviewed changes

tlil reviewed Dec 7, 2015
View reviewed changes

robhruska mentioned this pull request Dec 23, 2015

Question about usage #51

Closed

robhruska added 8 commits March 5, 2016 09:04

Minor renames, doc comments

b191d47

Rename timer variables for clarity

aa010d8

Rename timer variables for clarity

a6a7521

Add comments/thoughts on invocation timeouts

01b69e1

More renames, comments

0708b47

Convenience invoker provided by CommandContext

20539e3

Gut README, docs have been moved to the Wiki

64e3610

robhruska added a commit that referenced this pull request Mar 5, 2016

Merge pull request #46 from hudl/ReworkSyncAndAsyncApi

0d1500a

Rework API to better work with async and sync calls

robhruska merged commit 0d1500a into next Mar 5, 2016

robhruska deleted the ReworkSyncAndAsyncApi branch March 5, 2016 23:11

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Rework API to better work with async and sync calls #46

Rework API to better work with async and sync calls #46

robhruska commented Nov 16, 2015

lewislabs Nov 16, 2015

robhruska Nov 16, 2015

lewislabs Nov 16, 2015

lewislabs commented Nov 16, 2015

deflume1 Dec 2, 2015

robhruska Dec 2, 2015

robhruska Mar 5, 2016

deflume1 commented Dec 2, 2015

lewislabs Dec 3, 2015

robhruska Dec 3, 2015

lewislabs commented Dec 3, 2015

robhruska commented Dec 3, 2015

lewislabs commented Dec 4, 2015

tlil Dec 7, 2015

robhruska Dec 7, 2015

robhruska Mar 5, 2016

		@@ -12,5 +12,11 @@ internal static class ConcurrentDictionaryExtensions
		var lazy = dictionary.GetOrAdd(key, new Lazy<V>(() => valueFactory(key), LazyThreadSafetyMode.PublicationOnly));

Rework API to better work with async and sync calls #46

Rework API to better work with async and sync calls #46

Conversation

robhruska commented Nov 16, 2015

Separate sync/async Command APIs

Use semaphore bulkheads (without queues) by default

Better interface for hooking into metrics

For Release Notes

Possible Remaining Work

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lewislabs commented Nov 16, 2015

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

deflume1 commented Dec 2, 2015

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lewislabs commented Dec 3, 2015

robhruska commented Dec 3, 2015

lewislabs commented Dec 4, 2015

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment