Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rework API to better work with async and sync calls #46

Merged
merged 83 commits into from
Mar 5, 2016

Conversation

robhruska
Copy link
Member

These changes will all be backwards compatible, so this ought to result in a minor version update where clients can move code from the old Command over to the two new types at their leisure. Not sure if Command will get an [Obsolete] tag yet, though.

For an example of some of the API changes, check out Example.cs

Code Review: Two good entry points for review are CommandInvoker.cs and BaseCommand.cs. CommandInvoker delegates to a couple other invokers (BulkheadInvoker and BreakerInvoker) that could use review as well.

Separate sync/async Command APIs

Instead of Invoke() and InvokeAsync() on Command, implementations need to inherit from SyncCommand or AsyncCommand. This avoids the forced async-to-sync that's built into the library, and avoids the need for callers to make problematic async-to-sync conversions. Though convenient for callers, it's often a source of confusion, and also means that Mjolnir is more inefficient with async than it ought to be.

Currently, Commands are used like this:

// MyCommand : Command<TResult>

var command = new MyCommand(foo, bar, baz);
var value = await command.InvokeAsync();
// or
var value = command.Invoke(); // Sync equivalent.

This PR offers an alternative API:

var invoker = new CommandInvoker(); // Injectable via ICommandInvoker

// MyAsyncCommand : AsyncCommand<TResult>

var command = new MyAsyncCommand(foo, bar, baz);
var value = await invoker.InvokeThrowAsync(command, 1000); // Throws exceptions

var command = new MyAsyncCommand(fool, bar, baz);
var wrapped = await invoker.InvokeReturnAsync(command, 1000); // Wraps exceptions and returns
var value = wrapped.Value;

// MySyncCommand : SyncCommand<TResult>

var command = new MySyncCommand(foo, bar, baz);
var result = invoker.InvokeReturn(command);
var value = result.Value;

Throw and Return variants

The Return variants will catch and wrap Exceptions, preferring to always return a result to the caller (even when errors occur). Since Mjolnir's intent is to introduce fast failure to avoid failure cascading in unpredictable ways, it's useful for callers to handle that failure in ways beyond just re-throwing the exception upward. This could be handled by the caller using try/catch, but baking it into the Mjolnir API helps callers realize that they need to think about failure and make conscious decisions about how to handle it.

The Return variant returns a CommandResult<TResult> instead of just TResult. The wrapper contains the result or the causing exception if the command failed. The Throw method overrides don't return a wrapped result since it would add an unnecessary "unwrapping" step for callers.

Timeout override

Default and configured timeouts still apply, but callers can also provide a per-call timeout if they desire more lenient/rigid SLAs.

Improved testability

This will also make commands more unit-testable, since ICommandInvoker will be injectable (where injecting Command behavior is currently a challenge).

Use semaphore bulkheads (without queues) by default

I'm not sold on the use of thread pools as a bulkhead. Their strongest advantage is the ability to have a small queue in front that allows for absorbing some latency and bursts. However, there are a number of disadvantages.

  • They're unoptimal for async. Fully async code means that threads are returned to the thread pool until the async work is done. Introducing an entire thread to "wrap" that and limit concurrency feels nonsensical to me.
  • SmartThreadPool is a bit rough. We currently have to deliver our own forked copy of STP with Mjolnir because the main repo has gone stale and not merged PRs with functionality that we need. We've also had struggles with context flowing across STP threads, and the library isn't really well-equipped for async work.
  • Queuing and pooling makes Mjolnir harder to reason about. Since the message gets queued off the current thread, keeping track of it and getting the result back to the appropriate context is a bit tricky. It works, but means a fair amount of hackery around delegates and WorkItems. It also means that we've forced both synchronous and async code down the same code path, which makes the async bits a lot heavier (see bullet 1).

I'd like to try using semaphores as the default bulkhead. It does mean (for now, at least) getting rid of the queues, but they can be re-introduced if needed in the future.

Better interface for hooking into metrics

The existing interface (IStats [docs]) is pretty vague and free-form, and makes it difficult to adapt events into collectors that use standardized metric types like timers and meters.

A new interface, IMetricEvents, is available that should make that easier. It provides specific methods for events that occur, and recommends a metric type for each. A couple of example methods:

void CommandInvoked(string commandName, double invokeMillis, double executeMillis, string status, string failureAction);
void RejectedByBulkhead(string bulkheadName, string commandName);
void BulkheadConfigGauge(string bulkheadName, string bulkheadType, int maxConcurrent);

The original IStats were targeted both at profiling and operational events; IMetricEvents focuses less on method profiling and more on relevant bulkhead, breaker, and command events. Parity between IStats and IMetricEvents wasn't a goal, so they're fairly different.

The new BaseCommand/CommandInvoker pipeline will not publish the old IStats events.

The adapter we use will also be made available (once complete).

For Release Notes

A list of items that should end up in the release notes.

Differences between old and new Command

  • The Invoke INFO log is Invoke for both sync and async calls (Invoke Command={0} Breaker={1} Pool={2} Timeout={3}). In the old Command, it's InvokeAsync for the async call.
  • Exceptions aren't wrapped in a CommandFailedException. Instead, the root cause exception is simply rethrown with some Command-specific data attached to it.
  • "Pool" is now "Bulkhead" in a handful of configs, logs, and properties. If porting old Commands over to the new base classes, be sure to duplicate any configured pool values with the "bulkhead" configuration key.
  • New commands (sync/async) have a timeout configuration key that starts with mjolnir.command. The old commands were inconsistent with the rest of the library and just started with command.. Examples:
    • Old: command.my-group.MyCommand.Timeout=1000
    • New: mjolnir.command.my-group.MyCommand.Timeout=1000
  • New commands default to a 2000ms timeout instead of 15000ms.

Possible Remaining Work

  • [Command] support for sync and async commands
    • This may go into a future release that also involves pulling the invoker out of the main Mjolnir project and into its own (e.g. Hudl.Mjolnir.Interceptor), which would help get rid of a dependency on Castle.Core.
  • Ability to set separate bulkhead and breaker groups on new commands
  • Support for returning Task (non-generic) from execute methods
    • This is tricky - if anyone's got suggestions on how to do it without heavily duplicating code paths, I'd love some ideas. Everything I've tried ends up kind of lame.
  • Ability to roll out commands in a "dry-run" or "measure-only" mode, where nothing will get rejected, but MetricEvents will still fire. This would allow tuning before committing to a set of failure thresholds.

@robhruska robhruska mentioned this pull request Nov 16, 2015
var executeStopwatch = Stopwatch.StartNew();
var status = CommandCompletionStatus.RanToCompletion;

var cts = timeout.HasValue
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where are you passing through the Command's CancellationToken? i.e. if one has been explicitly passed into the call.
This might be happening somewhere else, but I can't find it at quick glance.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The GetActualTimeout() call on L61 should get the right value given configuration and default values - is that what you're referring to? I'm still kind of rethinking this a bit, too, for testing and logging purposes.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ignore me, I was thinking that there was a user specified CancellationToken passed through here, but actually there's not.....carry on 👀

@lewislabs
Copy link
Contributor

This looks so much cleaner and easier to debug!


// ReSharper disable NotAccessedField.Local
// Don't let these get garbage collected.
private readonly GaugeTimer _timer;
private readonly GaugeTimer _timer2;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we maybe rename this one to _metricsTimer, and the other one to _statsTimer? I was confused about why we had two, until I read the comment below.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Renamed these.

@deflume1
Copy link

deflume1 commented Dec 2, 2015

This looks sweet, nice work. :shipit:

@@ -12,5 +12,11 @@ internal static class ConcurrentDictionaryExtensions
var lazy = dictionary.GetOrAdd(key, new Lazy<V>(() => valueFactory(key), LazyThreadSafetyMode.PublicationOnly));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe call the method below? Also I think we spoke about PublicationOnly before, should we make the default ExecutionAndPublication, since the most common use case is to not allow the valueFactory to be called more than once.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I might even get rid of this one altogether and have the caller make a decision about the concurrency they'd like.

@lewislabs
Copy link
Contributor

General comment: It looks like the new default timeout will be 2000ms, rather than 15000. Should it be clear somewhere in code/release notes that this is changing when you switch to the new CommandInvoker approach?

@robhruska
Copy link
Member Author

@lewislabs Yeah, good call. I added a bullet to the release notes. Any thoughts on the change to 2000 as a default?

@lewislabs
Copy link
Contributor

I think it's a good idea to bring it down. Did @marcusdarmstrong have some analysis on timeouts that we could use for a suitable default?

{
void Release();
bool TryEnter();
int Available { get; }
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to rename this to CountAvailable or something indicative of it being a count?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Renamed to CountAvailable.

@robhruska robhruska mentioned this pull request Dec 23, 2015
This should be okay. Two risks: 1) if locking code ever gets introduced
into the actual constructors (unlikely), there could be deadlocks, and 2)
we're changing the exception caching mode so that exceptions will be
cached and re-thrown on subsequent attempts to initialize the Lazy, which
is probably fine. We don't see any initialization errors today.
robhruska added a commit that referenced this pull request Mar 5, 2016
Rework API to better work with async and sync calls
@robhruska robhruska merged commit 0d1500a into next Mar 5, 2016
@robhruska robhruska deleted the ReworkSyncAndAsyncApi branch March 5, 2016 23:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants