One less leaky abstraction #87

Closed
wants to merge 3 commits into from
@ploeh

This refactors a dependency on Func to a dependency on IViewRepository without sacrificing the lifecycle issues surrounding Entity Framework object contexts.

Each commit is accompanied with extensive notes about my reasoning.

ploeh added some commits Mar 21, 2012
@ploeh ploeh Refactored ConferenceController,
in order to remove a couple of leaky abstractions. Taking a dependency on
Func<IViewRepository> indicates knowledge about a particular
implementation.

In addition, the implementation violated the Liskov Substitution Principle
by attempting to downcast the created repository to IDisposable. This was
also a leaky abstraction.

Much to my surprise, NO UNIT TESTS BROKE because of this refactoring.
Either this project has surprisingly loosely coupled tests, or else the
coverage could be better.
8327608
@ploeh ploeh Added a Virtual Proxy for OrmViewRepository.
The OrmViewRepositoryProxy is a pure infrastructure component, so I
decided to place it in the Composition Root of the application.

It's a Humble Object, so I didn't add any unit tests.

All unit tests are still green.
7b91213
@ploeh ploeh Removed yet a dependency on Func<IViewRepository>.
AFACT, this was the last dependency on this particular factory.

All in all, I think this refactoring removed more lines of code than it
added, and it also reduced the cyclomatic complexity of the code.

All tests are still green.
375aefd
@kzu

Are you sure the using around the Query will work? 'cause the query will be delay-executed when something else is tacked on after the Query() method call. With a disposed context, I'm not sure the query will actually succeed. Hence the need for the Func?

I'd add this as an integratino test of OrmViewRepositoryProxy (which you don't seem to have).

Thanks!

Oh, I didn't even notice that the method returns IQueryable because that's such a huge leaky abstraction that it didn't even cross my mind to look for it. Interfaces should never do that, because you can pretty much always add a projection or filter which can't be translated to all data stores. Thus, publicly exposing IQueryable violates the Liskov Substitution Principle and should be avoided.

If you need to look up data based on anything else than an Aggregate Root's ID, it most likely implies that the Aggregate Root is incorrectly partitioned. If CQRS has taught me anything so far it is that every time I've ever felt the need to query an Aggregate Root by anything else than its ID, it meant one of two things:

  • There's a different Aggregate Root with that ID waiting to be discovered and modeled
  • I should query a Read Model instead

We're doing CQRS here... Repositories/Aggregate Roots are not supposed to support complex queries. We have Read Models for that.

Complex queries are never going to scale against an event store. To access an aggregate root, you must know its ID.

Interesting... You said order ID and then reservation ID... Sounds to me like they are two different Domain concepts. If this is so, they should be modeled as two different Aggregate Roots, each accessible by their ID.

That would solve the problem.

This is a typical example of the kind of feedback (in the GOOS style) an API can give. If something in the API design seems problematic, it's often a sign that there's an implicit concept waiting to be discovered. As DDD (the book) describes, such an insight can help make the Domain Model (and hence the code) much simpler. I believe this is such an opportunity.

Let me ask something in return: how do you plan to be able to allow arbitrarily complex queries against an Event Store? My bet is that if you go down that route, you'll end up with a big number of NotSupportedException/NotImplementedException.

There's a few things to note here:

  • With regard to looking up sagas by something else than a saga-correlation-id, I noticed (from experience) this only becomes necessary in an "open" system (open in the sense that one communicates with other parties, that can or will not provide you the saga-correlation-id). In case of an open system, it's a matter of having consistent secondary indexes to be able to perform the lookup on something else than the saga-correlation-id. Having these will solve most correlation problems. Using the "sagastate<->message mapping" paradigm (which is basically what you get in NSB and MT(?)) you can write some pretty complex correlations. I can't really tell if that's needed here, but I doubt it.
  • With regard to ReservationId and OrderId: from what I can see in the code (aggregates, value objects and sagas), these have been modeled as different concepts, which seems the right thing to do. The fact that an Order is inside the Reservation BC seems odd and might well be the crux of the problem (if it was in another BC OrderId would just become an attribute of the Reservation, used for future correlation with that other BC).
  • Eventstores should be simple. From a puristic POV one could argue that retrieving events for replay in read models is a violation of a system level SRP, but YMMV and I'm sure lots of people would disagree. I tend to model them with only 2 functions "get the events for this aggregate id" and "store these events (or changeset/commit) for this aggregate id". As such, performing IQueryable style queries against the event store used by the domain is not my cup of tea. It might be better to denormalize it into something that allows you such freedom should you seek it (a domain log comes to mind).

Doesn't a Saga (workflow, I'd rather not call it a saga but will to stay consistent) require a combination of IDs in many cases to retrieve the state? If not I think that is a smell that there is a misuse of what may be a request/response pattern.

@yreynhout even so, IQueryable over events might be very valuable to business too, like http://kzu.to/srVn3P

But clearly not for performing "read style" queries (i.e. all aggregates of type T with some property == some value).

@danielkzu Whether you load up that state (ORM) or obtain it by replaying (ES) really is a matter of taste/preference. That's orthogonal to the use of that state to perform saga lookup (which is obviously easier if you have that state ready to query in the first place). Reusing the same state to satisfy two responsibilities (finding sagas and sending messages to aggregates), might or might not become a problem. Using consistent secondary indexes is just another tool in the box to get there (which coincidentally can be obtained by replaying as well), i.e. as an alternative to storing state.

Mark's (ploeh) CLA on file. Approved contributor.

@yreynhout how would you query saga state if all you have are ES events? reply the entire set in memory to get their current values and then filter in-memory? Doesn't seem feasible. To query by correlation ids, you need the current correlation ids for the saga, for all of them, to find the one you want. No?

This still goes to the same thing: why would you want ES to persist the saga state? (even more if you make it standard, like ISaga { Id, State, CorrelationIds } )...

@danielkzu It seems to me you don't know what secondary indexes are. I'd be storing the events AS WELL AS the correlation identifiers (as secondary index values) that correlate back to the saga id. The correlation identifiers are just another byproduct of the event (message) coming in or the commands (messages) going out. Just not storing the rest of the state you use internally (other responsibility as I pointed out earlier). Think of the "correlation identifiers to saga id mapping data" as a synchronously updated read model of the saga. Would it be more work? It depends, it might be/it might not be. Unlike the saga code in this sample, when dealing with sagas I only COPY/TRANSFORM data (inside the saga instance handlers) from event to command(s). I don't create "new data" inside a saga. Data should come from the outside. I admit, it's not easy living by this rule, but found it rewarding and challenging in terms of where to put things.

I'm not religious about these things. There's just multiple ways of going about it. If you want to use an ORM, fine (in which case I'd argue why not use an fx out there that does all these things for you).

Another, more subtle problem you might not have considered is that sagas COULD be long running. Evolving them is easy when using a projection based on events (i.e. replay to get your saga state).

[P.S. I'm using the term saga here but I think the word workflow is closer of a definition]

On another note: if you know that saga persistence is always going to be done by an ORM, then why hide it behind an interface?

@ploeh hide it how? IQueryable is very much THE ORM API (for ORMs that matter, anyway ;)).

@yreynhout again, if the workflow has a unified storage shape (ID, State, CorrelationIDs) why would evolution be harder?. Regarding using an fx for the correlation: absolutely. But picking one would be tough, and many (including Greg) usually say that "plain" CQRS doesn't need any kind of framework, and that you should be able to write a "saga framework" in an afternoon ;). So that's what we're trying to do: see how it looks all manual, see what an fx would buy us (it's part of the journey :)).

@danielkzu Please elaborate how you're about to evolve State (it's a blob or document)? When you're done, question why you'd bother in the first place ;)

@yreynhout: sagas could an ORM persistence of

abstract class Saga
{
Guid Id;
int State;
ICollection CorrelationIds;
}

That's it. How you evolve the saga? There's no schema to evolve. Was that your question?

Effectively, I gain the same flexibility as using ES, but I question the need for ES for a saga. What would you gain? Again? ;)

The ORM is hidden behind a Repository interface. It creates the illusion that it can be replaced with something else (e.g. an Event Store, document database, etc.).

All the arguments I've seen so far are based on the assumption that saga state will be supported by an ORM. That's fine, but then make that architectural decision explicit in the API. Right now the API is dishonest. Make it honest :)

@danielkzu I stand corrected ;-) The fact that state denotes "current state the statemachine is in" as opposed to something like SagaState in NSB, makes evolution a whole lot easier. However, in an ES based solution this field would not be required and allow the statemachine behind the saga to evolve without being tied to stored state. Not saying it is better, just different.

Wouldn't GetBySagaId() and GetByCorrelationId() suffice as methods on the repository?

@yreynhout What does the correlation id refer to in this case?

I'm not sure we went down the correct path or not but we have our workflow state in a separate class than the workflow. IWorkflowState has an Id property which allows you to create composite keys in a way that makes sense for that particular workflow. This makes the repository implementation dead simple and all we have is workflowRepository.GetById() even though different sagas have different ways to uniquely identify themselves.

This is working for us really well but something makes me feel like its not ideal :)

@ploeh IQueryable is pretty honest, I'd say. And RavenDB (document database) does support queryables.

Not sure what you'd like to turn it into, to make it more explicit that the underlying storage needs to support flexible querying...

@yreynhout correlation ids in our registration saga are: Saga ID, (seat) Registration (the lock on the seats) ID, and Order ID (the entire order, which may be booked, canceled, etc.).

GetById and GetByCorrelationId(string kind, guid id) would be more explicit, yes. But then again, we'd need to make the saga API again use CorrelationIds (generic), and the saga code will be doing lookups internally on that collection too in order to send commands.

For example: when the order is cancelled, the saga needs to send a command to release the reservation by ReservationID. So the "router" (the component -like NSB, MT, manually in our case- that lookups the saga instance to pass on the event) will lookup by OrderID and forward the OrderCancelled event, and the saga will do a reservationID lookup with, say this.CorrelationIds["ReservationId"], and use that to send the command to cancel the (seats) reservation.

I'm OK with that. In fact, that's the approach we had initially, but went for strong typed properties on the saga class itself as it seemed clearer, but I'm not so sure now ;)

This whole discussion reminds me of a workshop Udi Dahan once gave.

Basically, if you look at what an Order is, you will find that it most likely goes through a series of states during its lifetime: Created, Paid, Cancelled, Whatever... Would it make any difference if you added one more state to the Order aggregate: Potential/Basket/Reservation?

In Udi Dahan's SOA workshop, that's basically how an Amazon shopping basket works. There's no separate Service for a basket. A basket is just an order which may or may not be placed. Once it's placed, its state changes, but the ID remains the same.

If you did the same thing here, you could vastly simplify the code. In fact, you wouldn't need a State Machine/Saga at all, and correlation would be irrelevant.

Well, if you must correlate, either don't hide the ORM behind an interface, or provide an accessor method for each ID you'll need:

Reservation FindReservation(int reservationId);

Order FindOrder(int orderId);

Each option is more honest, simpler, and provides appropriate guarantees.

Well, then we agree. The IViewRepository interface is unnecessary. Get rid of it and take a direct dependency on the Entity Framework. The interface serves no purpose.

FWIW, I ended up writing a blog post about why exposing IQueryable is a bad idea... http://blog.ploeh.dk/2012/03/26/IQueryableIsTightCoupling.aspx

@yevhen

OMG, why do you need IViewRepository in Controller? You should use your (InMemory)Bus object to make a query. Just make queries explicit, ie Query/Result objects (which is almost the same as Command). In fact Commands are Queries which return Void.

using (something as IDIsposable) is not violating anything. If the downcast doesn't succeed, nothing gets done (and no exception is thrown either)

That's the nature of the "using" language construct in C#. We could as well put the IDisposable interface as a requirement of the IRepository interface, but that would be leaking a particular implementation detail (i.e. EF implements it, but others might not). I decided not to leak that implementation detail into the interface. But still the consuming code can make provisions for the case where the actual impl. DOES implement IDisposable. Precisely what the "using" keyword was invented for.

@jdom

I agree with kzu that it could be useful to provide simple query capabilities for the saga and hence having the saga repository be queryable, or at least a much more reduced querying mechanism... either query by saga id, or correlation id, being the correlation a property of the sage if we want to keep it strongly typed).
Nevertheless, I'm with Mark with respect to the IViewRepository (the one used to query the read model, as opposed to lookup sagas). I do believe that the read model access should not be done in a completely generic way, and allowing the UI (MVC controllers) to provide any custom query by using the IQueryable interface. This puts a lot of constraints into the underlying data access, and at the same time allowing less opportunity to optimize the read access. The read model should be tailored to make it easy to lookup and render each of the views, so I'd say that having a lot of flexibility would not be a great thing to do, because it can take you to the path of doing classical non-CQRS work, and having the projections and joins and all be done at the UI level.
The only place I think would still be very valuable to return an IQueryable, is if you had a grid for example, and the user might filter, sort, page, etc. But I'd make this the exception and not the rule.

@kzu

@ploeh Why didn't my two comments on your post ever showed up? Did I screw it with the formatting or something? (the < > of the generics?)

@ploeh

Well, the angle brackets are usually the reason. You didn't screw up - my blog engine did. Sorry about that :$ (I do have a long-term project of migrating to a better blog engine...)

If you can re-post without 'dangerous' angle bracket, it ought to get through. Sorry about the inconvenience.

@kzu

Re-posted.

@jdom

@yevhen really?? would you do a query through a (asynchronous?) bus? Isn't that why you have a thin read model that acts as a fast and easy to query cache??? BTW, I do not agree about having IQueryable everywhere though, but the notion of having a query go through a bus seems completely off.

@yevhen

No, I wouldn't. I don't have async in-memory or "real" buses in my app. I'm only using synchronous in-memory Bus in my app, for sending commands, queries and publishing events. You can think of it as a generic Message Router. But I do model queries as explicit classes (that helps to preserve intent of the query).

@jdom

@yevhen may I ask why you prefer to use the synchronous in-memory bus to do a query? What would you be gaining? I prefer to be explicit and say OrderSummaryDAO.FindByUserId(userId), and be explicit in the intent there. Why is the intent preserved better with the query as a message? Also, this makes it easy to know the type of the response in order to render the UI, otherwise you will be just casting response messages and hoping the correct handler responds to your query and use the type you expect.
I'm really trying to think a scenario where I'd take advantage of this routing overhead, but I can't.

@kzu

I understand what @yevhen is saying. I always thought that CQRS/ES could also be applied as an internal decoupling architectural pattern, with in-memory buses too. I like that too.

But I'd still have a denormalized view/DTO that was eagerly populated as events flow through the bus...

@yevhen

@danielkzu I use denormalized views. Having explicit query objects doesn't cancel projection DTOs. The projection DTOs are merely a payload of Response.

Yes, having an in-memory routing should be enough for a majority of apps where people aren't being blinded by Enterprise Service Buses. If you don't need to distribute physically - you just model services (Bounded Contexts) as logical components. They have an external interface defined in terms of Commands, Queries, Events, and Views (and perhaps Exceptions). You just plug them into in-memory bus to enable routing. So you only have to reference a Bus in order to communicate across layers and services.

Udi wrote about difference between logical and physical architecture here. You always start from logical separation and only then think about physical distribution, if it's necessary at all.

@yevhen

@jdom

What would you be gaining?

I can reuse cross-cutting things like: authentication, authorization, validation (yes, for queries it's also make sense, like required criterion fields etc), logging, multi-tenancy, tracking message\result size for billing purposes ... it could be a lot of such stuff that can be reused between command and query handling pipelines.

I prefer to be explicit and say OrderSummaryDAO.FindByUserId(userId), and be explicit in the intent there.

Yes, but the scenario you've presented is too simple. How your code will look like if you add list queries and things like pagination, sorting etc? Or how it would look like when you will need to support variable number of criteria fields?

With explicit queries it's simple. You can just make base class like ListQuery for all of your list queries and encapsulate those properties there, without copy\pasting the all over the place, like in the approach you have shown.

However, result discoverability is an issue with explicit queries approach.

An example:

[DataContract]
public class GetInvoice : Query
{
    [DataMember] public Guid InvoiceId;
}

[DataContract]
public class GetInvoiceResult : Result
{
    [DataMember] public InvoiceView Invoice;
}

It's dead simple to generate documentation on queries and their results. And consumers know by simple convention what will be the result.

The code in query handlers is also dead simple and a lot of reuse can happen on things like pagination, sorting etc.

public class InvoiceQueryHandlers :
    Answers<GetInvoice, GetInvoiceResult>,
    Answers<GetInvoiceSummaries, GetInvoiceSummariesResult>
{
    readonly ProjectionStoreSession session;

    public InvoiceQueryHandlers(ProjectionStoreSession session)
    {
        this.session = session;
    }

    public GetInvoiceResult Answer(GetInvoice q)
    {
        var result = session.Load<InvoiceView>(q.InvoiceId);

        return new GetInvoiceResult { Invoice = result };
    }

And the consumers (ie UI layer is using it like this):

    public ActionResult Index()
    {
        var query = new GetInvoiceSummaries();

        var result = Query<GetInvoiceSummariesResult>(query);

        return View(result.Invoices.Select(i => i.ToModel()).ToList());
    }

And my bus query method looks like:

    public Result Query(QueryEnvelope envelope)
    {
        List<Func<Envelope, Result>> handlers;

        if (!routes.TryGetValue(envelope.Query.GetType(), out handlers))
            throw new HandlerNotFoundException(envelope.Query.GetType());

        if (handlers.Count != 1)
            throw new DuplicateHandlerException(envelope.Query.GetType());

        return handlers[0](envelope);
    }

Actually, it looks almost the same as my Send(command) method :)

Because commands are handled exactly like the queries, but they just return Void.

@jbogard

So...what was the point of doing Func again? Seems like a pointless abstraction in the controllers to "know" I need a Func rather than just the IViewRepository

@jdom

@yevhen I see your point and how you can find it useful, although I do not fully agree. Seems like you got into messaging, and trying to solve all your problems through messaging.
All of the things you mentioned can be solved by other means of doing AOP, and still keep the class count low and fully typed. For example, by using Unity interception, you could get, authZ, authZ, validation, logging, multitenancy, and so on, but still have a very explicit interface for querying.
Of course when you need to pass a more complex criterion, you pass it to the DAO in the same way as you would create a Query object. If the query is very complex (in order to serve lists of items with complex sorting, grouping and other things specified by the end user, then this is a particular case I would have a DAO that takes an expression similar to what you can do with IQueryable, and assume a relational database or something that supports IQueryable:
DAO.GetInvoiceSummaries(Expression> filter)
Doing that with a Query object could lead you to replicate what the Expression syntax already allows you to do (one reason is that you need not to only support any query allowable by the user, but also be able to serialize that).
I completely fail to see how discoverability is an issue in this approach. Instead of having to "discover" all the possible query objects you can send to the buss, you just get the correct DAO as a dependency. These would all be in fixed namespaces, right?

@jdom

@jbogard That's the problem being discussed actually. Now it is a Func so you can create a new instance of the repository or context when you want to use it, use the IQueryable thingy that it returns, and then dispose the repository/context.
I agree with Mark that this shouldn't be the case, and it should be treating this as a stateless querying service, and hence not require disposal after use.

@kzu
@yevhen

@jdom

I completely fail to see how discoverability is an issue in this approach. Instead of having to "discover" all the possible query objects you can send to the buss, you just get the correct DAO as a dependency. These would all be in fixed namespaces, right?

Sorry, typo, I meant result discoverablity problem in approach with explicit queries, not in you approach. Initially, we were returning views directly as the result of the query and so it wasn't obvious what to put into generic argument here

    protected TResult Query<TResult>(Query q) where TResult : Result
    {
        return (TResult) Bus.Query(CreateEnvelope(q));
    }

By having an explicit GetXXX\GetXXXResult pair it's now obvious how to close this generic method - ie by convention.

Seems like you got into messaging, and trying to solve all your problems through messaging.

I understand that it look like doing more work, defining all those additional query\result messages but it's easily fixed with the simple DSL. This crufty stuff is auto-generated.

All of the things you mentioned can be solved by other means of doing AOP, and still keep the class count low and fully typed. For example, by using Unity interception, you could get, authZ, authZ, validation, logging, multitenancy, and so on, but still have a very explicit interface for querying.

I don't use containers. I've tried both dynamic and static registration of handlers and I've found that static registration with monadic composition is much easier to use then dynamic. Zero infrastructure and full control over pipeline.

DAO.GetInvoiceSummaries(Expression> filter)

It's a potential hole. You don't want to expose it to external consumers. And expressions are not serializable.

Explicit queries gives you the same level of decoupling as commands. Consumers don't need to know how to instantiate DAO's correctly and how to satisfy their dependencies.

@moodmosaic

Here are my concerns on why the the code should depend on an interface (or an abstract class) rather to a Func(Of TResult) Delegate:

A developer can (improperly) invoke the delegates using the APM (via BeginXxx/EndXxx methods, creating an instance of a type implementing the IAsyncResult interface, a reference type, adding overhead, introducing unwanted asynchrony).

It's up to the consumers to control the creation and destruction of the returned value of the type specified by the TResult parameter. Do we really want to deal with this?

Is it possible for the compiler to inline the Func methods? (Because this may improve the performance on small methods.)

@jdom

@yevhen

I don't use containers. I've tried both dynamic and static registration of handlers and I've found that static registration with monadic composition is much easier to use then dynamic. Zero infrastructure and full control over pipeline.

This looks to me that containers did not work for you in a specific place (for registering the handlers, which I am not suggesting you to do, although you could), so you are now avoiding them everywhere (in this case, in the client code).
On the other hand, I'm comfortable with using containers for what they are useful. This also means doing simple AOP.
I hate to say it, but using a bus, and having a bunch of extra classes just in case you might want to later add a cross cutting concern... that's adding a lot of infrastructure and restrictions up front, not "zero infrastrcture". With an explicit interface you just created 1 class you can refactor and replace with mocks when doing TDD, and only IF you need to add those concerns, you configure the container correctly when bootstrapping it.

DAO.GetInvoiceSummaries(Expression> filter)

It's a potential hole. You don't want to expose it to external consumers. And expressions are not serializable.

And I agree with Mark Seeman on avoiding this for most situations, but for other reasons (which are state in my previous responses). Serialization is not one of those reasons, as I am suggesting NOT sending the query through a message bus or anything, I'm just using C# (or any) language constructs, not sending the query through the wire.

Explicit queries gives you the same level of decoupling as commands. Consumers don't need to know how to instantiate DAO's correctly and how to satisfy their dependencies.

This is why containers are so useful. I guess we have a different perspective... containers for me are so mainstream and easy to use, that it's not adding a burden at all, nor hindering maintainability, so it helps me deal with not complicating these issues, and keeping my code in focus.

@chrisnicola

Just getting to this now, I've only got time to read a bit of the beginning to get context @ploeh seems to be focused on the idea that the Query side must be partitioned in a DDD way the same as the command side. I don't see why. The whole point of CQRS is that this separation allows for flexible and changeable query models that are based on the needs of data analysis and not on the behaviour of the system. The partitioning doesn't need to match in the least for the purposes of querying data, it just tends to.

In fact because of this separation, this whole discussion may really be bikeshedding. The query side is not the most important aspect of CQRS design. It can be represented by just about anything that makes sense. What's important is that it's completely decoupled from the command side.

@jbogard
@chrisnicola

I would wager a guess that on the read side CQRS folks build apps a number of different ways. Me, I like document databases personally.

@jdom jdom referenced this pull request Apr 18, 2012
Merged

#86 Remove leaky abstractions #165

@gmelnik

Implemented the proposed idea in PR #165. Acknowledging with thanks.

@gmelnik gmelnik closed this May 14, 2012
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment