Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introducing Ecto.Multi #1114

Closed
josevalim opened this issue Dec 1, 2015 · 60 comments
Closed

Introducing Ecto.Multi #1114

josevalim opened this issue Dec 1, 2015 · 60 comments
Milestone

Comments

@josevalim
Copy link
Member

Ecto.Multi is a data structure that stores multiple operations to run inside a transaction. It will also help clean up many cases of Repo.transaction as well as help reduce the use of model callbacks.

Let's see an example:

user_changeset = User.changeset(%User{}, params)
log_changeset = Log.changeset(%Log{}, event: "user updated")

multi =
  Ecto.Multi.new
  |> Ecto.Multi.update(:user, user_changeset)
  |> Ecto.Multi.insert(:log, log_changeset)
  |> Ecto.Multi.run(:charge_credit_card, fn changes_so_far -> {:ok, _} | {:error _} end)

case Repo.transaction(multi) do
  {:ok, %{user: user, log: log}} ->
    ...
  {:error, key_that_errored, %{user: user_changeset}} ->
    ...
end

All operations in Ecto.Multi are tagged and stored in the order they are defined. Duplicated keys should raise. Once built, the underlying Ecto.Multi struct can be given to Repo.transaction which will return {:ok, map_of_structs} or {:error, key_that_errored, map_of_changesets} where key_that_errored is the first key that failed the operation.

API

  • Ecto.Multi.new() - returns a new Ecto.Multi struct
  • Ecto.Multi.insert(multi, key, changeset_or_struct, opts) - stores an insert to be performed on transaction
  • Ecto.Multi.update(multi, key, changeset, opts) - stores an update to be performed on transaction
  • Ecto.Multi.delete(multi, key, changeset_or_struct, opts) - stores a delete to be performed on transaction
  • Ecto.Multi.run(multi, key, fun) and Ecto.Multi.run(multi, key, mod, fun, args) - stores a function that will be executed if the previous operations succeed. The function will receive a map with the result of all the previous operations (which must have succeeded up to this point). Returns {:ok, value} or {:error, value}
  • Ecto.Multi.all|update_all|delete_all(query, opts) - query based operations
  • Ecto.Multi.to_list(multi) - converts it to a list in the format [user: {:update, args}, log: {:insert, args}, ...]
  • Ecto.Multi.prepend|append(multi, other_multi) - prepends or appends the operations in other multi

Ecto.Repo.transaction/2 will be changed to accept Ecto.Multi structs with the semantics specified above. It is worth pointing out that, although insert and delete accept "structs", it always return changesets in {:error, key_that_errored, map_of_changesets}. The only exceptions are the run/3 and run/5 functions which will return the value in {:ok, value} and {:error, value}, or :skipped if the operation failed before the function was executed.

Implementation

I was thinking Ecto.Multi could be a struct with a list, that keeps the reverse ordering of operations, and a map, which keeps the key-value mapping. It is better than a keyword list because every time we add a new operation we want to check the key is new and that would be faster against a map.

FAQ

A: Why Ecto.Multi and not something like Ecto.Transaction?

Because Repo.transaction/2 is potentially only one of the many functions that may use it. For example, Mongo does not provide transaction support but supports performing multiple operations at once.

Acknowledgements

/cc @solnic @MSch @chrismccord @jeregrine

@MSch
Copy link
Contributor

MSch commented Dec 2, 2015

That is awesome! 👍

Is there a way for Ecto.Multi.run to dynamically add/remove operations from the Ecto.Multi?

My use case is that almost all cases where we use transactions right now we start by first checking if the user a) has access b) has the last state (version column). In case a where the user just doesn't have access returning {:error, :access_denied} works, but in case b we want to return the last state from the DB, while skipping the following inserts/updates/deletes which are the 'actual' changes. Is there a way to do that?

Maybe allowing run to return e.g. {:instead, other_multi} which will skip processing the remaining operations and instead process the operations from other_multi and then return a merged map of the results of operations of both Ecto.Multis?

@josevalim
Copy link
Member Author

@MSch theoretically you should be able to just call Repo.transaction with the new multi inside the run function. Though I am not sure if it is an actual improvement over calling Repo.transaction with a function. We could make it complex in the future if necessary but best to ship a simple version for now and build on that based on usage. Thank you!

@parkerl
Copy link
Contributor

parkerl commented Dec 2, 2015

👍 This sounds fantastic. It has been great watching the conversation around this unfold.

@parkerl
Copy link
Contributor

parkerl commented Dec 2, 2015

Feel free to ignore this comment but I'm wondering if this is an Operationset to go along with Changeset? I find the name Multi a tiny bit awkward, especially when talking about it in the plural.

@divmgl
Copy link

divmgl commented Dec 7, 2015

Thanks @josevalim this is exactly what I was looking for

@solnic
Copy link

solnic commented Dec 8, 2015

This kind of functionality is often called "session". Maybe that could be a better name?

@divmgl
Copy link

divmgl commented Dec 8, 2015

@solnic wouldn't it be called a transaction? i think multi is a fitting name. maybe sequence? not sure. i actually like operationset

@robconery
Copy link

The architectural pattern is Unit of Work if… we still care about Fowler’s stuff http://martinfowler.com/eaaCatalog/unitOfWork.html

On Dec 7, 2015, at 5:11 PM, Dexter Miguel notifications@github.com wrote:

@solnic https://github.com/solnic wouldn't it be called a transaction? i think multi is a fitting name. maybe sequence? not sure. i actually like operationset


Reply to this email directly or view it on GitHub #1114 (comment).

@solnic
Copy link

solnic commented Dec 8, 2015

Hmm. Not sure if the intention of this feature is to provide a mechanism like UoW with dependency resolution for operations where they depend on each other and making sure that execution order is correct? From the examples Jose gave it looks like an explicit way of setting up a pipeline of operations for later execution. That's why I thought about a "Session", since that's what you do. You start a persistence session where one or more operations will be executed, possibly within a transaction, but the session does not imply a transaction as that can be db-specific.

@josevalim
Copy link
Member Author

I find the name "Session" confusing, there are too many things called Session. Calling it Ecto.Session would mean almost every time the concept is introduced it will be followed up by a disclaimer like "not your browser's session".

I am mostly ok with "Unit of Work" but it can introduce the mismatch @solnic has presented and that sometimes happens with Repo. If someone takes the definition to its heart, it will be confusing because it isn't a direct match.

My preferences so far are Ecto.Multi or Ecto.OperationSet. Thoughts? Also, feedback on the functionality besides the name is very welcome.

Thanks everyone for the feedback so far!

@jeregrine
Copy link
Contributor

I like OperationSet more than multi... but I also don't want to be typing
Ecto.OperationSet everywhere.

On Tue, Dec 8, 2015 at 5:41 AM, José Valim notifications@github.com wrote:

I find the name "Session" confusing, there are too many things called
Session. Calling it Ecto.Session would mean almost every time the concept
is introduced it will be followed up by a disclaimer like "not your
browser's session".

I am mostly ok with "Unit of Work" but it can introduce the mismatch
@solnic https://github.com/solnic has presented and that sometimes
happens with Repo. If someone takes the definition to its heart, it will be
confusing because it isn't a direct match.

My preferences so far are Ecto.Multi or Ecto.OperationSet. Thoughts?
Also, feedback on the functionality besides the name is very welcome.

Thanks everyone for the feedback so far!


Reply to this email directly or view it on GitHub
#1114 (comment).

@michalmuskala
Copy link
Member

Ecto.Transaction was also an option, but we decided, we can't use it because this could work also with MongoDB (packing several changesets together), where there are no transactions.

I like UnitOfWork, but I'm not exactly sure we do all the things this pattern describes, so for people familiar with it, it would be quite confusing.
I like OperationSet more than Multi. And when it comes to typing: the situation I see use for this feature is to create sort of a service/use case layer that would extract business logic from the controller, leaving controller only with executing actual operations, talking to db, retrieving params from request, etc.
When used like this, you could easily import the OperationSet module and avoid typing all together.

One of the features I initially imagined for something like this was a way to simulate Repo.transaction call at the end, without actually going to the database. It would be really useful in tests for unit testing the business logic/service/use case/whatever layer in isolation. It should be analogous to how Ecto.Changeset.apply_changes/1 works for a single changeset.

@divmgl
Copy link

divmgl commented Dec 8, 2015

@jeregrine alias Ecto.OperationSet, as: OSet

@josevalim I will do some testing later today with PostgreSQL. As far as naming, I'm perfectly fine with Ecto.Multi. Ecto.Operation and Ecto.OperationSet also make a lot of sense.

@michalmuskala
Copy link
Member

Oh, I forgot about one thing.

If we decide to go with OperationSet, let's name it Operationset for consistency with Changeset.

@solnic
Copy link

solnic commented Dec 8, 2015

OperationSet is not a good name. Set semantics is that the order of its elements is irrelevant, which is the exact opposite of what we have here. I agree that Session is a common name used in different contexts, although it is a common concept in database libraries, esp. in Data Mapper pattern-based ones (not really relevant here, but there's a chance many people are familiar with this name because of that pattern).

As far as functionality goes, I'm not sure why Multi.run is needed. What's the reasoning behind this interface?

@josevalim
Copy link
Member Author

@solnic in a conversation with @michalmuskala I do remember we had a use case for running custom code but I can't recall it on top of my mind. Do you have any concern regarding Multi.run?

@solnic
Copy link

solnic commented Dec 8, 2015

@josevalim it feels like out of scope of Ecto's functionality. Why would you want to run arbitrary functions using Ecto's interface? I think the result of executing Ecto's operations should be enough to decide what to do next.

@michalmuskala
Copy link
Member

I think the use case for run was coming from @chrismccord. If I recall correctly it was basically about deciding based on some external state to either continue or cancel the transaction.

The difference between running a function before call to Repo.transaction or after it, and via Multi.run would be that the later would be running inside the transaction.

@josevalim
Copy link
Member Author

@solnic I see. In the starting example:

  Ecto.Multi.new
  |> Ecto.Multi.update(:user, user_changeset)
  |> Ecto.Multi.insert(:log, log_changeset)
  |> Ecto.Multi.run(:charge_credit_card, fn changes_so_far -> {:ok, _} | {:error _} end)

The idea is that you may need some previous information, like the log.id, before continuing. This could also happen with dependent records, maybe you need to insert two records and need both of their ids to introduce the third. run would allow you to access the data so far while still being part of the transaction.

@chrismccord
Copy link
Contributor

The main driver for the run idea, is so we can wrap up operations that should only be run given the previous being successful, and most importantly, rollback the transaction should the run fail. In the charge_credit_card case say for a shopping system, we would want to rollback an Order insert/update if the charge failed (stripe is down, payment processing error, etc). If we handled this code in the {:ok, clause of the Repo.transaction, it's too late as we've already committed.

@robconery
Copy link

@chrismccord If I understand you right - the charge_credit_card method will call out to stripe, yes? If this throws in the current transaction callback (not the Multi stuff here) - won't that rollback the entire thing? From what I understand it's only on a successful callback that COMMIT is called.

I guess I'm getting confused as to how this is better than using what's already there.

Then again my mind could be locked in Postgres land. Mongo wouldn't support this.

@chrismccord
Copy link
Contributor

@chrismccord If I understand you right - the charge_credit_card method will call out to stripe, yes? If this throws in the current transaction callback (not the Multi stuff here) - won't that rollback the entire thing? From what I understand it's only on a successful callback that COMMIT is called.

Yes, calls out to some payment processor for this example and if it fails or raises then it rolls back the whole thing. The issue with manual rollbacks or raises is you have to compose that yourself and it has turned into a nested series of repo calls and case statements, where all I want is to continue on success or rollback the whole sequence on failure. @robconery just so I'm not missing something, can you turn the following code sample into existing code that raises/rollbacks, so we can compare to what's already there?:

opts = 
  Multi.new
  |> Multi.insert(:order, order_changeset)
  |> Multi.run(:products, %{order: order} -> decrement_stock_counts(order) end)
  |> Multi.update(:user, %{user | last_order_placed_at: DateTime.utc()})
  |> Multi.insert(:payment, payment)
  |> Multi.run(:charge_card, fn %{payment: %{trans_id: id} -> {:ok, ...} end)

case Repo.transaction(opts) do
  {:ok, %{user: user, charge_card: log}} ->
    ...
  {:error, :charge_card, %{charge_card: _errors}} ->
    conn
    |> put_flash("There was an error processing your payment")
    |> render("checkout.html")
  {:error, key, %{...}} ->
    ...
end

@divmgl
Copy link

divmgl commented Dec 8, 2015

I agree with @chrismccord. I am already running into the issue where I ask "what if this query is dependent on the results of another query and upon execution it fails? How do I revert the database back to a state before the previous query?" I've been writing transactions in SQL to overcome this pitfall and Ecto.Multi solves that problem.

@sorentwo
Copy link
Contributor

sorentwo commented Dec 8, 2015

Ecto.Multi.run is a welcome addition that will help with issues I face on a daily basis. I like that it is shaping up along the lines of distributed transactions (kind of).

@robconery
Copy link

@chrismccord Sorry I'm confused - probably as my question is a bit muddled. Let's try again...

If I run Repo.transaction and it fails anywhere in there, doesn't that result in COMMIT never being called? If COMMIT isn't called (in a system other than Mongo) then that's the same as the transaction dying. In fact I think you guys issue a ROLLBACK in there if an exception is thrown.

I was thinking something like this, with the current codebase:

case Repo.transaction fn -> 
  charge = Stripe.charge_card(...)
  user = User.insert!(%User{email: "..."})
  log = Logs.insert!(%Log{user_id: user.id, entry: "Logged"})
end

If the call to Stripe.charge fails, that results in a throw which I believe will be caught by Ecto, which will then issue a ROLLBACK so nothing is written.

This can be a problem, of course, if you call out to another app - but I'm just talking about the DB at this point. Nothing manual needs to happen here.

Given this, I guess I'm unclear on what Multi is actually doing? I know I'm missing something, sorry for being dense.

@josevalim
Copy link
Member Author

@robconery Multi is doing what we have discussed on #1009. Your code is perfectly fine when you don't expect things to fail but, if you assume things can fail along the way because of data or constraints, then you need to wrap each operation in multiple case blocks with explicit rollbacks. Something like this:

    Repo.transaction fn ->
      case User.insert(%User{email: "..."}) do
        {:ok, user} ->
          case Log.insert(%Log{...}) do
            {:ok, log} ->
              Stripe.charge(...)
              %{user: user, log: log}
            {:error, log_changeset} ->
              Repo.rollback(%{log_changeset: log_changeset})
          end
        {:error, user_changeset} ->
          Repo.rollback(%{user_changeset: user_changeset})
      end
    end

In other words, multi is executing every step one by one and returning exactly which step is failing, accumulating the data along the way.

PS: Mongo can support Multi but, due to lack of transactions, it should not support Ecto.Multi that have run functions in it. This makes it incomplete in Mongo but if you need this feature, you need transactions, then don't use mongo. I am also ok with calling it Ecto.Transaction and just saying it does not work with mongo.

@robconery
Copy link

@josevalim Ah - the rollback needs to be explicit! For some reason I thought I saw in your source that you caught the error and let it bubble. That's what we do with Moebius:
https://github.com/robconery/moebius/blob/master/lib/moebius/runner.ex#L36

Then in the transaction method:
https://github.com/robconery/moebius/blob/master/lib/moebius/query.ex#L898

The exact error is returned with the {:error} and so far its working well.

It seems odd to me that you would need to rollback exclusively for each step and changeset. Doesn't a rollback apply to a transaction?

@doomspork
Copy link
Contributor

+1 for Ecto.Transaction

@divmgl
Copy link

divmgl commented Dec 9, 2015

Ecto.Transaction +1

Would we be able to use Ecto.Transaction/Ecto.Multi with a database like MongoDB?

@josevalim
Copy link
Member Author

Would we be able to use Ecto.Transaction/Ecto.Multi with a database like MongoDB?

No. I believe it simply can't work with the same semantics on Mongo (or maybe it can now that it ships with PG :P). It is ultimately a question mongo_ecto should answer though.

@paulcsmith
Copy link
Contributor

I like Ecto.Multi more than Ecto.Transaction, but I think I like Ecto.Sequence a bit more. Thoughts on that?

Overall I like the idea a lot. Can't wait to give it a try

@michalmuskala michalmuskala self-assigned this Dec 15, 2015
@michalmuskala
Copy link
Member

No. I believe it simply can't work with the same semantics on Mongo (or maybe it can now that it ships with PG :P). It is ultimately a question mongo_ecto should answer though.

I believe it can work. It obviously won't give transactional guarantees, but it will allow you to batch multiple operations together.

@mgwidmann
Copy link
Contributor

This flow seems to fit the new with keyword perfectly. It seems setting this functionality up to utilize with and making Ecto require Elixir 1.2 would be the best option. That shouldn't be a problem since this is intended for Ecto 2.0 correct?

Just seems as if the code example is trying to reinvent the wheel when it comes to a transactional pipeline (which with solves beautifully).

@mgwidmann
Copy link
Contributor

Also, from the way this is handled it seems like it's used almost exactly like Streams are used. Perhaps Ecto.Stream is a fitting name that will align to Elixir's brand as well.

@josevalim
Copy link
Member Author

@mgwidmann could you clarify? this is definitely not a stream. :) it is not a collection of data of any sorts. Maybe run gives an idea of a callback but even the arguments and expect results does not map to something like a stream.

@mgwidmann
Copy link
Contributor

Was just thinking its built up as a data structure like a stream is and then executed with run like a stream is. Was thinking of it like a stream of transactions to be wrapped together, where transactions are supported of course, otherwise its just a stream of commands to be executed in the DB.

Was just a naming thought :)

@josevalim
Copy link
Member Author

Well, working with data structures is common throughout Elixir, not specific to streams. And it is not executed by calling run, it is executed by calling Repo.transaction. run is a function that will be executed later inside the transaction. :)

@archseer
Copy link
Contributor

How about Ecto.Procedure? Agree that we shouldn't bikeshed on the name too much though.

@auxbuss
Copy link

auxbuss commented Dec 16, 2015

Two more suggestions: Ecto.OperationChain or, more simply, Ecto.Chain

I presume Multi came from Redis (or another tool, or coincidence). I’ve never much liked it in Redis.

I like OperationSet, which I agree could imply a lack of ordering. Since OrderedOperationSet is too verbose (and OperationPoset too obscure and ugly) instead I suggest Chain and OperationChain, both of which imply order.

That said, I love terse code, so I could learn to live with Multi, but perhaps not love it.

@robconery
Copy link

I think we need a sense of humor on this:

Ecto.Donam

Or maybe...

Ecto.MonadNotMonad

Or...

Ecto.Maybe

@divmgl
Copy link

divmgl commented Dec 16, 2015

I know! How about Ecto.Multi?????

@parkerl
Copy link
Contributor

parkerl commented Dec 17, 2015

@robconery nice. How about Ecto.OpNoop or Ecto.IfOopsNoop?

Actually, sorry to have initiated the naming controversy on this thread. 😞

I ❤️ Ecto.Multi!

@almassapargali
Copy link

Just I quick question, can new with operator make this snippet better?

    Repo.transaction fn ->
      case User.insert(%User{email: "..."}) do
        {:ok, user} ->
          case Log.insert(%Log{...}) do
            {:ok, log} ->
              Stripe.charge(...)
              %{user: user, log: log}
            {:error, log_changeset} ->
              Repo.rollback(%{log_changeset: log_changeset})
          end
        {:error, user_changeset} ->
          Repo.rollback(%{user_changeset: user_changeset})
      end
    end

If yes, and if I understand correctly the main advantage of using Multi over it is that we're getting key_that_errored, is that right?

@michalmuskala
Copy link
Member

No, the biggest win for me is ability to pass the whole operation around as a data structure.

This means you could create it in one place (having the functions pure, and extremely easy to test), and defer actually running everything until the controller. This allows you to separate business logic - manipulating data, from impure operations such as interfacing with database.

@almassapargali
Copy link

Oh, you're right. Didn't think about that, but on the other hand, I think if I'm trying to do many things inside one transaction, probably best place to create that operation is controller, anyway.

@almassapargali
Copy link

Ok, couple of new thoughts/questions:

user_changeset = User.changeset(%User{}, params)
log_changeset = Log.changeset(%Log{}, event: "user updated")

multi =
  Ecto.Multi.new
  |> Ecto.Multi.update(:user, user_changeset)
  |> Ecto.Multi.insert(:log, log_changeset)
  |> Ecto.Multi.run(:charge_credit_card, fn changes_so_far -> {:ok, _} | {:error _} end)

case Repo.transaction(multi) do
  {:ok, %{user: user, log: log}} ->
    ...
  {:error, key_that_errored, %{user: user_changeset}} ->
    ...
end

In this example, can I get reference to user updated in :user operation in :log operation? That's the case I encounter quite often, when subsequent operations needs a data (often id) from previous inserts. Example: let's say our log should state "User with id #{id} created" and first operation is insert not update.

Second question, currently I have this snippet of code in my app:

case Repo.transaction fn ->
  location_params = game_params["location"] || %{}
  case Repo.insert Location.create_changeset(location_params) do
    {:ok, location} ->
      case Repo.insert Game.create_changeset(game_params, location.id) do
        {:ok, game} ->
          case Repo.insert Attendance.host_changeset(game.id, conn.assigns.current_user.id) do
            {:ok, _attendance} ->
              case Repo.update Game.players_count_changeset(game, :inc) do
                {:ok, game} -> %{game | location: location}
                {:error, changeset} -> Repo.rollback(changeset)
              end
            {:error, changeset} -> Repo.rollback(changeset)
          end
        {:error, changeset} -> Repo.rollback(changeset)
      end
    {:error, changeset} -> Repo.rollback(changeset)
  end
end do
  {:ok, game} -> conn |> render("show.json", game: game)
  {:error, changeset} -> conn |> error(changeset)
end

That Game has many validations, and I'd love to validate game_changeset even before transaction, so if user send not valid game_params I don't start transaction or don't create Location, but just respond with error.

@josevalim
Copy link
Member Author

In this example, can I get reference to user updated in :user operation in :log operation? That's the case I encounter quite often, when subsequent operations needs a data (often id) from previous inserts.

If they are associations, you should use the association support that exists in changesets today. Other than that, you can always use run.

That Game has many validations, and I'd love to validate game_changeset even before transaction, so if user send not valid game_params I don't start transaction or don't create Location, but just respond with error.

I think Ecto.Multi could do that automatically for the first entry, yes.

@almassapargali
Copy link

If they are associations, you should use the association support that exists in changesets today. Other than that, you can always use run.

For example, here:

case Repo.insert Location.create_changeset(location_params) do
  {:ok, location} ->
    case Repo.insert Game.create_changeset(game_params, location.id) do
      {:ok, game} ->
        # some code
      {:error, changeset} -> Repo.rollback(changeset)
    end
  {:error, changeset} -> Repo.rollback(changeset)
end

Game belongs_to Location, since I want to store location_id within games (that Location class in facts belongs to other models too, User, for example). AFAIK association support for belongs_to didn't come yet. Btw, there is no problem at all. I see run receives changes_so_far so we're good here.

I think Ecto.Multi could do that automatically for the first entry, yes.

Not sure what you mean by first entry, but I want to validate changeset of second operation. Actually it'd be great if Multi validated all changesets before starting transaction. (Of course I understand some changesets may become invalid during transaction (constraints), or some changesets can't be validated if created with run since they're dynamic).

@josevalim
Copy link
Member Author

We are going to support belongs_to in 2.0, which is the same version Milti will land but you could always flip them. There is nothing stopping you from adding the location association to the game changeset and inserting the game changeset. :)

@michalmuskala
Copy link
Member

Not sure what you mean by first entry, but I want to validate changeset of second operation. Actually it'd be great if Multi validated all changesets before starting transaction. (Of course I understand some changesets may become invalid during transaction (constraints), or some changesets can't be validated if created with run since they're dynamic).

This is exactly what Ecto.Multi will do. We will traverse all the operations before starting the transaction, and if we know any is not valid, we won't even start the transaction and return the error right away.

@almassapargali
Copy link

@michalmuskala wow, that's great. I wasn't sure yet how much Multi would improve over Repo.transaction + new with form, but now I start to see some advantages.

@phanimahesh
Copy link

A hearty +1.
Regarding the name, Ecto.Multi is as good as any other suggested name, and clearly indicates the intent IMO. Let's go with it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests