Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace NpgsqlTransaction.IsCompleted with NpgsqlConnection.IsTransactionInProgress #2618

Open
roji opened this issue Sep 11, 2019 · 20 comments
Assignees
Milestone

Comments

@roji
Copy link
Member

roji commented Sep 11, 2019

In #2416 we added transaction recycling for a given connector, since PostgreSQL only supports one transaction at any given time. This change is being reverted for 4.1 in #2617, because it introduced a breaking change around NpgsqlTransaction.IsCompleted (see #2607).

Transaction recycling remains in the dev branch for 5.0, but we need to decide exactly what we want:

  • Reintroduce NpgsqlConnector.IsTransactionInProgress instead, which more closely represents the actual state of things and wouldn't cause any confusion. This would simply expose NpgsqlConnector.TransactionStatus publicly.
  • Reintroduce IsCompleted on NpgsqlTransaction, and require users to dispose transactions before creating new ones. This would provide stronger and safer semantics for IsCompleted, but would be another breaking change (since transaction didn't need to be disposed until now).
  • Not reintroduce IsCompleted. It's an Npgsql-specific property, and it can be argued that it's the user's responsibility to do any tracking. @baronfel any comments here, any specific reason why you need this from Npgsql?
  • Remove transaction recycling for 5.0, which would be a shame for perf.
@roji roji added this to the 5.0 milestone Sep 11, 2019
@roji roji self-assigned this Sep 11, 2019
@baronfel
Copy link

I'm the wrong person to ask, I don't have any special knowledge about how marten tracks the underlying NpgsqlTransaction. I just happened to notice the usage as I was trialing 4.1.0!

@jeremydmiller @oskardudycz @mysticmind do you all have more context here for how Marten uses the transaction lifecycle?

@roji
Copy link
Member Author

roji commented Sep 11, 2019

Thanks @baronfel.

I think reintroducing something like NpgsqlConnector.IsTransactionInProgress would make sense in 5.0, Marten would have to to make the change to use that.

@oskardudycz
Copy link
Contributor

oskardudycz commented Sep 11, 2019

@roji @baronfel in Marten we're using IsCompleted Transaction property to make sure that we're not Rollbacking or Disposing already Completed transaction. See more in https://github.com/JasperFx/marten/blob/master/src/Marten/Services/TransactionState.cs#L123.

It seems that for this case NpgsqlConnector.IsTransactionInProgress should be perfectly fine.

Btw. @roji do you have some place with written stuff that you're working on 5.0 version (especially potential breaking change would be great to know)? We'd like to prepare in advance to not have big migration issues as we had in 3=>4 migration.

p.s.
@baronfel thank you for giving heads up and great catch!
@roji thank you for reverting the change to keep compatibility 👍

@roji
Copy link
Member Author

roji commented Sep 11, 2019

@roji @baronfel in Marten we're using IsCompleted Transaction property to make sure that we're not Rollbacking or Disposing already Completed transaction. See more in https://github.com/JasperFx/marten/blob/master/src/Marten/Services/TransactionState.cs#L123.

It seems that for this case NpgsqlConnector.IsTransactionInProgress should be perfectly fine.

I think it does make sense to have NpgsqlConnector.IsTransactionInProgress, but is there any specific reason you need Npgsql to track the transaction status rather than just doing it yourself? Am wondering also because it's an Npgsql-specific feature, so you simply wouldn't have something like this on, say, SqlClient.

Btw. @roji do you have some place with written stuff that you're working on 5.0 version (especially potential breaking change would be great to know)? We'd like to prepare in advance to not have big migration issues as we had in 3=>4 migration.

The best way to follow this is to look at issues in the 5.0 milestone which have the "breaking change" label. Of course it's also advisable to occasionally run your tests against preview (or even nightly) versions of Npgsql, to make sure nothing slips in by accident.

@oskardudycz
Copy link
Contributor

oskardudycz commented Sep 11, 2019

I think it does make sense to have NpgsqlConnector.IsTransactionInProgress, but is there any specific reason you need Npgsql to track the transaction status rather than just doing it yourself? Am wondering also because it's an Npgsql-specific feature, so you simply wouldn't have something like this on, say, SqlClient.

As there might be multiple connections using same transaction then imho it's better to use the that information from the source of truth - so Npgsql. Probably we could manage that by ourselves, but if it's possible to keep that then it would be less costly for us. We'll need to add this code plus do some extensive testing if it's working the same way as yours implementation to not break anything in our users codebase.

The best way to follow this is to look at issues in the 5.0 milestone which have the "breaking change" label. Of course it's also advisable to occasionally run your tests against preview (or even nightly) versions of Npgsql, to make sure nothing slips in by accident.

Thanks, I'll start to look into that. Is there an ETA for v5.0?

@roji
Copy link
Member Author

roji commented Sep 11, 2019

As there might be multiple connections using same transaction then imho it's better to use the that information from the source of truth - so Npgsql

Do you mean to say you are using distributed transactions (multiple physical connections, one transaction)? If so, I don't think that the current property (NpgsqlTransaction.IsCompleted) tracks that in any way (that's a good point for #2618). Could you please double-check on this point?

Probably we could manage that by ourselves

I'm mainly trying to understand how/why people need this, I really have no particular objection to exposing the information.

@oskardudycz
Copy link
Contributor

As there might be multiple connections using same transaction then imho it's better to use the that information from the source of truth - so Npgsql

Do you mean to say you are using distributed transactions (multiple physical connections, one transaction)? If so, I don't think that the current property (NpgsqlTransaction.IsCompleted) tracks that in any way (that's a good point for #2618). Could you please double-check on this point?

Probably we could manage that by ourselves

I'm mainly trying to understand how/why people need this, I really have no particular objection to exposing the information.

I probably wasn’t precise enough. I meant that there might be multiple NpgsqConnections sharing the same transaction. If eg. 2 of them fail then to only one Rollback should be made. Being able to get that from the Transaction would be imho easier and less error prone than manually manage that information.

@roji
Copy link
Member Author

roji commented Sep 12, 2019

I probably wasn’t precise enough. I meant that there might be multiple NpgsqConnections sharing the same transaction. If eg. 2 of them fail then to only one Rollback should be made. Being able to get that from the Transaction would be imho easier and less error prone than manually manage that information.

I still don't understand - what does it mean for two NpgsqlConnections to share the same transaction, without that being a distributed transaction? An NpgsqlTransaction instance is obtained by calling NpgsqlConnection.BeginTransaction; it represents the transaction started on that connection and is bound to it.

@oskardudycz
Copy link
Contributor

oskardudycz commented Sep 12, 2019

I still don't understand - what does it mean for two NpgsqlConnections to share the same transaction, without that being a distributed transaction? An NpgsqlTransaction instance is obtained by calling NpgsqlConnection.BeginTransaction; it represents the transaction started on that connection and is bound to it.

@roji you're right, by bad. I should have wrote multiple commands and not multiple connections.

@roji roji changed the title Decide on API contract around transaction recycling Replace NpgsqlTransaction.IsCompleted with NpgsqlConnection.IsTransactionInProgress Sep 15, 2019
@roji
Copy link
Member Author

roji commented Sep 15, 2019

@oskardudycz thanks for clarifying. Even when multiple commands are associated to the same connection, only one can be active at a given point.

In any case, for 5.0 we can just have NpgsqlConnection.IsTransactionInProgress.

@oskardudycz
Copy link
Contributor

@roji thank you for clarifications.

Having NpgsqlConnection.IsTransactionInProgress in 5.0 would be great. Thank you!

@YohDeadfall
Copy link
Contributor

@oskardudycz What'is a reason to explicitly rollback the transaction? When you call Dispose it internally invokes Rollback if necessary. If you're disposing already disposed transaction, then there nothing will happen.

@roji
Copy link
Member Author

roji commented Oct 15, 2019

@YohDeadfall your question makes sense and you're right that Dispose internally rolls back if necessary - although I don't think it should affect including NpgsqlConnection.IsTransactionInProgress as a way of knowing whether a transaction is in progress or not.

@YohDeadfall
Copy link
Contributor

Less public API we have - less places to introduce breaking changes. I'm not against introducing a new API, but I want to see its value first. In this specific case I don't see it.

@oskardudycz
Copy link
Contributor

As @roji mentioned, having the information about the transaction being in progress will be nice, although I'll try to check if we'd be able to remove this usage as you (@YohDeadfall) suggested, then we'd give you more options.

I'll get back to you with the results. I agree with the point that less is more in case of the public API and the maintenance burden.

@roji
Copy link
Member Author

roji commented Oct 17, 2019

Less public API we have - less places to introduce breaking changes

I think of it a bit differently - PostgreSQL has chosen to expose this information (transaction status) to clients in the wire protocol, so we should do the same. They could have said the same thing, i.e. that it's only the user's responsibility to track this. I can think of scenarios where it would be reasonable to ask the connection whether a transaction is in progress, and we need to track this anyway for internal purposes...

@YohDeadfall
Copy link
Contributor

If we are going to do this, let's do it without ugly quick wins and with a better design decisions. The flag has limitations - it doesn't allow to control the current server side transaction from API while it's possible to control the client side transaction (call BeginTransaction and commit it via raw SQL for instance). Another issue I see in the proposed solution is that it doesn't suite the provider API. I mean that since there is a transaction object it should be responsible to track the status.

A good way to go forward is to introduce a Transaction property which will have a value when and only when there is an open transaction. Otherwise, it must be null. Why so?

  • allows the user to know is there an open transaction;
  • allows the user to control an open transaction even if it's opened via SQL;
  • allows the user to create a command and execute it while he has only a connection object (no need to pass a transaction via an argument to the call site, it could be taken from the conntection).

The last one is a defect of ADO.NET because even SQL Server with turned on MARS doesn't allow to have multiple transactions, and there is no way to understand which transaction is associated with one of command scopes.

/cc @kevin-osborne

@roji
Copy link
Member Author

roji commented Oct 28, 2019

I'm not really against exposing NpgsqlTransaction on the connection instead of simply exposing the information direction - both options are somewhat equivalent, but see some replies to your specific comments below.

The flag has limitations - it doesn't allow to control the current server side transaction from API while it's possible to control the client side transaction (call BeginTransaction and commit it via raw SQL for instance).

I'm not sure exactly what you mean by "controlling server side transaction" - all client side control operations also affect the "server side transaction" as well (whether they go through NpgsqlTransaction or via raw SQL)...

In any case, the idea for this flag wasn't to allow controlling anything - only to expose the connection's current transactional state, exactly in the same way that PostgreSQL exposes it via the Ready For Query message's transaction status indicator. However, one thing I realized thanks to your comment, is that it would be better to also expose the "failed transaction" state as well, so an enum would be better than a bool here.

Another issue I see in the proposed solution is that it doesn't suite the provider API. I mean that since there is a transaction object it should be responsible to track the status.

In general in ADO.NET, DbTransaction is produced by DbConnection.BeginTransaction - it's not exposed via any other means, so we're already outside the standard provider API. The question is whether there's anything to be gained by forcing users to get an NpgsqlTransaction from NpgsqlConnection.

Also, note that to represent the "transaction failed" state we'd have to add some PostgreSQL-specific API in any case.

A good way to go forward is to introduce a Transaction property which will have a value when and only when there is an open transaction.
[..]
allows the user to know is there an open transaction;

Right, but so would a flag or enum.

allows the user to control an open transaction even if it's opened via SQL;

The user can control the open transaction - via SQL; if the transaction was created via SQL, what reason would there be to allow controlling it via an NpgsqlTransaction object?

allows the user to create a command and execute it while he has only a connection object (no need to pass a transaction via an argument to the call site, it could be taken from the conntection).

Npgsql already supports this - the transaction argument to the command is ignored, and commands are always implicitly executed within the connection's current transaction. This is simply the way PostgreSQL works.

The last one is a defect of ADO.NET because even SQL Server with turned on MARS doesn't allow to have multiple transactions, and there is no way to understand which transaction is associated with one of command scopes.

I don't exactly understand this one. If you're using ADO.NET DbTransaction and your database supports multiple concurrent transactions, you can call DbConnection.BeginTransaction multiple times, and pass different resulting DbTransaction objects to different DbCommands to determine which command should run under which transaction. With System.Transactions (TransactionScope) I'm not sure how to manage multiple concurrent transactions, but in any case I'm not sure how this is relevant to the discussion on what Npgsql should do.

@YohDeadfall
Copy link
Contributor

However, one thing I realized thanks to your comment, is that it would be better to also expose the "failed transaction" state as well, so an enum would be better than a bool here.

In which scenarios this could be used?

In general in ADO.NET, DbTransaction is produced by DbConnection.BeginTransaction - it's not exposed via any other means, so we're already outside the standard provider API. The question is whether there's anything to be gained by forcing users to get an NpgsqlTransaction from NpgsqlConnection.

Splitting responsibility between types. Overloading the connection class isn't a good idea, I think.

The user can control the open transaction - via SQL; if the transaction was created via SQL, what reason would there be to allow controlling it via an NpgsqlTransaction object?

Whoa! After SQL Client I thought that it must be specified on an open transaction.

With System.Transactions (TransactionScope) I'm not sure how to manage multiple concurrent transactions, but in any case I'm not sure how this is relevant to the discussion on what Npgsql should do.

This was about bad API design which I'm trying to avoid. Microsoft made ADO.NET keeping SQL Server in mind and not other DBMS'es, so probably this is the answer why DbConnection doesn't expose the current transaction.

@roji roji removed this from the 5.0 milestone Sep 17, 2020
@igorzhilin

This comment has been minimized.

@roji roji added this to the Backlog milestone Jan 17, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants