-
Notifications
You must be signed in to change notification settings - Fork 913
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Can't run two queries in a transaction? #81
Comments
I did some additional digging. Turns out this isn't a pq problem, it's a problem with database/sql in Go. Here's the ticket: https://code.google.com/p/go/issues/detail?id=3857 |
With the latest version of Go (1.1), they sort of fixed this issue, so now I think it's a problem with pq. If I have two queries running simultaneously in a transaction, pq gets confused by the responses and returns weird error messages. Again, this is perfectly reasonable functionality, and works in every other language but Go. We don't get deadlocks anymore. Instead, here's what I'm seeing now: var err error
// Where lookupExisting, updateExisting, and insertNew are prepared statements, a la tx.Prepare(...)
rows, _ := lookupExisting.Query(id)
defer rows.Close()
if rows.Next() {
_, err = updateExisting(newValue, id)
} else {
_, err = insertNew(id, newValue)
}
fmt.Printf("Errors? %s\n", err) If you create something like this, when lookupExisting doesn't find anything, an insert occurs and everything is ok. If a record already exists and you try to update it, you'll get the error message "Errors? pq: unexpected bind response: 'C'". My guess is the two prepared statements are "crossing streams" in the pq driver and the driver is confused. This means that my old example is still problematic too, since if you try to query and update in the loop, you'll run into this same overlapping of information. If you make this change, everything will work (there's no fix for my original example though; that just won't work in Go with pq): if rows.Next() {
rows.Close()
_, err = updateExisting(newValue, id)
} else {
rows.Close()
_, err = insertNew(id, newValue) |
I am having the exact same problem, but with a different error message: pq: unexpected describe rows response: 'D' Looking on the pq manual, 'D' is "DataRow", which is not handled in pq, but without using transactions, this doesn't happen; Here is a complete program that fails with this error.
Outputs: panic: pq: unexpected describe rows response: 'D' on this line:
|
This may be the way the PostgreSQL protocol works, according to this link, maybe there is no solution for this. This is not good. |
Hmm, it's an old post, so I don't know I it's still an issue. If you follow my link to the Google bug about this, I uploaded some sample code where I ran this scenario through JavaScript, Java, and Ruby (maybe Python too), and every other database library in those languages had no problems running through these types if scenarios. Now maybe they were doing something sneaky behind the scenes, but I don't thing so. Also, every language but Java uses libpq to interact with the database, so maybe libpq is the one being sneaky. I wish there was a libpq wrapper for Go. On Jul 18, 2013, at 9:42 AM, Rangel Reale notifications@github.com wrote:
|
I am investigating this in depth, one of the thinks I discovered is that libpq is in fact sneaky. It always loads all results sets entirely in memory. If you want to return row by row, then you can't issue a new query before you close the previous one: http://www.postgresql.org/docs/9.2/static/libpq-single-row-mode.html
|
On Thu, Jul 18, 2013 at 1:32 PM, Sean Bowman notifications@github.com wrote:
Not sure if this meets your needs or not, but at least one cgo based |
Found another user with this problem, although the email is from 2005, my study shows that this is still the case: http://www.mail-archive.com/pgsql-general@postgresql.org/msg71802.html On the same thread, there is a suggestion of using a cursor:
I did a quick test in Golang, and it seems to work, but we would need to FETCH 1 por 1, each one generating a new exec, it probably can become expensive.
I think that the way to go is to emulate libpq, that is, load all rows on Query(). This looks like the the "official" way. |
Does it always load all rows on query, or only if another query is issued while one is still pending? |
libpq always fetch all records on query, except if "PQsetSingleRowMode" was called. |
I think I would rather it stay with pulling data as it does, and just document the library to suggest opening a second connection if you need to do this kind of interwoven processing. I don't like the idea of loading all the results at once. Defeats the purpose of going row by row, presumably because there are so many results it could overflow the memory. On Jul 18, 2013, at 1:28 PM, Rangel Reale notifications@github.com wrote:
|
I implemented it to be like libpq, with a fallback to the current mode, using the same name as libpq (singlerowmode). |
Thanks for the patch, @RangelReale. I think your exact approach (i.e., a flag for the old behavior) would be problematic, but after reading through the various linked messages, bugs, and docs (thanks to you and @sbowman for linking these), and considering my own experience with the protocol, a lot of this comes down to the underlying behavior in the Postgres wire protocol: it doesn't support multiple concurrent standard statements because it's fundamentally synchronous (ignoring things like cancel). So to resolve this issue, we could either:
I think (3) is right out because it would be disastrous to performance, requiring a round-trip per row. (1) is simple but shifts the burden onto the pq user. (2) may be reasonable, but I think we'd need to approach it differently: rather than just offering an option for buffering, we could use the current row-by-row processing in the common case, but immediately buffer the remainder of the previous query if the user issues a second query in the same transaction. This might be a little too magical, but it seems fairly reasonable. It might also be nice to have some combination of (2) and (3), fetching k rows at a time via Thoughts? |
To help our decision, I made a study of how other platform's native postgresql do in this case. PHP (libpq)Default method: fetch all on query Python (http://python.projects.pgfoundry.org/)Default method: fetch all on query
node.js (https://github.com/brianc/node-postgres)Default method: fetch all on query - queues queries so no 2 queries are run at the same time Java (http://jdbc.postgresql.org/)Default method: fetch all on query
C# (http://npgsql.projects.pgfoundry.org/)Default method: fetches row by row, but warns that only one query can be made at a time, with option to preload all rows
Ruby (libpq - https://github.com/ged/ruby-pg)Default method: fetch all on query |
The Python driver is a little more complicated than that. While Of course this doesn't deal with multiple simultaneous queries in any way, presumably you'd need to deal with that on a higher level. Similarly the JDBC driver supports fetching rows with a cursor: http://jdbc.postgresql.org/documentation/head/query.html#query-with-cursor |
Yes, I was looking for mostly what is the "default" way of doing things, what most people would use. On the Java case, the "setFetchSize" is part of the standard Java api, so the api have some notion of Cursors, which the go one doesn't. |
Right, but in any case pq needs to provide a means of working with streaming results, whether it is the default or not. Otherwise it makes working with large datasets unmanageable. It would be good to maintain the current behaviour for existing code that just performs a single query on a connection so that we don't cause any surprises. Since multiple simultaneous queries don't work right now anyway, we have a bit more leeway in how to handle that. |
Also is this library thread safe, or does it need to? If it needs locks, I imagine it would be dificult to return streaming results in a thread-safe manner. |
In typical usage database/sql will use a new sql.Conn for each query so it's not typically a concern. Transactions are the exception, since a transaction context is tied to a particular Conn for obvious reasons. The library doesn't specify that a transaction is safe for concurrent use, which typically means it is not. I don't think we really need to worry about concurrency in that case. |
I tend to agree. I think we should either outright block this as unsupported, or implement the just-in-time buffering I suggested above (i.e., only buffer if we're in the middle of processing a Rows object and another query is being issued--in that case, buffer the outstanding rows so we can issue the next query). |
I can implement it like that, only buffering if a new query arrives, but for that I would need to cache the last rows object in the connection. In the golang documentation it says:
So I can safelly assume that a connection will never be used by too goroutines, and just store the last rows object on the connection, to fetch all data in case a new query arrives? |
@RangelReale yeah, I think so. Keep in mind that you'll need to do this dynamically: e.g., if multiple queries are started-but-not-yet-finished, you'll need to buffer the one currently in progress and push that onto the stack of buffered queries, then unravel the stack as necessary once things complete. |
I don't like the idea that the library is going to "magically" do something behind the scenes for me, so I vote for not supporting this functionality (yes, with all this new information, I'm switching my original vote). Seems contrary to the Tao of Go to start helping me when I didn't ask for it. On Jul 24, 2013, at 12:04 AM, Maciek Sakrejda notifications@github.com wrote:
|
Maybe we could ask on golang-nuts what other people think about this? |
If we don't go with the buffering solution, we should at least raise a panic if there is an outstanding query in a transaction and another one is attempted. Currently it just locks up and makes for hard to track down problems, at least with a panic it would indicate that there's been a programming error. @sbowman may be right that automatic buffering may be a little too magical |
@kisielk Are you using Go 1.0 or 1.1? There was an issue with database/sql in 1.0 that is fixed in 1.1. In 1.1 you'll get a weird error message, but it shouldn't lock up anymore. |
I don’t think this is the same problem. Looks like you’re creating a prepared statement but trying to put multiple SQL commands into it (“BEGIN”, “UPDATE”, “UPDATE”, “COMMIT”). I don’t think PostgreSQL supports that (though I’ll admit I’ve never tried).
|
pq: unexpected Parse response 'D' happened!
|
Before we do next query or update, we must close the previous rows. If we do not close the rows, the next update will fail.
|
This is still a problem under go 1.8 with the latest pg driver. We see all of the errors posted above in unit tests that hold tx open for a long time and some parallel go routine kicks off to also perform a query after a Close has been called. It's a race between rows.Close() and the start of another query. |
this one was a tricky one to debug and fix, @whrool's #81 (comment) worked for me. I'm ok with the current behavior as long as |
An option to fetch and buffer all the rows, would be simple to implement, non-magical, consistent with the majority of other Postgres drivers, and give people a simple way forward when they hit this problem. It could also improve performance on both the client and server sides for some workloads. It would also let this issue be closed after more than 4 years. |
Hello, func UpsertRS(transaction *db.Tx, queries RSUpsertQueries, args ...interface{}) (db.Result, error) { ( Here's what I'm doing in a nutshell:
redshiftTransaction, rsTransactionError := db.Begin()
func checkCount(rows *db.Rows) (count int) {
if rows == nil {
return 0
}
for rows.Next() {
err := rows.Scan(&count)
rows.Close()
log.Debug(fmt.Sprintf("Error while counting rows: %+v", err), "SYSTEM")
}
return count
}
func checkDataExistance(transaction *db.Tx, query string, args ...interface{}) bool {
result, err := transaction.Query(query, args...)
if err != nil {
log.Error(fmt.Errorf("Redshift Upsert: error while checing existance: %+v", err), "SYSTEM")
}
return (checkCount(result) > 0)
}
query := queries.Insert
if checkDataExistance(transaction, queries.Check, args...) {
fmt.Println("--> Data exists, switching to update query")
query = queries.Update
} else {
fmt.Println("--> Data does not exists, we can keep the insert")
}
result, err := transaction.Exec(query, args...) and here is where everything goes BOOM and I get: sql: Transaction has already been committed or rolled back where the commit is actually outside this scope, so I'm really puzzled by who commited it 🤔 And googling took me on this case which seems kind of appropriate but honestly after going throughout it all I fail to understand what's the outcome, what was choose out of the whole "buffering/non-buffering" diatribe and most of all: what's the solution to this problem? |
Has this moved at all in the favor of @timbunce 's solution? |
(or any solution, rather?) |
…ctTX() We have to call rows.Close() before we issue another query in the same transaction. See lib/pq#81
This PR introduces dbutil.DBTx which implements the dbutil.DB interface backed by either a sql.DB or a sql.Tx, with additional support for nested transactions using Postgres savepoints. This allows us to run a suite of integration tests in the context of single transaction (that will be rolled back at the end), even if those tests start transactions themselves. It also allows us to explicitly rollback to a previous savepoint in tests when we know a given test will result in a SQL level exception / error and proceed safely from there within the context of a parent transaction. One situation where using a single transaction for integration tests isn't currently supported is when concurrency is involved, due to this limitation: lib/pq#81 That's why we didn't change the integration tests of the Bitbucket Server provider store, which do lots of concurrency and fail when operating with a DBTx.
This commit adds `Synchronized` flag to `support/db.Session`. When set to `true` and `Session` runs a transaction all `Exec*` and `Select*` methods are protected by mutex what allows running them in multiple goroutines. This is an experimental feature (see below) and not a breaking change (default is `false`). Postgres protocol does not allow sending Exec query results if previously sent Query haven't been fully read. This issue manifested itself when a PR that's sending read and write queries in multiple goroutines was merged. More info: lib/pq#81 lib/pq#635 Known limitations: * It's possible that it will not be needed if we decide to remove concurrency from ingestion pipeline (see #1983). We are adding this to unblock development of new ingestion processors with a more readable code (currently all database processors are merged into one big processor that is hard to read and test). Splitting `DatabaseProcessor` will be done in a separate PR/PRs. * During Horizon Team meeting we agreed to create a new `SynchronizedSession` struct embedding `db.Session` inside that would not be placed in a shared `support` package. The problem is that `history.Q` embeds `db.Session` inside. Changing this to `db.SessionInterface` requires updating a lot of files that create `history.Q` objects, tests and mocks. Given that we may want to revert this change in the future, the change in this commit seems to be simpler.
I'm a bit late, but I just hit the The only official (AFAIK) client is PostgreSQL 14 added pipeline support, but I think that just means This left me believing the only way to support large query responses in PostgreSQL is to implement chunking (or "custom cursors"):
Any other solutions people have been employing? As for |
func (o *OrderService) UpdateOvershootProduct(products model.UpdateOvershootRequest, sellerID *uuid.UUID) error {
} for this code in my update query ,I am getting this error "pq: unexpected Parse response 'D'" |
You can't call a new SQL command when you're in the middle of reading the results for another. In the middle of looping over the results of your query, you're trying to call |
thanks |
I have some data I've loaded into a table from a CSV file that I want to massage into a master table. When I try to do this in a transaction, pq just hangs on the second query I perform for no reason I can ascertain.
Here's my code. I chopped out everything after the second query; that query checks to see if the data is already migrated, and if not, runs all the create code and such. If you comment out the "select * from community_gnis" line, the loop runs to completion, printing every city in the table. If I leave the query in, it just hangs after the first city name.
I'm new to Go, so maybe I'm doing things incorrectly here. However, if I move the "select * from community_gnis" up above the "select name, state..." statement, the community_gnis statement runs fine and the "select name, state..." query will then be the one that hangs.
If I remove the transaction code and run everything against "db", e.g. "db.Query" instead of "tx.Query", etc., everything runs completely fine (though it doesn't run in a transaction, which is what I want).
I've tried various combinations, using prepared statements, adding Close calls everywhere, etc. Nothing works. I'm 99% sure it's not the database, because I'm migrating this code from a Node JS app to test whether we should use Go or Node for our project, and the Node version runs completely in a transaction.
The text was updated successfully, but these errors were encountered: