Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Large / slow transaction terminates process without throwing exception #658

Closed
mschneider opened this issue Aug 10, 2023 · 11 comments
Closed
Labels
bug Something isn't working

Comments

@mschneider
Copy link

I don't have a minimal reproduction yet, but I wanted to report this issue anyways.

Basically what I'm trying to do is insert 100,000 of entries into a database as a transaction with rollback in case I have a data mapping issue in between.

My code is roughly:

console.log("tx begin");
await sql.begin("READ WRITE", async (tsql) => {
    await Promise.all([...].map(async (x) => { 
         console.log("insert begin");
         await tsql`insert into table ${x}`;
         console.log("insert end");
    }));
    console.log("tx end"
 });
console.log("await end");

The output I get is:

tx begin
insert begin
insert end
...
insert begin
✨  Done in 102.00s.

I started looking a bit inside the begin implementation, inside scope, this promise never resolves or rejects.

 result = await new Promise((resolve, reject) => {
          const x = fn(sql)
          Promise.resolve(Array.isArray(x) ? Promise.all(x) : x).then(resolve, reject)
        })

On my postgres server I get the following error in the log:

[info] postgres | 2023-08-10 00:36:57.064 UTC [14596] LOG: unexpected EOF on client connection with an open transaction

The interesting part here is that the promise seemingly just stops executing and then the process exits. I wonder if this is a load balancing issue on my postgres server, will need to investigate more. In any case I would expect at least a client side error message.

@porsager
Copy link
Owner

porsager commented Aug 10, 2023

That error in your db logs is if the client connection is severed with no more information.

That should result in a connection error from your sql.begin`` promise, but it appears your process exits cleanly? Is there any chance something else in your code is swallowing the error?

@mschneider
Copy link
Author

I wrapped nearly everything in try catch as much as i can and none of the exception handlers get triggered.

@daviddanialy
Copy link

daviddanialy commented Aug 18, 2023

I am experiencing something which may be related - I have an application which calls a transaction-opening function at high volume, potentially concurrently. I have observed the following behavior multiple times:

  1. Transaction A begins
  2. Transaction A begins an upsert query
  3. Transaction B begins immediately after
  4. Transaction B begins the same upsert query on a different row
  5. Transaction A returns from the query
  6. Transaction A proceeds normally, executing another insert and committing
  7. Transaction B never resolves the initial query, and the transaction remains open in the database

I can see this being expressed in AWS RDS performance insights, where the query is shown as load on Client:ClientRead, meaning that the transaction remains open, waiting for the client to read the data:
image

It seems like the promise never resolves, or the process dies and doesn't throw an error/resolve the promise of the transaction function to close it out.

I have mitigated this for the time being by using a mutex lock on that function, but the ability for concurrency on that function is necessary for scaling.

Here is the flow of the code for that function:

await this.sql.begin(async (sql) => {
            // the upsert query
            object = await sql`
              INSERT...
            `
            if (!object) {
              throw new Error(
                `upsert returned null`
              )
            }
            if (boolean condition) {
              object2.property= object.property
            }
            // the insert query
            object3= await sql`
              INSERT INTO "table2"...
            `
            if (!object3) {
              throw new Error(`insert returned null`)
            }
            return
})

@mschneider
Copy link
Author

mschneider commented Aug 18, 2023 via email

@daviddanialy
Copy link

I also use multiple transactions in concurrency. Forgot to mention that.

On Fri, Aug 18, 2023 at 9:05 PM David Danialy @.> wrote: I am experiencing something which may be related - I have an application which calls a transaction-opening function at high volume. I have observed the following behavior multiple times: 1. Transaction A begins 2. Transaction A begins an upsert query 3. Transaction B begins immediately after 4. Transaction B begins the same upsert query on a different row 5. Transaction A returns from the query 6. Transaction A proceeds normally, executing another insert and committing 7. Transaction B never resolves the initial query, and the transaction remains open in the database I can see this being expressed in AWS RDS performance insights, where the query is shown as load on Client:ClientRead, meaning that the transaction remains open, waiting for the client to read the data: [image: image] https://user-images.githubusercontent.com/35443754/261684990-13d80b4f-c6ab-4f43-b6f3-29ed5c55eb57.png It seems like the promise never resolves, or the process dies and doesn't throw an error/resolve the promise of the transaction function to close it out. I have mitigated this for the time being by using a mutex lock on that function, but the ability for concurrency on that function is necessary for scaling. Here is the flow of the code for that function: await this.sql.begin(async (sql) => { // the upsert query variable = await sqlINSERT... if (!variable) { throw new Error( upsert returned null ) } if (boolean condition) { variable2.property= variable.property } // the insert query variable3 = await sqlINSERT INTO "table2"... if (!variable3) { throw new Error(insert returned null) } return }) — Reply to this email directly, view it on GitHub <#658 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/AABDF7BGI7YNKJ7IXDUIXBTXV64IVANCNFSM6AAAAAA3KXQWV4 . You are receiving this because you authored the thread.Message ID: @.>

Do you know if your issue happens only during concurrent transactions?

@boromisp
Copy link

Is it possible that the underlying TCP connection got silently dropped? AWS does that after a few minutes of inactivity if you didn't set up TCP keepalives.

@mschneider
Copy link
Author

Is it possible that the underlying TCP connection got silently dropped? AWS does that after a few minutes of inactivity if you didn't set up TCP keepalives.

i don't see how that could be the root cause of this issue. it can be an adjacent issue, but in any case that shouldn't cause the client process to exit with return code 0 completely outside of javascript control flow.

@daviddanialy
Copy link

Just swapped the query to execute through a different library and it works fine. There is almost certainly some bug in this library causing connection drops

@akalitenya
Copy link

Yesterday I had a similar issue. I was using PostgresJs over a Drizzle-orm (SSL) connection to Postgres 16 (Neon) located very far from my local development server (for testing purposes).
The small transaction contained 1 insert record in table A and 1 update record in table B. Due to the long distance to the database server, the transaction was slow (as expected).
Sometimes the transaction succeeds, sometimes it doesn't (no rows inserted into the database), but always silently (no errors detected). The Drizzle log always showed generated queries (without errors).
I don't remember whether the promises resolved as successful (if the transaction failed) or hung indefinitely, but there were certainly no errors caught.
Switching to node-postgres solved this issue.

@thupi
Copy link

thupi commented Oct 22, 2023

I am currently debugging a similar issue. It is fairly consistent to reproduce in my case but seems to be related to slow queries/connections/servers. I'll explain:

I have a local test server in Denmark. I also have a production server and a Postgres server in Netherland. The production server and Postgres server runs within the same network and therefore has a fast connection. Due to the distance between the test server in Denmark and the Postgres server in Netherland, the connection between those could be rather slow.

What I noticed is that the test server gets into a state after a while where the queries no longer resolves. So all the queries will stay pending and they will also not throw any errors which causes the services to timeout. Debugging from the Postgres server indicates that the connection is closed and that the server doesn't receive anything from the client. Restarting the app resolves the issues until it happens again.

Exactly why this happens I don't know yet but seems to be related to the closing and opening of connections. Lowering the 'idle_timeout' to 30 makes the test server get into the broken state faster(15 min). However, the exact same code running on the production server haven't experience these issues.

I'll be investigating a bit more, feel free to reach out if I can be of any assistance.

I guess the first things to look into would be to avoid the queries failing silently, to ensure that they throws an error if they can't succeed :-)

@porsager
Copy link
Owner

I finally found a little time to dig into this, and there is indeed no handling in transactions currently if the connection drops while no query is running. If the connection drops the transaction is broken, so we should of course throw in such a scenario.

I'm working on a fix now.

@porsager porsager added the bug Something isn't working label Oct 26, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

6 participants