Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multiple queries hang silently #395

Closed
CorvusCorrax opened this issue Mar 30, 2016 · 31 comments
Closed

Multiple queries hang silently #395

CorvusCorrax opened this issue Mar 30, 2016 · 31 comments
Labels

Comments

@CorvusCorrax
Copy link

Hi,
This code crash before commits

var nbCnx = 20;
var oracledb = require('oracledb');
var ident = 0;

for (var i = 0; i < nbCnx; i++) {
  oracledb.getConnection( {
      user: "user",
      password: "password",
      connectString: "(DESCRIPTION = (ADDRESS = (PROTOCOL = TCP)(HOST = test-host)(PORT = 1521))(CONNECT_DATA =(SERVER = DEDICATED)(SERVICE_NAME = myservice)))"
    },
  function(err, connection) {
    if (err) {
      console.error(err.message);
      return;
    }
    connection.ident = ident++;
    connection.execute('update "TEST" set "INTEGERTEST" = 10, "STRINGTEST" = \'Update test string\' where "IDTEST" > 1661', [],{autoCommit: false}, function(err, result) {
      if (err) {
        console.log('ERROR : ', err);
      }
      console.log('RESULT UPDATE : ', result);
      connection.commit(function(err) {
        if(err) {
          console.log('COMMIT ERR : ', err);
        }
        console.log('COMMIT ID : ', connection.ident);
      });
    });
  });
}

Here is the only result :

RESULT UPDATE :  { rowsAffected: 6,
  outBinds: undefined,
  rows: undefined,
  metaData: undefined }

This is working when nbCnx is lower than 8, but higher than that it just silently hangs.
Where am i wrong ? Could you help me ?

@dmcghan
Copy link

dmcghan commented Mar 30, 2016

@CorvusCorrax You need to release connections when you're done with them: connection.release(). You should do this inside your commit callback (whether or not an error occurs).

Also, keep in mind that you only have 4 background threads by default in Node.js. If you need more you'll need to set UV_THREADPOOL_SIZE higher.

Look into using async to better control execution flow in Node.js.

@CorvusCorrax
Copy link
Author

Thanks for your help, UV_THREADPOOL_SIZE solved the problem.
Maybe getConnection() should return an error when this fails ?

@dmcghan
Copy link

dmcghan commented Mar 30, 2016

I don't think it's failing, I think it's waiting for a thread to become available to handle your request for another connection. Again, you need to release connections when you're done with them. Upping the UV_THREADPOOL_SIZE for your example has only masked a much larger problem.

@CorvusCorrax
Copy link
Author

This code fails too...

var oracledb = require('oracledb');
var ident = 0;
for (var i = 0; i < 20; i++) {
  oracledb.getConnection( {
      user: "user",
      password: "password",
      connectString: "(DESCRIPTION = (ADDRESS = (PROTOCOL = TCP)(HOST = test-host)(PORT = 1521))(CONNECT_DATA =(SERVER = DEDICATED)(SERVICE_NAME = myservice)))"
    },
  function(err, connection) {
    if (err) {
      console.error(err.message);
      return;
    }
    connection.ident = ident++;
    connection.execute('update "TEST" set "INTEGERTEST" = 10, "STRINGTEST" = \'Update test string\' where "IDTEST" > 1661', [],{autoCommit: false}, function(err, result) {
      if (err) {
        console.log('ERROR : ', err);
      }
      console.log('RESULT UPDATE : ', result);
      connection.commit(function(err) {
        if(err) {
          console.log('COMMIT ERR : ', err);
        }
        console.log('COMMIT ID : ', connection.ident);
        connection.release(function(err) {
          console.log('RELEASE ERR : ', err);
        });
      });
    });
  });
}

Connections are never released, it hangs at commit.
Here are the results :

RESULT UPDATE :  { rowsAffected: 6,
  outBinds: undefined,
  rows: undefined,
  metaData: undefined }
RESULT UPDATE :  { rowsAffected: 6,
  outBinds: undefined,
  rows: undefined,
  metaData: undefined }
COMMIT ID :  0

@CorvusCorrax CorvusCorrax reopened this Mar 30, 2016
@cjbj
Copy link
Member

cjbj commented Mar 30, 2016

I believe it hangs for the same reason @dmcghan mentioned https://github.com/oracle/node-oracledb/blob/master/doc/api.md#number-of-threads

@cjbj cjbj changed the title Multiple queries crash silently Multiple queries hang silently Mar 30, 2016
@cjbj cjbj added the question label Mar 30, 2016
@dmcghan
Copy link

dmcghan commented Mar 30, 2016

Yeah, that makes sense. This is a unique case since the update statements are updating the same rows. The first update goes to the table and updates the rows but doesn't commit (that means it's placed row level locks on them). Then 3 more updates are executed on the same set of rows but they are blocked waiting on the first transaction to commit. But the first transaction can't commit because all of the threads to do so are used up by execute statements.

It's the Node.js version of a deadlock. :)

The code should be updated to serialize the statements or at least provide and utilize threads sufficiently to handle this "unique" workload.

@dmcghan
Copy link

dmcghan commented Mar 30, 2016

Another option, now that I think about it, would be to have the update statements first attempt to lock the rows with select for update. Then you could use the nowait option to tell Oracle not to wait and instead throw an error.

@CorvusCorrax
Copy link
Author

The issue was indeed quite specific. A more careful thread handling solved the problem.
Thanks for your help guys !

@timon
Copy link

timon commented Jun 11, 2016

Hi all, we have just had experience the same problem.
Suppose you have to update an user entry in database each time he logs on.

So for this straightforward task you simply do

// ... user is found and authorized already
db.execute(
  'UPDATE users SET last_logon = :0 WHERE id = :1', 
  { '0': new Date(), '1': user.id },
  commitAndPassTheUserBack
);`

Then whole application stops responding, because someone have just opened Chrome with your app loaded in multiple tabs.

This is a potential way for causing denial of service.
Yes, we could enable auto commit (for this case yes, but generally no).
Yes, we could speak to security & audit and tell them that 'logging this event once per ten seconds interval is fine too'
But besides that, how should the issue be addressed?

@dmcghan
Copy link

dmcghan commented Jun 11, 2016

@timon What is your pool config? What is your UV_THREADPOOL_SIZE?

This is not an Oracle Database problem, but rather a problem of resource utilization (threads) with Node.js. Here are some thoughts:

  • Why wouldn't you use autoCommit? This would not only prevent the issue, it would also increase performance, scalability, etc.
  • Increase the number of threads you're allowing Node.js to use (the default of 4 is very low) and increase connections in the connection pool. However, this will only get you so far before you have to put a queue in front of your API to prevent overloading available resources.
  • Push it do the database via PL/SQL: https://jsao.io/2015/04/to-javascript-or-not-to-javascript-that-is-the-question/ This is similar to autoCommit in that it reduces round trips and uses less threads. If each transaction is a single call to the database then you can ensure that the problem you're running into can't happen as you'll never need another thread to get back and finish the work.

@cjbj
Copy link
Member

cjbj commented Jun 11, 2016

There are a few notes on how you can end up locking all the threads and on setting UV_THREADPOOL_SIZE in the doc: https://github.com/oracle/node-oracledb/blob/v1.9.3/doc/api.md#connpoolmonitor

You should commit when appropriate for your data integrity but use the autoCommit option in preference to an explicit commit(). (Overcommitting reduces overall scalability because of the extra work done by the DB. )

@atiertant
Copy link

@dmcghan you are right :

This is not an Oracle Database problem, but rather a problem of resource utilization (threads) with Node.js

so this is a node-oracledb problem ...

@dmcghan
Copy link

dmcghan commented Jun 14, 2016

If you know how all the moving pieces work (node.js, node-oracledb, and Oracle Database) and you design an application in a way that it requires additional resources to finish work (by not using autoCommit, PL/SQL, etc.) AND you don't ensure those additional resources are available to finish the work, then I would see it as an application design problem.

Are you aware of other RDBMS drivers that offer solutions for this? The only solution I could imagine would be to ensure that the driver never uses all the background threads for async operations. In theory, that would mean we could always have some threads to process commit or rollback operations (which unlock rows).

Perhaps it would be possible to accomplish this in the JS layer. We'd have to route all async requests through a queue with a concurrency set no higher than (UV_THREADPOOL_SIZE - 1). Then we could let commit and rollback operations to bypass the queue.

One problem I see is that you'd be limiting overall concurrency on systems where people leave the default UV_THREADPOOL_SIZE of 4.

It's worth thinking about some more...

@atiertant
Copy link

@dmcghan never seen this problem with other RDBMS drivers...
this could be handle in a pool: when trying to acquire a ressource not available then throw an error.
this could solve the "silently" part of the problem.
if this is possible to make sure this never appear this would be the best !

@dmcghan
Copy link

dmcghan commented Jun 14, 2016

@atiertant Just because you haven't seen it doesn't mean it doesn't exist. ;)

this could be handle in a pool: when trying to acquire a ressource not available then throw an error.

By "acquire a ressource", I assume you mean a lock on the rows that someone else has already locked, right? In that case we don't need to make any changes to the driver. One could change the transaction isolation level from READ COMMITTED to SERIALIZABLE or use a little PL/SQL with the NOWAIT option. But then you just get an error that you have to handle - what would you do with that? Retry???

Why would we do any of that when the driver already provides this super simple autoCommit feature which solves this problem beautifully? This is a single statement transaction, so autoCommit is perfect. If this were a multiple statement transaction then a little PL/SQL could solve the problem an only use 1 thread/connection/round trip for the transaction.

@atiertant
Copy link

@dmcghan

Just because you haven't seen it doesn't mean it doesn't exist. ;)

only known bug can be fixed ! if this bug whould produce only once a century noone talk you about...

By "acquire a ressource", i mean check that we can use a thead.(thread count is lower than UV_THREADPOOL_SIZE)

But then you just get an error that you have to handle - what would you do with that? Retry???

say user the action he does is cancelled and free memory.
this not because a request crash that all my application should crash !

@cjbj
Copy link
Member

cjbj commented Jun 16, 2016

@atiertant thanks for pushing us on this. We will, of course, improve where we can. I don't know whether this particular scenario can be resolved: how will the n-1 thread be known to be blocking or not until it is actually used?

@atiertant
Copy link

@cjbj where does this threads are created in js code?

@cjbj
Copy link
Member

cjbj commented Jun 16, 2016

@atiertant threading is handled by the C++ layer

@atiertant
Copy link

hummm there is one big C++ exec who do all job and interact with nodejs... maybe not the best way.
whould it be the same problem if js start threads himself ?

@dmcghan
Copy link

dmcghan commented Jun 16, 2016

threading is handled by the C++ layer

As I understand it, threading it handled by libuv. libuv creates a pool of threads then C applications (such as the driver) interface with it using uv_queue_work. In that sense, you're just asking libuv to add your "work" to the queue and it will run it as soon as a thread is available.

The only way to prevent the driver from using more than n-1 threads, is to not add more than n-1 requests to the queue at any time. I think it would be possible to do this in the JS layer by routing all async calls through a new queue that doesn't go beyond n-1 at any point in time. The exception to the rule would be for commit and rollback operations, which would skip the new queue.

In theory, unless some some other background job was using the 1 open thread (say writing to a large file), the commit/rollback operations should be able to use the open thread to release locks, thus allowing the blocked connections to finish their work and ensure that deadlock doesn't occur.

The biggest downside to this approach is that you limit the driver to n-1 threads for most work. With the default of 4, this means you're reducing the number by 25%, which could have an adverse affect on overall performance.

@atiertant
Copy link

The exception to the rule would be for commit and rollback operations, which would skip the new queue.

doesn't commit and rolback done before operations on which it should operate ?

this means you're reducing the number by 25%, which could have an adverse affect on overall performance

in fact with the actual code without changing UV_THREADPOOL_SIZE, perf are already low...
and i suppose noone would like their app to crash.

@dmcghan
Copy link

dmcghan commented Jun 16, 2016

doesn't commit and rolback done before operations on which it should operate ?

I'm not sure what you're asking here. Could you please clarify?

in fact with the actual code without changing UV_THREADPOOL_SIZE, perf are already low...
and i suppose noone would like their app to crash.

Yeah, but defaults are defaults, and many people aren't aware of them and how to adjust them. I'd be in favor of some kind of "warning messages" feature that tries to help users with these kinds of things.

@atiertant
Copy link

I'm not sure what you're asking here. Could you please clarify?

well, if i use more than n-1 threads for insert,delete,update and i commit after.
some insert,delete,update ... are in the queue but as commit skip this queue, when commit is done, is there some insert,delete,update in queue who was related to this commit?

Yeah, but defaults are defaults

every other products i use work well with defaults...

I'd be in favor of some kind of "warning messages" feature that tries to help users with these kinds of things.

install oracledb is complex,
install node-oracledb is complex (because of licenses),
using it could crash sometime or have poor performance but don't worry read all logs to be sure there is no "warning messages"...

ps: sorry to be ironic ;)

@dmcghan
Copy link

dmcghan commented Jun 16, 2016

well, if i use more than n-1 threads

You wouldn't be able to as the queue would prevent it. You'd just call normal methods like getConnection and execute - the queue would make sure that no more than n-1 operations are running at any given point in time.

is there some insert,delete,update in queue who was related to this commit?

That wouldn't happen as your code wouldn't invoke commit unless the other work had already been completed.

every other products i use work well with defaults...

I don't see where you're going with this. The default UV_THREADPOOL_SIZE is not under our control - it's set by Node.js.

At any rate, a colleague recently explained to me why the n-1 concept wouldn't work as the complexity of the work scales up - so that's not a solution that would work anyway.

@atiertant
Copy link

https://github.com/Atlantis-Software/knex/blob/2a1c33b69eb3446b045b33a2ae7be15f966d4d51/src/dialects/oracledb/index.js#L22
you could increase it like this (tested on 0.10 only) and handle leak in JS layer
or
handle multiple C++ process in JS layer (nodejs should handle many many async C++ process)

@dmcghan
Copy link

dmcghan commented Jun 16, 2016

you could increase it like this

UV_THREADPOOL_SIZE must be set very early on, before async calls are made. As such, there's no guarantee your code will work. Also, you shouldn't be doing this. What if some other module did this too, only they set it lower than you did?

Finally, this doesn't truly solve the problem. What if I have multiple pools?

and handle leak in JS layer

What do you mean by this?

handle multiple C++ process in JS layer

This isn't not the problem. The problem is hitting the limit of background threads available in libuv.

@atiertant
Copy link

UV_THREADPOOL_SIZE must be set very early on, before async calls are made.

you are right ! i just made this to make my code work before you fix.

Also, you shouldn't be doing this. What if some other module did this too, only they set it lower than you did?

when you know you don't have enougth available thread, you got a choice:

  • do nothing and be sure process crash !
  • only alert user then process crash and user must set it himself (exactly the same problem if and other module need lower value)
  • add one to UV_THREADPOOL_SIZE and why not alert user default value was to low

Finally, this doesn't truly solve the problem.

i don't think writting threading app with nodejs is good idea.

What if I have multiple pools?

exacly the same ! if node-oacledb need more thread then add one...

and handle leak in JS layer
What do you mean by this?

there is one thing worst than a app crashing: a system crash !
if every time you add more and more and more thread, system will go slower and could crash !
so you need to detect anormal behavior

handle multiple C++ process in JS layer
This isn't not the problem. The problem is hitting the limit of background threads available in libuv.

if you use multiple synchronous process instead of multiple theads in a single process.
no thread so no UV_THREADPOOL_SIZE limit...

@dmcghan
Copy link

dmcghan commented Jun 20, 2016

you are right ! i just made this to make my code work before you fix.

There's a difference between writing code as an end user and for an end user. Knex meant for an end user. That's why you can't guarantee it will work - you don't know if the end user will have done any async calls prior to using your module. The fact that it works for you because you know how it needs to be used doesn't help end users.

We don't yet know if there is a reasonable fix that can be done on our side save the tools everyone already has at their disposal, such as:

  • autoCommit
  • PL/SQL based transactions

We could kick around the idea of an execute method that works for an entire transaction based on a queue. But even that would require proper understanding and use.

do nothing and be sure process crash !

This issue is not about a "process crash", it's about hanging/waiting due to a client induced deadlock. The crashing reported earlier was a separate issue.

why not alert user default value was to low

I suggested a "warning messages" feature earlier but you didn't seem to like the idea.

@cjbj
Copy link
Member

cjbj commented Jun 21, 2016

@atiertant, If I understand correctly, you are trying to work out the best way to stop a deadlock when Node runs out of threads. The basic premise is that a node-oracledb worker thread will get blocked when it is doing DB work; this isn't going to change in the foreseeable future. And since I can't solve the Halting Problem, I don't know how node-oracledb can be expected to predict what DB operations the app is going to do, and when not to allow the app to block itself.

@atiertant
Copy link

@cjbj
this could be a temporary solution:
if UV_THREADPOOL_SIZE env is not set, set it at 128 at start and warn user UV_THREADPOOL_SIZE was too low. (warn user without doing anything is not user friendly)
this will not solve the problem but this could give you some time to solve the real problem.

libuv/libuv#382

why do you need to lock thread?
isn't it possible to lock only the operation on the thread and assign it a new task until operation unlock?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

5 participants