Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Connection.execute quietly hangs without calling the callback #481

Closed
balabanshik opened this issue Jul 25, 2016 · 7 comments
Closed

Connection.execute quietly hangs without calling the callback #481

balabanshik opened this issue Jul 25, 2016 · 7 comments
Labels

Comments

@balabanshik
Copy link

TL;DR In some cases, connection.execute() can just hang indefinitely without calling the callback.

Slightly more details:
We have code that inserts a bunch of records into a bunch of tables. Some time ago, we noticed an unpleasant number of orphan records. So we wrapped the whole process into a transaction. A few days after that, we noticed that autoCommit was set in the options, so we set that to false. That's when the whole process started to hang. Vigorous debugging led to a failing query (a dependent record being created outside of a transaction, causing a foreign key constraint failure) that simply does not call its callback at all.

Process:
The business logic is isolated by a model layer, so the query execution code is fairly generic. The overall sequence of events is roughly as follows:
Create a transaction connection (T)
Insert a master object inside T; let's call it 'user'
Insert a few other rows (a dozen or so) into a couple other tables, all using user ID acquired above, and all but one operating inside T
One of the inserts uses an incorrect connection (i.e. it attempts to use the main connection, not the transaction); this would create an integrity constraint error, but the query never finishes and never returns to the callback.

Technical specs:
Initially, this was observed using a fairly old set of software:
OS X,
oracle binaries macos.x64-11.2.0.4.0,
oracledb module 1.2.0
Then we updated to oracledb 1.10.1 and more recent binaries, but that didn't change the behaviour.

Code snippet (some parts removed for simplicity):

function query (sql, values, fn) {
 var queryParts = sql.split(' ')
  var queryWord = queryParts[0]
  getOracleConnection(function (error, connection) {
    if (error) {
      process.log.warn('getOracleConnection failed with: ' + JSON.stringify(error))
      return fn(error)
    }
    process.log.info(queryWord + ' QUERY executing')  /// LINE 1
    connection.execute(sql, values, options, function (error, result) {
      process.log.info(queryWord + ' QUERY resolved')  /// LINE 2
      if (!error) {
        return fn(undefined, result)
      }
      process.log.warn('SQL Statement Failed: \n' + sql + '\n' + JSON.stringify(values), error)
      if (isTransaction && isConnectionError(message)) {
        // Our transaction has permanently failed.
        closeOracleConnection(function (error) {
          if (error) {
            process.log.warn('Failed to close Oracle connection.', error)
          }
        })
        return fn(error)
      }
      return fn(error)
    })
  })
}

So what happens is that the re would be a bunch of console lines about queries executing (LINE 1) and resolving (LINE 2), but the failing query would only produce the first of them, and not the second. Therefore, for whatever reason, the callback is not called. Further, in some cases Node itself may become unstable and not respond to ^C. This hints at the snag being somewhere in the native code.

@cjbj
Copy link
Member

cjbj commented Jul 25, 2016

Are you locking up all the threads? See https://github.com/oracle/node-oracledb/blob/master/doc/api.md#numberofthreads
If you're using a pool, what does _logStats() show

@cjbj cjbj added the question label Jul 25, 2016
@balabanshik
Copy link
Author

balabanshik commented Jul 25, 2016

Yes, we are using a pool. Here are the stats, just after running the last (hanging) query:

Pool statistics:
...total up time (milliseconds): 40398
...total connection requests: 2
...total requests enqueued: 0
...total requests dequeued: 0
...total requests failed: 0
...total request timeouts: 0
...max queue length: 0
...sum of time in queue (milliseconds): 0
...min time in queue (milliseconds): 0
...max time in queue (milliseconds): 0
...avg time in queue (milliseconds): 0
...pool connections in use: 2
...pool connections open: 2
Related pool attributes:
...queueRequests: true
...queueTimeout (milliseconds): 60000
...poolMin: 1
...poolMax: 16
...poolIncrement: 1
...poolTimeout (seconds): 30
...stmtCacheSize: 30
Related environment variables:
...process.env.UV_THREADPOOL_SIZE: undefined

The zeroes are not very inspiring. Did I do something wrong?

Changing UV_THREADPOOL_SIZE doesn't seem to affect anything. Tried setting it to 20, for example, doesn't make any difference.

@cjbj
Copy link
Member

cjbj commented Jul 26, 2016

If you are getting blocked with only 2 connections, perhaps you are simply deadlocking yourself?Sounds like a chance to do some more debugging.

@balabanshik
Copy link
Author

  1. None of the queries involved touch the same tables
  2. None of the queries use locking of any kind. The only queries are inserts and selects (without 'for update')
  3. Wouldn't it fail with ORA-00060 after a little bit, instead of hanging indefinitely?
  4. Even if no error is encountered, shouldn't there be a timeout of some sort, to ensure the callback gets called?

@dmcghan
Copy link

dmcghan commented Jul 26, 2016

@balabanshik I know it's a pain, but could you try to put together a reproducible test case? Without that we have to wonder about how you might have written your application and where the issue really exists.

Chris' comments on deadlocking may be spot on. Try setting your UV_THREADPOOL_SIZE to a higher number, like 20 or 30. Test and let us know how that goes.

See #395 for more details.

@cjbj
Copy link
Member

cjbj commented Jul 26, 2016

@balabanshik You could also experiment with turning autocommit on and see what effect that has.

@balabanshik
Copy link
Author

Well, I've spent some time trying to find the suggested deadlock, and couldn't find any. In the meanwhile, we refactored some stuff, and the problem stopped somehow. Wish I had a more enlightening resolution.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants