Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Queries hang after long idle period (~hr), until service restarted #460

Closed
CollinEstes opened this issue Jun 17, 2016 · 13 comments
Closed

Comments

@CollinEstes
Copy link

CollinEstes commented Jun 17, 2016

I'm running an Express API who's only purpose is to execute queries using node-oracledb against our Oracle 11g DB. Everything works perfectly, I'm even running ETL jobs hammering the API with requests and everything executes fine.

The ETL jobs all finish but then usually about an hour or so after that, any request to that API simply timeout. Debugging it, it seems the queries just never return any results. I restart the Node process (in my case bounce the docker container) and everything starts working again. I'm using a resultSet (although I'm moving towards using the stream interface soon) and connection pool,

I'm wondering if this is related to this issue:
#395

Or is there something else I'm doing incorrectly here.

Here is my code where I setup my connection pool (I use a singleton of sorts...)

'use strict';
var oracledb = require('oracledb')
  , config = require('../config/config')
  , DB_ENV = process.env.NODE_ENV || 'staging'
  , connectObj = config[DB_ENV]
  ;

let instance = {};

module.exports = function OracleConnectionPool(cb) {
  var createInstance = function createInstance(ctx, done) {
    oracledb.createPool(connectObj, function (err, pool) {
      if (err) { return done(err); }
      ctx.pool = pool;
      done(null, ctx);
    })
  };


  if (!instance.pool) {
    createInstance(instance, function (err, ctx) {
      instance = ctx;
      cb(err, instance.pool);
    });
  } else {
    cb(null, instance.pool);
  }
};

And here is my code for executing my queries:

execute(cb) {
    let sqlToExecute = this.query
      , pool = this.pool
      , queryParams = this.queryInputParameters
      , queryResults = []
      ;

    function doRelease(connection) {
      connection.release(
        function(err)
        {
          if (err) { console.error(err.message); }
        });
    }

    function doClose(connection, resultSet) {
      resultSet.close(
        function(err)
        {
          if (err) { console.error(err.message); }
          doRelease(connection);
        });
    }

    checkCallback(cb);

    // connect to oracle db
    pool.getConnection(function(err, connection) {
      if (err) {
        return cb(new Error('Unable to establish connection'));
      }

      connection.execute(
        sqlToExecute,
        queryParams,
        { resultSet: true,
          outFormat: oracledb.OBJECT
        },
        function(err, result) {
          if (err) {
            doRelease(connection);
            return cb(err);
          }

          let fetchRowsFromRS = function (connection, resultSet, numRows) {
            resultSet.getRows( // get numRows rows
              numRows,
              function (err, rows)
              {
                if (err) {
                  doClose(connection, resultSet);
                  return cb(err);
                }

                if (rows.length > 0) {
                  rows.forEach((row) => queryResults.push(row));
                  if (rows.length == numRows) {
                    fetchRowsFromRS(connection, resultSet, numRows);
                  } else {
                    rows.forEach((row) => queryResults.push(row));
                    cb(null, queryResults);
                    doClose(connection, resultSet);
                    return;
                  }
                } else {
                  cb(null, queryResults);
                  doClose(connection, resultSet);
                  return;
                }
              });
          };

          fetchRowsFromRS(connection, result.resultSet, ROW_COUNT)

        });
    });
  }

(There is obviously some state in that code, I'm using ES6 classes, which I plan on refactoring out because they stink...)

I'm running everything in a docker container (https://hub.docker.com/r/collinestes/docker-node-oracle/).

Debian, Node 4.4

Any help is much appreciated, would love to get this API to a more stable point but right now I have to make sure I bounce it every hour or so. Memory usage looks fine no memory leaks from what I can see.

@CollinEstes CollinEstes changed the title Queries hangs after long idle period (~hr) Queries hang after long idle period (~hr) Jun 17, 2016
@CollinEstes CollinEstes changed the title Queries hang after long idle period (~hr) Queries hang after long idle period (~hr), until service restarted Jun 17, 2016
@dmcghan
Copy link

dmcghan commented Jun 17, 2016

@CollinEstes Sorry, just noticed you did say the results don't seem to be coming back...

Can you elaborate on that point?

@CollinEstes
Copy link
Author

So I've done a good portion of this, will revisit and document with some fresh examples.

Thanks @dmcghan... more to come...

@CollinEstes
Copy link
Author

@dmcghan I'm trying to pinpoint where exactly it just hangs until it the req is timeout is exceeded. I believe the connection.execute() callback is never called but trying to get some logging in there now so I can confirm. Either there or the pool is never getting a connection for that request.

@dmcghan
Copy link

dmcghan commented Jun 17, 2016

Sounds good. Logging just before and in the callback of connection.execute would help...

@cjbj
Copy link
Member

cjbj commented Jun 17, 2016

I'd suspect the connection is being killed by something in your environment. You can tweak various net parameters so Oracle gets network errors back faster. If this is the cause, the ping feature mentioned in #300 should help; it's high on my wish-list. Also #443 has some related info.

@CollinEstes
Copy link
Author

Ok I was able to reproduce with logging, and see the following:

GET /sarah/components?page=1&per_page=100 200 1009.037 ms - -
about to get a connection
got a connection about to execute query
got a query result about to execute fetchRows
done with result callingback
GET /sarah/master/9239/billofmaterials 200 931.899 ms - -
about to get a connection

The first request at the top there completed a couple hours ago and you can see the console logs for when a request completes correctly.

The second request there does not ever complete and it appears that it never gets into the the callback from the pool.getConnection() call.

My pool size is the default 4 and I am running now with "UV_THREADPOOL_SIZE=6" env variable.

I restart the service and that request completes as expected.

Any ideas?

@CollinEstes
Copy link
Author

I'm looking at #300 and #443

@CollinEstes
Copy link
Author

Closing as duplicate of #300.

+1 for the ping feature @cjbj, wish my C++ was good enough to help on it and I would. We are using this library at NASA and right now I have a scheduler bouncing my container every 1hr or so. Now that I understand the issue better I'm sure I can put in a interim solution. Thanks for the work and the library regardless you are helping NASA and specifically the EVA program.

@ecowden
Copy link

ecowden commented Jun 18, 2016

Hey @CollinEstes,

I'm still not sure if it's a perfect approach, but if you're looking for an "interim solution," you may want to consider using generic-pool or similar in place of the driver's pool. Just give it a validate function and it can verify that a connection is healthy before vending it. I've generally found it to be pretty well behaved.

@CollinEstes
Copy link
Author

@ecowden yeah thanks that is a good idea, I was thinking of some sort of ping or health check function on an interval in my app, it isn't heavy use at this point but the intention is to keep growing it

@dmcghan
Copy link

dmcghan commented Jun 18, 2016

@ecowden I'm not convinced this is a dup of #300? Folks there were getting ORA-03113s, ORA-03135s, etc. You were reporting no errors, just hanging...

@CollinEstes
Copy link
Author

@dmcghan I confirmed I also receive the "Error: ORA-03113: end-of-file on communication channel". However it does take quite a while for the callback on that pool.getConnection() to to return (like several mins).

@cjbj
Copy link
Member

cjbj commented Jun 21, 2016

Tuning OS network timeouts or using FAN (with an appropriate Oracle DB setup, e.g. RAC) are options, depending on your quality-of-service requirements and how complex your system is. I think connection pinging will solve most basic use cases I've seen, and will be a good start. For more complex and demanding HA requirements, tools like FAN can be used, or custom app logic can be implemented if you want to DIY.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants