Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Why is this execute Promise steadily increasing in execution/resolve time (resulting in NJS-040: connection request timeout error)? #474

Closed
jonesnc opened this issue Jul 12, 2016 · 20 comments
Labels

Comments

@jonesnc
Copy link

jonesnc commented Jul 12, 2016

I'm writing an ETL tool that interacts with an Oracle database which also uses node-oracledb 1.10 and RxJS to handle all of the various asynchronous data streams that I'm tossing around. I'm running into an issue where the longer my application runs, the longer a call to node-oracledb's .execute() takes, and it seems to increase in running time linearly for the same query. Hopefully you can spot the mistake in the code below and correct it.

First let me show how I'm running Oracle queries. I created my own .execute() function that acts as a wrapper around node-oracledb's .execute().

import oracledb from 'oracledb';
var oraConnPool;

export function execute(sql, bind, opts) {
  if (!oraConnPool) {
    createOraPool();
  }
  return oraConnPool
    .then(pool => pool.getConnection())
    .then(conn => conn.execute(sql, bind, opts));
}

function createOraPool() {
  let oraPool = oracledb.createPool(config.database.oracle);
  oraConnPool = oraPool;
  return oraPool;
}

And my config.database.oracle (without the credentials):

{
  "poolTimeout": 60,
  "poolMin": 10,
  "poolMax": 25,
  "queueRequests": true,
  "queueTimeout": 600000,
  "_enableStats": true
}

Below is an example of me invoking my .execute() function. As you can see, there's a lot happening here, so let me try to annotate it a bit for clarity. rnd is used to create a unique id for console.time(), so I can keep track of the time it takes for the .execute() Promise to resolve. Let me know if there's a flaw in this time measuring technique. The bind input variable passed to the SELECT statement is a csv string of ssid identifiers, and a list of matches will be returned. This allows me to batch process records instead of creating a single query for each individual row, hopefully saving some execution time. The first .then() makes every key in the resulting array of objects lowercase. The second .then(), obviously, ends the console.time() tracking.

const rnd = Math.random() * 100;
console.time(rnd);
return execute(`
  SELECT
    ssid_input.ssid AS ssid,
    students.id AS student_id,
    students.student_number AS student_number
  FROM (
         SELECT REGEXP_SUBSTR(
                    :ssids,
                    '[^,]+', 1, level) AS ssid
         FROM dual
         CONNECT BY REGEXP_SUBSTR(
                        :ssids,
                        '[^,]+', 1, level) IS NOT NULL
  ) ssid_input
  LEFT JOIN students ON students.state_studentnumber = ssid_input.ssid`, {
    ssids: {
      val: ssids.join(','),
      dir: orawrap.BIND_IN,
      type: orawrap.STRING
    }
  }, {
    outFormat: orawrap.OBJECT,
    maxRows: ssids.length
  })
  .then(results => {
    return results.rows.map(result => {
      let newObj = {};
      Object.keys(result).forEach(key => {
        newObj[key.toLowerCase()] = result[key];
      });
      return newObj;
    });
  })
  .then(result => {
    console.timeEnd(rnd);
    return result;
  });

Below is the console.time() output, which increases steadily until it hits the 60000 ms queueTimeout limit.

97.24179652744425: 12226.930ms
38.14057213652584: 14583.518ms
46.19793585774834: 16024.785ms
16.12600313565251: 17820.694ms
87.73720584788988: 20809.461ms
54.711100085462604: 22652.638ms
42.474404414891744: 24037.868ms
49.09845121453702: 26521.596ms
87.70258724764568: 29461.480ms
1.0731996619882223: 31210.875ms
90.33430329792829: 32259.944ms
37.4829457960367: 34076.824ms
9.731832830291932: 35292.281ms
/home/nathanjones/Projects/test-forge/node_modules/@reactivex/rxjs/dist/cjs/util/subscribeToResult.js:41
            root_1.root.setTimeout(function () { throw err; });
                                                 ^

Error: NJS-040: connection request timeout

I added a console.log(pool._logStats()) statement every time the .execute() function is called. I've included the output of the last time it was printed before the NJS-040 error:

Pool statistics:
...total up time (milliseconds): 62823
...total connection requests: 1794
...total requests enqueued: 1769
...total requests dequeued: 0
...total requests failed: 0
...total request timeouts: 0
...max queue length: 1769
...sum of time in queue (milliseconds): 0
...min time in queue (milliseconds): 0
...max time in queue (milliseconds): 0
...avg time in queue (milliseconds): 0
...pool connections in use: 25
...pool connections open: 25
Related pool attributes:
...queueRequests: true
...queueTimeout (milliseconds): 60000
...poolMin: 10
...poolMax: 25
...poolIncrement: 1
...poolTimeout (seconds): 60
...stmtCacheSize: 30

Related environment variables:
...process.env.UV_THREADPOOL_SIZE: undefined
undefined

It's strange that ...total requests dequeued is 0, shouldn't that be a non-zero value?

I've tried to include most of the relevant code, please let me know if you need more context.

@jonesnc jonesnc changed the title Why is my execute Promise steadily increasing in execution/resolve time (resulting in NJS-040: connection request timeout error)? Why is this execute Promise steadily increasing in execution/resolve time (resulting in NJS-040: connection request timeout error)? Jul 12, 2016
@dmcghan
Copy link

dmcghan commented Jul 12, 2016

@jonesnc I haven't poured over the code much, but I did notice that you don't seem to have a call to conn.close anywhere. When you call conn.close on a connection that was obtained from a pool, that connection is released back to the pool so it can be used later.

See the section of the doc related to promises for an example (uses close even though the connection isn't from a pool):
https://github.com/oracle/node-oracledb/blob/master/doc/api.md#promiseoverview

Let's start by addressing that and then we can look at other potential issues.

Also, I'll look into the dequeued count...

@dmcghan
Copy link

dmcghan commented Jul 12, 2016

@jonesnc I just ran a couple of tests on my end. The dequeued count seems to be working fine on my end, which means there's something strange going on on your side...

Could you share the code you're using to run your code? What is driving/calling execute?

@jonesnc
Copy link
Author

jonesnc commented Jul 12, 2016

I have modified my .execute() wrapper function to include calling .close() on the connection. This fixed the Error: NJS-040 error, but I think I'm still getting some slowness with .execute().

My project can be found here.

@dmcghan
Copy link

dmcghan commented Jul 12, 2016

I don't see your call to close in that project. Can you update it with the current code?

@jonesnc
Copy link
Author

jonesnc commented Jul 12, 2016

Ah, sorry. I'll push that change to the repo shortly.

@dmcghan
Copy link

dmcghan commented Jul 12, 2016

Ignore my deleted comment, silly mistake on my end.

Let me know when you've pushed the update.

@dmcghan
Copy link

dmcghan commented Jul 12, 2016

Also, could you show the logStats output after having made that change. Here's how I would describe what happened in what you showed before:

  1. poolMax was set to 25
  2. First 25 requests came in, they were satisfied immediately and skipped the queue (no enqueue or dequeue)
  3. Next 1769 requests came in. Because you were not releasing connections those requests were never satisfied - hence the dequeue of 0

@jonesnc
Copy link
Author

jonesnc commented Jul 12, 2016

I just pushed an update to the develop branch of the repo I linked to above. Is there a problem with the large disparity between total requests enqueued and total requests dequeued?

Pool statistics:
...total up time (milliseconds): 88808
...total connection requests: 20320
...total requests enqueued: 20295
...total requests dequeued: 67
...total requests failed: 0
...total request timeouts: 0
...max queue length: 20228
...sum of time in queue (milliseconds): 3284889
...min time in queue (milliseconds): 0
...max time in queue (milliseconds): 72310
...avg time in queue (milliseconds): 162
...pool connections in use: 24
...pool connections open: 25
Related pool attributes:
...queueRequests: true
...queueTimeout (milliseconds): 600000
...poolMin: 10
...poolMax: 25
...poolIncrement: 1
...poolTimeout (seconds): 60
...stmtCacheSize: 30
Related environment variables:
...process.env.UV_THREADPOOL_SIZE: undefined
undefined

@jonesnc
Copy link
Author

jonesnc commented Jul 12, 2016

Is this the best strategy for doing a large number of queries with small result size?

@dmcghan
Copy link

dmcghan commented Jul 12, 2016

Is this the best strategy for doing a large number of queries with small result size?

I'm not sure exactly what you mean... If you mean using execute with maxRows over ResultSets, then yes, I'd say it's fine. If you mean using a pool for lot of queries, then I would say yes again - definitely.

Please show me the updated logStats output now that you're closing connections.

Also, note that close is async. As you're using it, it will return a promise but you don't have a catch block checking for errors.

@jonesnc
Copy link
Author

jonesnc commented Jul 12, 2016

The logStats output above is from after I started closing connections.

If you mean using execute with maxRows over ResultSets

I'm not sure what you mean by that. Could you elaborate? Where does maxRows and ResultSets get set?

@dmcghan
Copy link

dmcghan commented Jul 12, 2016

Oh, sorry. GitHub didn't push your update... It's good that we are starting to see some dequeuing. :)

Is there a problem with the large disparity between total requests enqueued and total requests dequeued?

Yes, this would be problematic. In the most recent _logStats output, you issued 20,320 requests over 88 seconds which is about 231 requests per second. How long do you think these queries are taking to execute?

In any case, it seems like you have a flow problem. You're sending requests to the database faster than you can process them. Whatever's driving the flow should slow down and move at the speed of your database. Then you can explore means of tuning.

In the example query you provided you used CONNECT BY with REGEXP_SUBSTR. See this for an alternative means of doing this:
#137

Let us know which performs better.

Where does maxRows and ResultSets get set?

You're already setting maxRows, check your source. That gives you your results in an array, which is fine for you. If you had large result sets you should switch to using the ResultSet class which is described here:
https://github.com/oracle/node-oracledb/blob/master/doc/api.md#resultsethandling

For clarity in your code, could you please remove all references to Orawrap? I assume you were using it at sometime but have since switched over to the core driver (which is good).

@jonesnc
Copy link
Author

jonesnc commented Jul 13, 2016

How long do you think these queries are taking to execute?

It probably varies between the "batch" queries (those queries that use CONNECT BY with REGEXP_SUBSTR as you mentioned) and the "single" queries. A single "batch" query, for example, performed something like: 79 rows retrieved starting from 1 in 175ms (execution: 132ms, fetching: 43ms), which seems way too slow to be completing at a rate of 231reqs/sec.

I'll try the method you suggested and report back the results, hopefully sometime tomorrow.

In any case, it seems like you have a flow problem. You're sending requests to the database faster than you can process them. Whatever's driving the flow should slow down and move at the speed of your database. Then you can explore means of tuning.

You're right. I'm using RxJS, and I need to find a way to effectively use backpressure to slow down the requests.

You're already setting maxRows, check your source.

Ah, I see. I misunderstood what you were saying with that. I thought you meant that it was getting set elsewhere and that I didn't need to set it.

Thanks for your help, you've already helped the performance on this project greatly!

@cjbj cjbj added the question label Jul 13, 2016
@cjbj
Copy link
Member

cjbj commented Jul 13, 2016

UV_THREADPOOL_SIZE is shown as undefined, which means the default 4 worker threads are being used. However your connection pool is 20. You may want to experiment increasing the number of threads. See https://github.com/oracle/node-oracledb/blob/v1.10.0/doc/api.md#number-of-threads

If you are instrumenting your code, stats on connection times as well as stats on execution times would be useful.

A side note: The pool stats time in queue values only record the time that dequeued connections spent in the queue. Any connection currently in the queue will not be recorded in these stats.
You can augment the stats to traverse the queue if you want a more accurate figure, however even without this you can see from the number of connections in the queue that the app is under more load than it is configured to handle.

@jonesnc
Copy link
Author

jonesnc commented Jul 13, 2016

If you are instrumenting your code

I'm not sure what you mean by "instrumenting". Could you elaborate?

@dmcghan
Copy link

dmcghan commented Jul 13, 2016

I'm not sure what you mean by "instrumenting". Could you elaborate?

Here's an old Tom Kyte post on that (the first couple paragraphs will answer your question):
http://tkyte.blogspot.com/2005/06/instrumentation.html

You've already started with some basic instrumentation with console.time. I think Chris wants to see it enhanced a little. For example, if you move your console.timeEnd(rnd); from above to the first then block after the execute, then you'll be closer to measuring the actual time to execute time.

Another measurement could be added to see how long it's taking to get connections prior to execution. Then you'd have more data to help you understand what's happening.

Chris has a really good point on the UV_THREADPOOL_SIZE. I would try bumping that up to be at least as high as your poolMax. Then try adjusting the size of the pool and UV_THREADPOOL_SIZE up and down to see how things perform. Just remember, more isn't always better. Try both at 50, 75, 100 - look for the sweet spot. Let us know what you get.

@cjbj
Copy link
Member

cjbj commented Jul 21, 2016

@jonesnc how are you going with this?

@jonesnc
Copy link
Author

jonesnc commented Jul 21, 2016

@cjbj Sorry I haven't updated this issue yet. I've been at a conference all week, so I haven't had a chance to carry out any of the suggestions made here.

I'll try to provide an update on Friday.

@cjbj
Copy link
Member

cjbj commented Jul 21, 2016

@jonesnc no problems.

@cjbj
Copy link
Member

cjbj commented Oct 19, 2016

Closing due to lack of activity.

@cjbj cjbj closed this as completed Oct 19, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants