Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

reuse connections on table.insert #41

Closed
alexander-fenster opened this issue Dec 18, 2017 · 14 comments · Fixed by googleapis/teeny-request#54
Closed

reuse connections on table.insert #41

alexander-fenster opened this issue Dec 18, 2017 · 14 comments · Fixed by googleapis/teeny-request#54
Assignees
Labels
api: bigquery Issues related to the googleapis/nodejs-bigquery API. perf type: feature request ‘Nice-to-have’ improvement, new feature or different behavior or design.

Comments

@alexander-fenster
Copy link
Member

alexander-fenster commented Dec 18, 2017

TL;DR: the current implementation of BigQuery establishes a new connections on each table.insert() call. This causes problems for Cloud Functions because of connection quota that our users hit. Please update the client libraries to reuse connections.

Why new connections cause problems:

Cloud Functions is a serverless platform for executing snippets of code. Many our customers use function to store events in Big Query. GCF has a quota on connections, and this quota is relatively low because each connection requires a NAT port to be allocated and maintained until TCP packet timeout; and there are only 32k ports per server.

Why BigQuery is worse than other client libraries:

I found three types of clients:

  1. Clients that always reuse connections - even if we construct the client object in a local scope - @google-cloud/storage
  2. Clients that reuse connectoins only if we declare the client object at global scope - @google-cloud/pubsub, @google-cloud/language, @google-cloud/spanner
  3. Clients that never reuse connectoins - @google-cloud/bigquery

It would be nice if all libraries worked as 1.

How to reproduce the problem:

Every call to the function exported by the code below creates a new connection.

******************* function.js ****************

const bigquery = require('@google-cloud/bigquery')();
const table = bigquery.dataset('Tests').table('RandomNumbers');

exports.function = function(req, res) {
  var r = Math.floor(Math.random() * 100);
  console.log("Generated random number " + r);
  var row = {number: r};
  table.insert(row, function(err, apiResponse) {
    if (err) {
      result = 'Error inserting data to bigquery';
      console.log(result + ": " + JSON.stringify(err, null, 2));
      res.status(500).send("Error" + err);
    } else {
      res.status(200).send("" + r);
    }
  });
};

************* package.json ****************

{
  "dependencies": {
    "@google-cloud/bigquery": "^0.10.0"
  }
}

We also found that BigQuery uses IPv4, while other libraries prefer IPv6. This should also be fixed.

[Googlers: internal tracking id b/68240537]

@alexander-fenster alexander-fenster added the priority: p2 Moderately-important priority. Fix may not be included in next release. label Dec 18, 2017
@stephenplusplus
Copy link
Contributor

I believe what we want is a Keep-Alive header for our requests. This is possible using the npm request module that we use, and is actually the default behavior. However, we turned it off specifically for Google Cloud Functions due to users receiving lots of socket hang up errors (see here and the rest of the discussion in the issue: googleapis/google-cloud-node#2254 (comment)).

@stephenplusplus
Copy link
Contributor

@shannb
Copy link

shannb commented Dec 29, 2017

Any resolution to this? @alexander-fenster were you able to resolve the issue but setting the forever agent to false on a fork of google-cloud/storage?

I have run into the same issue using bigquery. Is there some part of the google-cloud/bigquery api that also uses google-cloud/storage?

If not @stephenplusplus could you provide a reference for where in the bigquery api repository this update needs to occur?

@stephenplusplus
Copy link
Contributor

If we confirm that this PR was a mistake, we would disable it in the same place, and all APIs would be immediately corrected.

For any users that want to undo the change from that PR, you can use request overrides to turn Keep-Alive back on:

const BigQuery = require('@google-cloud/bigquery')
const bigquery = new BigQuery({...})

bigquery.interceptors.push({
  request: reqOpts => {
    reqOpts.forever = true
    return reqOpts
  }
})

@danoscarmike
Copy link

Ping. Next steps?

@emilyofficial
Copy link

Any updates? @stephenplusplus

@stephenplusplus
Copy link
Contributor

Sorry, I was hoping @alexander-fenster would advise.

@JustinBeckwith JustinBeckwith added type: bug Error or flaw in code with unintended results or allowing sub-optimal usage patterns. perf labels May 31, 2018
@JustinBeckwith JustinBeckwith added the 🚨 This issue needs some love. label Jun 16, 2018
@amineharbaoui
Copy link

Any news for this issue ??

@callmehiphop
Copy link
Contributor

@stephenplusplus @alexander-fenster is this still an issue? Haven't seen any chatter about it in a long while

@JustinBeckwith
Copy link
Contributor

Just to be safe, it would be great to run a quickie system test that does a bunch of inserts back-to-back, and monitors the number of open network connections. I have no idea how the move to teeny-request could have affected things here tbh

@shannb
Copy link

shannb commented Mar 18, 2019 via email

@sduskis sduskis added type: feature request ‘Nice-to-have’ improvement, new feature or different behavior or design. and removed 🚨 This issue needs some love. type: bug Error or flaw in code with unintended results or allowing sub-optimal usage patterns. labels May 20, 2019
@JustinBeckwith JustinBeckwith added the 🚨 This issue needs some love. label May 20, 2019
@sduskis
Copy link

sduskis commented May 28, 2019

@callmehiphop, can you please add a system test?

@sduskis sduskis removed 🚨 This issue needs some love. priority: p2 Moderately-important priority. Fix may not be included in next release. labels May 28, 2019
@shannb
Copy link

shannb commented May 28, 2019 via email

@romaincointepas
Copy link

romaincointepas commented Jun 21, 2019

Bumping, as connections do not seem to be reused anymore (anywhere, not only in Cloud functions).

I'm wondering if the switch from request to teeny-request could be the culprit as teeny-request does not support the forever and pool options (request did), and the default behavior of node HTTP(S) agents is to not send the Connection: keep-alive header (it sends Connection: close).

node-fetch (used by teeny-request) accepts an agent to be passed as option, but teeny-request does not have this option or a way to forward it to node-fetch.

+ @JustinBeckwith
+ googleapis/nodejs-common#394


Some digging:

@google-cloud/bigquery relies on @google-cloud/common for requests, with the following defaults:

const requestDefaults = {
  timeout: 60000,
  gzip: true,
  forever: true,
  pool: {
    maxSockets: Infinity,
  },
};

https://github.com/googleapis/nodejs-common/blob/master/src/util.ts#L35-L42

@google-cloud/common then uses teeny-request for requests that itself uses node-fetch.

Both the forever and pool options seem to be ignored (not converted then passed as options to node-fetch):

function requestToFetchOptions(reqOpts: r.Options) {
  const options: f.RequestInit = {
    method: reqOpts.method || 'GET',
    ...(reqOpts.timeout && {timeout: reqOpts.timeout}),
    ...(reqOpts.gzip && {compress: reqOpts.gzip}),
  };

  if (typeof reqOpts.json === 'object') {
    // Add Content-type: application/json header
    reqOpts.headers = reqOpts.headers || {};
    reqOpts.headers['Content-Type'] = 'application/json';

    // Set body to JSON representation of value
    options.body = JSON.stringify(reqOpts.json);
  } else {
    if (typeof reqOpts.body !== 'string') {
      options.body = JSON.stringify(reqOpts.body);
    } else {
      options.body = reqOpts.body;
    }
  }

  options.headers = reqOpts.headers as Headers;

  let uri = ((reqOpts as r.OptionsWithUri).uri ||
    (reqOpts as r.OptionsWithUrl).url) as string;
  if (reqOpts.useQuerystring === true || typeof reqOpts.qs === 'object') {
    const qs = require('querystring');
    const params = qs.stringify(reqOpts.qs);
    uri = uri + '?' + params;
  }

  if (reqOpts.proxy || process.env.HTTP_PROXY || process.env.HTTPS_PROXY) {
    const proxy = (process.env.HTTP_PROXY || process.env.HTTPS_PROXY)!;
    options.agent = new HttpsProxyAgent(proxy);
  }

  return {uri, options};
}

https://github.com/googleapis/teeny-request/blob/master/src/index.ts#L22-L60

node HTTP(S) agents default to not reusing sockets:

  • keepAlive defaults to false
  • maxSockets defaults to Infinity
  • The Connection: keep-alive header is always sent when using an agent except when the Connection header is explicitly specified or when the keepAlive and maxSockets options are respectively set to false and Infinity, in which case Connection: close will be used. (https://nodejs.org/api/http.html#http_new_agent_options)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: bigquery Issues related to the googleapis/nodejs-bigquery API. perf type: feature request ‘Nice-to-have’ improvement, new feature or different behavior or design.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

10 participants