Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bigquery:createWriteStream() should propogate insertErrors #2635

Closed
chrishiestand opened this issue Sep 29, 2017 · 8 comments
Closed

bigquery:createWriteStream() should propogate insertErrors #2635

chrishiestand opened this issue Sep 29, 2017 · 8 comments
Assignees
Labels
api: bigquery Issues related to the BigQuery API. type: bug Error or flaw in code with unintended results or allowing sub-optimal usage patterns.

Comments

@chrishiestand
Copy link
Contributor

The gist is bigquery sometimes silently drops one or more streams when many streams are used in parallel (42 in this example). If I'm hitting a quota an error ought to be thrown, but no errors are thrown the program runs as expected but in the end there are rows missing from the bigquery data.

If there is a bug, it might be in gcloud-node, or it might be in the bigquery api. Both seem less likely than me having made a mistake, so I hope you can find something I've done wrong.

The tricky part is the bug doesn't always reproduce. When the bug does reproduce n streams of size s are dropped so bigquery will have n * s missing rows. So if n=2 and s=150 bigquery will be missing 300 rows. In other words, the problem does not appear to be that a subset of stream data is missing, but rather one or more entire streams are missing.

This seems to reproduce reliably sometimes, and other times it reliably does not reproduce. To try and get the opposite result, try again later and/or change the stream load with the env variables.

This small bug reproduction project https://github.com/chrishiestand/gcloud-node-bigquery-manystreams-bugis the result of troubleshooting missing big query data in a production system where 50 streams are processed in parallel.

Below is a screenshot of the reproduction repository showing a reproduction of the issue.

image

In contrast, if I reduce the number of streams from 42 to 10, the tests pass as below:
image

Environment details

  • OS: OS X 10.12.6
  • Node.js version: 8.6.0
  • npm version: 5.3.0
  • google-cloud-node version: @google-cloud/bigquery = 0.9.6

Steps to reproduce

Please go here for a reproduction project: https://github.com/chrishiestand/gcloud-node-bigquery-manystreams-bug

@stephenplusplus
Copy link
Contributor

Well, I tried something simple, and it worked! Can you try replacing the complete event name with finish? As far as why that works, I'm not exactly sure. In theory, they should both be the same, although finish might be when the internal stream is actually done, as opposed to complete, which is when request gets the full response. The distinction might be important with this many of simultaneous requests.

@chrishiestand
Copy link
Contributor Author

@stephenplusplus thanks for the quick reply, it's so nice that you regularly get back so quickly.

So I tried your suggestion (side note, I often wondered why gcloud-node used the complete event instead of the standard finish):

    writeStream.on('finish', resolve)

And it didn't work for me. So my guess is that this is one of those "hard to reproduce" features of this issue. I also checked the BQ UI and double-checked that the rows didn't stream in later.

image

Give it another try with 'finish' and/or try tweaking totalStreams and streamLength and I expect eventually you'll see it fail as well.

@chrishiestand
Copy link
Contributor Author

I've now discovered that the issue doesn't simply require many parallel streams, it can also be reproduced with many serial streams. I've pushed some new commits to the test repo - tagged v0.1.0 has these changes.

If I stream 100 streams, one at a time (totalStreams = 100, streamConcurrency = 1), usually no streams are dropped but perhaps 20% of the trials have shown one stream, 1/99 (or streamLength rows), was dropped pictured:
image

If I stream 100 streams, two at a time (totalStreams = 100, streamConcurrency = 2), several more are dropped as pictured:

image

It seems unlikely that I'd be the first to hit an issue like this if it were the same for everyone. I can't be the only one setting up parallel or serial streams to the same table. Perhaps there is some issue with the how I am wrapping the stream in a promise? Or something else specific to my environment? But as far as I can tell, the code ought to work as written.

@stephenplusplus
Copy link
Contributor

Looked into this, and came up with this: sometimes, the data just fails to insert.

This isn't necessary, but using the job's events, we can skip the optimistic delay later in the test, and instead know for certain that the rows were inserted.

This also gives us what turned out to be the key component of this issue, the logging of the job object's state. Most of the time, everything works as expected. However, an occasional test run returns:

  Rejected promise returned by test. Reason:

  ErrorClass {
    code: 503,
    errors: [
      {
        domain: 'global',
        message: 'Error encountered during execution. Retrying may solve the problem.',
        reason: 'backendError',
      },
    ],
    message: 'Error encountered during execution. Retrying may solve the problem.',
    response: undefined,
  }

  Object.parseHttpRespBody (node_modules/@google-cloud/common/src/util.js:192:30)
  Object.handleResp (node_modules/@google-cloud/common/src/util.js:132:18)
  Request._callback (node_modules/@google-cloud/common/src/util.js:255:14)
  Request.self.callback (node_modules/request/request.js:186:22)
  Request.<anonymous> (node_modules/request/request.js:1163:10)
  IncomingMessage.<anonymous> (node_modules/request/request.js:1085:12)
  Test.__dirname [as fn] (test/many-streams.test.js:189:38)

So, a couple questions:

  • Should we retry these failed inserts for the user automatically?
  • Should we communicate this job event as an error, instead of "complete"? The reason we don't is (AFAIK), the "Job" metadata can be in many different formats, so it's impossible to reliably know where to find what could be an "error"-like response.

cc @callmehiphop @lukesneeringer

@chrishiestand
Copy link
Contributor Author

chrishiestand commented Oct 2, 2017

Thank you very much @stephenplusplus. I've tagged your latest version of the reproduction project as v0.1.1.

Now I am finally seeing rate limit errors with streamConcurrency = 2, totalStreams = 100, and streamLength = 150

...
DONE 150
DONE 150
TypeError: Cannot read property 'outputRows' of undefined
    at Job.job.on.job (/private/tmp/gcloud-node-bigquery-manystreams-bug/test/many-streams.test.js:158:37)
    at emitOne (events.js:115:13)
    at Job.emit (events.js:210:7)
    at /private/tmp/gcloud-node-bigquery-manystreams-bug/node_modules/@google-cloud/common/src/operation.js:188:10
    at /private/tmp/gcloud-node-bigquery-manystreams-bug/node_modules/@google-cloud/bigquery/src/job.js:331:5
    at /private/tmp/gcloud-node-bigquery-manystreams-bug/node_modules/@google-cloud/common/src/service-object.js:272:5
    at Object.handleResp (/private/tmp/gcloud-node-bigquery-manystreams-bug/node_modules/@google-cloud/common/src/util.js:135:3)
    at /private/tmp/gcloud-node-bigquery-manystreams-bug/node_modules/@google-cloud/common/src/util.js:465:12
    at Request.onResponse [as _callback] (/private/tmp/gcloud-node-bigquery-manystreams-bug/node_modules/retry-request/index.js:160:7)
    at Request.self.callback (/private/tmp/gcloud-node-bigquery-manystreams-bug/node_modules/request/request.js:186:22)
    at emitTwo (events.js:125:13)
    at Request.emit (events.js:213:7)
    at Request.<anonymous> (/private/tmp/gcloud-node-bigquery-manystreams-bug/node_modules/request/request.js:1163:10)
    at emitOne (events.js:115:13)
    at Request.emit (events.js:210:7)
    at Gunzip.<anonymous> (/private/tmp/gcloud-node-bigquery-manystreams-bug/node_modules/request/request.js:1085:12)
    at Object.onceWrapper (events.js:314:30)
    at emitNone (events.js:110:20)
    at Gunzip.emit (events.js:207:7)
    at endReadableNT (_stream_readable.js:1059:12)
    at _combinedTickCallback (internal/process/next_tick.js:138:11)
    at process._tickCallback (internal/process/next_tick.js:180:9)
{ kind: 'bigquery#job',
  etag: '"Z8nD8CIuj_TqmPrS_MNV-O9x2rU/RoVaRZnuqFcy-MsZbrpdHrL7cy4"',
  id: 'my-project:job_xwxZ4JQ7ESbaCf9Aw0xkFyfVcxl7',
  selfLink: 'https://www.googleapis.com/bigquery/v2/projects/my-project/jobs/job_xwxZ4JQ7ESbaCf9Aw0xkFyfVcxl7',
  jobReference:
   { projectId: 'my-project',
     jobId: 'job_xwxZ4JQ7ESbaCf9Aw0xkFyfVcxl7' },
  configuration:
   { load:
      { destinationTable: [Object],
        sourceFormat: 'NEWLINE_DELIMITED_JSON' } },
  status:
   { state: 'DONE',
     errorResult:
      { reason: 'rateLimitExceeded',
        location: 'table.write',
        message: 'Exceeded rate limits: too many table update operations for this table. For more information, see https://cloud.google.com/bigquery/troubleshooting-errors' },
     errors: [ [Object] ] },
  statistics:
   { creationTime: '1506969989800',
     startTime: '1506969989869',
     endTime: '1506969989869' },
  user_email: 'user@my-project.iam.gserviceaccount.com' }
/private/tmp/gcloud-node-bigquery-manystreams-bug/test/many-streams.test.js:40
  throw error;
  ^

TypeError: Cannot read property 'outputRows' of undefined
    at Job.job.on.job (/private/tmp/gcloud-node-bigquery-manystreams-bug/test/many-streams.test.js:163:39)
    at emitOne (events.js:115:13)
    at Job.emit (events.js:210:7)
    at /private/tmp/gcloud-node-bigquery-manystreams-bug/node_modules/@google-cloud/common/src/operation.js:188:10
    at /private/tmp/gcloud-node-bigquery-manystreams-bug/node_modules/@google-cloud/bigquery/src/job.js:331:5
    at /private/tmp/gcloud-node-bigquery-manystreams-bug/node_modules/@google-cloud/common/src/service-object.js:272:5
    at Object.handleResp (/private/tmp/gcloud-node-bigquery-manystreams-bug/node_modules/@google-cloud/common/src/util.js:135:3)
    at /private/tmp/gcloud-node-bigquery-manystreams-bug/node_modules/@google-cloud/common/src/util.js:465:12
    at Request.onResponse [as _callback] (/private/tmp/gcloud-node-bigquery-manystreams-bug/node_modules/retry-request/index.js:160:7)
    at Request.self.callback (/private/tmp/gcloud-node-bigquery-manystreams-bug/node_modules/request/request.js:186:22)
    at emitTwo (events.js:125:13)
    at Request.emit (events.js:213:7)
    at Request.<anonymous> (/private/tmp/gcloud-node-bigquery-manystreams-bug/node_modules/request/request.js:1163:10)
    at emitOne (events.js:115:13)
    at Request.emit (events.js:210:7)
    at Gunzip.<anonymous> (/private/tmp/gcloud-node-bigquery-manystreams-bug/node_modules/request/request.js:1085:12)
    at Object.onceWrapper (events.js:314:30)
    at emitNone (events.js:110:20)
    at Gunzip.emit (events.js:207:7)
    at endReadableNT (_stream_readable.js:1059:12)
    at _combinedTickCallback (internal/process/next_tick.js:138:11)
    at process._tickCallback (internal/process/next_tick.js:180:9)
   test/many-streams.test.js exited with a non-zero exit code: 7

  0 tests passed [11:46:30]
  1 uncaught exception

npm ERR! Test failed.  See above for more details.

@chrishiestand
Copy link
Contributor Author

It's unclear to me which rate limit I am hitting. It looks like I shouldn't be hitting any of the streaming insert limits:

image

My guess is I'm hitting the "Maximum rate of table update operations: 1 operation every 2 seconds" query. But it isn't obvious to me that this applies to streaming inserts too. Does this mean that I have to wait 2 seconds between the end of one streaming request and the start of another? Or two seconds after I start one, I can start another, even if the first hasn't finished?

image

@chrishiestand
Copy link
Contributor Author

Regarding solving this issue: I believe that errors like quotas/rate limits ought to throw an error or emit an error event. But, if for some reason behavior cannot be changed, at the very least they should be logged to the error console.

The lack of any errors being thrown in the typical and documented fashion, along with a lack of documentation of how to see such errors at https://googlecloudplatform.github.io/google-cloud-node/#/docs/google-cloud/0.56.0/bigquery/table?method=createWriteStream has burned a lot of my time. Let's spare the next people from this problem :-)

@stephenplusplus stephenplusplus added api: bigquery Issues related to the BigQuery API. type: bug Error or flaw in code with unintended results or allowing sub-optimal usage patterns. labels Nov 3, 2017
@stephenplusplus stephenplusplus self-assigned this Nov 3, 2017
@stephenplusplus stephenplusplus changed the title bigquery createWriteStream() sometimes silently drops streams bigquery:createWriteStream() should propogate insertErrors Nov 3, 2017
@stephenplusplus
Copy link
Contributor

This issue was moved to googleapis/nodejs-bigquery#8

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: bigquery Issues related to the BigQuery API. type: bug Error or flaw in code with unintended results or allowing sub-optimal usage patterns.
Projects
None yet
Development

No branches or pull requests

2 participants