Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dataflow job throws exception when reading from Bigtable #1921

Closed
pvijaypatrf opened this issue Aug 30, 2018 · 5 comments
Closed

Dataflow job throws exception when reading from Bigtable #1921

pvijaypatrf opened this issue Aug 30, 2018 · 5 comments
Assignees
Labels
api: bigtable Issues related to the googleapis/java-bigtable-hbase API.

Comments

@pvijaypatrf
Copy link

pvijaypatrf commented Aug 30, 2018

Hi,

I have a Dataflow job that reads from Bigtable ( table size approx 70Gb, 18 Mil Rows) and writes back to Bigtable. The job fails always after processing a few million records. I have checked that mutations per write are less than 100,000 limit, reading in only the latest version of the required cells. I am using a high mem vm with 8 cpus and 52GB memory.

I get multiple exceptions as below:

java.io.IOException: Failed to advance reader of source: Split start: 'US-20110184', end: 'US-2011021', size: 805306368.
	at com.google.cloud.dataflow.worker.WorkerCustomSources$BoundedReaderIterator.advance(WorkerCustomSources.java:605)
	at com.google.cloud.dataflow.worker.util.common.worker.ReadOperation$SynchronizedReaderIterator.advance(ReadOperation.java:398)
	at com.google.cloud.dataflow.worker.util.common.worker.ReadOperation.runReadLoop(ReadOperation.java:193)
	at com.google.cloud.dataflow.worker.util.common.worker.ReadOperation.start(ReadOperation.java:158)
	at com.google.cloud.dataflow.worker.util.common.worker.MapTaskExecutor.execute(MapTaskExecutor.java:75)
	at com.google.cloud.dataflow.worker.BatchDataflowWorker.executeWork(BatchDataflowWorker.java:391)
	at com.google.cloud.dataflow.worker.BatchDataflowWorker.doWork(BatchDataflowWorker.java:360)
	at com.google.cloud.dataflow.worker.BatchDataflowWorker.getAndPerformWork(BatchDataflowWorker.java:288)
	at com.google.cloud.dataflow.worker.DataflowBatchWorkerHarness$WorkerThread.doWork(DataflowBatchWorkerHarness.java:134)
	at com.google.cloud.dataflow.worker.DataflowBatchWorkerHarness$WorkerThread.call(DataflowBatchWorkerHarness.java:114)
	at com.google.cloud.dataflow.worker.DataflowBatchWorkerHarness$WorkerThread.call(DataflowBatchWorkerHarness.java:101)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
	at java.lang.Thread.run(Thread.java:745)
Caused by: com.google.bigtable.repackaged.com.google.cloud.bigtable.grpc.io.IOExceptionWithStatus: Error in response stream
	at com.google.bigtable.repackaged.com.google.cloud.bigtable.grpc.scanner.ResultQueueEntry$ExceptionResultQueueEntry.getResponseOrThrow(ResultQueueEntry.java:100)
	at com.google.bigtable.repackaged.com.google.cloud.bigtable.grpc.scanner.ResponseQueueReader.getNextMergedRow(ResponseQueueReader.java:105)
	at com.google.bigtable.repackaged.com.google.cloud.bigtable.grpc.scanner.ResponseQueueReader.getNextMergedRow(ResponseQueueReader.java:111)
	at com.google.bigtable.repackaged.com.google.cloud.bigtable.grpc.scanner.ResumingStreamingResultScanner.next(ResumingStreamingResultScanner.java:78)
	at com.google.bigtable.repackaged.com.google.cloud.bigtable.grpc.scanner.ResumingStreamingResultScanner.next(ResumingStreamingResultScanner.java:35)
	at com.google.cloud.bigtable.beam.CloudBigtableIO$Reader.advance(CloudBigtableIO.java:631)
	at com.google.cloud.dataflow.worker.WorkerCustomSources$BoundedReaderIterator.advance(WorkerCustomSources.java:602)
	... 14 more
Caused by: com.google.bigtable.repackaged.io.grpc.StatusRuntimeException: CANCELLED: Canceled due to idle connection
	at com.google.bigtable.repackaged.io.grpc.Status.asRuntimeException(Status.java:517)
	at com.google.bigtable.repackaged.com.google.cloud.bigtable.grpc.async.AbstractRetryingOperation.onError(AbstractRetryingOperation.java:212)
	at com.google.bigtable.repackaged.com.google.cloud.bigtable.grpc.async.AbstractRetryingOperation.onClose(AbstractRetryingOperation.java:193)
	at com.google.bigtable.repackaged.com.google.cloud.bigtable.grpc.scanner.RetryingReadRowsOperation.onClose(RetryingReadRowsOperation.java:229)
	at com.google.bigtable.repackaged.com.google.cloud.bigtable.grpc.io.ChannelPool$InstrumentedChannel$2.onClose(ChannelPool.java:210)
	at com.google.bigtable.repackaged.io.grpc.PartialForwardingClientCallListener.onClose(PartialForwardingClientCallListener.java:39)
	at com.google.bigtable.repackaged.io.grpc.ForwardingClientCallListener.onClose(ForwardingClientCallListener.java:23)
	at com.google.bigtable.repackaged.io.grpc.ForwardingClientCallListener$SimpleForwardingClientCallListener.onClose(ForwardingClientCallListener.java:40)
	at com.google.bigtable.repackaged.com.google.cloud.bigtable.grpc.io.Watchdog$WatchedCall$1.onClose(Watchdog.java:172)
	at com.google.bigtable.repackaged.io.grpc.PartialForwardingClientCallListener.onClose(PartialForwardingClientCallListener.java:39)
	at com.google.bigtable.repackaged.io.grpc.ForwardingClientCallListener.onClose(ForwardingClientCallListener.java:23)
	at com.google.bigtable.repackaged.io.grpc.ForwardingClientCallListener$SimpleForwardingClientCallListener.onClose(ForwardingClientCallListener.java:40)
	at com.google.bigtable.repackaged.com.google.cloud.bigtable.grpc.io.RefreshingOAuth2CredentialsInterceptor$UnAuthResponseListener.onClose(RefreshingOAuth2CredentialsInterceptor.java:85)
	at com.google.bigtable.repackaged.io.grpc.PartialForwardingClientCallListener.onClose(PartialForwardingClientCallListener.java:39)
	at com.google.bigtable.repackaged.io.grpc.ForwardingClientCallListener.onClose(ForwardingClientCallListener.java:23)
	at com.google.bigtable.repackaged.io.grpc.ForwardingClientCallListener$SimpleForwardingClientCallListener.onClose(ForwardingClientCallListener.java:40)
	at com.google.bigtable.repackaged.io.grpc.internal.CensusStatsModule$StatsClientInterceptor$1$1.onClose(CensusStatsModule.java:684)
	at com.google.bigtable.repackaged.io.grpc.PartialForwardingClientCallListener.onClose(PartialForwardingClientCallListener.java:39)
	at com.google.bigtable.repackaged.io.grpc.ForwardingClientCallListener.onClose(ForwardingClientCallListener.java:23)
	at com.google.bigtable.repackaged.io.grpc.ForwardingClientCallListener$SimpleForwardingClientCallListener.onClose(ForwardingClientCallListener.java:40)
	at com.google.bigtable.repackaged.io.grpc.internal.CensusTracingModule$TracingClientInterceptor$1$1.onClose(CensusTracingModule.java:403)
	at com.google.bigtable.repackaged.io.grpc.internal.ClientCallImpl.closeObserver(ClientCallImpl.java:459)
	at com.google.bigtable.repackaged.io.grpc.internal.ClientCallImpl.access$300(ClientCallImpl.java:63)
	at com.google.bigtable.repackaged.io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl.close(ClientCallImpl.java:546)
	at com.google.bigtable.repackaged.io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl.access$600(ClientCallImpl.java:467)
	at com.google.bigtable.repackaged.io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl$1StreamClosed.runInContext(ClientCallImpl.java:584)
	at com.google.bigtable.repackaged.io.grpc.internal.ContextRunnable.run(ContextRunnable.java:37)
	at com.google.bigtable.repackaged.io.grpc.internal.SerializingExecutor.run(SerializingExecutor.java:123)
	... 3 more
Caused by: com.google.bigtable.repackaged.com.google.cloud.bigtable.grpc.io.Watchdog$StreamWaitTimeoutException
	at com.google.bigtable.repackaged.com.google.cloud.bigtable.grpc.io.Watchdog$WatchedCall.cancelIfStale(Watchdog.java:213)
	at com.google.bigtable.repackaged.com.google.cloud.bigtable.grpc.io.Watchdog$WatchedCall.access$000(Watchdog.java:126)
	at com.google.bigtable.repackaged.com.google.cloud.bigtable.grpc.io.Watchdog.run(Watchdog.java:101)
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
	at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
	... 3 more

The final failure message I get is:

Workflow failed. Causes: 
S01:Read(Source)+ParDo(...)+CloudBigtableIO.CloudBigtableWriteTransform/ParDo(CloudBigtableSingleTableBufferedWrite) failed., A work item was attempted 4 times without success. Each time the worker eventually lost contact with the service. The work item was attempted on: 
  claimandspecpipeline-pvij-08292115-8plh-harness-4g6j,
  claimandspecpipeline-pvij-08292115-8plh-harness-4g6j,
  claimandspecpipeline-pvij-08292115-8plh-harness-4g6j,
  claimandspecpipeline-pvij-08292115-8plh-harness-4g6j

Appreciate your help.

Thanks,
Padmini

@kevinsi4508
Copy link
Contributor

Could you please create a Cloud support ticket? Thanks!

@mbrukman mbrukman changed the title Dataflow job throws exception when reading from BigTable Dataflow job throws exception when reading from Bigtable Aug 30, 2018
@sduskis
Copy link
Contributor

sduskis commented Aug 30, 2018

This exception comes from the Watchdog, which checks for idle connections. This happens when the client does not get a response after a minute or so. This can happen for a few reasons:

  1. Your Dataflow job is running in a different region than your Cloud Bigtable cluster. Please make sure that your job is configured with the region or zone of your Cloud Bigtable cluster.

  2. You set a "sparse Filter" on your read, and the server is processing the request, but not actually returning results.

  3. You have large rows, and response times on those rows is larger than 1 minute.

You should open a Support Ticket, so that the support team can investigate issue 3. You can also do a conf.withConfiguration("google.bigtable.grpc.read.partial.row.timeout.ms", String.valueOf(10*60*1000)); to increase the timeout to 10 minutes.

@igorbernstein2, FYI.

@pvijaypatrf
Copy link
Author

BigTable and Dataflow jobs are both on us-central1-f and I don't have sparse Filter set. I have increased the gprc timeout to 10 min as suggested and will update if the job succeeds or fails. I have also opened a support ticket on the same - https://issuetracker.google.com/113559508

@sduskis
Copy link
Contributor

sduskis commented Sep 4, 2018

As per the issue, you're running a large Dataflow job on a development cluster. There's not much more that the client can do, so I'm closing this bug. Please feel free to open if you have new information.

@sduskis
Copy link
Contributor

sduskis commented Mar 15, 2019

I potentially just fixed this with #2124. See #2123 which shows similar symptoms.

@google-cloud-label-sync google-cloud-label-sync bot added the api: bigtable Issues related to the googleapis/java-bigtable-hbase API. label Jan 31, 2020
gcf-owl-bot bot added a commit that referenced this issue Jan 30, 2024
…nfig artifact (#1921)

chore: update renovate bot configs to update the sdk-platform-java-config artifact
Source-Link: googleapis/synthtool@d7828c0
Post-Processor: gcr.io/cloud-devrel-public-resources/owlbot-java:latest@sha256:0d1bb26a1a99ae0456176bf891b8490e9aab424a5cb4e4d301d9703c4dc43b58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: bigtable Issues related to the googleapis/java-bigtable-hbase API.
Projects
None yet
Development

No branches or pull requests

3 participants