Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

can not read from BQ table #35

Closed
JD-V opened this issue Jul 2, 2019 · 9 comments
Closed

can not read from BQ table #35

JD-V opened this issue Jul 2, 2019 · 9 comments
Assignees

Comments

@JD-V
Copy link

JD-V commented Jul 2, 2019

I am trying to read data from BigQuery in my dataproc spark job
using below code

val df_bpp: DataFrame = spark.read.bigquery("mintreporting.TEST_DS.spr_sample") df_bpp.printSchema(); df_bpp.show(10);

The schema for my table is getting printed properly but the code fails while executing the last line (show()). Below is the stack trace.

Any kind of help is highly appreciated.

9/07/02 08:02:47 ERROR io.grpc.internal.ManagedChannelImpl: [Channel<1>: (bigquerystorage.googleapis.com:443)] Uncaught exception in the SynchronizationContext. Panic!
java.lang.NoSuchMethodError: io.grpc.internal.ClientTransportFactory$ClientTransportOptions.getProxyParameters()Lio/grpc/internal/ProxyParameters;
at io.grpc.netty.shaded.io.grpc.netty.NettyChannelBuilder$NettyTransportFactory.newClientTransport(NettyChannelBuilder.java:542)
at io.grpc.internal.CallCredentialsApplyingTransportFactory.newClientTransport(CallCredentialsApplyingTransportFactory.java:48)
at io.grpc.internal.InternalSubchannel.startNewTransport(InternalSubchannel.java:263)
at io.grpc.internal.InternalSubchannel.obtainActiveTransport(InternalSubchannel.java:216)
at io.grpc.internal.ManagedChannelImpl$SubchannelImpl.requestConnection(ManagedChannelImpl.java:1452)
at io.grpc.internal.PickFirstLoadBalancer.handleResolvedAddressGroups(PickFirstLoadBalancer.java:59)
at io.grpc.internal.AutoConfiguredLoadBalancerFactory$AutoConfiguredLoadBalancer.handleResolvedAddressGroups(AutoConfiguredLoadBalancerFactory.java:148)
at io.grpc.internal.ManagedChannelImpl$NameResolverListenerImpl$1NamesResolved.run(ManagedChannelImpl.java:1326)
at io.grpc.SynchronizationContext.drain(SynchronizationContext.java:101)
at io.grpc.SynchronizationContext.execute(SynchronizationContext.java:130)
at io.grpc.internal.ManagedChannelImpl$NameResolverListenerImpl.onAddresses(ManagedChannelImpl.java:1331)
at io.grpc.internal.DnsNameResolver$Resolve.resolveInternal(DnsNameResolver.java:318)
at io.grpc.internal.DnsNameResolver$Resolve.run(DnsNameResolver.java:220)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Exception in thread "main" com.google.api.gax.rpc.InternalException: io.grpc.StatusRuntimeException: INTERNAL: Panic! This is a bug!
at com.google.api.gax.rpc.ApiExceptionFactory.createException(ApiExceptionFactory.java:67)
at com.google.api.gax.grpc.GrpcApiExceptionFactory.create(GrpcApiExceptionFactory.java:72)
at com.google.api.gax.grpc.GrpcApiExceptionFactory.create(GrpcApiExceptionFactory.java:60)
at com.google.api.gax.grpc.GrpcExceptionCallable$ExceptionTransformingFuture.onFailure(GrpcExceptionCallable.java:97)
at com.google.api.core.ApiFutures$1.onFailure(ApiFutures.java:68)
at repackaged.com.google.common.util.concurrent.Futures$CallbackListener.run(Futures.java:1070)
at repackaged.com.google.common.util.concurrent.DirectExecutor.execute(DirectExecutor.java:30)
at repackaged.com.google.common.util.concurrent.AbstractFuture.executeListener(AbstractFuture.java:1139)
at repackaged.com.google.common.util.concurrent.AbstractFuture.complete(AbstractFuture.java:958)
at repackaged.com.google.common.util.concurrent.AbstractFuture.setException(AbstractFuture.java:748)
at io.grpc.stub.ClientCalls$GrpcFuture.setException(ClientCalls.java:507)
at io.grpc.stub.ClientCalls$UnaryStreamToFuture.onClose(ClientCalls.java:482)
at io.grpc.PartialForwardingClientCallListener.onClose(PartialForwardingClientCallListener.java:39)
at io.grpc.ForwardingClientCallListener.onClose(ForwardingClientCallListener.java:23)
at io.grpc.ForwardingClientCallListener$SimpleForwardingClientCallListener.onClose(ForwardingClientCallListener.java:40)
at io.grpc.internal.CensusStatsModule$StatsClientInterceptor$1$1.onClose(CensusStatsModule.java:699)
at io.grpc.PartialForwardingClientCallListener.onClose(PartialForwardingClientCallListener.java:39)
at io.grpc.ForwardingClientCallListener.onClose(ForwardingClientCallListener.java:23)
at io.grpc.ForwardingClientCallListener$SimpleForwardingClientCallListener.onClose(ForwardingClientCallListener.java:40)
at io.grpc.internal.CensusTracingModule$TracingClientInterceptor$1$1.onClose(CensusTracingModule.java:397)
at io.grpc.internal.ClientCallImpl.closeObserver(ClientCallImpl.java:459)
at io.grpc.internal.ClientCallImpl.access$300(ClientCallImpl.java:63)
at io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl.close(ClientCallImpl.java:546)
at io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl.access$600(ClientCallImpl.java:467)
at io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl$1StreamClosed.runInContext(ClientCallImpl.java:584)
at io.grpc.internal.ContextRunnable.run(ContextRunnable.java:37)
at io.grpc.internal.SerializingExecutor.run(SerializingExecutor.java:123)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Suppressed: com.google.api.gax.rpc.AsyncTaskException: Asynchronous task failed
at com.google.api.gax.rpc.ApiExceptions.callAndTranslateApiException(ApiExceptions.java:57)
at com.google.api.gax.rpc.UnaryCallable.call(UnaryCallable.java:112)
at com.google.cloud.bigquery.storage.v1beta1.BigQueryStorageClient.createReadSession(BigQueryStorageClient.java:237)
at com.google.cloud.spark.bigquery.direct.DirectBigQueryRelation.buildScan(DirectBigQueryRelation.scala:84)
at org.apache.spark.sql.execution.datasources.DataSourceStrategy$$anonfun$10.apply(DataSourceStrategy.scala:293)
at org.apache.spark.sql.execution.datasources.DataSourceStrategy$$anonfun$10.apply(DataSourceStrategy.scala:293)
at org.apache.spark.sql.execution.datasources.DataSourceStrategy$$anonfun$pruneFilterProject$1.apply(DataSourceStrategy.scala:326)
at org.apache.spark.sql.execution.datasources.DataSourceStrategy$$anonfun$pruneFilterProject$1.apply(DataSourceStrategy.scala:325)
at org.apache.spark.sql.execution.datasources.DataSourceStrategy.pruneFilterProjectRaw(DataSourceStrategy.scala:403)
at org.apache.spark.sql.execution.datasources.DataSourceStrategy.pruneFilterProject(DataSourceStrategy.scala:321)
at org.apache.spark.sql.execution.datasources.DataSourceStrategy.apply(DataSourceStrategy.scala:289)
at org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:63)
at org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:63)
at scala.collection.Iterator$$anon$12.nextCur(Iterator.scala:435)
at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:441)
at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:440)
at org.apache.spark.sql.catalyst.planning.QueryPlanner.plan(QueryPlanner.scala:93)
at org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$2$$anonfun$apply$2.apply(QueryPlanner.scala:78)
at org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$2$$anonfun$apply$2.apply(QueryPlanner.scala:75)
at scala.collection.TraversableOnce$$anonfun$foldLeft$1.apply(TraversableOnce.scala:157)
at scala.collection.TraversableOnce$$anonfun$foldLeft$1.apply(TraversableOnce.scala:157)
at scala.collection.Iterator$class.foreach(Iterator.scala:891)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1334)
at scala.collection.TraversableOnce$class.foldLeft(TraversableOnce.scala:157)
at scala.collection.AbstractIterator.foldLeft(Iterator.scala:1334)
at org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$2.apply(QueryPlanner.scala:75)
at org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$2.apply(QueryPlanner.scala:67)
at scala.collection.Iterator$$anon$12.nextCur(Iterator.scala:435)
at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:441)
at org.apache.spark.sql.catalyst.planning.QueryPlanner.plan(QueryPlanner.scala:93)
at org.apache.spark.sql.execution.QueryExecution.sparkPlan$lzycompute(QueryExecution.scala:72)
at org.apache.spark.sql.execution.QueryExecution.sparkPlan(QueryExecution.scala:68)
at org.apache.spark.sql.execution.QueryExecution.executedPlan$lzycompute(QueryExecution.scala:77)
at org.apache.spark.sql.execution.QueryExecution.executedPlan(QueryExecution.scala:77)
at org.apache.spark.sql.Dataset.withAction(Dataset.scala:3359)
at org.apache.spark.sql.Dataset.head(Dataset.scala:2544)
at org.apache.spark.sql.Dataset.take(Dataset.scala:2758)
at org.apache.spark.sql.Dataset.getRows(Dataset.scala:254)
at org.apache.spark.sql.Dataset.showString(Dataset.scala:291)
at org.apache.spark.sql.Dataset.show(Dataset.scala:745)
at org.apache.spark.sql.Dataset.show(Dataset.scala:704)
at transformations.SPRBarcPPJob$.flattenSPRBARC(SPRBarcPPJob.scala:42)
at LoadData$.main(LoadData.scala:34)
at LoadData.main(LoadData.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:849)
at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:167)
at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:195)
at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:86)
at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:924)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:933)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: io.grpc.StatusRuntimeException: INTERNAL: Panic! This is a bug!
at io.grpc.Status.asRuntimeException(Status.java:532)
... 23 more
Caused by: java.lang.NoSuchMethodError: io.grpc.internal.ClientTransportFactory$ClientTransportOptions.getProxyParameters()Lio/grpc/internal/ProxyParameters;
at io.grpc.netty.shaded.io.grpc.netty.NettyChannelBuilder$NettyTransportFactory.newClientTransport(NettyChannelBuilder.java:542)
at io.grpc.internal.CallCredentialsApplyingTransportFactory.newClientTransport(CallCredentialsApplyingTransportFactory.java:48)
at io.grpc.internal.InternalSubchannel.startNewTransport(InternalSubchannel.java:263)
at io.grpc.internal.InternalSubchannel.obtainActiveTransport(InternalSubchannel.java:216)
at io.grpc.internal.ManagedChannelImpl$SubchannelImpl.requestConnection(ManagedChannelImpl.java:1452)
at io.grpc.internal.PickFirstLoadBalancer.handleResolvedAddressGroups(PickFirstLoadBalancer.java:59)
at io.grpc.internal.AutoConfiguredLoadBalancerFactory$AutoConfiguredLoadBalancer.handleResolvedAddressGroups(AutoConfiguredLoadBalancerFactory.java:148)
at io.grpc.internal.ManagedChannelImpl$NameResolverListenerImpl$1NamesResolved.run(ManagedChannelImpl.java:1326)
at io.grpc.SynchronizationContext.drain(SynchronizationContext.java:101)
at io.grpc.SynchronizationContext.execute(SynchronizationContext.java:130)
at io.grpc.internal.ManagedChannelImpl$NameResolverListenerImpl.onAddresses(ManagedChannelImpl.java:1331)
at io.grpc.internal.DnsNameResolver$Resolve.resolveInternal(DnsNameResolver.java:318)
at io.grpc.internal.DnsNameResolver$Resolve.run(DnsNameResolver.java:220)
... 3 more
19/07/02 08:02:47 INFO org.spark_project.jetty.server.AbstractConnector: Stopped Spark@6fca5907{HTTP/1.1,[http/1.1]}{0.0.0.0:4040}
ERROR: (gcloud.dataproc.jobs.submit.spark) Job [920e05ddb57643fbbc6bf980e7b351bb] failed with error:
Google Cloud Dataproc Agent reports job failure. If logs are available, they can be found in 'gs://uat-mint-dataproc/google-cloud-dataproc-metainfo/9d999556-1033-457c-a954-05c6c4b3dcaf/jobs/920e05ddb57643fbbc6bf980e7b351bb/driveroutput'.

@pmkc
Copy link
Contributor

pmkc commented Jul 2, 2019

com.google.api.gax.* should be shaded/relocated in our connector, but isn't shaded in your stack trace. Are you using gs://spark-lib/bigquery/spark-bigquery-latest.jar or building it yourself?

If you build it yourself you should use sbt assembly.
If you are compiling against it I would mark it provided and use our shaded jar.

@pmkc
Copy link
Contributor

pmkc commented Jul 3, 2019

This is the same as #36. I think I will update the compilation instructions to compile against the seaded profile.

@pmkc pmkc self-assigned this Jul 3, 2019
@JD-V
Copy link
Author

JD-V commented Jul 12, 2019

@pmkc I am building it myself using sbt assembly, even though I am seeing this error.

@pmkc
Copy link
Contributor

pmkc commented Jul 12, 2019

Can you give me your approximate build and run commands.
If you are compiling against the connector could you show me your build.sbt?

@JD-V
Copy link
Author

JD-V commented Jul 15, 2019

my build command is sbt assembly

and this is how my build.sbt looks like,

name := "sparklib"

version := "0.1"

scalaVersion := "2.11.12"


libraryDependencies += "org.apache.spark" %% "spark-core" % "2.4.0" % "provided"
libraryDependencies += "org.apache.spark" %% "spark-sql" % "2.4.0" % "provided"
libraryDependencies += "org.apache.spark" %% "spark-hive" % "2.4.0" % "provided"
libraryDependencies += "com.typesafe" % "config" % "1.3.4" % "provided"

libraryDependencies += "log4j" % "log4j" % "1.2.15" excludeAll( ExclusionRule(organization = "com.sun.jdmk"), ExclusionRule(organization = "com.sun.jmx"), ExclusionRule(organization = "javax.jms") )

libraryDependencies += "com.google.cloud.bigdataoss" % "gcs-connector" % "hadoop2-1.9.17"  % "provided"
libraryDependencies += "com.google.cloud.bigdataoss" % "bigquery-connector" % "hadoop2-0.13.5" % "provided"
libraryDependencies += "com.google.cloud.spark" %% "spark-bigquery" % "0.7.0-beta"


resolvers += Opts.resolver.sonatypeReleases

assemblyShadeRules in assembly := Seq(
  ShadeRule.rename("com.google.common.**" -> "repackaged.com.google.common.@1").inAll,
  ShadeRule.rename("com.google.protobuf.**" -> "repackaged.com.google.protobuf.@1").inAll
)

mainClass in assembly := some("LoadData")
assemblyJarName := "assembly_sparklib_2.11-0.1.jar"

@achelimed
Copy link

achelimed commented Jul 15, 2019

Could you try this?

 ShadeRule.rename("com.google.guava.**"        -> "repackaged.com.google.guava.@1").inAll,
  ShadeRule.rename("com.google.common.guava.**" -> "repackaged.com.google.common.guava.@1").inAll,
  ShadeRule.rename("com.google.protobuf.**"     -> "repackaged.com.google.protobuf.@1").inAll

@pmkc
Copy link
Contributor

pmkc commented Jul 15, 2019

I'm not actually sure why that is giving you a no such method for grpc, you could try sbt dependencyTree to find a conflict, but I would just side step that.

I would just mark "spark-bigquery" provided and use gs://spark-lib/bigquery/spark-bigquery-latest.jar OR compile against the shaded qualifier.

@JD-V
Copy link
Author

JD-V commented Jul 16, 2019

@achelimed I am seeing same error even after applying shadeRules you mentioned above.

@JD-V
Copy link
Author

JD-V commented Aug 12, 2019

This issue seems to be resolved in 0.8.0 release. Thank you @achelimed and @pmkc for your help.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants