Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ExternalAccountCredentials serialization is broken #1347

Closed
kohsuke opened this issue Dec 28, 2023 · 1 comment · Fixed by #1358
Closed

ExternalAccountCredentials serialization is broken #1347

kohsuke opened this issue Dec 28, 2023 · 1 comment · Fixed by #1358
Assignees

Comments

@kohsuke
Copy link

kohsuke commented Dec 28, 2023

ExternalAccountCredentials has protected transient HttpTransportFactory transportFactory, which becomes null if this object gets serialized & restored. The design intent of this appears to be described in #67, but the implementation in ExternalAccountCredentials lacks the crucial part, quoted below:

When serializing an option object we only transmit the class name for the transport factory and try to instantiate the factory from its classname upon deserialization.

The same problem has been seen and fixed in #132. I believe we need to bring the same fix to ExternalAccountCredentials

More details

NPE happens at the following call site:

HttpRequestFactory requestFactory = transportFactory.create().createRequestFactory();

Full stack trace below:

com.google.cloud.spark.bigquery.repackaged.io.grpc.StatusRuntimeException: UNAUTHENTICATED: Failed computing credential metadata
	at com.google.cloud.spark.bigquery.repackaged.com.google.api.gax.rpc.ApiExceptionFactory.createException(ApiExceptionFactory.java:116)
	at com.google.cloud.spark.bigquery.repackaged.com.google.api.gax.rpc.ApiExceptionFactory.createException(ApiExceptionFactory.java:41)
	at com.google.cloud.spark.bigquery.repackaged.com.google.api.gax.grpc.GrpcApiExceptionFactory.create(GrpcApiExceptionFactory.java:86)
	at com.google.cloud.spark.bigquery.repackaged.com.google.api.gax.grpc.GrpcApiExceptionFactory.create(GrpcApiExceptionFactory.java:66)
	at com.google.cloud.spark.bigquery.repackaged.com.google.api.gax.grpc.ExceptionResponseObserver.onErrorImpl(ExceptionResponseObserver.java:82)
	at com.google.cloud.spark.bigquery.repackaged.com.google.api.gax.rpc.StateCheckingResponseObserver.onError(StateCheckingResponseObserver.java:84)
	at com.google.cloud.spark.bigquery.repackaged.com.google.api.gax.grpc.GrpcDirectStreamController$ResponseObserverAdapter.onClose(GrpcDirectStreamController.java:148)
	at com.google.cloud.spark.bigquery.repackaged.io.grpc.PartialForwardingClientCallListener.onClose(PartialForwardingClientCallListener.java:39)
	at com.google.cloud.spark.bigquery.repackaged.io.grpc.ForwardingClientCallListener.onClose(ForwardingClientCallListener.java:23)
	at com.google.cloud.spark.bigquery.repackaged.io.grpc.ForwardingClientCallListener$SimpleForwardingClientCallListener.onClose(ForwardingClientCallListener.java:40)
	at com.google.cloud.spark.bigquery.repackaged.com.google.api.gax.grpc.ChannelPool$ReleasingClientCall$1.onClose(ChannelPool.java:546)
	at com.google.cloud.spark.bigquery.repackaged.io.grpc.internal.DelayedClientCall$DelayedListener$3.run(DelayedClientCall.java:489)
	at com.google.cloud.spark.bigquery.repackaged.io.grpc.internal.DelayedClientCall$DelayedListener.delayOrExecute(DelayedClientCall.java:453)
	at com.google.cloud.spark.bigquery.repackaged.io.grpc.internal.DelayedClientCall$DelayedListener.onClose(DelayedClientCall.java:486)
	at com.google.cloud.spark.bigquery.repackaged.io.grpc.internal.ClientCallImpl.closeObserver(ClientCallImpl.java:567)
	at com.google.cloud.spark.bigquery.repackaged.io.grpc.internal.ClientCallImpl.access$300(ClientCallImpl.java:71)
	at com.google.cloud.spark.bigquery.repackaged.io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl$1StreamClosed.runInternal(ClientCallImpl.java:735)
	at com.google.cloud.spark.bigquery.repackaged.io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl$1StreamClosed.runInContext(ClientCallImpl.java:716)
	at com.google.cloud.spark.bigquery.repackaged.io.grpc.internal.ContextRunnable.run(ContextRunnable.java:37)
	at com.google.cloud.spark.bigquery.repackaged.io.grpc.internal.SerializingExecutor.run(SerializingExecutor.java:133)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
	at java.base/java.lang.Thread.run(Thread.java:833)
	Suppressed: java.lang.RuntimeException: Asynchronous task failed
		at com.google.cloud.bigquery.connector.common.StreamCombiningIterator.hasNext(StreamCombiningIterator.java:152)
		at com.google.cloud.bigquery.connector.common.ReadRowsResponseInputStreamEnumeration.loadNextResponse(ReadRowsResponseInputStreamEnumeration.java:57)
		at com.google.cloud.bigquery.connector.common.ReadRowsResponseInputStreamEnumeration.<init>(ReadRowsResponseInputStreamEnumeration.java:37)
		at com.google.cloud.spark.bigquery.v2.context.ArrowColumnBatchPartitionReaderContext.makeSingleInputStream(ArrowColumnBatchPartitionReaderContext.java:234)
		at com.google.cloud.spark.bigquery.v2.context.ArrowColumnBatchPartitionReaderContext.<init>(ArrowColumnBatchPartitionReaderContext.java:224)
		at com.google.cloud.spark.bigquery.v2.context.ArrowInputPartitionContext.createPartitionReaderContext(ArrowInputPartitionContext.java:89)
		at com.google.cloud.spark.bigquery.v2.Spark32BigQueryPartitionReaderFactory.createColumnarReader(Spark32BigQueryPartitionReaderFactory.java:21)
		at org.apache.spark.sql.execution.datasources.v2.DataSourceRDD$$anon$1.advanceToNextIter(DataSourceRDD.scala:79)
		at org.apache.spark.sql.execution.datasources.v2.DataSourceRDD$$anon$1.hasNext(DataSourceRDD.scala:63)
		at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37)
		at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:460)
		at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.columnartorow_nextBatch_0$(Unknown Source)
		at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown Source)
		at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:35)
		at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.hasNext(Unknown Source)
		at org.apache.spark.sql.execution.WholeStageCodegenExec$$anon$1.hasNext(WholeStageCodegenExec.scala:968)
		at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:460)
		at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:460)
		at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:491)
		at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:460)
		at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:460)
		at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:460)
		at org.apache.spark.util.collection.ExternalSorter.insertAll(ExternalSorter.scala:205)
		at org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:65)
		at org.apache.spark.shuffle.ShuffleWriteProcessor.write(ShuffleWriteProcessor.scala:59)
		at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:99)
		at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:52)
		at org.apache.spark.scheduler.Task.run(Task.scala:138)
		at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:548)
		at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1516)
		at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:551)
		... 3 more
Caused by: com.google.cloud.spark.bigquery.repackaged.io.grpc.StatusRuntimeException: UNAUTHENTICATED: Failed computing credential metadata
	at com.google.cloud.spark.bigquery.repackaged.io.grpc.Status.asRuntimeException(Status.java:537)
	... 17 more
Caused by: java.lang.NullPointerException: Cannot invoke "com.google.cloud.spark.bigquery.repackaged.com.google.auth.http.HttpTransportFactory.create()" because "this.transportFactory" is null
	at com.google.cloud.spark.bigquery.repackaged.com.google.auth.oauth2.AwsCredentials.retrieveResource(AwsCredentials.java:213)
	at com.google.cloud.spark.bigquery.repackaged.com.google.auth.oauth2.AwsCredentials.retrieveResource(AwsCredentials.java:202)
	at com.google.cloud.spark.bigquery.repackaged.com.google.auth.oauth2.AwsCredentials.getAwsRegion(AwsCredentials.java:338)
	at com.google.cloud.spark.bigquery.repackaged.com.google.auth.oauth2.AwsCredentials.retrieveSubjectToken(AwsCredentials.java:173)
	at com.google.cloud.spark.bigquery.repackaged.com.google.auth.oauth2.AwsCredentials.refreshAccessToken(AwsCredentials.java:152)
	at com.google.cloud.spark.bigquery.repackaged.com.google.auth.oauth2.OAuth2Credentials$1.call(OAuth2Credentials.java:269)
	at com.google.cloud.spark.bigquery.repackaged.com.google.auth.oauth2.OAuth2Credentials$1.call(OAuth2Credentials.java:266)
	at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
	at com.google.cloud.spark.bigquery.repackaged.com.google.auth.oauth2.OAuth2Credentials$RefreshTask.run(OAuth2Credentials.java:633)
	... 3 more
@lsirac lsirac assigned BigTailWolf and unassigned lsirac Jan 11, 2024
BigTailWolf added a commit to BigTailWolf/google-auth-library-java that referenced this issue Jan 24, 2024
BigTailWolf added a commit that referenced this issue Jan 24, 2024
…1358)

* fix: Issue #1347: ExternalAccountCredentials serialization is broken

* fix test

* fix lint

* Update oauth2_http/java/com/google/auth/oauth2/ExternalAccountCredentials.java

Co-authored-by: Leo <39062083+lsirac@users.noreply.github.com>

* address the removal of redaundant public

* move the getter to test class

---------

Co-authored-by: Leo <39062083+lsirac@users.noreply.github.com>
BigTailWolf added a commit that referenced this issue Feb 6, 2024
BigTailWolf added a commit that referenced this issue Feb 8, 2024
@kohsuke
Copy link
Author

kohsuke commented Mar 13, 2024

Any chance a new release can be created? It's been quite some time since the fix was created.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants