Skip to content
This repository has been archived by the owner on Apr 1, 2024. It is now read-only.

ISSUE-14936: pulsar-io Sinks & Sources tests fail (flake?) with "Failed to open ledger" #3996

Open
sijie opened this issue Mar 29, 2022 · 0 comments
Labels

Comments

@sijie
Copy link
Member

sijie commented Mar 29, 2022

Original Issue: apache#14936


Describe the bug

"CI - Integration - Pulsar-IO Sinks and Sources / pulsar-io" job fails on multiple PRs and fail on re-runs.
Failed integration test seems to not be affected by the changes in the PR per se.

Most commonly I see testRabbitMQSink failures, but sometimes testKinesis or testKafka.

Example PRs: apache#14922 apache#14912 apache#14916

To Reproduce

Rerun "CI - Integration - Pulsar-IO Sinks and Sources / pulsar-io" on PRs, the job fails quite repeatedly.

Expected behavior

The job pass.

Log

2022-03-29T14:25:38,055+0000 INFO  [main] o.a.p.t.MockitoCleanupListener@40 - Cleaning up Mockito's ThreadSafeMockingProgress.MOCKING_PROGRESS_PROVIDER thread local state. {}
Error:  Tests run: 12, Failures: 3, Errors: 0, Skipped: 6, Time elapsed: 866.072 s <<< FAILURE! - in TestSuite
Error:  testKafkaSink(org.apache.pulsar.tests.integration.io.sinks.PulsarSinksTest)  Time elapsed: 132.423 s  <<< FAILURE!
org.apache.pulsar.client.api.PulsarClientException: java.util.concurrent.ExecutionException: org.apache.pulsar.client.api.PulsarClientException: {"errorMsg":"org.apache.pulsar.broker.service.schema.exceptions.SchemaException: Error while recovering ledger -  ledger=5 - operation=Failed to open ledger caused by org.apache.pulsar.broker.service.schema.exceptions.SchemaException: Error while recovering ledger -  ledger=5 - operation=Failed to open ledger","reqId":1253756777559247167, "remote":"localhost/127.0.0.1:49251", "local":"/127.0.0.1:48208"}
	at org.apache.pulsar.client.api.PulsarClientException.unwrap(PulsarClientException.java:1074)
	at org.apache.pulsar.client.impl.ProducerBuilderImpl.create(ProducerBuilderImpl.java:88)
	at org.apache.pulsar.tests.integration.io.sinks.SinkTester.produceMessage(SinkTester.java:118)
	at org.apache.pulsar.tests.integration.io.PulsarIOTestRunner.produceMessagesToInputTopic(PulsarIOTestRunner.java:93)
	at org.apache.pulsar.tests.integration.io.sinks.PulsarIOSinkRunner.runSinkTester(PulsarIOSinkRunner.java:116)
	at org.apache.pulsar.tests.integration.io.PulsarIOTestBase.testSink(PulsarIOTestBase.java:50)
	at org.apache.pulsar.tests.integration.io.sinks.PulsarSinksTest.testKafkaSink(PulsarSinksTest.java:34)
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.base/java.lang.reflect.Method.invoke(Method.java:566)
	at org.testng.internal.MethodInvocationHelper.invokeMethod(MethodInvocationHelper.java:132)
	at org.testng.internal.InvokeMethodRunnable.runOne(InvokeMethodRunnable.java:45)
	at org.testng.internal.InvokeMethodRunnable.call(InvokeMethodRunnable.java:73)
	at org.testng.internal.InvokeMethodRunnable.call(InvokeMethodRunnable.java:11)
	at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
	at java.base/java.lang.Thread.run(Thread.java:829)
Caused by: java.util.concurrent.ExecutionException: org.apache.pulsar.client.api.PulsarClientException: {"errorMsg":"org.apache.pulsar.broker.service.schema.exceptions.SchemaException: Error while recovering ledger -  ledger=5 - operation=Failed to open ledger caused by org.apache.pulsar.broker.service.schema.exceptions.SchemaException: Error while recovering ledger -  ledger=5 - operation=Failed to open ledger","reqId":1253756777559247167, "remote":"localhost/127.0.0.1:49251", "local":"/127.0.0.1:48208"}
	at java.base/java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:395)
	at java.base/java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1999)
	at org.apache.pulsar.client.impl.ProducerBuilderImpl.create(ProducerBuilderImpl.java:86)
	... 17 more
Caused by: org.apache.pulsar.client.api.PulsarClientException: {"errorMsg":"org.apache.pulsar.broker.service.schema.exceptions.SchemaException: Error while recovering ledger -  ledger=5 - operation=Failed to open ledger caused by org.apache.pulsar.broker.service.schema.exceptions.SchemaException: Error while recovering ledger -  ledger=5 - operation=Failed to open ledger","reqId":1253756777559247167, "remote":"localhost/127.0.0.1:49251", "local":"/127.0.0.1:48208"}
	at org.apache.pulsar.client.impl.ClientCnx.getPulsarClientException(ClientCnx.java:1196)
	at org.apache.pulsar.client.impl.ClientCnx.handleError(ClientCnx.java:726)
	at org.apache.pulsar.common.protocol.PulsarDecoder.channelRead(PulsarDecoder.java:178)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
	at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357)
	at io.netty.handler.codec.ByteToMessageDecoder.fireChannelRead(ByteToMessageDecoder.java:327)
	at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:299)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
	at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357)
	at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1410)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
	at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:919)
	at io.netty.channel.epoll.AbstractEpollStreamChannel$EpollStreamUnsafe.epollInReady(AbstractEpollStreamChannel.java:795)
	at io.netty.channel.epoll.EpollEventLoop.processReady(EpollEventLoop.java:480)
	at io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:378)
	at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:986)
	at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
	at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
	... 1 more

Error:  testKinesis(org.apache.pulsar.tests.integration.io.sinks.PulsarSinksTest)  Time elapsed: 0 s  <<< FAILURE!
org.apache.pulsar.tests.FailFastNotifier$FailFastSkipException: Skipped after failure since testFailFast system property is set.
	at org.apache.pulsar.tests.FailFastNotifier.beforeInvocation(FailFastNotifier.java:88)

Error:  testRabbitMQSink(org.apache.pulsar.tests.integration.io.sinks.PulsarSinksTest)  Time elapsed: 0 s  <<< FAILURE!
org.apache.pulsar.tests.FailFastNotifier$FailFastSkipException: Skipped after failure since testFailFast system property is set.
	at org.apache.pulsar.tests.FailFastNotifier.beforeInvocation(FailFastNotifier.java:88)

[INFO] 
[INFO] Results:
[INFO] 
Error:  Failures: 
Error:  org.apache.pulsar.tests.integration.io.sinks.PulsarSinksTest.testKafkaSink(org.apache.pulsar.tests.integration.io.sinks.PulsarSinksTest)
[INFO]   Run 1: PASS
Error:    Run 2: PulsarSinksTest.testKafkaSink:34->PulsarIOTestBase.testSink:50 » PulsarClient ...
[INFO] 
Error:    PulsarSinksTest.testKinesis » FailFastSkip Skipped after failure since testFai...
Error:    PulsarSinksTest.testRabbitMQSink » FailFastSkip Skipped after failure since te...
[INFO] 
Error:  Tests run: 9, Failures: 3, Errors: 0, Skipped: 3
[INFO] 
[INFO] ------------------------------------------------------------------------
[INFO] Reactor Summary for Apache Pulsar :: Tests 2.10.0-SNAPSHOT:
[INFO] 
[INFO] Apache Pulsar :: Tests ............................. SUCCESS [  2.587 s]
[INFO] Apache Pulsar :: Tests :: Integration .............. FAILURE [14:35 min]
[INFO] ------------------------------------------------------------------------
[INFO] BUILD FAILURE
[INFO] ------------------------------------------------------------------------
[INFO] Total time:  14:38 min
[INFO] Finished at: 2022-03-29T14:25:38Z
[INFO] ------------------------------------------------------------------------
Error:  Failed to execute goal org.apache.maven.plugins:maven-surefire-plugin:3.0.0-M3:test (default-test) on project integration: There are test failures.
Error:  
Error:  Please refer to /home/runner/work/pulsar/pulsar/tests/integration/target/surefire-reports for the individual test results.
Error:  Please refer to dump files (if any exist) [date].dump, [date]-jvmRun[N].dump and [date].dumpstream.
Error:  -> [Help 1]
Error:  
Error:  To see the full stack trace of the errors, re-run Maven with the -e switch.
Error:  Re-run Maven using the -X switch to enable full debug logging.
Error:  
Error:  For more information about the errors and possible solutions, please read the following articles:
Error:  [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/MojoFailureException
Error:  
Error:  After correcting the problems, you can resume the build with the command
Error:    mvn <args> -rf :integration
Error: Process completed with exit code 1.
6s

@sijie sijie added the type/bug label Mar 29, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests

1 participant