Skip to content

4.x: DefaultMetadataTabletMapIT leaves multiple hanging sessions #678

@Bouncheck

Description

@Bouncheck

Observed (but not only) for example in:
https://github.com/scylladb/java-driver/actions/runs/18193758711/job/51794652116
where it causes AdvancedShardAwarenessIT to fail with:

Error:  Failures: 
Error:    AdvancedShardAwarenessIT.should_initialize_all_channels:133->lambda$should_initialize_all_channels$2:133->assertNoLogMatches:276 Logs should not contain pattern [.*Scheduling next reconnection in.*] but found in [[s92|/127.0.18.1:9042] Scheduling next reconnection in PT1M]
Error:    AdvancedShardAwarenessIT.should_initialize_all_channels:133->lambda$should_initialize_all_channels$2:133->assertNoLogMatches:276 Logs should not contain pattern [.*Scheduling next reconnection in.*] but found in [[s72] Scheduling next reconnection in PT1M]
Error:    AdvancedShardAwarenessIT.should_not_struggle_to_fill_pools:248->assertMatchesAtMost:331 Expected to find pattern at most 4 times but found it 5 times. Pattern: [.*Scheduling next reconnection in.*]

However this cluster can be traced back to

[INFO] Running com.datastax.oss.driver.core.metadata.RandomTokenVnodesIT
13:10:08.614 [main] INFO  c.d.o.d.api.testinfra.ccm.CcmBridge - Executing: [ccm, create, ccm_1, -i, 127.0.18., -n, 2:0, -v, release:6.2.3, --scylla, --config-dir=/tmp/ccm3788213650875537416]
13:10:08.960 [Exec Stream Pumper] INFO  c.d.o.d.api.testinfra.ccm.CcmBridge - ccmout> Current cluster is now: ccm_1
13:10:08.993 [main] INFO  c.d.o.d.api.testinfra.ccm.CcmBridge - Executing: [ccm, updateconf, auto_snapshot:false, experimental_features:[consistent-topology-changes,tablets], --config-dir=/tmp/ccm3788213650875537416]
13:10:09.304 [main] INFO  c.d.o.d.api.testinfra.ccm.CcmBridge - Executing: [ccm, start, --wait-for-binary-proto, --wait-other-notice, --config-dir=/tmp/ccm3788213650875537416]
Warning:  Tests run: 1, Failures: 0, Errors: 0, Skipped: 1, Time elapsed: 6.87 s - in com.datastax.oss.driver.core.metadata.RandomTokenVnodesIT
[INFO] Running com.datastax.oss.driver.core.metadata.DefaultMetadataTabletMapIT
13:10:24.703 [main] INFO  c.d.o.d.api.testinfra.ccm.CcmBridge - Executing: [ccm, remove, --config-dir=/tmp/ccm3788213650875537416]
[INFO] Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 10.529 s - in com.datastax.oss.driver.core.metadata.DefaultMetadataTabletMapIT
[INFO] Running com.datastax.oss.driver.core.metadata.ByteOrderedTokenIT
Warning:  Tests run: 1, Failures: 0, Errors: 0, Skipped: 1, Time elapsed: 0.007 s - in com.datastax.oss.driver.core.metadata.ByteOrderedTokenIT

RandomTokenVnodesIT is currently marked with annotations that should make it irrelevant for Scylla anyway:

@BackendRequirement(
type = BackendType.CASSANDRA,
maxExclusive = "4.0-beta4",
// TODO Re-enable when CASSANDRA-16364 is fixed
description = "TODO Re-enable when CASSANDRA-16364 is fixed")
@ScyllaSkip(description = "scylladb/java-driver#568 - fails to start scylla")
public class RandomTokenVnodesIT extends TokenITBase {

Since the failures started appearing it started taking around 6.8 seconds rather than no more than 1.
Even though the placement of log lines about creation of cluster (127.0.18.) make it almost look like it's created for RandomTokenVnodesIT i think it actually is created for the next test in the logs, which is DefaultMetadataTabletMapIT. The updateconf command matches the requirements for it, while it not matches the requirements for RandomTokenVnodesIT.
The number of nodes is also 2 rather than 3.

public class DefaultMetadataTabletMapIT {
private static final Logger LOG = LoggerFactory.getLogger(DefaultMetadataTabletMapIT.class);
private static final CustomCcmRule CCM_RULE =
CustomCcmRule.builder()
.withNodes(2)
.withCassandraConfiguration(
"experimental_features", "['consistent-topology-changes','tablets']")
.build();

Throughout the job it seems that the session attempts to communicate with shutdown cluster for several minutes, until it causes different test to fail.


01:34:47.635 [s77-admin-0] DEBUG c.d.o.d.i.core.pool.ChannelPool - [s77|/127.0.18.1:9042] Reconnection attempt complete, 0/2 channels
01:34:47.635 [s77-admin-0] DEBUG c.d.o.d.i.c.u.c.Reconnection - [s77|/127.0.18.1:9042] Scheduling next reconnection in PT58.2S
01:34:48.265 [s68-admin-1] DEBUG c.d.o.d.i.core.pool.ChannelPool - [s68|/127.0.18.2:9042] Trying to create 2 missing channels
01:34:48.267 [s68-admin-1] WARN  c.d.o.d.i.core.pool.ChannelPool - [s68|/127.0.18.2:9042]  Error while opening new channel
com.datastax.oss.driver.api.core.connection.ConnectionInitException: [s68|connecting...] Protocol initialization request, step 1 (OPTIONS): failed to send request (io.netty.channel.StacklessClosedChannelException)
	at com.datastax.oss.driver.internal.core.channel.ProtocolInitHandler$InitRequest.fail(ProtocolInitHandler.java:375)
	at com.datastax.oss.driver.internal.core.channel.ChannelHandlerRequest.writeListener(ChannelHandlerRequest.java:89)
	at io.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise.java:590)
	at io.netty.util.concurrent.DefaultPromise.notifyListenersNow(DefaultPromise.java:557)
	at io.netty.util.concurrent.DefaultPromise.notifyListeners(DefaultPromise.java:492)
	at io.netty.util.concurrent.DefaultPromise.addListener(DefaultPromise.java:185)
	at io.netty.channel.DefaultChannelPromise.addListener(DefaultChannelPromise.java:95)
	at io.netty.channel.DefaultChannelPromise.addListener(DefaultChannelPromise.java:30)
	at com.datastax.oss.driver.internal.core.channel.ChannelHandlerRequest.send(ChannelHandlerRequest.java:78)
	at com.datastax.oss.driver.internal.core.channel.ProtocolInitHandler$InitRequest.send(ProtocolInitHandler.java:209)
	at com.datastax.oss.driver.internal.core.channel.ProtocolInitHandler.onRealConnect(ProtocolInitHandler.java:136)
	at com.datastax.oss.driver.internal.core.channel.ConnectInitHandler.lambda$connect$0(ConnectInitHandler.java:59)
	at io.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise.java:590)
	at io.netty.util.concurrent.DefaultPromise.notifyListeners0(DefaultPromise.java:583)
	at io.netty.util.concurrent.DefaultPromise.notifyListenersNow(DefaultPromise.java:559)
	at io.netty.util.concurrent.DefaultPromise.notifyListeners(DefaultPromise.java:492)
	at io.netty.util.concurrent.DefaultPromise.setValue0(DefaultPromise.java:636)
	at io.netty.util.concurrent.DefaultPromise.setFailure0(DefaultPromise.java:629)
	at io.netty.util.concurrent.DefaultPromise.tryFailure(DefaultPromise.java:118)
	at io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.fulfillConnectPromise(AbstractNioChannel.java:326)
	at io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.finishConnect(AbstractNioChannel.java:342)
	at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:784)
	at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:732)
	at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:658)
	at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:562)
	at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:998)
	at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
	at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
	at java.lang.Thread.run(Thread.java:750)
	Suppressed: io.netty.channel.AbstractChannel$AnnotatedConnectException: Connection refused: /127.0.18.2:9042
	Caused by: java.net.ConnectException: Connection refused
		at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
		at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:716)
		at io.netty.channel.socket.nio.NioSocketChannel.doFinishConnect(NioSocketChannel.java:336)
		at io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.finishConnect(AbstractNioChannel.java:339)
		at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:784)
		at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:732)
		at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:658)
		at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:562)
		at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:998)
		at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
		at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
		at java.lang.Thread.run(Thread.java:750)
Caused by: io.netty.channel.StacklessClosedChannelException: null
	at io.netty.channel.AbstractChannel$AbstractUnsafe.flush0()(Unknown Source)
01:34:48.269 [s68-admin-1] WARN  c.d.o.d.i.core.pool.ChannelPool - [s68|/127.0.18.2:9042]  Error while opening new channel
com.datastax.oss.driver.api.core.connection.ConnectionInitException: [s68|connecting...] Protocol initialization request, step 1 (OPTIONS): failed to send request (io.netty.channel.StacklessClosedChannelException)
	at com.datastax.oss.driver.internal.core.channel.ProtocolInitHandler$InitRequest.fail(ProtocolInitHandler.java:375)
	at com.datastax.oss.driver.internal.core.channel.ChannelHandlerRequest.writeListener(ChannelHandlerRequest.java:89)
	at io.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise.java:590)
	at io.netty.util.concurrent.DefaultPromise.notifyListenersNow(DefaultPromise.java:557)
	at io.netty.util.concurrent.DefaultPromise.notifyListeners(DefaultPromise.java:492)
	at io.netty.util.concurrent.DefaultPromise.addListener(DefaultPromise.java:185)
	at io.netty.channel.DefaultChannelPromise.addListener(DefaultChannelPromise.java:95)
	at io.netty.channel.DefaultChannelPromise.addListener(DefaultChannelPromise.java:30)
	at com.datastax.oss.driver.internal.core.channel.ChannelHandlerRequest.send(ChannelHandlerRequest.java:78)
	at com.datastax.oss.driver.internal.core.channel.ProtocolInitHandler$InitRequest.send(ProtocolInitHandler.java:209)
	at com.datastax.oss.driver.internal.core.channel.ProtocolInitHandler.onRealConnect(ProtocolInitHandler.java:136)
	at com.datastax.oss.driver.internal.core.channel.ConnectInitHandler.lambda$connect$0(ConnectInitHandler.java:59)
	at io.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise.java:590)
	at io.netty.util.concurrent.DefaultPromise.notifyListeners0(DefaultPromise.java:583)
	at io.netty.util.concurrent.DefaultPromise.notifyListenersNow(DefaultPromise.java:559)
	at io.netty.util.concurrent.DefaultPromise.notifyListeners(DefaultPromise.java:492)
	at io.netty.util.concurrent.DefaultPromise.setValue0(DefaultPromise.java:636)
	at io.netty.util.concurrent.DefaultPromise.setFailure0(DefaultPromise.java:629)
	at io.netty.util.concurrent.DefaultPromise.tryFailure(DefaultPromise.java:118)
	at io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.fulfillConnectPromise(AbstractNioChannel.java:326)
	at io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.finishConnect(AbstractNioChannel.java:342)
	at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:784)
	at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:732)
	at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:658)
	at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:562)
	at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:998)
	at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
	at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
	at java.lang.Thread.run(Thread.java:750)
	Suppressed: io.netty.channel.AbstractChannel$AnnotatedConnectException: Connection refused: /127.0.18.2:9042
	Caused by: java.net.ConnectException: Connection refused
		at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
		at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:716)
		at io.netty.channel.socket.nio.NioSocketChannel.doFinishConnect(NioSocketChannel.java:336)
		at io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.finishConnect(AbstractNioChannel.java:339)
		at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:784)
		at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:732)
		at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:658)
		at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:562)
		at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:998)
		at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
		at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
		at java.lang.Thread.run(Thread.java:750)
Caused by: io.netty.channel.StacklessClosedChannelException: null
	at io.netty.channel.AbstractChannel$AbstractUnsafe.flush0()(Unknown Source)
01:34:48.269 [s68-admin-1] DEBUG c.d.o.d.i.core.pool.ChannelPool - [s68|/127.0.18.2:9042] Reconnection attempt complete, 0/2 channels
01:34:48.269 [s68-admin-1] DEBUG c.d.o.d.i.c.u.c.Reconnection - [s68|/127.0.18.2:9042] Scheduling next reconnection in PT56.4S

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions