Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve ZooKeeper client handling in migration code #10045

Conversation

scholzj
Copy link
Member

@scholzj scholzj commented Apr 30, 2024

Type of change

  • Bugfix

Description

The ZooKeeper client used to delete the /controller node during migration rollback is not closed properly when it fails. As a result, it might consumer resources or trying to connect long after its certificates might be deleted. This might cause issues if this happens in production. But it is also producing a large amounts of errors like this:

2024-04-30 23:02:37 WARN  ChannelInitializer:97 - Failed to initialize a channel. Closing: [id: 0xee25611e]
org.apache.zookeeper.common.X509Exception$SSLContextException: Failed to create KeyManager
	at org.apache.zookeeper.common.X509Util.createSSLContextAndOptionsFromConfig(X509Util.java:371) ~[zookeeper-3.8.4.jar:3.8.4]
	at org.apache.zookeeper.common.X509Util.createSSLContextAndOptions(X509Util.java:349) ~[zookeeper-3.8.4.jar:3.8.4]
	at org.apache.zookeeper.common.X509Util.createSSLContext(X509Util.java:277) ~[zookeeper-3.8.4.jar:3.8.4]
	at org.apache.zookeeper.ClientCnxnSocketNetty$ZKClientPipelineFactory.initSSL(ClientCnxnSocketNetty.java:449) ~[zookeeper-3.8.4.jar:3.8.4]
	at org.apache.zookeeper.ClientCnxnSocketNetty$ZKClientPipelineFactory.initChannel(ClientCnxnSocketNetty.java:439) ~[zookeeper-3.8.4.jar:3.8.4]
	at org.apache.zookeeper.ClientCnxnSocketNetty$ZKClientPipelineFactory.initChannel(ClientCnxnSocketNetty.java:423) ~[zookeeper-3.8.4.jar:3.8.4]
	at io.netty.channel.ChannelInitializer.initChannel(ChannelInitializer.java:129) ~[netty-transport-4.1.108.Final.jar:4.1.108.Final]
	at io.netty.channel.ChannelInitializer.handlerAdded(ChannelInitializer.java:112) ~[netty-transport-4.1.108.Final.jar:4.1.108.Final]
	at io.netty.channel.AbstractChannelHandlerContext.callHandlerAdded(AbstractChannelHandlerContext.java:1130) ~[netty-transport-4.1.108.Final.jar:4.1.108.Final]
	at io.netty.channel.DefaultChannelPipeline.callHandlerAdded0(DefaultChannelPipeline.java:609) ~[netty-transport-4.1.108.Final.jar:4.1.108.Final]
	at io.netty.channel.DefaultChannelPipeline.access$100(DefaultChannelPipeline.java:46) ~[netty-transport-4.1.108.Final.jar:4.1.108.Final]
	at io.netty.channel.DefaultChannelPipeline$PendingHandlerAddedTask.execute(DefaultChannelPipeline.java:1463) ~[netty-transport-4.1.108.Final.jar:4.1.108.Final]
	at io.netty.channel.DefaultChannelPipeline.callHandlerAddedForAllHandlers(DefaultChannelPipeline.java:1115) ~[netty-transport-4.1.108.Final.jar:4.1.108.Final]
	at io.netty.channel.DefaultChannelPipeline.invokeHandlerAddedIfNeeded(DefaultChannelPipeline.java:650) ~[netty-transport-4.1.108.Final.jar:4.1.108.Final]
	at io.netty.channel.AbstractChannel$AbstractUnsafe.register0(AbstractChannel.java:514) ~[netty-transport-4.1.108.Final.jar:4.1.108.Final]
	at io.netty.channel.AbstractChannel$AbstractUnsafe.access$200(AbstractChannel.java:429) ~[netty-transport-4.1.108.Final.jar:4.1.108.Final]
	at io.netty.channel.AbstractChannel$AbstractUnsafe$1.run(AbstractChannel.java:486) ~[netty-transport-4.1.108.Final.jar:4.1.108.Final]
	at io.netty.util.concurrent.AbstractEventExecutor.runTask(AbstractEventExecutor.java:173) ~[netty-common-4.1.108.Final.jar:4.1.108.Final]
	at io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:166) ~[netty-common-4.1.108.Final.jar:4.1.108.Final]
	at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:470) ~[netty-common-4.1.108.Final.jar:4.1.108.Final]
	at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:569) ~[netty-transport-4.1.108.Final.jar:4.1.108.Final]
	at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:997) ~[netty-common-4.1.108.Final.jar:4.1.108.Final]
	at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) ~[netty-common-4.1.108.Final.jar:4.1.108.Final]
	at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) ~[netty-common-4.1.108.Final.jar:4.1.108.Final]
	at java.lang.Thread.run(Thread.java:840) ~[?:?]
Caused by: org.apache.zookeeper.common.X509Exception$KeyManagerException: java.nio.file.NoSuchFileException: /var/folders/v4/gqvkpgh95yn3p5xn8dh9m0j80000gn/T/io.strimzi.operator.cluster.operator.assembly.KRaftMigrationUtils11628936327686183756pem
	at org.apache.zookeeper.common.X509Util.createKeyManager(X509Util.java:492) ~[zookeeper-3.8.4.jar:3.8.4]
	at org.apache.zookeeper.common.X509Util.createSSLContextAndOptionsFromConfig(X509Util.java:369) ~[zookeeper-3.8.4.jar:3.8.4]
	... 24 more
Caused by: java.nio.file.NoSuchFileException: /var/folders/v4/gqvkpgh95yn3p5xn8dh9m0j80000gn/T/io.strimzi.operator.cluster.operator.assembly.KRaftMigrationUtils11628936327686183756pem
	at sun.nio.fs.UnixException.translateToIOException(UnixException.java:92) ~[?:?]
	at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:106) ~[?:?]
	at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:111) ~[?:?]
	at sun.nio.fs.UnixFileSystemProvider.newByteChannel(UnixFileSystemProvider.java:218) ~[?:?]
	at java.nio.file.Files.newByteChannel(Files.java:380) ~[?:?]
	at java.nio.file.Files.newByteChannel(Files.java:432) ~[?:?]
	at java.nio.file.Files.readAllBytes(Files.java:3288) ~[?:?]
	at org.apache.zookeeper.util.PemReader.loadPrivateKey(PemReader.java:141) ~[zookeeper-3.8.4.jar:3.8.4]
	at org.apache.zookeeper.util.PemReader.loadKeyStore(PemReader.java:103) ~[zookeeper-3.8.4.jar:3.8.4]
	at org.apache.zookeeper.common.PEMFileLoader.loadKeyStore(PEMFileLoader.java:50) ~[zookeeper-3.8.4.jar:3.8.4]
	at org.apache.zookeeper.common.X509Util.loadKeyStore(X509Util.java:425) ~[zookeeper-3.8.4.jar:3.8.4]
	at org.apache.zookeeper.common.X509Util.createKeyManager(X509Util.java:481) ~[zookeeper-3.8.4.jar:3.8.4]
	at org.apache.zookeeper.common.X509Util.createSSLContextAndOptionsFromConfig(X509Util.java:369) ~[zookeeper-3.8.4.jar:3.8.4]
	... 24 more

in the unit tests where it seems to be able to easily produce hundreds of them while running the cluster-operator module unit tests. This PR uses the try-with-resource construct to make sure the client is closed.

Checklist

  • Make sure all tests pass

…Scaler tests

Signed-off-by: Jakub Scholz <www@scholzj.com>
@scholzj scholzj added this to the 0.41.0 milestone Apr 30, 2024
@scholzj scholzj requested a review from ppatierno April 30, 2024 23:33
@scholzj scholzj changed the title Improve ZooKeeper client handling in migration code and in ZooKeeper Scaler tests Improve ZooKeeper client handling in migration code Apr 30, 2024
@scholzj
Copy link
Member Author

scholzj commented May 1, 2024

/azp run migration

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

Copy link
Contributor

@tinaselenge tinaselenge left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Member

@ppatierno ppatierno left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@scholzj scholzj merged commit faa3e91 into strimzi:main May 2, 2024
15 checks passed
@scholzj scholzj deleted the auto-close-zookeeper-client-in-migration-rollback branch May 2, 2024 12:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants