Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

目前1.2.9 运行一段时间报打开文件过多 #142

Closed
zhaolizhi opened this Issue Feb 15, 2019 · 9 comments

Comments

Projects
None yet
3 participants
@zhaolizhi
Copy link

zhaolizhi commented Feb 15, 2019

ERROR [KafkaServiceImpl.org.springframework.scheduling.quartz.SchedulerFactoryBean#0_Worker-3] - Get kafka old version logsize has error, msg is Failed to retrieve RMIServer stub: javax.naming.CommunicationException [Root exception is java.rmi.ConnectIOException: Exception creating connection to: 172.16.81.153; nested exception is:
java.net.SocketException: 打开的文件过多]
java.io.IOException: Failed to retrieve RMIServer stub: javax.naming.CommunicationException [Root exception is java.rmi.ConnectIOException: Exception creating connection to: 172.16.81.153; nested exception is:
java.net.SocketException: 打开的文件过多]
at javax.management.remote.rmi.RMIConnector.connect(RMIConnector.java:369)
at javax.management.remote.JMXConnectorFactory.connect(JMXConnectorFactory.java:270)
at javax.management.remote.JMXConnectorFactory.connect(JMXConnectorFactory.java:229)
at org.smartloli.kafka.eagle.core.factory.KafkaServiceImpl.getLogSize(KafkaServiceImpl.java:1007)
at org.smartloli.kafka.eagle.web.quartz.AlertQuartz$Consumer.consumer(AlertQuartz.java:119)
at org.smartloli.kafka.eagle.web.quartz.AlertQuartz.alertJobQuartz(AlertQuartz.java:71)
at sun.reflect.GeneratedMethodAccessor73.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at org.springframework.util.MethodInvoker.invoke(MethodInvoker.java:269)
at org.springframework.scheduling.quartz.MethodInvokingJobDetailFactoryBean$MethodInvokingJob.executeInternal(MethodInvokingJobDetailFactoryBean.java:257)
at org.springframework.scheduling.quartz.QuartzJobBean.execute(QuartzJobBean.java:75)
at org.quartz.core.JobRunShell.run(JobRunShell.java:202)
at org.quartz.simpl.SimpleThreadPool$WorkerThread.run(SimpleThreadPool.java:573)
Caused by: javax.naming.CommunicationException [Root exception is java.rmi.ConnectIOException: Exception creating connection to: 172.16.81.153; nested exception is:
java.net.SocketException: 打开的文件过多]
at com.sun.jndi.rmi.registry.RegistryContext.lookup(RegistryContext.java:122)
at com.sun.jndi.toolkit.url.GenericURLContext.lookup(GenericURLContext.java:205)
at javax.naming.InitialContext.lookup(InitialContext.java:417)
at javax.management.remote.rmi.RMIConnector.findRMIServerJNDI(RMIConnector.java:1957)
at javax.management.remote.rmi.RMIConnector.findRMIServer(RMIConnector.java:1924)
at javax.management.remote.rmi.RMIConnector.connect(RMIConnector.java:287)
... 13 more
Caused by: java.rmi.ConnectIOException: Exception creating connection to: 172.16.81.153; nested exception is:
java.net.SocketException: 打开的文件过多
at sun.rmi.transport.tcp.TCPEndpoint.newSocket(TCPEndpoint.java:631)
at sun.rmi.transport.tcp.TCPChannel.createConnection(TCPChannel.java:216)
at sun.rmi.transport.tcp.TCPChannel.newConnection(TCPChannel.java:202)
at sun.rmi.server.UnicastRef.newCall(UnicastRef.java:342)
at sun.rmi.registry.RegistryImpl_Stub.lookup(Unknown Source)
at com.sun.jndi.rmi.registry.RegistryContext.lookup(RegistryContext.java:118)
... 18 more
Caused by: java.net.SocketException: 打开的文件过多
at java.net.Socket.createImpl(Socket.java:460)
at java.net.Socket.(Socket.java:431)
at java.net.Socket.(Socket.java:211)
at sun.rmi.transport.proxy.RMIDirectSocketFactory.createSocket(RMIDirectSocketFactory.java:40)
at sun.rmi.transport.proxy.RMIMasterSocketFactory.createSocket(RMIMasterSocketFactory.java:148)
at sun.rmi.transport.tcp.TCPEndpoint.newSocket(TCPEndpoint.java:613)
... 23 more

@zhaolizhi

This comment has been minimized.

Copy link
Author

zhaolizhi commented Feb 15, 2019

运行了一段时间发现和kafka链接在暴涨,不到2个小时达到了快1000个,测试集群kafka3个节点。

@smartloli

This comment has been minimized.

Copy link
Owner

smartloli commented Feb 18, 2019

@zhaolizhi
use ke.sh stats statistics kafka eagle occupies the number of Linux handles.

[hadoop@dn1 ~]$ ke.sh stats
===================== TCP Connections Count  ==========================
(Not all processes could be identified, non-owned process info
 will not be shown, you would have to be root to see it all.)
      2 2081/java
===================== ESTABLISHED/TIME_OUT Status  ====================
      3 
      1 10.211.55.2
===================== Connection Number Of Different States ===========
TIME_WAIT 3
ESTABLISHED 4
LISTEN 27
===================== End =============================================
@zhaolizhi

This comment has been minimized.

Copy link
Author

zhaolizhi commented Feb 18, 2019

===================== TCP Connections Count ==========================
981 23475/java
===================== ESTABLISHED/TIME_OUT Status ====================
1509
1 172.31.90.46
1 172.31.90.24
1 10.2.5.13
===================== Connection Number Of Different States ===========
TIME_WAIT 3
FIN_WAIT2 2
ESTABLISHED 1512
LISTEN 14

@smartloli

This comment has been minimized.

Copy link
Owner

smartloli commented Feb 19, 2019

Your linux handles connection too many.
What did your kafka cluster do? What tests have been done? For example, how many consumers have been activated?

&&

What is your kafka version number?

@zhaolizhi

This comment has been minimized.

Copy link
Author

zhaolizhi commented Feb 19, 2019

kafka version 2.1
consumers 10

@smartloli

This comment has been minimized.

Copy link
Owner

smartloli commented Feb 19, 2019

kafka version 2.1 , you *.kafka.eagle.offset.storage set zookeeper ?
this error "Get kafka old version logsize has error" from method getLogSize(), when you set *.kafka.eagle.offset.storage=zookeeper.
So, you need to set *.kafka.eagle.offset.storage=kafka

@zhaolizhi

This comment has been minimized.

Copy link
Author

zhaolizhi commented Feb 20, 2019

I have two mode ,one is kafka ,one is zookeeper .

@kivis-online

This comment has been minimized.

Copy link

kivis-online commented Feb 21, 2019

遇到类似问题、表现为,程序启动后,kafka-eagle所在java进程TCP ESTABLIST数量持续增长,短短几天已经5000多,而且还在持续增长。

@zhaolizhi

This comment has been minimized.

Copy link
Author

zhaolizhi commented Feb 22, 2019

看链接都是和kafka的链接,怀疑有链接泄漏。

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.