Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

目前1.2.9 运行一段时间报打开文件过多 #142

Closed
zhaolizhi opened this issue Feb 15, 2019 · 9 comments
Closed

目前1.2.9 运行一段时间报打开文件过多 #142

zhaolizhi opened this issue Feb 15, 2019 · 9 comments

Comments

@zhaolizhi
Copy link

ERROR [KafkaServiceImpl.org.springframework.scheduling.quartz.SchedulerFactoryBean#0_Worker-3] - Get kafka old version logsize has error, msg is Failed to retrieve RMIServer stub: javax.naming.CommunicationException [Root exception is java.rmi.ConnectIOException: Exception creating connection to: 172.16.81.153; nested exception is:
java.net.SocketException: 打开的文件过多]
java.io.IOException: Failed to retrieve RMIServer stub: javax.naming.CommunicationException [Root exception is java.rmi.ConnectIOException: Exception creating connection to: 172.16.81.153; nested exception is:
java.net.SocketException: 打开的文件过多]
at javax.management.remote.rmi.RMIConnector.connect(RMIConnector.java:369)
at javax.management.remote.JMXConnectorFactory.connect(JMXConnectorFactory.java:270)
at javax.management.remote.JMXConnectorFactory.connect(JMXConnectorFactory.java:229)
at org.smartloli.kafka.eagle.core.factory.KafkaServiceImpl.getLogSize(KafkaServiceImpl.java:1007)
at org.smartloli.kafka.eagle.web.quartz.AlertQuartz$Consumer.consumer(AlertQuartz.java:119)
at org.smartloli.kafka.eagle.web.quartz.AlertQuartz.alertJobQuartz(AlertQuartz.java:71)
at sun.reflect.GeneratedMethodAccessor73.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at org.springframework.util.MethodInvoker.invoke(MethodInvoker.java:269)
at org.springframework.scheduling.quartz.MethodInvokingJobDetailFactoryBean$MethodInvokingJob.executeInternal(MethodInvokingJobDetailFactoryBean.java:257)
at org.springframework.scheduling.quartz.QuartzJobBean.execute(QuartzJobBean.java:75)
at org.quartz.core.JobRunShell.run(JobRunShell.java:202)
at org.quartz.simpl.SimpleThreadPool$WorkerThread.run(SimpleThreadPool.java:573)
Caused by: javax.naming.CommunicationException [Root exception is java.rmi.ConnectIOException: Exception creating connection to: 172.16.81.153; nested exception is:
java.net.SocketException: 打开的文件过多]
at com.sun.jndi.rmi.registry.RegistryContext.lookup(RegistryContext.java:122)
at com.sun.jndi.toolkit.url.GenericURLContext.lookup(GenericURLContext.java:205)
at javax.naming.InitialContext.lookup(InitialContext.java:417)
at javax.management.remote.rmi.RMIConnector.findRMIServerJNDI(RMIConnector.java:1957)
at javax.management.remote.rmi.RMIConnector.findRMIServer(RMIConnector.java:1924)
at javax.management.remote.rmi.RMIConnector.connect(RMIConnector.java:287)
... 13 more
Caused by: java.rmi.ConnectIOException: Exception creating connection to: 172.16.81.153; nested exception is:
java.net.SocketException: 打开的文件过多
at sun.rmi.transport.tcp.TCPEndpoint.newSocket(TCPEndpoint.java:631)
at sun.rmi.transport.tcp.TCPChannel.createConnection(TCPChannel.java:216)
at sun.rmi.transport.tcp.TCPChannel.newConnection(TCPChannel.java:202)
at sun.rmi.server.UnicastRef.newCall(UnicastRef.java:342)
at sun.rmi.registry.RegistryImpl_Stub.lookup(Unknown Source)
at com.sun.jndi.rmi.registry.RegistryContext.lookup(RegistryContext.java:118)
... 18 more
Caused by: java.net.SocketException: 打开的文件过多
at java.net.Socket.createImpl(Socket.java:460)
at java.net.Socket.(Socket.java:431)
at java.net.Socket.(Socket.java:211)
at sun.rmi.transport.proxy.RMIDirectSocketFactory.createSocket(RMIDirectSocketFactory.java:40)
at sun.rmi.transport.proxy.RMIMasterSocketFactory.createSocket(RMIMasterSocketFactory.java:148)
at sun.rmi.transport.tcp.TCPEndpoint.newSocket(TCPEndpoint.java:613)
... 23 more

@zhaolizhi
Copy link
Author

运行了一段时间发现和kafka链接在暴涨,不到2个小时达到了快1000个,测试集群kafka3个节点。

@smartloli
Copy link
Owner

@zhaolizhi
use ke.sh stats statistics kafka eagle occupies the number of Linux handles.

[hadoop@dn1 ~]$ ke.sh stats
===================== TCP Connections Count  ==========================
(Not all processes could be identified, non-owned process info
 will not be shown, you would have to be root to see it all.)
      2 2081/java
===================== ESTABLISHED/TIME_OUT Status  ====================
      3 
      1 10.211.55.2
===================== Connection Number Of Different States ===========
TIME_WAIT 3
ESTABLISHED 4
LISTEN 27
===================== End =============================================

@zhaolizhi
Copy link
Author

===================== TCP Connections Count ==========================
981 23475/java
===================== ESTABLISHED/TIME_OUT Status ====================
1509
1 172.31.90.46
1 172.31.90.24
1 10.2.5.13
===================== Connection Number Of Different States ===========
TIME_WAIT 3
FIN_WAIT2 2
ESTABLISHED 1512
LISTEN 14

@smartloli
Copy link
Owner

smartloli commented Feb 19, 2019

Your linux handles connection too many.
What did your kafka cluster do? What tests have been done? For example, how many consumers have been activated?

&&

What is your kafka version number?

@zhaolizhi
Copy link
Author

kafka version 2.1
consumers 10

@smartloli
Copy link
Owner

smartloli commented Feb 19, 2019

kafka version 2.1 , you *.kafka.eagle.offset.storage set zookeeper ?
this error "Get kafka old version logsize has error" from method getLogSize(), when you set *.kafka.eagle.offset.storage=zookeeper.
So, you need to set *.kafka.eagle.offset.storage=kafka

@zhaolizhi
Copy link
Author

I have two mode ,one is kafka ,one is zookeeper .

@kivis-online
Copy link

遇到类似问题、表现为,程序启动后,kafka-eagle所在java进程TCP ESTABLIST数量持续增长,短短几天已经5000多,而且还在持续增长。

@zhaolizhi
Copy link
Author

看链接都是和kafka的链接,怀疑有链接泄漏。

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants