[io] Thread leak after HazelcastInstance.shutdown() in 3.8.2 so JVM won't exit #10886

wilevers · 2017-07-07T11:23:30Z

Hi all,

On my system, (Ubuntu 14.04, Java 1.8.0_131-b11, hazelcast-all.jar version 3.8.2), the following program often hangs after printing 'all instances shut down':

import com.hazelcast.config.Config;
import com.hazelcast.core.Hazelcast;
import com.hazelcast.core.HazelcastInstance;

public class Main 
{
	public static void main(String[] args) {
		System.out.println("creating instances...");
		Config config = new Config();
		HazelcastInstance[] instances = new HazelcastInstance[3];
		for (int i = 0; i < instances.length; ++i) {
			instances[i] = Hazelcast.newHazelcastInstance(config);
		}
		System.out.println("instances created. shutting them down...");
		for (int i = instances.length - 1; i >= 0; --i) {
			instances[i].shutdown();
		}
		System.out.println("all instances shut down");
	}
}

A jstack dump of the hanging Java process reveals a non-daemon Hazelcast thread hanging in NonBlockingIOThread.selectLoop():

"hz._hzInstance_2_dev.IO.thread-in-0" #84 prio=5 os_prio=0 tid=0x00007ff02c951800 nid=0x65a3 runnable [0x00007fef7cccd000]
   java.lang.Thread.State: RUNNABLE
	at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
	at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:269)
	at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:93)
	at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:86)
	- locked <0x0000000776359288> (a sun.nio.ch.Util$3)
	- locked <0x0000000776359298> (a java.util.Collections$UnmodifiableSet)
	- locked <0x00000007762c5a28> (a sun.nio.ch.EPollSelectorImpl)
	at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:97)
	at com.hazelcast.internal.networking.nonblocking.NonBlockingIOThread.selectLoop(NonBlockingIOThread.java:248)
	at com.hazelcast.internal.networking.nonblocking.NonBlockingIOThread.run(NonBlockingIOThread.java:203)

Is this a bug in Hazelcast?

Kind regards,

Wil Evers

The text was updated successfully, but these errors were encountered:

jerrinot · 2017-07-07T11:30:16Z

it looks so, thank you for reporting!

mdogan · 2017-07-07T11:45:25Z

I think we already fixed this issue in 3.8.3. See #10651

@wilevers; can you try with 3.8.3 or 3.9-SNAPSHOT?

wilevers · 2017-07-07T12:44:41Z

Just tried 3.8.3, and the issue does not appear. Thanks!
Not sure I like the solution direction taken in #10651, though. It seems to me the root cause is a task clearing the thread's interrupt status. It shouldn't. What if this task first clears the thread's interrupt status and then enters a blocking call, before returning to NonBlockingIOThread.selectLoop())?

Regards,
Wil

pveentjer · 2017-07-10T04:12:41Z

Good points. I'll have a closer look; the amount of code called from the IO thread is limited. Lets see if we can find the violating code.

pveentjer · 2017-07-10T04:26:23Z

I see one potential problem already. If an InterruptedException is thrown in the SelectionHandler.handle method, it isn't correctly handled. It is just caught as Throwable and not checked for InterruptedException; and the flag isn't restored.

pveentjer · 2017-07-10T05:24:21Z

Even though the above InterruptedException handling isn't correct, it isn't the cause. Another problem is with the PacketDispatcherImpl which also catches all exceptions and doesn't handle the InterruptedException specifically

pveentjer · 2017-07-10T05:39:41Z

Also the PacketDispatcherImpl isn't the (last) cause. Search continues

pveentjer · 2017-07-10T07:03:53Z

Provided a customer InterruptedException that provides some more info:

public
class InterruptedException extends Exception {
    private static final long serialVersionUID = 6700697376100628473L;

    /**
     * Constructs an <code>InterruptedException</code> with no detail  message.
     */
    public InterruptedException() {
        super();

        logStackTrace();
    }

    private static void logStackTrace(){
        if(Thread.currentThread().getName().contains("in-")){
            try{
                throw new Exception();
            }catch(Exception e){
                e.printStackTrace();
            }
        } else{
            System.out.println("-------------------------------"+Thread.currentThread().getName()+"--------------------------------");
        }
    }

    /**
     * Constructs an <code>InterruptedException</code> with the
     * specified detail message.
     *
     * @param   s   the detail message.
     */
    public InterruptedException(String s) {
        super(s);

        logStackTrace();
    }
}

And finding one more source of gobbling the exception:

java.lang.Exception
	at java.lang.InterruptedException.logStackTrace(InterruptedException.java:65)
	at java.lang.InterruptedException.<init>(InterruptedException.java:59)
	at java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireNanos(AbstractQueuedSynchronizer.java:1245)
	at java.util.concurrent.locks.ReentrantLock.tryLock(ReentrantLock.java:442)
	at com.hazelcast.util.executor.CachedExecutorServiceDelegate.addNewWorkerIfRequired(CachedExecutorServiceDelegate.java:142)
	at com.hazelcast.util.executor.CachedExecutorServiceDelegate.execute(CachedExecutorServiceDelegate.java:116)
	at com.hazelcast.spi.impl.executionservice.impl.ExecutionServiceImpl.execute(ExecutionServiceImpl.java:248)
	at com.hazelcast.nio.NodeIOService.onDisconnect(NodeIOService.java:162)
	at com.hazelcast.nio.tcp.TcpIpConnection.close(TcpIpConnection.java:274)
	at com.hazelcast.internal.networking.nonblocking.AbstractHandler.onFailure(AbstractHandler.java:128)
	at com.hazelcast.internal.networking.nonblocking.NonBlockingIOThread.handleSelectionKey(NonBlockingIOThread.java:349)
	at com.hazelcast.internal.networking.nonblocking.NonBlockingIOThread.handleSelectionKeys(NonBlockingIOThread.java:332)
	at com.hazelcast.internal.networking.nonblocking.NonBlockingIOThread.selectLoop(NonBlockingIOThread.java:250)
	at com.hazelcast.internal.networking.nonblocking.NonBlockingIOThread.run(NonBlockingIOThread.java:203)

Which leads to

 @SuppressFBWarnings("VO_VOLATILE_INCREMENT")
    private void addNewWorkerIfRequired() {
        if (size < maxPoolSize) {
            try {
                if (lock.tryLock(TIME, TimeUnit.MILLISECONDS)) {
                    try {
                        if (size < maxPoolSize && getQueueSize() > 0) {
                            size++;
                            cachedExecutor.execute(new Worker());
                        }
                    } finally {
                        lock.unlock();
                    }
                }
            } catch (InterruptedException ignored) {
                EmptyStatement.ignore(ignored);<----
            }
        }
    }

Fix hazelcast#10886

Fix #10886

jerrinot added Team: Core Type: Defect labels Jul 7, 2017

jerrinot added this to the 3.9 milestone Jul 7, 2017

pveentjer self-assigned this Jul 10, 2017

pveentjer added a commit to pveentjer/hazelcast that referenced this issue Jul 10, 2017

Fixes various IO thread interupt status eating

35ac2b7

Fix hazelcast#10886

pveentjer mentioned this issue Jul 10, 2017

[BACKPORT} Fixes various IO thread interupt status eating #10893

Merged

nilskp added the in progress label Jul 10, 2017

pveentjer added a commit to pveentjer/hazelcast that referenced this issue Jul 10, 2017

Fixes various IO thread interupt status eating

e4185b5

Fix hazelcast#10886

pveentjer added a commit to pveentjer/hazelcast that referenced this issue Jul 10, 2017

Fixes various IO thread interupt status eating

8b43849

Fix hazelcast#10886

pveentjer mentioned this issue Jul 10, 2017

Fixes various IO thread interupt status eating #10894

Merged

mmedenjak changed the title ~~Thread leak after HazelcastInstance.shutdown() in 3.8.2 so JVM won't exit~~ [io] Thread leak after HazelcastInstance.shutdown() in 3.8.2 so JVM won't exit Jul 11, 2017

jerrinot closed this as completed in #10894 Jul 21, 2017

tombujok pushed a commit that referenced this issue Jul 31, 2017

Fixes various IO thread interupt status eating (#10893)

d576318

Fix #10886

mmedenjak added the Source: Community PR or issue was opened by a community user label Jan 28, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[io] Thread leak after HazelcastInstance.shutdown() in 3.8.2 so JVM won't exit #10886

[io] Thread leak after HazelcastInstance.shutdown() in 3.8.2 so JVM won't exit #10886

wilevers commented Jul 7, 2017 •

edited by mmedenjak

jerrinot commented Jul 7, 2017

mdogan commented Jul 7, 2017

wilevers commented Jul 7, 2017

pveentjer commented Jul 10, 2017

pveentjer commented Jul 10, 2017

pveentjer commented Jul 10, 2017 •

edited

pveentjer commented Jul 10, 2017 •

edited

pveentjer commented Jul 10, 2017

[io] Thread leak after HazelcastInstance.shutdown() in 3.8.2 so JVM won't exit #10886

[io] Thread leak after HazelcastInstance.shutdown() in 3.8.2 so JVM won't exit #10886

Comments

wilevers commented Jul 7, 2017 • edited by mmedenjak

jerrinot commented Jul 7, 2017

mdogan commented Jul 7, 2017

wilevers commented Jul 7, 2017

pveentjer commented Jul 10, 2017

pveentjer commented Jul 10, 2017

pveentjer commented Jul 10, 2017 • edited

pveentjer commented Jul 10, 2017 • edited

pveentjer commented Jul 10, 2017

wilevers commented Jul 7, 2017 •

edited by mmedenjak

pveentjer commented Jul 10, 2017 •

edited

pveentjer commented Jul 10, 2017 •

edited