EOFException: Remote socket closed! #1551

brunolellis · 2014-01-11T14:57:15Z

Hello,

More details about this issue here: https://groups.google.com/forum/#!msg/hazelcast/snHGeHaQmDU/t7AnZx9JdxwJ

Test code to reproduce:

package com.hazelcast.map.mapstore;

import static org.junit.Assert.assertTrue;

import java.util.Map;
import java.util.concurrent.CountDownLatch;
import java.util.concurrent.atomic.AtomicInteger;

import org.apache.log4j.Logger;
import org.junit.Test;
import org.junit.experimental.categories.Category;
import org.junit.runner.RunWith;

import com.hazelcast.config.Config;
import com.hazelcast.config.MapStoreConfig;
import com.hazelcast.core.HazelcastInstance;
import com.hazelcast.core.IMap;
import com.hazelcast.map.mapstore.MapStoreTest.SimpleMapStore;
import com.hazelcast.query.SampleObjects.Employee;
import com.hazelcast.test.HazelcastJUnit4ClassRunner;
import com.hazelcast.test.HazelcastTestSupport;
import com.hazelcast.test.TestHazelcastInstanceFactory;
import com.hazelcast.test.annotation.ParallelTest;

@RunWith(HazelcastJUnit4ClassRunner.class)
@Category(ParallelTest.class)
public class BigMapStoreTest extends HazelcastTestSupport {

    private static final Logger LOG = Logger.getLogger(BigMapStoreTest.class);

    @Test   
    public void testTwoMembersWriteBehind() throws Exception {

        int size = 400000;

        ThreadSleepTestMapStore tstms = new ThreadSleepTestMapStore(size);

        Config config = new Config();
        config.getMapConfig("map")
                .setMapStoreConfig(new MapStoreConfig()
                        .setWriteDelaySeconds(30)
                        .setImplementation(tstms));

        TestHazelcastInstanceFactory nodeFactory = createHazelcastInstanceFactory(2);

        HazelcastInstance h1 = nodeFactory.newHazelcastInstance(config);
        HazelcastInstance h2 = nodeFactory.newHazelcastInstance(config);
//        HazelcastInstance h3 = nodeFactory.newHazelcastInstance(config);

        IMap map = h1.getMap("map");
        for (int i = 0; i < size; i++) {
            map.put(i, new Employee("joe", i, true, 100.00));

        }

        LOG.info("done loading map; waiting MapStore write delay...");

        tstms.latchStoreAll.await();

        assertTrue(String.valueOf(tstms.storeAllCallCount.get()), tstms.storeAllCallCount.get() == size);
        assertTrue(map.size() == size);

    }


    public static class ThreadSleepTestMapStore extends SimpleMapStore {

        final AtomicInteger storeAllCallCount = new AtomicInteger();
        final CountDownLatch latchStoreAll;

        public ThreadSleepTestMapStore(int size) {
            latchStoreAll = new CountDownLatch(size);

        }

        public void storeAll(Map map) {
            storeAllCallCount.addAndGet(map.size());

            for (int i = 0; i < map.size(); i++) {
                latchStoreAll.countDown();

            }

            try {
                long d = 200;

                LOG.info("storeAll.size() = " + map.size() + "; delaying " + d + "ms to simulate database...");
                Thread.sleep(d);

            } catch (InterruptedException e) {
                // TODO Auto-generated catch block
                e.printStackTrace();
            }

        }

    }

}

Log:

12:49:24,914  WARN [ClusterService] - [127.0.0.1]:5002 [dev] Master node has timed out its heartbeat and will be removed
12:49:24,914  INFO [BigMapStoreTest] - storeAll.size() = 2335; delaying 200ms to simulate database...
12:49:24,914  INFO [ClusterService] - [127.0.0.1]:5002 [dev] Master Address[127.0.0.1]:5001 left the cluster. Assigning new master Member [127.0.0.1]:5002 this
12:49:24,914 DEBUG [Node] - [127.0.0.1]:5002 [dev] ** setting master address to Address[127.0.0.1]:5002
12:49:24,914 DEBUG [ClusterService] - [127.0.0.1]:5002 [dev] Now Master Address[127.0.0.1]:5002
12:49:24,914  INFO [ClusterService] - [127.0.0.1]:5002 [dev] Removing Member [127.0.0.1]:5001
12:49:24,928  INFO [BigMapStoreTest] - storeAll.size() = 6; delaying 200ms to simulate database...
12:49:24,930  WARN [ClusterService] - [127.0.0.1]:5001 [dev] Added Address[127.0.0.1]:5002 to list of dead addresses because of timeout since last read
12:49:24,931 DEBUG [ClusterService] - [127.0.0.1]:5001 [dev] No heartbeat should remove Address[127.0.0.1]:5002
12:49:24,931  INFO [ClusterService] - [127.0.0.1]:5001 [dev] Removing Member [127.0.0.1]:5002
12:49:24,954  WARN [PartitionService] - [127.0.0.1]:5002 [dev] This is the master node and received a PartitionRuntimeState from Address[127.0.0.1]:5001. Ignoring incoming state! 
12:49:24,970 DEBUG [Backup] - [127.0.0.1]:5002 [dev] com.hazelcast.spi.exception.CallerNotMemberException: Not Member! caller:Address[127.0.0.1]:5001, partitionId: 158, operation: com.hazelcast.spi.impl.Backup, service: hz:impl:mapService
12:49:24,974  INFO [BigMapStoreTest] - storeAll.size() = 4383; delaying 200ms to simulate database...
12:49:24,982 DEBUG [ClusterService] - [127.0.0.1]:5001 [dev] Member [127.0.0.1]:5002 is dead. Sending remove to all other members.
12:49:24,982  INFO [ClusterService] - [127.0.0.1]:5001 [dev] 

Members [1] {
    Member [127.0.0.1]:5001 this
}

12:49:24,982 DEBUG [ClusterService] - [127.0.0.1]:5002 [dev] Member [127.0.0.1]:5001 is dead. Sending remove to all other members.
12:49:24,982  INFO [ClusterService] - [127.0.0.1]:5002 [dev] 

Members [1] {
    Member [127.0.0.1]:5002 this
}

The text was updated successfully, but these errors were encountered:

mdogan · 2014-01-11T16:19:45Z

This issue is fixed by commit; bb4f2d4

Please try using version 3.1.4 and re-open if you are still able to reproduce the issue.

brunolellis · 2014-01-11T21:04:04Z

Actually that code was tested with 3.1.4... I will do some more tests and let you know.

mdogan · 2014-01-11T23:00:29Z

Yeah, actually with increasing sleep time to a greater value and decreasing the heartbeat timeout (hazelcast.max.no.heartbeat.seconds) to a lower value, it's reproducible on 3.1.4 too.

…ent. Fixes hazelcast#1551.

ravindraku · 2015-12-06T08:44:49Z

I am using 3.5.1 version and still see EOFException issue...
INFO: [ma-ilogt-lapp45.corp.apple.com]:5701 [dev] [3.5.1] Connection [/:38432] lost. Reason: java.io.EOFException[Remote socket closed!]
Dec 6, 2015 8:29:57 AM com.hazelcast.nio.tcp.ReadHandler
WARNING: [ma-ilogt-lapp45.corp.apple.com]:5701 [dev] [3.5.1] hz._hzInstance_1_dev.IO.thread-in-1 Closing socket to endpoint null, Cause:java.io.EOFException: Remote socket closed!
Dec 6, 2015 8:29:57 AM com.hazelcast.nio.tcp.TcpIpConnectionManager

Is this known issue?

cooljam · 2016-03-31T06:44:26Z

I am using 3.6.1 version and still see it.
2016-03-31 14:43:48.460 | INFO | hz._hzInstance_1_caster-eageye-session.IO.thread-in-1 | com.hazelcast.nio.tcp.TcpIpConnection | [172.19.85.115]:5712 [caster-eageye-session] [3.6.1] Connection [/172.19.104.51:47423] lost. Reason: java.io.EOFException[Remote socket closed!]
My Client is nodejs and it connect by nginx for rest

jerrinot · 2016-03-31T07:27:35Z

@ravindraku: do you have a firewall between your members? how often do you see the warning?

@cooljam: I am not a nginx expert, but it appears the proxy is closing connections to Hazelcast. This could possibly help you: http://nginx.org/en/docs/http/ngx_http_upstream_module.html#keepalive Keeping the connection alive between requests should also improve performance.

Also you might want to have a look at native our Node.js client: https://github.com/hazelcast/hazelcast-nodejs-client It's still a work in progress, but it could be interesting for your.

rbalabomm · 2017-09-22T16:32:05Z

Still i am facing the above issue, when i am clustering 4 nodes in the cluster.Please clarify is it a bug in the Hazelcast? or there is any way that can be resolved?

bhavitsengar · 2018-10-19T07:45:46Z

Hello ,

Is there any update on this ? I am also getting the same error:

com.hazelcast.client.connection.nio.ClientConnection
WARNING: Connection [/10.0.0.1:5701] lost. Reason: java.io.EOFException[Remote socket closed!]

Anyone who has got a workaround, please comment to help me out.

Thanks and regards,
Bhavit

mmedenjak · 2018-10-19T08:04:49Z

Hi @bhavitsengar and @rbalabomm !

I wouldn't say this is necessarily an issue with hazelcast as the connection was simply closed which could be because of various reasons.
Are you able to reproduce the issue consistently? Do you have any other logs indicating network issues?

Also see: https://stackoverflow.com/questions/43032042/error-java-io-eofexception-remote-socket-closed

mdogan closed this as completed Jan 11, 2014

mdogan reopened this Jan 11, 2014

mdogan added a commit to mdogan/hazelcast that referenced this issue Jan 13, 2014

Test framework should set MemberImpl.lastRead time when a packet is s…

accd99e

…ent. Fixes hazelcast#1551.

mdogan closed this as completed in 65064af Jan 13, 2014

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

EOFException: Remote socket closed! #1551

EOFException: Remote socket closed! #1551

brunolellis commented Jan 11, 2014

mdogan commented Jan 11, 2014

brunolellis commented Jan 11, 2014

mdogan commented Jan 11, 2014

ravindraku commented Dec 6, 2015

cooljam commented Mar 31, 2016

jerrinot commented Mar 31, 2016

rbalabomm commented Sep 22, 2017

bhavitsengar commented Oct 19, 2018

mmedenjak commented Oct 19, 2018

EOFException: Remote socket closed! #1551

EOFException: Remote socket closed! #1551

Comments

brunolellis commented Jan 11, 2014

mdogan commented Jan 11, 2014

brunolellis commented Jan 11, 2014

mdogan commented Jan 11, 2014

ravindraku commented Dec 6, 2015

cooljam commented Mar 31, 2016

jerrinot commented Mar 31, 2016

rbalabomm commented Sep 22, 2017

bhavitsengar commented Oct 19, 2018

mmedenjak commented Oct 19, 2018