Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

EOFException: Remote socket closed! #1551

Closed
brunolellis opened this issue Jan 11, 2014 · 9 comments
Closed

EOFException: Remote socket closed! #1551

brunolellis opened this issue Jan 11, 2014 · 9 comments
Milestone

Comments

@brunolellis
Copy link

Hello,

More details about this issue here: https://groups.google.com/forum/#!msg/hazelcast/snHGeHaQmDU/t7AnZx9JdxwJ

  • Test code to reproduce:
package com.hazelcast.map.mapstore;

import static org.junit.Assert.assertTrue;

import java.util.Map;
import java.util.concurrent.CountDownLatch;
import java.util.concurrent.atomic.AtomicInteger;

import org.apache.log4j.Logger;
import org.junit.Test;
import org.junit.experimental.categories.Category;
import org.junit.runner.RunWith;

import com.hazelcast.config.Config;
import com.hazelcast.config.MapStoreConfig;
import com.hazelcast.core.HazelcastInstance;
import com.hazelcast.core.IMap;
import com.hazelcast.map.mapstore.MapStoreTest.SimpleMapStore;
import com.hazelcast.query.SampleObjects.Employee;
import com.hazelcast.test.HazelcastJUnit4ClassRunner;
import com.hazelcast.test.HazelcastTestSupport;
import com.hazelcast.test.TestHazelcastInstanceFactory;
import com.hazelcast.test.annotation.ParallelTest;

@RunWith(HazelcastJUnit4ClassRunner.class)
@Category(ParallelTest.class)
public class BigMapStoreTest extends HazelcastTestSupport {

    private static final Logger LOG = Logger.getLogger(BigMapStoreTest.class);

    @Test   
    public void testTwoMembersWriteBehind() throws Exception {

        int size = 400000;

        ThreadSleepTestMapStore tstms = new ThreadSleepTestMapStore(size);

        Config config = new Config();
        config.getMapConfig("map")
                .setMapStoreConfig(new MapStoreConfig()
                        .setWriteDelaySeconds(30)
                        .setImplementation(tstms));

        TestHazelcastInstanceFactory nodeFactory = createHazelcastInstanceFactory(2);

        HazelcastInstance h1 = nodeFactory.newHazelcastInstance(config);
        HazelcastInstance h2 = nodeFactory.newHazelcastInstance(config);
//        HazelcastInstance h3 = nodeFactory.newHazelcastInstance(config);

        IMap map = h1.getMap("map");
        for (int i = 0; i < size; i++) {
            map.put(i, new Employee("joe", i, true, 100.00));

        }

        LOG.info("done loading map; waiting MapStore write delay...");

        tstms.latchStoreAll.await();

        assertTrue(String.valueOf(tstms.storeAllCallCount.get()), tstms.storeAllCallCount.get() == size);
        assertTrue(map.size() == size);

    }


    public static class ThreadSleepTestMapStore extends SimpleMapStore {

        final AtomicInteger storeAllCallCount = new AtomicInteger();
        final CountDownLatch latchStoreAll;

        public ThreadSleepTestMapStore(int size) {
            latchStoreAll = new CountDownLatch(size);

        }

        public void storeAll(Map map) {
            storeAllCallCount.addAndGet(map.size());

            for (int i = 0; i < map.size(); i++) {
                latchStoreAll.countDown();

            }

            try {
                long d = 200;

                LOG.info("storeAll.size() = " + map.size() + "; delaying " + d + "ms to simulate database...");
                Thread.sleep(d);

            } catch (InterruptedException e) {
                // TODO Auto-generated catch block
                e.printStackTrace();
            }

        }

    }

}
  • Log:
12:49:24,914  WARN [ClusterService] - [127.0.0.1]:5002 [dev] Master node has timed out its heartbeat and will be removed
12:49:24,914  INFO [BigMapStoreTest] - storeAll.size() = 2335; delaying 200ms to simulate database...
12:49:24,914  INFO [ClusterService] - [127.0.0.1]:5002 [dev] Master Address[127.0.0.1]:5001 left the cluster. Assigning new master Member [127.0.0.1]:5002 this
12:49:24,914 DEBUG [Node] - [127.0.0.1]:5002 [dev] ** setting master address to Address[127.0.0.1]:5002
12:49:24,914 DEBUG [ClusterService] - [127.0.0.1]:5002 [dev] Now Master Address[127.0.0.1]:5002
12:49:24,914  INFO [ClusterService] - [127.0.0.1]:5002 [dev] Removing Member [127.0.0.1]:5001
12:49:24,928  INFO [BigMapStoreTest] - storeAll.size() = 6; delaying 200ms to simulate database...
12:49:24,930  WARN [ClusterService] - [127.0.0.1]:5001 [dev] Added Address[127.0.0.1]:5002 to list of dead addresses because of timeout since last read
12:49:24,931 DEBUG [ClusterService] - [127.0.0.1]:5001 [dev] No heartbeat should remove Address[127.0.0.1]:5002
12:49:24,931  INFO [ClusterService] - [127.0.0.1]:5001 [dev] Removing Member [127.0.0.1]:5002
12:49:24,954  WARN [PartitionService] - [127.0.0.1]:5002 [dev] This is the master node and received a PartitionRuntimeState from Address[127.0.0.1]:5001. Ignoring incoming state! 
12:49:24,970 DEBUG [Backup] - [127.0.0.1]:5002 [dev] com.hazelcast.spi.exception.CallerNotMemberException: Not Member! caller:Address[127.0.0.1]:5001, partitionId: 158, operation: com.hazelcast.spi.impl.Backup, service: hz:impl:mapService
12:49:24,974  INFO [BigMapStoreTest] - storeAll.size() = 4383; delaying 200ms to simulate database...
12:49:24,982 DEBUG [ClusterService] - [127.0.0.1]:5001 [dev] Member [127.0.0.1]:5002 is dead. Sending remove to all other members.
12:49:24,982  INFO [ClusterService] - [127.0.0.1]:5001 [dev] 

Members [1] {
    Member [127.0.0.1]:5001 this
}

12:49:24,982 DEBUG [ClusterService] - [127.0.0.1]:5002 [dev] Member [127.0.0.1]:5001 is dead. Sending remove to all other members.
12:49:24,982  INFO [ClusterService] - [127.0.0.1]:5002 [dev] 

Members [1] {
    Member [127.0.0.1]:5002 this
}
@mdogan
Copy link
Contributor

mdogan commented Jan 11, 2014

This issue is fixed by commit; bb4f2d4

Please try using version 3.1.4 and re-open if you are still able to reproduce the issue.

@mdogan mdogan closed this as completed Jan 11, 2014
@brunolellis
Copy link
Author

Actually that code was tested with 3.1.4... I will do some more tests and let you know.

@mdogan
Copy link
Contributor

mdogan commented Jan 11, 2014

Yeah, actually with increasing sleep time to a greater value and decreasing the heartbeat timeout (hazelcast.max.no.heartbeat.seconds) to a lower value, it's reproducible on 3.1.4 too.

@mdogan mdogan reopened this Jan 11, 2014
mdogan added a commit to mdogan/hazelcast that referenced this issue Jan 13, 2014
@mdogan mdogan closed this as completed in 65064af Jan 13, 2014
@ravindraku
Copy link

I am using 3.5.1 version and still see EOFException issue...
INFO: [ma-ilogt-lapp45.corp.apple.com]:5701 [dev] [3.5.1] Connection [/:38432] lost. Reason: java.io.EOFException[Remote socket closed!]
Dec 6, 2015 8:29:57 AM com.hazelcast.nio.tcp.ReadHandler
WARNING: [ma-ilogt-lapp45.corp.apple.com]:5701 [dev] [3.5.1] hz._hzInstance_1_dev.IO.thread-in-1 Closing socket to endpoint null, Cause:java.io.EOFException: Remote socket closed!
Dec 6, 2015 8:29:57 AM com.hazelcast.nio.tcp.TcpIpConnectionManager

Is this known issue?

@cooljam
Copy link

cooljam commented Mar 31, 2016

I am using 3.6.1 version and still see it.
2016-03-31 14:43:48.460 | INFO | hz._hzInstance_1_caster-eageye-session.IO.thread-in-1 | com.hazelcast.nio.tcp.TcpIpConnection | [172.19.85.115]:5712 [caster-eageye-session] [3.6.1] Connection [/172.19.104.51:47423] lost. Reason: java.io.EOFException[Remote socket closed!]
My Client is nodejs and it connect by nginx for rest

@jerrinot
Copy link
Contributor

@ravindraku: do you have a firewall between your members? how often do you see the warning?

@cooljam: I am not a nginx expert, but it appears the proxy is closing connections to Hazelcast. This could possibly help you: http://nginx.org/en/docs/http/ngx_http_upstream_module.html#keepalive Keeping the connection alive between requests should also improve performance.

Also you might want to have a look at native our Node.js client: https://github.com/hazelcast/hazelcast-nodejs-client It's still a work in progress, but it could be interesting for your.

@rbalabomm
Copy link

Still i am facing the above issue, when i am clustering 4 nodes in the cluster.Please clarify is it a bug in the Hazelcast? or there is any way that can be resolved?

@bhavitsengar
Copy link

Hello ,

Is there any update on this ? I am also getting the same error:

com.hazelcast.client.connection.nio.ClientConnection
WARNING: Connection [/10.0.0.1:5701] lost. Reason: java.io.EOFException[Remote socket closed!]

Anyone who has got a workaround, please comment to help me out.

Thanks and regards,
Bhavit

@mmedenjak
Copy link
Contributor

Hi @bhavitsengar and @rbalabomm !

I wouldn't say this is necessarily an issue with hazelcast as the connection was simply closed which could be because of various reasons.
Are you able to reproduce the issue consistently? Do you have any other logs indicating network issues?

Also see: https://stackoverflow.com/questions/43032042/error-java-io-eofexception-remote-socket-closed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

8 participants