Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hazelcast client produces OutOfMemoryError when there is no hazelcast server available at startup #13186

Closed
sottos opened this issue May 27, 2018 · 10 comments
Assignees
Milestone

Comments

@sottos
Copy link

@sottos sottos commented May 27, 2018

I am using hazelcast 3.9, error exists in 3.10, but does not produce OOM there.
I am using Windows 10 and Java 1.8.0_171

The attached hazelcast-error.zip contains a project, just unpack and mvn clean install. The unit test logs will normally show OOMs after half a minute..
It is using hazelcast 3.9

Changing the hazelcast version in the pom to 3.10 will not produce OOMs, but the buffer error is still there.

Starting the client when there is no datagrid to connect to will produce a buffer,
which seems to have a length which is larger than the buffers capacity. This is certainly an error.

The error appears in com.hazelcast.client.impl.protocol.ClientMessage::readFrom, where
the Bits.readIntL(src) returns a length greater then src.capacity(). The length bytes in the buffer seems
to be corrupted somehow (the lengt bytes are actually offseted by 3 bytes in this case)

public boolean readFrom(ByteBuffer src) {

        int frameLength = 0;
        if (this.buffer == null) {
            // init internal buffer
            final int remaining = src.remaining();
            if (remaining < Bits.INT_SIZE_IN_BYTES) {
                // we don't have even the frame length ready
                return false;
            }
            frameLength = Bits.readIntL(src); // In some cases we have frameLength > src.capacity()
                   //, seems like three bytes has been inserted first in buffer and handled as part of length !!
            // we need to restore the position; as if we didn't read the frame-length
            src.position(src.position() - Bits.INT_SIZE_IN_BYTES);
            if (frameLength < HEADER_SIZE) {
                throw new IllegalArgumentException("Client message frame length cannot be smaller than header size.");
            }
            wrap(new byte[frameLength], 0, USE_UNSAFE);
        }
        frameLength = frameLength > 0 ? frameLength : getFrameLength();
        accumulate(src, frameLength - index());
        return isComplete();
    }
@mdogan mdogan added the Team: Client label May 28, 2018
@mdogan mdogan added this to the 3.10.2 milestone May 28, 2018
@mmedenjak mmedenjak modified the milestones: 3.10.2, 3.10.3 May 29, 2018
@ahmetmircik
Copy link
Member

@ahmetmircik ahmetmircik commented May 29, 2018

seems same issue with this-> hazelcast/hazelcast-enterprise#1382

@sancar sancar self-assigned this May 30, 2018
@sancar
Copy link
Member

@sancar sancar commented May 30, 2018

Hi @sottos ,
I am trying to reproduce the issue. When I run your zip itself, it did not go OOME.
Should I open a cluster with it to reproduce ? Or may be I should open the cluster a little bit later ?

When there is no cluster, clientMessage.readFrom is actually never called.

@sottos
Copy link
Author

@sottos sottos commented May 30, 2018

@sottos
Copy link
Author

@sottos sottos commented May 30, 2018

@sancar
Copy link
Member

@sancar sancar commented May 30, 2018

Can you share the logs of your run ?

@sottos
Copy link
Author

@sottos sottos commented May 30, 2018

@sancar
Copy link
Member

@sancar sancar commented May 30, 2018

It is either because of Windows environment which is very unlikely (I will test this, currently using mac).
Or I am going to suspect that there is somebody responding from endpoints that client trying to connect 127.0.1-3/1958-1961 . Can you check and confirm nothing is running on these addresses ?

@sottos
Copy link
Author

@sottos sottos commented May 30, 2018

If i use 127.0.0.1 in the list of hostnames (with whatever port it looks like) it fails like I mention. If I remove 127.0.0.1 from the list of hosts and inserts "xyz.somewhere.no,xyz.elsewhere.se" instead, no call to readFrom occurs (with the same portrange, 1958 -> 1961).

But I will stop using so small portnumbers, I forgot that they they are most likely reserved for something

@sancar
Copy link
Member

@sancar sancar commented May 31, 2018

Since it is not reproduced on our side, and you say that it could be the case that ports are mixed, I am closing the issue for now. Feel free to reopen the issue if you see a related problem.

@sancar sancar closed this May 31, 2018
@sottos
Copy link
Author

@sottos sottos commented Aug 8, 2018

We have tried it on Mac, produces no error, the ClientMessage::readFrom is not called.
on Windows and Ubuntu we end up with an OutOfMemory error.
We are now using port 19580 instead of 1958, no difference actually, OOMs for both.

I do not know how I reopens this case. Maybe you must change the state ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
5 participants
You can’t perform that action at this time.