Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix memory leak in ReliableTopicListenerRunner.next API-1514 #1377

Merged

Conversation

srknzl
Copy link
Member

@srknzl srknzl commented Sep 28, 2022

The leak was due to nested promises triggering each other. In the leak scenario, the catch block of this.ringbuffer.readMany call in next() calls next immediately because it gets a ClientOfflineError due to ReconnectMode.ASYNC. Then infinite number of next() calls happen and the error object and the promises leak memory.

See:

recursion

See screenshots that are taken using Chrome Dev Tools memory snapshot feature:

memoryleak
memoryleak2
memoryleak3

The infinitely calling next() behaviour is the same with Java. However, looks like in Node.js, creating a promise in a promise's catch() block makes the promise non-garbage collectible somehow.

I have tried to use process.nextTick(this.next.bind(this)) instead of calling this.next() as well but it led to the same leak of calling this.next():

memoryUsageWithProcessNextTick

I am not exactly sure why this is the case but it may be due to the execution order of things in the event loop:

  1. Process next tick
  2. Promises callbacks
  3. Macro tasks (setTimeout, setImmediate)

Source: https://nodejs.dev/en/learn/understanding-setimmediate/

Event loop executes tasks in process.nextTick queue first, and then executes promises microtask queue, and then executes macrotask queue.

Using setImmediate(this.next.bind(this)) fixed the issue:

memoryUsageAfter

I realized that garbage collection started to work again.


Script used to test the leak:

'use strict';
const { Client } = require('./lib');

async function main() {

    const client = await Client.newHazelcastClient({
        clusterName: 'dev',
        network: {
            clusterMembers: [
                'localhost:5701'
            ]
        },
        connectionStrategy: {
            asyncStart: false,
            reconnectMode: 'ASYNC', // <----
            connectionRetry: {
                clusterConnectTimeoutMillis: -1
            }
        },
        // properties: {
        //     'hazelcast.logging.level': 'TRACE'
        // }
    });
    const topic = await client.getReliableTopic('topic-reconnect');
    topic.addMessageListener((message) => console.log(message.messageObject));

    setInterval(async () => {
        try {
            console.log('*** PUBLISHING MESSAGE ***');
            await topic.publish('hello world!');
            console.log('*** MESSAGE WAS PUBLISHED ***');
        } catch (error) {
            console.log(`Error in setInterval: ${error}`);
        }
    }, 1000);

}

main().catch(err => {
    console.error(err);
    process.exit(1);
});

Fixes #1347

@srknzl srknzl added this to the 5.2.0 milestone Sep 28, 2022
@srknzl srknzl changed the title Fix leak in ReliableTopicListenerRunner.next API-1564 Fix leak in ReliableTopicListenerRunner.next API-1514 Sep 28, 2022
@codecov
Copy link

codecov bot commented Sep 28, 2022

Codecov Report

Merging #1377 (2adbf2f) into master (cae0c7d) will increase coverage by 0.01%.
The diff coverage is 100.00%.

@@            Coverage Diff             @@
##           master    #1377      +/-   ##
==========================================
+ Coverage   93.25%   93.27%   +0.01%     
==========================================
  Files         464      464              
  Lines       16375    16375              
  Branches     1330     1330              
==========================================
+ Hits        15271    15274       +3     
+ Misses        805      803       -2     
+ Partials      299      298       -1     
Impacted Files Coverage Δ
src/proxy/topic/ReliableTopicListenerRunner.ts 82.22% <100.00%> (ø)
src/util/Util.ts 87.89% <0.00%> (-0.64%) ⬇️
src/invocation/InvocationService.ts 95.29% <0.00%> (+0.78%) ⬆️
src/network/Connection.ts 95.11% <0.00%> (+0.88%) ⬆️

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@srknzl srknzl changed the title Fix leak in ReliableTopicListenerRunner.next API-1514 Fix memory leak in ReliableTopicListenerRunner.next API-1514 Sep 29, 2022
Copy link

@utku-caglayan utku-caglayan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me

Copy link
Contributor

@mdumandag mdumandag left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@srknzl I think it totally makes sense.
consider the following example.

function next() {
    readMany()
        .catch(() => {
            //next();
            //process.nextTick(() => next());
            //setImmediate(() => next());
        });
}

function readMany() {
    return Promise.reject('client offline');
}

next();
setInterval(() => console.log('interval'), 10);

Now if you use anything other than the setImmediate in the catch block, you wouldn't see the interval printed out.

I believe your reasoning is correct.

  • If you use next, it becomes something of a recursive function that never ends. Created invocation promise is immediately rejected since the client is offline and reconnect mode is async, and we immediately execute the catch block.
  • same happens if you use the process.nextTick. That will effectively call next once this function ends.

These two options basically block the event loop, and can cause various problems.

setImmediate allows us to not block the event loop, and still call next infinitely many times.

@srknzl srknzl merged commit 8215b48 into hazelcast:master Sep 30, 2022
@srknzl
Copy link
Member Author

srknzl commented Sep 30, 2022

Thanks for the quick review guys!

@srknzl
Copy link
Member Author

srknzl commented Sep 30, 2022

I will send backports of this if the issue happens in older clients as well.

srknzl added a commit to srknzl/hazelcast-nodejs-client that referenced this pull request Sep 30, 2022
srknzl added a commit to srknzl/hazelcast-nodejs-client that referenced this pull request Sep 30, 2022
srknzl added a commit to srknzl/hazelcast-nodejs-client that referenced this pull request Sep 30, 2022
srknzl added a commit to srknzl/hazelcast-nodejs-client that referenced this pull request Sep 30, 2022
srknzl added a commit to srknzl/hazelcast-nodejs-client that referenced this pull request Sep 30, 2022
harunalpak pushed a commit to harunalpak/hazelcast-nodejs-client that referenced this pull request Dec 8, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Heap out of memory when using ReliableTopic with reconnect mode "ASYNC" [API-1514]
3 participants