Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixes a memory leak which is caused by un-cleared stream connections. #50752

Merged
merged 1 commit into from Dec 13, 2018

Conversation

Projects
None yet
2 participants
@kstreee
Copy link
Contributor

commented Dec 5, 2018

What does this PR do?

Compliment memory leak fixes of #46094.

Previous versions of tornado (4.4.x) do gc frequently for asynchronous request handlers' variables, but the current version of tornado (5.x) don't do gc well for the variables.

The reason that gc should be run appropriately is that the saltclients variable maintains connections between the publish channel of EventPublisher. if the connections do not be cleaned as soon as possible, those will occupy un-necessary memory.

What issues does this PR fix or reference?

#46094

Previous Behavior

(Logging by following code at translate/ipc.py)

log.info('stream size is : {}'.format(len(self.streams)))
def buf_log(stream):
    if stream:
        return 'r{}-w{}'.format(stream._read_buffer.__len__(), stream._write_buffer._size)
log.info(' '.join('{}: {}, '.format(idx, buf_log(s)) for idx, s in enumerate(self.streams)))
2018-12-05 14:33:48,272 [salt.transport.ipc:541            ][INFO    ] stream size is : 14
2018-12-05 14:33:48,272 [salt.transport.ipc:545            ][INFO    ] 0: r0-w0,  1: r0-w0,  2: r0-w0,  3: r0-w0,  4: r0-w0,  5: r0-w0,  6: r0-w0,  7: r0-w0,  8: r0-w0,  9: r0-w0,  10: r0-w0,  11: r0-w0,  12: r0-w0,  13: r0-w0,
...
2018-12-05 14:34:10,359 [salt.transport.ipc:541            ][INFO    ] stream size is : 18
2018-12-05 14:34:10,360 [salt.transport.ipc:545            ][INFO    ] 0: r0-w0,  1: r0-w0,  2: r0-w0,  3: r0-w0,  4: r0-w0,  5: r0-w0,  6: r0-w0,  7: r0-w0,  8: r0-w14003,  9: r0-w29063,  10: r0-w0,  11: r0-w0,  12: r0-w4735,  13: r0-w16017,  14: r0-w0,  15: r0-w0,  16: r0-w18198,  17: r0-w0,
...
2018-12-05 14:34:39,278 [salt.transport.ipc:541            ][INFO    ] stream size is : 24
2018-12-05 14:34:39,278 [salt.transport.ipc:545            ][INFO    ] 0: r0-w0,  1: r0-w8986,  2: r0-w0,  3: r0-w0,  4: r0-w0,  5: r0-w0,  6: r0-w11021,  7: r0-w0,  8: r0-w30536,  9: r0-w32717,  10: r0-w43582,  11: r0-w0,  12: r0-w0, 13: r0-w0,  14: r0-w28522,  15: r0-w0,  16: r0-w0,  17: r0-w0,  18: r0-w0,  19: r0-w0,  20: r0-w14026,  21: r0-w19254,  22: r0-w0,  23: r0-w0,
...
2018-12-05 14:34:44,984 [salt.transport.ipc:541            ][INFO    ] stream size is : 40
2018-12-05 14:34:44,984 [salt.transport.ipc:545            ][INFO    ] 0: r0-w0,  1: r0-w56690,  2: r0-w0,  3: r0-w0,  4: r0-w4318,  5: r0-w9625,  6: r0-w58725,  7: r0-w5953,  8: r0-w0,  9: r0-w0,  10: r0-w0,  11: r0-w78240,  12: r0-w80421,  13: r0-w0,  14: r0-w0,  15: r0-w91286,  16: r0-w0,  17: r0-w0,  18: r0-w0,  19: r0-w0,  20: r0-w0,  21: r0-w76226,  22: r0-w2000,  23: r0-w0,  24: r0-w0,  25: r0-w0,  26: r0-w0,  27: r0-w0,  28: r0-w0,  29: r0-w61730,  30: r0-w0, 31: r0-w0,  32: r0-w0,  33: r0-w0,  34: r0-w0,  35: r0-w0,  36: r0-w66958,  37: r0-w0,  38: r0-w0,  39: r0-w0,
...

New Behavior

Maintains the stream size and the read-write buffer size.

2018-12-05 14:42:00,618 [salt.transport.ipc:541            ][INFO    ] stream size is : 6
2018-12-05 14:42:00,618 [salt.transport.ipc:545            ][INFO    ] 0: r0-w0,  1: r0-w0,  2: r0-w0,  3: r0-w0,  4: r0-w0,  5: r0-w0,
...
2018-12-05 14:46:02,033 [salt.transport.ipc:541            ][INFO    ] stream size is : 6
2018-12-05 14:46:02,034 [salt.transport.ipc:545            ][INFO    ] 0: r0-w0,  1: r0-w0,  2: r0-w0,  3: r0-w0,  4: r0-w0,  5: r0-w0,
...
2018-12-05 14:50:40,155 [salt.transport.ipc:541            ][INFO    ] stream size is : 6
2018-12-05 14:50:40,155 [salt.transport.ipc:545            ][INFO    ] 0: r0-w0,  1: r0-w0,  2: r0-w0,  3: r0-w0,  4: r0-w0,  5: r0-w0,

Tests written?

No

Commits signed with GPG?

Yes

Please review Salt's Contributing Guide for best practices.

See GitHub's page on GPG signing for more information about signing commits with GPG.

@cachedout cachedout requested a review from dwoz Dec 6, 2018

@cachedout

This comment has been minimized.

Copy link
Collaborator

commented Dec 6, 2018

@dwoz This sounds similar to the leak you were chasing.

@kstreee

This comment has been minimized.

Copy link
Contributor Author

commented Dec 7, 2018

@cachedout @dwoz

F.Y.I. the main reason that this memory leak issue is in the LocalClient.
For example, if the api uses self.saltclients as a singleton variable, then it will leak like following.

18-12-05 14:29:37,702 [salt.transport.ipc:541            ][INFO    ] stream size is : 14
2018-12-05 14:29:37,702 [salt.transport.ipc:545            ][INFO    ] 0: r0-w0,  1: r0-w0,  2: r0-w72514,  3: r0-w60884,  4: r0-w886,  5: r0-w0,  6: r0-w0,  7: r0-w72514,  8: r0-w0,  9: r0-w0,  10: r0-w0,  11: r0-w56973,  12: r0-w0,  13: r0-w0, 
...
2018-12-05 14:29:38,694 [salt.transport.ipc:541            ][INFO    ] stream size is : 14
2018-12-05 14:29:38,694 [salt.transport.ipc:545            ][INFO    ] 0: r0-w0,  1: r0-w0,  2: r0-w75262,  3: r0-w63632,  4: r0-w3634,  5: r0-w0,  6: r0-w0,  7: r0-w75262,  8: r0-w0,  9: r0-w0,  10: r0-w1353,  11: r0-w59721,  12: r0-w0,  13: r0-w0,
...
2018-12-05 14:29:41,211 [salt.transport.ipc:541            ][INFO    ] stream size is : 14
2018-12-05 14:29:41,212 [salt.transport.ipc:545            ][INFO    ] 0: r0-w0,  1: r0-w0,  2: r0-w86874,  3: r0-w75244,  4: r0-w15246,  5: r0-w0,  6: r0-w0,  7: r0-w86874,  8: r0-w0,  9: r0-w0,  10: r0-w12965,  11: r0-w71333,  12: r0-w0,  13: r0-w0,
...

The stream size does not be changed but the size of the write buffer keep increasing. Because some special case of the LocalClient use patterns doesn't consume the buffer.
I tried to fix LocalClient, but too risky and complex so just changed the saltclients from singleton to the local variable which should be cleaned in this case. There might be some similar leak could be possible because of the mis-use of LocalClient.
I hope this information helps you. :)

@cachedout

This comment has been minimized.

Copy link
Collaborator

commented Dec 10, 2018

I also think that @mattp- would like to look at this.

@cachedout cachedout merged commit e28b2ba into saltstack:2017.7 Dec 13, 2018

9 of 10 checks passed

jenkins/pr/py2-windows-2016 The py2-windows-2016 job has failed
Details
WIP Ready for review
Details
continuous-integration/jenkins/pr-merge This commit looks good
Details
jenkins/pr/docs The docs job has passed
Details
jenkins/pr/lint Python lint test has passed
Details
jenkins/pr/py2-centos-7 The py2-centos-7 job has passed
Details
jenkins/pr/py2-ubuntu-1604 The py2-ubuntu-1604 job has passed
Details
jenkins/pr/py3-centos-7 The py3-centos-7 job has passed
Details
jenkins/pr/py3-ubuntu-1604 The py3-ubuntu-1604 job has passed
Details
jenkins/pr/py3-windows-2016 The py3-windows-2016 job has passed
Details
@kstreee

This comment has been minimized.

Copy link
Contributor Author

commented Dec 14, 2018

I figured out that this commit will raise redundant errors cause of cleaned saltclients. It isn't wrong behavior but it could make people confuse cause of error logs.

i.e.

  1. a client send a request to salt api
  2. salt api trigger payload to salt minion and do some works asynchronously
  3. the client close a connection by some reason
  4. salt api cleanup saltclients object by the on_finish callback of Tornado (by this pr)
  5. salt api try to do something using saltclients <- error has been occurred cause of salt api already cleaned up saltclients.

I think that this isn't critical but it keeps leaving error logs. To prevent the redundant error logs, we might add some condition check logic before using saltclients i.e. if not hasattr(self, 'saltclients') or not self.saltclients: ...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.