Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Anomalous RAM use by TLSSocket #5469

Closed
Zarel opened this issue Feb 28, 2016 · 6 comments
Closed

Anomalous RAM use by TLSSocket #5469

Zarel opened this issue Feb 28, 2016 · 6 comments
Labels
memory Issues and PRs related to the memory management or memory footprint. tls Issues and PRs related to the tls subsystem.

Comments

@Zarel
Copy link
Contributor

Zarel commented Feb 28, 2016

Hi, this is an issue that again only seems to occur at load.

I first found it here: #3072 (comment) – Further testing revealed that it's probably unrelated to #3072.

Summary: In my app (actually an online game, but for the purposes of this bug report can be considered a WebSocket chatroom app with HTTPS support), a single TLSSocket's write buffer is taking up hundreds of megabytes of RAM.

Approximately equal (non-anomalous) amounts of data are being written to every socket, but every once in a while, a few TLSSockets' write buffers will get backed up.

Details and screenshots to follow in comments.

My app is somewhat commonly targeted by DoS attacks, so that is possibly the explanation. Preliminary inspection of the sockets in question appear legitimate, and consistent with the theory that a user disconnects, and reconnects again, but their first socket doesn't get properly closed, but nothing conclusive.

  • Version: v5.7.0, v4.3.1 (bug occurs in both)
  • Platform: Linux smogon-sim 3.13.0-77-generic #121-Ubuntu SMP Wed Jan 20 10:50:42 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
  • Subsystem: https?
@Zarel
Copy link
Contributor Author

Zarel commented Feb 28, 2016

Screenshots of heap dumps of the issue

I can send heapdump files to collaborators or Node.js Foundation members interested, but I don't want to publish them since they're of a production server and could include data that my users consider sensitive/private.

@Zarel
Copy link
Contributor Author

Zarel commented Feb 28, 2016

The RAM usage has caused an out of memory crash in the past:

FATAL ERROR: CALL_AND_RETRY_LAST Allocation failed - process out of memory

<--- Last few GCs --->

93688918 ms: Mark-sweep 863.5 (1434.3) -> 855.0 (1434.3) MB, 902.8 / 0 ms [allocation failure] [GC in old space requested].
93689692 ms: Mark-sweep 855.0 (1434.3) -> 855.0 (1434.3) MB, 773.9 / 0 ms [allocation failure] [GC in old space requested].
93690476 ms: Mark-sweep 855.0 (1434.3) -> 855.0 (1434.3) MB, 783.4 / 0 ms [last resort gc].
93691234 ms: Mark-sweep 855.0 (1434.3) -> 854.3 (1434.3) MB, 757.9 / 0 ms [last resort gc].


<--- JS stacktrace --->

==== JS stack trace =========================================

Security context: 0x14a012bd4871 <JS Object>
    1: stringify [native json.js:159] [pc=0x2657a83dadc4] (this=0x14a012bcf9e9 <a JSON with map 0x3cc85d70ab39>,u=0x26a63e761fa9 <Very long string[7262]>,v=0x14a012b04189 <undefined>,I=0x14a012b04189 <undefined>)
    2: arguments adaptor frame: 1->3
    3: _send [internal/child_process.js:595] [pc=0x2657ab81d891] (this=0x4bd14231a79 <a ChildProcess with map 0x30e91dc815c1>,message=0x26a63e761fa...

@Zarel
Copy link
Contributor Author

Zarel commented Feb 28, 2016

In the other thread, @indutny wrote:

I think what could help is a core dump during one of this OOM crashes. I should be able to inspect stuff more closely after that.

Unfortunately, I've only been able to reproduce this in production, and happens after the server becomes laggier and laggier (as GC pauses get longer and longer), which is an experience I wouldn't want to intentionally inflict on my users (hundreds of thousands daily, millions monthly).

@Zarel
Copy link
Contributor Author

Zarel commented Feb 28, 2016

The best way to work around this issue in the short term is to is have some way to query the size of the write buffer, so I can detect this condition and stop writing to the socket before its write buffer gets too huge.

Is that something that can be done?

@ChALkeR ChALkeR added memory Issues and PRs related to the memory management or memory footprint. tls Issues and PRs related to the tls subsystem. labels Feb 28, 2016
@Zarel
Copy link
Contributor Author

Zarel commented Mar 2, 2016

edit: nevermind, this appears to be an unrelated bug

@Zarel
Copy link
Contributor Author

Zarel commented Apr 1, 2016

#5713 seems to have fixed this.

@Zarel Zarel closed this as completed Apr 1, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
memory Issues and PRs related to the memory management or memory footprint. tls Issues and PRs related to the tls subsystem.
Projects
None yet
Development

No branches or pull requests

2 participants