Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dist binary memory leak in tcp_inet in Erlang OTP 24 and 25 #7834

Closed
nickva opened this issue Nov 7, 2023 · 8 comments
Closed

Dist binary memory leak in tcp_inet in Erlang OTP 24 and 25 #7834

nickva opened this issue Nov 7, 2023 · 8 comments
Assignees
Labels
bug Issue is reported as a bug team:VM Assigned to OTP team VM

Comments

@nickva
Copy link
Contributor

nickva commented Nov 7, 2023

Describe the bug

There seems to be a binary memory leak when sending data from a file across a dist channel.

To Reproduce

Reading file IO data across a dist channel when processes time-out and exit early seems to get the memory allocated in the tcp_inet binary drivers. Disconnecting the nodes doesn't seem to release the memory.

instrument:allocations().
{ok,{128,0, ...
       tcp_inet =>
           #{binary => {6,29,37,93,186,413,732,1543,3129,794203,0,0,0,0,0,0,0,0},
             driver_tid => {14,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0},
             driver_tsd => {14,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0},
             drv_binary => {0,0,0,1,0,0,0,0,0,37,0,0,0,0,0,0,0,0},
             drv_internal => {0,14,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0}}}
}}
erlang:memory().
[{total,52375171672},
 {processes,43839592},
 {processes_used,43839336},
 {system,52331332080},
 {atom,491713},
 {atom_used,462311},
 {binary,52286491928},
 {code,8859086},
 {ets,523320}]

This was detected when attempting to reproduce blocking behavior in #7801

The reproducer is in https://gist.github.com/nickva/31232b3ed9f57ce8f96b2e4ecfdec524

It was run on two nodes started on the same server.

$ erl -name otp252@127.0.0.1
> c(distblockleak).
% $ erl -name otp251@127.0.0.1
> c(distblockleak), distblockleak:go('otp252@127.0.0.1', "./junk.bin", 10000, 100, 200).
.... wait some time, taking care not to overrun memory completely ...
> distblockleak:stop().

Expected behavior

It's expected that the binary memory would not increase and leak without bounds. Some memory usage is expected but it should be released at least on distblockleak:stop() and when the nodes disconnect.

Affected versions

Latest OTP 24 and 25 versions.

Does not affect OTP 23 and OTP 26.

On OTP 23 the memory usage constant with the above reproducer.

...
process_info:
  > procs: 152 msec: 64
  > dist tcp_inet allocations: #{driver_tid =>
                                     {13,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0},
                                 driver_tsd =>
                                     {13,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0},
                                 drv_binary =>
                                     {0,0,0,0,0,0,0,0,0,2402,0,0,0,0,0,0,0,0},
                                 drv_internal =>
                                     {0,13,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0}}
(otp231@127.0.0.1)4> distblockleak:stop().

On OTP 26:

process_info:
  > procs: 155 msec: 49
  > dist tcp_inet allocations: #{drv_binary =>
                                     {0,0,0,0,0,0,0,0,0,2168,0,0,0,0,0,0,0,0},
                                 drv_internal =>
                                     {0,17,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0},
                                 driver_tid =>
                                     {17,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0},
                                 driver_tsd =>
                                     {17,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0}}

Additional context

Since the reproducer in https://gist.github.com/nickva/31232b3ed9f57ce8f96b2e4ecfdec524 was designed to reproduce the blocking issue in #7801, it may have additional/irrelevant parts for this issue.

@nickva nickva added the bug Issue is reported as a bug label Nov 7, 2023
@nickva nickva changed the title Dist binary memory leak in tcp_inet in Erlang OTP 24+ Dist binary memory leak in tcp_inet in Erlang OTP 24 and 25 Nov 7, 2023
@IngelaAndin IngelaAndin added the team:PS Assigned to OTP team PS label Nov 7, 2023
@nickva
Copy link
Contributor Author

nickva commented Nov 27, 2023

Since this issue seems to happen on 24 and 25, but not on 23 and 26, and it seemingly has to do with the dist channel, I wonder if it's related to #6947 and the pid references table which were removed in 26.0: #6073.

@RaimoNiskanen
Copy link
Contributor

RaimoNiskanen commented Nov 28, 2023

Sure sounds reasonable, I have just started to have a look...
Edit: @rickard-green has started to look...

@rickard-green
Copy link
Contributor

@nickva Thanks, it was related to the removed pid-ref table!

PR #7915 should fix this issue. It has not been very well tested yet, but I'm quite sure it is correct. The PR is based on OTP 24.3.4 and should merge forward cleanly. Unless something unexpected happen it should soon be released in patches.

@rickard-green rickard-green removed the team:PS Assigned to OTP team PS label Nov 29, 2023
@nickva
Copy link
Contributor Author

nickva commented Nov 29, 2023

That's perfect. Thanks for taking a look and fixing it, Rickard!

@rickard-green
Copy link
Contributor

When writing test cases I found that a crash could occur, so I've added a fix for that as well in #7915. The test cases made it impossible to have one branch for all releases, so the branch in #7915 is now based on OTP 26.1.2. There are a rickard/frag-unaliased-leak/24/OTP-18885 branch based on OTP 24.3.4 (which does not merge cleanly to the top of maint-24) and a rickard/frag-unaliased-leak/25/OTP-18885 branch based on OTP 25.3.2 in my github repo which will be used for patches on those releases.

@jaydoane
Copy link

rickard/frag-unaliased-leak/25/OTP-18885 branch based on OTP 25.3.2 in my github repo which will be used for patches on those releases.

Thank you for the fixes, Rickard!

In order to test them ASAP, I patched OTP-25.3.2.7 with rickard/frag-unaliased-leak/25/OTP-18885:

❯ git log
commit d3c9eeb17042e5567337e1f87c924b25186daae0 (HEAD -> OTP-25.3.2.7-frag-unaliased-leak, tag: OTP-25.3.2.7, origin/maint-25, maint-25)
Author: Erlang/OTP <otp@erlang.org>
Date:   Thu Oct 12 09:53:34 2023 +0200

    Updated OTP version

❯ git cherry-pick d154a389cada12bed67e35c4875965d6a8933d2c

Auto-merging erts/emulator/beam/dist.c
Auto-merging erts/emulator/beam/erl_proc_sig_queue.c
Auto-merging erts/emulator/test/process_SUITE.erl
CONFLICT (content): Merge conflict in erts/emulator/test/process_SUITE.erl

error: could not apply d154a389ca... [erts] Fix memory leak of fragmented message to unaliased process

Fortunately, it seems like a simple conflict resolution, and the subsequent git cherry-pick d94af94dde8331fb5352a5a2dd0b7e62824e92ae applies cleanly after that.

Also, it looks like everything builds successfully after the patches are applied, so I think we can move forward with our testing 🎉

@rickard-green
Copy link
Contributor

This issue has been fixed in OTP 26.2, OTP 25.2.3.8, and OTP 24.3.4.15

@nickva
Copy link
Contributor Author

nickva commented Dec 19, 2023

Thank you, Rickard

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Issue is reported as a bug team:VM Assigned to OTP team VM
Projects
None yet
Development

No branches or pull requests

6 participants