Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory growth in Channel usage #2803

Open
drclaw1394 opened this issue Mar 30, 2019 · 7 comments
Open

Memory growth in Channel usage #2803

drclaw1394 opened this issue Mar 30, 2019 · 7 comments
Labels
exotictest leak tests needed Issue is generally resolved but tests were not written yet

Comments

@drclaw1394
Copy link

The Problem

This is related to my SO question https://stackoverflow.com/questions/55352666/huge-memory-usage-with-circular-pipeline, regarding rapidly growing memory usage for a circular pipleline.

My 'circular' pipeline/queue reacts to a message from a Channel and places a new message onto the same Channel. The process is boostrapped by sending a single message onto the Channel before the react block.

The problem is memory usage is not stable and continues to grow quickly.

Expected Behavior

There is only ever one message in the Channel at any one time. As such buffer of messages should not grow, and the memory usage of the program should be stable.

Actual Behavior

Memory grows quickly and does not stabilise. Lots of allocations and garbage collection activity.

Steps to Reproduce

  1. Run the following code:
my $channel=Channel.new;    #create a new channel
$channel.send(0);           #kickstart the circular pipeline
react {
    whenever $channel {
        say $_;
        $channel.send($_ + 1); #send back to same pipeline
        #sit back and watch the memory usage grow
    }
}
  1. Monitor memory usage of Moar via Top command

Environment

  • Operating system: macOS 10.14.3
  • Compiler version (perl6 -v): Rakudo version 2019.03 built on MoarVM version 2019.03
@Kaiepi
Copy link
Contributor

Kaiepi commented Mar 30, 2019

I ran ktrace on this code on OpenBSD and there are a lot of mquery syscalls failing with EINVAL, which according to mquery(2):

     [EINVAL]           MAP_FIXED was specified and the requested memory area
                        is unavailable.

MAP_FIXED being an mmap flag, which according to mmap(2):

           MAP_FIXED      Demand that the mapping is placed at addr, rather
                          than having the system select a location.  addr, len
                          and offset (in the case of fd mappings) must be
                          multiples of the page size.  Existing mappings in
                          the address range will be replaced.  Use of this
                          option is discouraged.

This leak also happens with Supplier.

For the record, you can get more accurate memory usage results using the Telemetry module:

use Telemetry;

my Channel   $channel .= new;
my atomicint $count    = 0;

react {
    whenever $channel {
        $channel.send: $_ + 1;
        say T<max-rss> / 1024 ~ 'MB max-rss' if ++$count %% 5000;
    }
    LEAVE $channel.send: 0;
}

@jnthn
Copy link
Member

jnthn commented Apr 5, 2019

Starting to investigate this. This construct does not leak:

for $channel.Seq {
    say $_;
    $channel.send($_ + 1);
}

Which means it isn't something to do with the underlying blocking queue. By contrast, this does leak:

$channel.Supply.tap: {
    say $_;
    $channel.send($_ + 1); #send back to same pipeline
}
sleep;

Meaning the react/whenever construct's implementation isn't to blame.

@lizmat
Copy link
Contributor

lizmat commented Apr 5, 2019

for $channel.Seq {
    say $_;
    $channel.send($_ + 1);
}

Maybe this code doesn't leak because it's not doing anything? I don't see any output of it, and a run with -Msnapper shows its not doing any CPU either (except for the .15% for running the snapper I guess):

Telemetry Report of Process #73156 (2019-04-05T13:36:04Z)
Number of Snapshots: 9
Initial/Final Size: 61472 / 85328 Kbytes
Total Time: 0.76 seconds
Total CPU Usage: 0.11 seconds
Supervisor thread ran the whole time

wallclock  util%  max-rss
   100717  12.48    23704
   100162   0.05       20
   104154   0.10       16
   104579   0.16       20
   100699   0.19       24
   103011   0.15       20
   104362   0.18       20
    43734   0.54       32
--------- ------ --------
   761418   1.80    23856

@jnthn
Copy link
Member

jnthn commented Apr 5, 2019

@lizmat Only if you forget to include the initial send, I think? :)

Looking further into it: the queue in Lock::Async::Holder seems to become ever longer, keeping things alive. Now to figure out exactly why.

@jnthn
Copy link
Member

jnthn commented Apr 5, 2019

Ah, I shoulda spotted this much sooner. Here's how Channel.Supply is implemented:

    multi method Supply(Channel:D:) {
        supply {
            # Tap the async notification for new values supply.
            whenever $!async-notify.unsanitized-supply.schedule-on($*SCHEDULER) {
                my Mu \got = self.poll;
                if nqp::eqaddr(got, Nil) {
                    if $!closed_promise {
                        $!closed_promise.status == Kept
                            ?? done()
                            !! die $!closed_promise.cause
                    }
                }
                else {
                    emit got;
                }
            }

            # Grab anything that's in the channel and emit it. Note that
            # it's important to do this after tapping the supply, or a
            # value sent between us draining it and doing the tap would
            # not result in a notification, and so we'd not emit it on
            # the supply. This lost event can then cause a deadlock.
            loop {
                my Mu \got = self.poll;
                last if nqp::eqaddr(got, Nil);
                emit got;
            }
            self!peek();
            if $!closed_promise {
                $!closed_promise.status == Kept
                    ?? done()
                    !! die $!closed_promise.cause
            }
        }
    }

It correctly avoids the potential race data loss bug, as the comments note, however that leads to a leak in the situation in this bug report. Each time around the loop we end up with something being added to the channel, and never relinquish control. However, we also get an async notification for each item that is set, having subscribed to them. Those notifications can't be handled right away since the supply block is busy running its initialization logic. Thus, they pile up in the scheduler, and that's what creates the runaway memory use.

So the ingredients are Channel.Supply being used, and the thing that tapped it always sending something else on the Channel synchronously with the handling.

jnthn added a commit that referenced this issue Apr 5, 2019
If the code that `tap`s the Supply then does a `send` on the Channel,
then that will happen synchronously with the loop pulling from the
Channel. The result is that the loop never ends. However, to avoid
races in other situations, we subscribe to the async notifications of
new items in the Channel before starting that loop. Since we cannot
process any `whenever` events until setup logic is complete, this will
cause those events to queue up, causing memory growth.

This helps with #2803.
jnthn added a commit to MoarVM/MoarVM that referenced this issue Apr 5, 2019
We need it for backtraces and context introspection. Otherwise, it can
go. Preserving it can cause leaks involving taken closures: the taking
of the closure has an ->outer frame that in turn points to a ->caller
and keeps it alive. This was involved in the leak that was reported in
rakudo/rakudo#2803.
@jnthn
Copy link
Member

jnthn commented Apr 8, 2019

With the two fixes above, and having run it for around 5 minutes to observe the memory behavior, it seems that it now reaches a plateau and stays there. So, hopefully resolved.

@jnthn jnthn added exotictest tests needed Issue is generally resolved but tests were not written yet labels Apr 8, 2019
@JJ
Copy link
Collaborator

JJ commented Jul 26, 2020

Please include in tests Supply.unsanitized-supply, which is not tested here or in roast.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
exotictest leak tests needed Issue is generally resolved but tests were not written yet
Projects
None yet
Development

No branches or pull requests

6 participants