Memory growth in Channel usage #2803

drclaw1394 · 2019-03-30T13:28:19Z

The Problem

This is related to my SO question https://stackoverflow.com/questions/55352666/huge-memory-usage-with-circular-pipeline, regarding rapidly growing memory usage for a circular pipleline.

My 'circular' pipeline/queue reacts to a message from a Channel and places a new message onto the same Channel. The process is boostrapped by sending a single message onto the Channel before the react block.

The problem is memory usage is not stable and continues to grow quickly.

Expected Behavior

There is only ever one message in the Channel at any one time. As such buffer of messages should not grow, and the memory usage of the program should be stable.

Actual Behavior

Memory grows quickly and does not stabilise. Lots of allocations and garbage collection activity.

Steps to Reproduce

Run the following code:

my $channel=Channel.new;    #create a new channel
$channel.send(0);           #kickstart the circular pipeline
react {
    whenever $channel {
        say $_;
        $channel.send($_ + 1); #send back to same pipeline
        #sit back and watch the memory usage grow
    }
}

Monitor memory usage of Moar via Top command

Environment

Operating system: macOS 10.14.3
Compiler version (perl6 -v): Rakudo version 2019.03 built on MoarVM version 2019.03

The text was updated successfully, but these errors were encountered:

Kaiepi · 2019-03-30T19:10:31Z

I ran ktrace on this code on OpenBSD and there are a lot of mquery syscalls failing with EINVAL, which according to mquery(2):

     [EINVAL]           MAP_FIXED was specified and the requested memory area
                        is unavailable.

MAP_FIXED being an mmap flag, which according to mmap(2):

           MAP_FIXED      Demand that the mapping is placed at addr, rather
                          than having the system select a location.  addr, len
                          and offset (in the case of fd mappings) must be
                          multiples of the page size.  Existing mappings in
                          the address range will be replaced.  Use of this
                          option is discouraged.

This leak also happens with Supplier.

For the record, you can get more accurate memory usage results using the Telemetry module:

use Telemetry;

my Channel   $channel .= new;
my atomicint $count    = 0;

react {
    whenever $channel {
        $channel.send: $_ + 1;
        say T<max-rss> / 1024 ~ 'MB max-rss' if ++⚛$count %% 5000;
    }
    LEAVE $channel.send: 0;
}

jnthn · 2019-04-05T13:30:20Z

Starting to investigate this. This construct does not leak:

for $channel.Seq {
    say $_;
    $channel.send($_ + 1);
}

Which means it isn't something to do with the underlying blocking queue. By contrast, this does leak:

$channel.Supply.tap: {
    say $_;
    $channel.send($_ + 1); #send back to same pipeline
}
sleep;

Meaning the react/whenever construct's implementation isn't to blame.

lizmat · 2019-04-05T13:37:12Z

for $channel.Seq {
    say $_;
    $channel.send($_ + 1);
}

Maybe this code doesn't leak because it's not doing anything? I don't see any output of it, and a run with -Msnapper shows its not doing any CPU either (except for the .15% for running the snapper I guess):

Telemetry Report of Process #73156 (2019-04-05T13:36:04Z)
Number of Snapshots: 9
Initial/Final Size: 61472 / 85328 Kbytes
Total Time: 0.76 seconds
Total CPU Usage: 0.11 seconds
Supervisor thread ran the whole time

wallclock  util%  max-rss
   100717  12.48    23704
   100162   0.05       20
   104154   0.10       16
   104579   0.16       20
   100699   0.19       24
   103011   0.15       20
   104362   0.18       20
    43734   0.54       32
--------- ------ --------
   761418   1.80    23856

jnthn · 2019-04-05T15:03:51Z

@lizmat Only if you forget to include the initial send, I think? :)

Looking further into it: the queue in Lock::Async::Holder seems to become ever longer, keeping things alive. Now to figure out exactly why.

jnthn · 2019-04-05T15:45:54Z

Ah, I shoulda spotted this much sooner. Here's how Channel.Supply is implemented:

    multi method Supply(Channel:D:) {
        supply {
            # Tap the async notification for new values supply.
            whenever $!async-notify.unsanitized-supply.schedule-on($*SCHEDULER) {
                my Mu \got = self.poll;
                if nqp::eqaddr(got, Nil) {
                    if $!closed_promise {
                        $!closed_promise.status == Kept
                            ?? done()
                            !! die $!closed_promise.cause
                    }
                }
                else {
                    emit got;
                }
            }

            # Grab anything that's in the channel and emit it. Note that
            # it's important to do this after tapping the supply, or a
            # value sent between us draining it and doing the tap would
            # not result in a notification, and so we'd not emit it on
            # the supply. This lost event can then cause a deadlock.
            loop {
                my Mu \got = self.poll;
                last if nqp::eqaddr(got, Nil);
                emit got;
            }
            self!peek();
            if $!closed_promise {
                $!closed_promise.status == Kept
                    ?? done()
                    !! die $!closed_promise.cause
            }
        }
    }

It correctly avoids the potential race data loss bug, as the comments note, however that leads to a leak in the situation in this bug report. Each time around the loop we end up with something being added to the channel, and never relinquish control. However, we also get an async notification for each item that is set, having subscribed to them. Those notifications can't be handled right away since the supply block is busy running its initialization logic. Thus, they pile up in the scheduler, and that's what creates the runaway memory use.

So the ingredients are Channel.Supply being used, and the thing that tapped it always sending something else on the Channel synchronously with the handling.

If the code that `tap`s the Supply then does a `send` on the Channel, then that will happen synchronously with the loop pulling from the Channel. The result is that the loop never ends. However, to avoid races in other situations, we subscribe to the async notifications of new items in the Channel before starting that loop. Since we cannot process any `whenever` events until setup logic is complete, this will cause those events to queue up, causing memory growth. This helps with #2803.

We need it for backtraces and context introspection. Otherwise, it can go. Preserving it can cause leaks involving taken closures: the taking of the closure has an ->outer frame that in turn points to a ->caller and keeps it alive. This was involved in the leak that was reported in rakudo/rakudo#2803.

jnthn · 2019-04-08T10:09:10Z

With the two fixes above, and having run it for around 5 minutes to observe the memory behavior, it seems that it now reaches a plateau and stays there. So, hopefully resolved.

JJ · 2020-07-26T09:02:18Z

Please include in tests Supply.unsanitized-supply, which is not tested here or in roast.

AlexDaniel added the leak label Mar 30, 2019

jnthn added exotictest tests needed Issue is generally resolved but tests were not written yet labels Apr 8, 2019

JJ mentioned this issue Jul 26, 2020

Supplier.unsanitized-supply is undocumented Raku/doc#2786

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Memory growth in Channel usage #2803

Memory growth in Channel usage #2803

drclaw1394 commented Mar 30, 2019

Kaiepi commented Mar 30, 2019

jnthn commented Apr 5, 2019

lizmat commented Apr 5, 2019

jnthn commented Apr 5, 2019

jnthn commented Apr 5, 2019

jnthn commented Apr 8, 2019

JJ commented Jul 26, 2020

Memory growth in Channel usage #2803

Memory growth in Channel usage #2803

Comments

drclaw1394 commented Mar 30, 2019

The Problem

Expected Behavior

Actual Behavior

Steps to Reproduce

Environment

Kaiepi commented Mar 30, 2019

jnthn commented Apr 5, 2019

lizmat commented Apr 5, 2019

jnthn commented Apr 5, 2019

jnthn commented Apr 5, 2019

jnthn commented Apr 8, 2019

JJ commented Jul 26, 2020