Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Excessive memory usage after upgrading to Sympa 6.2.62 #1170

Closed
drkspc opened this issue May 11, 2021 · 56 comments · Fixed by #1172
Closed

Excessive memory usage after upgrading to Sympa 6.2.62 #1170

drkspc opened this issue May 11, 2021 · 56 comments · Fixed by #1172
Assignees

Comments

@drkspc
Copy link

drkspc commented May 11, 2021

After upgrading to Sympa 6.2.62 we observed a drastic increase in memory usage by Sympa processes, in particular (but not limited to) task_manager.pl and sympa_msg.pl. On our production server memory usage increased by roughly 1GB per minute, resulting in things crashing after some time. On a test system (no postfix running, so no incoming mail traffic) I can observe this behaviour, too, obviously limited to task_manager.pl, though.

Version

Sympa 6.2.62, Perl 5.32.1, Jails on FreeBSD 12.2

Installation method

FreeBSD ports, packaged locally with poudriere

Expected behavior

Memory usage of Sympa processes stays reasoble, allowing the software to run over an extented period of time.

Actual behavior

Memory usage increases rapidly, crashing Sympa (and eventually other services) due to the server running out of memory.

Additional information

After some fiddling around I reverted commit 6371f6c and got back to a more usual behaviour. Memory usage still isn't great, but we can leave Sympa running again. I suspect the actual problem is not introduced by that commit itself, but just shows up much more clearly. Also, the named daemons would have grown very large after a long enough runtime (multiple weeks) with previous versions, too.

Sympa processes on our test system after running for about an hour (6371f6c applied):

... $ ps aux | grep sympa
sympa  86737  4.6  9.8 13072372 13074300  0  SJ   13:43   7:07.60 /usr/local/bin/perl /usr/local/libexec/sympa/task_manager.pl
root    7484  0.0  0.0    11296     2276  0  S+J  14:45   0:00.00 grep sympa
sympa  86703  0.0  0.1   161472   150136  0  SJ   13:43   0:02.00 /usr/local/bin/perl /usr/local/libexec/sympa/sympa_msg.pl
sympa  86707  0.0  0.1   112240    99264  0  SJ   13:43   0:02.35 /usr/local/bin/perl /usr/local/libexec/sympa/bulk.pl
sympa  86710  0.0  0.1    97168    83860  0  SJ   13:43   0:00.13 /usr/local/bin/perl /usr/local/libexec/sympa/archived.pl
sympa  86715  0.0  0.1    95964    82712  0  SJ   13:43   0:00.12 /usr/local/bin/perl /usr/local/libexec/sympa/bounced.pl

Sympa processes on our test system after running for about an hour (6371f6c reverted):

... $ ps aux | grep sympa
sympa   9264  1.1  0.2 262748 247796  0  SJ   14:50   2:27.85 /usr/local/bin/perl /usr/local/libexec/sympa/task_manager.pl
sympa   9209  0.0  0.1  96804  83636  0  SJ   14:50   0:00.33 /usr/local/bin/perl /usr/local/libexec/sympa/sympa_msg.pl
sympa   9231  0.0  0.1  95728  82256  0  SJ   14:50   0:02.28 /usr/local/bin/perl /usr/local/libexec/sympa/bulk.pl
sympa   9236  0.0  0.1  97172  83996  0  SJ   14:50   0:00.13 /usr/local/bin/perl /usr/local/libexec/sympa/archived.pl
sympa   9239  0.0  0.1  96264  82860  0  SJ   14:50   0:00.13 /usr/local/bin/perl /usr/local/libexec/sympa/bounced.pl
root   30204  0.0  0.0  11296   2276  0  S+J  15:54   0:00.00 grep sympa

The test system is set up identically to the production system and runs on copies of the production database and list home directory.

Please let me know if you need any other information.

@ikedas
Copy link
Member

ikedas commented May 11, 2021

Hi @drkspc ,

From what version you updated Sympa?

@drkspc
Copy link
Author

drkspc commented May 11, 2021

Hi @ikedas ,

we upgraded from 6.2.60.

@racke
Copy link
Contributor

racke commented May 11, 2021

That is really odd. How many lists/subscribers do you have? Do you use the list inclusions? Do you have merge feature enabled? Can you provide an excerpt of the task manager logs?

@drkspc
Copy link
Author

drkspc commented May 11, 2021

We have

  • around 3000 open lists
  • 458800 entries in the subscriber table
  • the largests lists have around 34000/35000 subscribers
  • around 70 lists fed by SQL queries (some of them have around 5000 subscribers, the 35000 one is also fed via SQL)
  • 172 list includes done in 45 lists
  • merge/personalization_feature allowed

I attached an anonymized excerpt of the task_manager log from the first run I mentioned above. I did not spot anything different over the rest of the log (the full hour makes up almost 9000 lines). If you have something particular to search for, I'll try to find and post that.
task_manager_excerpt1.log

The errors about badly formed mail addresses are correct in the sense that there are actually garbage entries in one the inclusion sources. While these were already present before the upgrade, I will try to have them fixed.

@ikedas
Copy link
Member

ikedas commented May 13, 2021

Hi @drkspc ,

Could you please apply this patch and check if the problem will be solved? Thanks.

@drkspc
Copy link
Author

drkspc commented May 13, 2021

Hey @ikedas ,

thanks for the patch! I will be back at work on monday and do some testing then.

@drkspc
Copy link
Author

drkspc commented May 17, 2021

@ikedas I applied your patch now and gave it a test run. Unfortunately, memory consumption doesn't look that much better yet:

... $ ps aux | grep sympa
sympa  13350  8.4  7.6 10221988 10200184  1  SJ   08:23   5:36.63 /usr/local/bin/perl /usr/local/libexec/sympa/task_manager.pl
sympa  13333  0.0  0.1   157768   146264  1  SJ   08:23   0:01.84 /usr/local/bin/perl /usr/local/libexec/sympa/sympa_msg.pl
sympa  13337  0.0  0.1    95556    82160  1  SJ   08:23   0:01.03 /usr/local/bin/perl /usr/local/libexec/sympa/bulk.pl
sympa  13340  0.0  0.1    97164    84020  1  SJ   08:23   0:00.11 /usr/local/bin/perl /usr/local/libexec/sympa/archived.pl
sympa  13347  0.0  0.1    96248    82884  1  SJ   08:23   0:00.11 /usr/local/bin/perl /usr/local/libexec/sympa/bounced.pl
root   31661  0.0  0.0    11148     2268  1  R+J  09:21   0:00.00 grep sympa

Here is another excerpt from the from task_manager log.

@ikedas
Copy link
Member

ikedas commented May 17, 2021

Maybe my patch wasn't related to the behavior you reported.

Can you determine which list(s) are consuming the memory (Or, each list is consuming the memoey) ?

@drkspc
Copy link
Author

drkspc commented May 17, 2021

Okay, I did some more testing now, still with your patch applied. What I did:

  • remove all includes from all lists so that the task_manager does not need to process any include
  • run Sympa (except for start/stop messages, the log is empty here)
  • stop task_manager.pl and run it manually with /usr/local/libexec/sympa/task_manager.pl --foreground --log_level=2
    This gives me (very detailed) info about regular tasks and after that
debug2 Sympa::List::get_robots() 
debug2 Sympa::List::get_lists(lists.my.domain, )

followed by

debug2 Sympa::List::_load_edit_list_conf(Sympa::List <***@lists.my.domain>)
debug2 Sympa::Robot::load_topics(lists.my.domain)

once for every list -- so like 3000 times. While these are running through, I can watch task_manager.pl blowing up in top. Does that help?

@ikedas
Copy link
Member

ikedas commented May 18, 2021

  • Is the memory consumed by particular list(s) or every lists?
  • What are the contents of edit_list.conf and topic.conf?

@drkspc
Copy link
Author

drkspc commented May 18, 2021

Here are our edit_list.conf and topics.conf (we only have a single list domain).

From what I can see, memory consumption is not bound to any particular list(s) but rather results from the number of lists. As soon as I purge some lists (closed ones still have an impact), task_manager's memory footprint goes down. For this, I purged around 500 lists in three iterations and after each purge run and the next call to get_lists(), memory consumption went down in relation to the number of lists deleted.

Loading edit_list.conf and topics.conf for every list seems very odd to me, as these are specific to robots and not to individual lists (correct me, if I am overlooking something). For how we use them, there is also not really a need to reload them, as we restart Sympa anyway after changing something (which hardly ever happens). I see that there has already been some discussion on that in #1090, though. Also, reloading these files to recognize changes done via wwsympa does seem like a good idea. The current behaviour would even be ok (not great, but ok), if allocated memory was freed after reloading the configuration the next time.

Any thougts/ideas?

@ikedas
Copy link
Member

ikedas commented May 18, 2021

N.B.: I suppose this issue is another instance of #584 .

@ikedas
Copy link
Member

ikedas commented May 18, 2021

From what I can see, memory consumption is not bound to any particular list(s) but rather results from the number of lists. As soon as I purge some lists (closed ones still have an impact), task_manager's memory footprint goes down. For this, I purged around 500 lists in three iterations and after each purge run and the next call to get_lists(), memory consumption went down in relation to the number of lists deleted.

I understand the problem, but I couldn't reproduce such problem by my Sympa installation with more than 1000 lists.

Loading edit_list.conf and topics.conf for every list seems very odd to me, as these are specific to robots and not to individual lists (correct me, if I am overlooking something). For how we use them, there is also not really a need to reload them, as we restart Sympa anyway after changing something (which hardly ever happens). I see that there has already been some discussion on that in #1090, though. Also, reloading these files to recognize changes done via wwsympa does seem like a good idea. The current behaviour would even be ok (not great, but ok), if allocated memory was freed after reloading the configuration the next time.

My patch changed edit_list.conf to be loaded only once, and topics.conf has been treated as such.

To be sure, could you please try undoing the customization of topics.conf?

Any thougts/ideas?

@racke
Copy link
Contributor

racke commented May 18, 2021

IMHO we need to profile the task manager to get to the bottom of the problem.

Suggestions:

  • Use Devel::Cycle to detect circular references which are causing memory leaks
  • Use Devel::Gladiator or related modules inside the infinite loop $spindle->spin to determine memory usage
  • Use Devel::NYTProf which doesn't profile memory but is great tool to determine hotspots in the source code

Going to give it a try when I have some spare time.

@drkspc
Copy link
Author

drkspc commented May 18, 2021

For reference, do you also get

debug2 Sympa::List::_load_edit_list_conf(Sympa::List <somelist@lists.my.domain>)
debug2 Sympa::Robot::load_topics(lists.my.domain)

for each of your lists? And how much memory does your task_manager need after running get_lists() for the first time (a minute after startup or so)? I am at ~1.3 GB with now ~2600 lists.

I moved both edit_list.conf and topics.conf out of place so that the defaults should be used, but that did not change anything. To be sure, my /usr/local/libexec/sympa/Sympa/List.pm.

@ikedas ikedas removed this from the 6.2.64 milestone May 21, 2021
@ghalse
Copy link

ghalse commented May 24, 2021

I just wanted to mention that we see similar behaviour with Sympa 6.2.62 on FreeBSD 12.2.

In our case, sympa_msg.pl is consuming memory and memory use ramps up over a period of several hours or days before sympa_msg.pl eventually dies. It seems that the eventual (and currently inevitable) demise of sympa_msg.pl is related to the scheduler trying to swap it out. This is new behaviour that didn't exist in 6.2.60.

However, unlike what's here, our installation is very small -- we only have about 50 lists, and very low traffic.

@ikedas
Copy link
Member

ikedas commented May 24, 2021

Hi @ghalse,

  • What is the installation method?
  • Is the memory consumed by particular list(s) or every list?
  • When the consumption of memory increases? Can you find any special events in the log?

@ghalse
Copy link

ghalse commented May 24, 2021

  • What is the installation method?

FreeBSD pkg

  • Is the memory consumed by particular list(s) or every list?

I'm not sure how to tell that.

  • When the consumption of memory increases? Can you find any special events in the log?

In the sympa logs see a regular repeatition of this pattern of tasks on an otherwise idle Sympa:

May 23 04:03:53 lists task_manager[729]: notice Sympa::Spindle::ProcessTask::do_next() --> new task Sympa::Task <date=1621739033;label=INIT;model=sync_include;context=list-re-10@example.ac.za>
May 23 04:03:53 lists task_manager[729]: notice Sympa::Spindle::ProcessTask::_execute() The task Sympa::Task <date=1621735423;label=INIT;model=sync_include;context=list-re-10@example.ac.za> is now useless. Removing it
May 23 05:04:02 lists task_manager[729]: notice Sympa::Spindle::ProcessTask::_execute() Running task Sympa::Task <date=1621739033;label=INIT;model=sync_include;context=list-re-10@example.ac.za>
May 23 05:04:02 lists task_manager[729]: notice Sympa::Spindle::ProcessRequest::_twist() Processing Sympa::Request <action=include;context=list-re-10@example.ac.za;role=member>
May 23 05:04:02 lists task_manager[729]: info Sympa::Request::Handler::include::_twist() Sympa::DataSource::List <context=list-re-10@example.ac.za;id=2421b4a5;role=member;name=Gauteng>: 0 included, 0 deleted, 1 updated, 0 kept
May 23 05:04:02 lists task_manager[729]: info Sympa::Request::Handler::include::_twist() Sympa::Request <action=include;context=list-re-10@example.ac.za;role=member>: Success, 0 added, 0 deleted, 1 updated
May 23 05:04:02 lists task_manager[729]: notice Sympa::Spindle::ProcessTask::do_next() line 5 of sync_include: next(1621742642, INIT)
May 23 05:04:02 lists task_manager[729]: notice Sympa::Spool::store() Sympa::Task <date=1621742642;label=INIT;model=sync_include;context=list-re-10@example.ac.za> is stored into Sympa::Spool::Task as <1621742642.INIT.sync_include.list-re
-10@example.ac.za>
May 23 05:04:02 lists task_manager[729]: notice Sympa::Spindle::ProcessTask::do_next() --> new task Sympa::Task <date=1621742642;label=INIT;model=sync_include;context=list-re-10@example.ac.za>
May 23 05:04:02 lists task_manager[729]: notice Sympa::Spindle::ProcessTask::_execute() The task Sympa::Task <date=1621739033;label=INIT;model=sync_include;context=list-re-10@example.ac.za> is now useless. Removing it

What's particularly interesting about the above is that the list in question was part of a set of lists created some time ago to test list inclusion, and only has a handful of subscribers. However, it always shows the the source as 1 updated when there have been absolutely no changes to that source list in about six months (and indeed, the source list only has a single subscriber).

@racke
Copy link
Contributor

racke commented May 24, 2021

So the problem was probably introduced by some changes between 6.2.60 and 6.2.62 and seems to be related to list inclusions.

@drkspc @ghalse Can you please provide more details about inclusion type and configuration?

@ikedas
Copy link
Member

ikedas commented May 24, 2021

@ghalse , on your environment, sympa_msg is consuming the memory. Are there any special events in the log about sympa_msg, when it is increasing memory consumption?

@drkspc
Copy link
Author

drkspc commented May 25, 2021

@racke I did my last tests with all list inclusions removed and I did not notice a difference, so I fear this is not related...

@drkspc
Copy link
Author

drkspc commented May 25, 2021

@ikedas @racke I now managed to tame the memory bumps down to around 15 to 20 MB (instead of 1GB+) by applying
this patch. This seems to work on vanilla 6.2.62 as well as on top of @ikedas suggested patch. After the first call to get_lists() I am at around 1.3GB for the task_manager (unpatched 6.2.62), or around 190MB (with patches), respectively. Thereby, I can now also confirm, that your patch does what you intended (@ikedas ).

Memory still piles up over time, though -- just not as fast as before.

@ikedas
Copy link
Member

ikedas commented May 25, 2021

@drkspc , if possible, can you attach context-style diff?

@drkspc
Copy link
Author

drkspc commented May 27, 2021

@ikedas This seems to solve the issue on my idle system. By "idle" I mean that there is also nothing logged. As soon as I enable debug logging (--log_level=2), memory bumps (15-20 MB) come back and memory piles up again. I will run a couple more tests with more activity on the system (i.e. letting list inludes run and thus more non-debug logging output) and report back how things look.

@ikedas
Copy link
Member

ikedas commented May 28, 2021

With my environment:

  • CentOS 7
  • sympa 6.2.62 (RPM) without any patches
  • Perl 5.16.3 & Clone 0.34.
  • Approximately 1000 lists, some are including from data sources.

There are some (not more than 100 Megabytes) ups and downs in memory usage (i.e. VSZ & RSS), but it is generally stable.

$ ps auwwx| grep sympa
sympa      906  0.3  3.3 479948 137176 ?       S    May27   3:56 /usr/bin/perl /usr/sbin/task_manager.pl
ikedas   10994  0.0  0.0 112828   960 pts/3    S+   09:41   0:00 /usr/bin/grep sympa
sympa    20670  0.4  0.7 412740 31780 ?        S    May22  34:15 /usr/bin/perl /usr/sbin/sympa_msg.pl
sympa    20685  0.0  0.9 289716 36468 ?        S    May22   0:27 /usr/bin/perl /usr/sbin/bounced.pl
sympa    20686  0.0  1.0 289084 41968 ?        S    May22   2:41 /usr/bin/perl /usr/sbin/bulk.pl
sympa    20689  0.0  1.0 395020 40872 ?        S    May22   0:26 /usr/bin/perl /usr/sbin/archived.pl
USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND

Also, I tried running task_manager with Perl 5.32.1 and Clone 0.45. However, memory usage did not change much.

@racke
Copy link
Contributor

racke commented May 28, 2021

I don't see this memory problem either in my test VMs. Maybe it is a specific configuration option that causes the memory leak.

@ghalse
Copy link

ghalse commented May 28, 2021

@ikedas I think I can concur with @drkspc. Over the 18 hours or so I've been running that patch (with log_level back to default) I've not seen any significant growth in the process sizes. I'm now sitting at ~ 100MB as well.

I do see some growth with a log_level set to >=2, but it's slower than before.

@ikedas
Copy link
Member

ikedas commented May 28, 2021

@ghalse Thanks for information. Suspicious thing is that the problem seems only occurring in the environment of FreeBSD people: No problems have been reported by Linux people so far.
Please continue to let us know if you have any information.

@drkspc
Copy link
Author

drkspc commented May 28, 2021

I did some more tests with different log levels (with list includes). With no logging at all (I changed the levels in Log.pm) there was barely a difference after running for an about an hour. With normal logging the growth is a bit larger, but still not a problem. Then, with debug logging there are very noticeable bumps (15-20MB, comparable to what I mentioned in my previous comment).

To me, this seems to be somewhat unrelated to the changes introduced by Sympa 6.2.62. Also, the problem of growing processes existed before (I'm fine as long as nothing else runs out of memory, so I mostly ignored it >.>).

Here are some values on our production server with the new patch:

... $ ps aux | grep sympa | sort -rk 6
sympa   63289  0.8  0.5 637620 625404  -  SJ   12:05   0:24.67 /usr/local/bin/perl /usr/local/libexec/sympa/wwsympa.fcgi
sympa   81254  8.4  0.5 632472 622320  2  SJ   09:45   6:53.43 /usr/local/bin/perl /usr/local/libexec/sympa/sympa_msg.pl
sympa   81276  3.9  0.3 476888 462340  2  SJ   09:45   7:26.33 /usr/local/bin/perl /usr/local/libexec/sympa/task_manager.pl
sympa   81266  0.0  0.1 143508 125892  2  SJ   09:45   0:15.58 /usr/local/bin/perl /usr/local/libexec/sympa/archived.pl
sympa   81257  0.0  0.1 114924 103316  2  SJ   09:45   0:33.00 /usr/local/bin/perl /usr/local/libexec/sympa/bulk.pl
sympa   83837  0.0  0.1 111884 100276  2  SJ   09:49   0:31.66 /usr/local/bin/perl /usr/local/libexec/sympa/bulk.pl
sympa   83855  0.0  0.1 111876 100232  2  SJ   09:49   0:32.26 /usr/local/bin/perl /usr/local/libexec/sympa/bulk.pl
sympa   83805  0.0  0.1 111324  99704  2  SJ   09:49   0:33.48 /usr/local/bin/perl /usr/local/libexec/sympa/bulk.pl
sympa   81269  0.0  0.1 100116  88916  2  SJ   09:45   0:02.20 /usr/local/bin/perl /usr/local/libexec/sympa/bounced.pl
root    92656  0.0  0.0  11296   2576  2  S+J  12:56   0:00.00 grep sympa

wwsympa is large, but will be replaced with a new process by apache after serving 300 requests. task_manager started at 260 MB, sympa_msg at around 200 MB. The log file for that time makes up only 8.9 MB, though, so I guess there is still some growth going on elsewhere, too. I am not entirely sure what to think about that, yet.

@drkspc
Copy link
Author

drkspc commented May 28, 2021

@ikedas @racke I just spun up a fresh CentOS 7, installed Sympa, copied my list home/arc/spool/etc over from the test server and got Sympa up and running there. List inclusions are done, logging is normal, otherwise nothing happening (same as my test setup on FreeBSD). Memory usage didn't increase for about an hour now (I'll just leave it running until next week). I'd support @ikedas ' suspicion and say that there is either something odd with @ghalse 's and my setup or there is generally something wrong on FreeBSD.

@alvar-freude
Copy link

alvar-freude commented May 28, 2021

I can confirm the memory leak on FreeBSD (12.2).
My Server has only a few low frequency lists, but task_manager.pl grows to 2 GB after one day.

I use spawn-fcgi for starting wwsympa.fcgi (according to the documentation, but with TCP instead of unix domain sockets) and with every request on /sympa/, memory consumption grows about 2 MB. It grows even when I call /sympa/ unauthenticated.

I used Devel::Gladiator for some debugging; copying differences between runs (like in synopsis of Devel::Gladiator) is not a good idea, because the copies itself increase memory usage; but counting numbers shows, that the leak doesn't seem to be in Perl:

    use Devel::Gladiator qw(walk_arena);
    use List::Util qw(sum);

    my $count = 0;
    my $sum = sum map { $count++; length; } @{walk_arena()};
    $log->syslog("notice", "Number of keys before run $loop_count: $count; Sum Length: $sum" );

Result (unauthenticated):

May 28 22:29:05 ml wwsympa-debug[4472]: notice main:: Number of keys before run 2: 456450; Sum Length: 8384644 
May 28 22:29:07 ml wwsympa-debug[4472]: notice main:: Number of keys before run 3: 456446; Sum Length: 8384568 
May 28 22:29:09 ml wwsympa-debug[4472]: notice main:: Number of keys before run 4: 456445; Sum Length: 8384549 
May 28 22:29:10 ml wwsympa-debug[4472]: notice main:: Number of keys before run 5: 456445; Sum Length: 8384549 
May 28 22:29:12 ml wwsympa-debug[4472]: notice main:: Number of keys before run 6: 456445; Sum Length: 8384549 
May 28 22:29:14 ml wwsympa-debug[4472]: notice main:: Number of keys before run 7: 456445; Sum Length: 8384549 

Maybe spawn-fcgi is the origin for the leak in wwsympa.fcgi, but top shows growing size for wwsympa.fcgi.

@alvar-freude
Copy link

When using Apache to spawn wwsympa.fcgi, the size of wwsympa.fcgi still grows per request about 2 MB. Hmmm. Maybe I made a mistake in the usage of Devel::Gladiator?

@alvar-freude
Copy link

Memory Consumption of task_manager.pl and sympa_msg.pl even grows, when there is no traffic. Two days later:

  PID USERNAME    PRI NICE   SIZE    RES STATE    C   TIME    WCPU COMMAND
30276 sympa        25    0  9776M  9738M nanslp   6  11:03   0.00% /usr/local/bin/perl /usr/local/libexec/sympa/task_manager.pl
22047 sympa        20    0  5571M  5555M nanslp   1   7:03   0.00% /usr/local/bin/perl /usr/local/libexec/sympa/sympa_msg.pl
34828 sympa        23    0   140M   127M accept  14   0:03   0.00% /usr/local/bin/perl /usr/local/libexec/sympa/wwsympa.fcgi

Since this error does not occur with CentOS, perhaps it is due to a newer CPAN library than typical Linux distros?

@racke
Copy link
Contributor

racke commented May 31, 2021

Yes that is of course possible.

@alvar-freude
Copy link

alvar-freude commented Jun 1, 2021

Since this error does not occur with CentOS, perhaps it is due to a newer CPAN library than typical Linux distros?

Update on this:

I upgraded my Sympa jail and this Perl Libs (new version on the right side):

p5-Crypt-OpenSSL-X509-1.902        <   needs updating (remote has 1.908)
p5-Digest-HMAC-1.03_1              <   needs updating (remote has 1.04)
p5-Exception-Class-1.44            <   needs updating (remote has 1.45)
p5-File-ReadBackwards-1.05_1       <   needs updating (remote has 1.06)
p5-HTML-Parser-3.75                <   needs updating (remote has 3.76)
p5-IO-Socket-SSL-2.070             <   needs updating (remote has 2.071)
p5-MIME-Types-2.18                 <   needs updating (remote has 2.20)
p5-Net-CIDR-0.20                   <   needs updating (remote has 0.21)
p5-Net-SSLeay-1.88                 <   needs updating (remote has 1.90)
p5-XML-LibXML-2.0206,1             <   needs updating (remote has 2.0207,1)

In the CHANGES of the CPAN modules above, nothing sounds like a memory leak, and as expected there is no difference in memory usage.

I removed the sleep 60 in task_manager.pl: with this, it is easy to reproduce the memory leak, memory consumption grows fast.

And I used Level::NYTProf to look into hotspots. Archive with the result is attached. nytprof-sympa-taskmanager.zip

@alvar-freude
Copy link

I also tried the patch above via #1172. No change in memory consumption. task_manager without sleep needs about 23 Seconds to grow to 1 GB in my Test, with and without patch.

@ghalse
Copy link

ghalse commented Jun 2, 2021

So if it is any help, these are the FreeBSD Perl packages that updated at the same time as Sympa on our system:

p5-CGI-4.52
p5-DBD-CSV-0.58,1
p5-DateTime-Locale-1.32
p5-Net-DNS-1.31
p5-libwww-6.54

There were some that updated two weeks earlier too, but I don't recall experiencing the problem at that stage.

@ikedas
Copy link
Member

ikedas commented Jun 2, 2021

Thanks for information. What are the versions of following things, both before and after upgrading Sympa?

  • FreeBSD
  • Perl (the package such as perl5)

@ghalse
Copy link

ghalse commented Jun 2, 2021

  • FreeBSD

12.2-RELEASE-p6

  • Perl (the package such as perl5)

perl5-5.32.1_1 (updated to that in early April, so prior to these problems).

@alvar-freude
Copy link

What are the versions of following things, both before and after upgrading Sympa?

My Installation is a fresh installation, using config and lists from an old Sympa (running since at least 2013, latest installed version 6.2.52) on another server.

  • FreeBSD

uname -a:
FreeBSD ml 12.2-RELEASE-p6 FreeBSD 12.2-RELEASE-p6 GENERIC amd64

  • Perl (the package such as perl5)

perl5-5.32.1_1

perl -v:
This is perl 5, version 32, subversion 1 (v5.32.1) built for amd64-freebsd-thread-multi

Complete package list of my Sympa Jail (Sympa, Exim, Apache) and output of perl -V is here attached:
pkg-list--sympa-jail.txt

@alvar-freude
Copy link

Good news, but it does not help to find the reason:

I updated my Server and the Sympa jail to FreeBSD 13.0.

After reinstalling all packages (same version of all packages, but new FreeBSD-13-packages, compiled locally with poudriere), memory consumption is stable.

When running a modified task manager with removed "sleep 60" (as mentioned above: #1170 (comment)) /usr/local/libexec/sympa/task_manager-debug.pl -F the memory usage is stable at 121 MB.
Memory usage of wwsympa.fcgi stays stable at 141 MB.

This is very strange.

Running the jail with FreeBSD 12.2 binaries, the process site of the task manager was growing; maybe a little bit slower as before.

@ikedas
Copy link
Member

ikedas commented Jun 13, 2021

I'm Just guessing...

This bug in FreeBSD 12.2-RELEASE seems to be somehow related to our problem.

It was reproduced with Perl not earlier than 5.26:

Just an idea, if we set "lang en" in Sympa (in sympa.conf/robot.conf and config of each list), would it stop the problem? I'm not sure if it would do, though.

@alvar-freude
Copy link

Oh! Sounds very reasonable; that would also explain why I haven't found anything with Devel::Gladiator.

However, I wonder why this was not more noticeable. Possibly there are a lot of uselocale-libc-calls in the combination of Perl and Sympa that cause this. This could then possibly also be a performance issue?

@ikedas
Copy link
Member

ikedas commented Jun 13, 2021

Oh! Sounds very reasonable; that would also explain why I haven't found anything with Devel::Gladiator.

However, I wonder why this was not more noticeable. Possibly there are a lot of uselocale-libc-calls in the combination of Perl and Sympa that cause this. This could then possibly also be a performance issue?

I don't know why. But I think this is worth trying:

  • Set lang parameter in sympa.conf, robot.conf and list config to be en.
  • Set environment variable LC_ALL=C.
    And then start the services of Sympa.

@ikedas
Copy link
Member

ikedas commented Jun 22, 2021

However, I wonder why this was not more noticeable. Possibly there are a lot of uselocale-libc-calls in the combination of Perl and Sympa that cause this. This could then possibly also be a performance issue?

I found that Sympa/Language.pm uses POSIX::setlocale frequently.

@drkspc
Copy link
Author

drkspc commented Jun 23, 2021

I don't know why. But I think this is worth trying:

  • Set lang parameter in sympa.conf, robot.conf and list config to be en.
  • Set environment variable LC_ALL=C.
    And then start the services of Sympa.

I tried different combinations of setting the lang parameter in sympa.conf, robot.conf and list config files and setting LC_ALL (I put that into Sympa's rc script). The size of the task_manager varied in how much memory was allocated right after startup (either ~260MB or ~1.5GB) and how fast it grew. The suggested combination (as well as not specifying anything at all) looked comparably well, but --unfortunately -- still did not keep a stable memory footprint.

If you have any other suggestions on what to try/test, I'll try do that. Otherwise my plan is to upgrade to FreeBSD 13 as well.

@ikedas
Copy link
Member

ikedas commented Jun 23, 2021

I don't know why. But I think this is worth trying:

  • Set lang parameter in sympa.conf, robot.conf and list config to be en.
  • Set environment variable LC_ALL=C.
    And then start the services of Sympa.

I tried different combinations of setting the lang parameter in sympa.conf, robot.conf and list config files and setting LC_ALL (I put that into Sympa's rc script). The size of the task_manager varied in how much memory was allocated right after startup (either ~260MB or ~1.5GB) and how fast it grew. The suggested combination (as well as not specifying anything at all) looked comparably well, but --unfortunately -- still did not keep a stable memory footprint.

If you have any other suggestions on what to try/test, I'll try do that. Otherwise my plan is to upgrade to FreeBSD 13 as well.

Thanks for confirming!

If setlocale on FreeBSD 12.2 (or so) is the cause of the memory leak, this issue is considered not problem of Sympa, then I think we may recommend upgrading FreeBSD and may close this issue.

@ikedas
Copy link
Member

ikedas commented Jun 23, 2021

FYI

This user also experiences growing memory usage with

  • FreeBSD 12.1
  • Sympa 6.2.56 or 6.2.60

@drkspc
Copy link
Author

drkspc commented Jun 24, 2021

FYI

This user also experiences growing memory usage with

* FreeBSD 12.1

* Sympa 6.2.56 or 6.2.60

That also supports that the problems weren't actually introduced in Sympa 6.2.62, but just became much more obvious. I wonder about the issues on SLES, though, but that might be something different.

From my side, feel free to close this issue for now. There seems to be a way around with FreeBSD 13 and eventually 12.3 (if and whenever released).
Thank you (all) for looking into this!

@ikedas
Copy link
Member

ikedas commented Jun 30, 2021

Okey, I'll close this issue for now. Thank you all for information!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants