Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

clamd using all memory and getting oom killed since 567064e #3761

Closed
ThinIce opened this issue Sep 17, 2020 · 33 comments
Closed

clamd using all memory and getting oom killed since 567064e #3761

ThinIce opened this issue Sep 17, 2020 · 33 comments

Comments

@ThinIce
Copy link

ThinIce commented Sep 17, 2020

Prior to placing the issue, please check following: (fill out each checkbox with an X once done)

  • [ x] I understand, that not following or deleting the below instructions, will result in immediate closing and deletion of my issue.
  • [ x] I have understood that answers are voluntary and community-driven, and not commercial support.
  • [ x] I have verified that my issue has not been already answered in the past. I also checked previous issues.

Description of the bug:

Since last update containing 567064e clamd has been using much more memory, to the extent that the server oom kills it. server has 4GB of ram and has been running without issue (regularly updated).

I'm unsure whether it is a particular message that triggers this or the clamd update process. After the instance logged below, it ran fine all day before then causing problems again mid evening. I've had to disable clamd for now in the config file.

Docker container logs of affected containers:

clamd-mailcow_1      | Wed Sep 16 03:08:28 2020 -> ClamAV update process started at Wed Sep 16 03:08:28 2020
clamd-mailcow_1      | Wed Sep 16 03:08:30 2020 -> daily.cvd database is up to date (version: 25930, sigs: 4317819, f-level: 63, builder: raynman)
clamd-mailcow_1      | Wed Sep 16 03:08:30 2020 -> main.cvd database is up to date (version: 59, sigs: 4564902, f-level: 60, builder: sigmgr)
clamd-mailcow_1      | Wed Sep 16 03:08:30 2020 -> bytecode.cvd database is up to date (version: 331, sigs: 94, f-level: 63, builder: anvilleg)
clamd-mailcow_1      | Wed Sep 16 03:46:46 2020 -> SelfCheck: Database status OK.
clamd-mailcow_1      | Wed Sep 16 04:11:40 2020 -> instream(local): OK
clamd-mailcow_1      | Wed Sep 16 04:33:26 2020 -> instream(172.22.1.13@46950): OK
clamd-mailcow_1      | Wed Sep 16 04:48:14 2020 -> SelfCheck: Database status OK.
clamd-mailcow_1      | receiving incremental file list
clamd-mailcow_1      | ./
clamd-mailcow_1      | blurl.ndb
clamd-mailcow_1      | jurlbl.ndb
clamd-mailcow_1      | phishtank.ndb
clamd-mailcow_1      | rogue.hdb
clamd-mailcow_1      | 
clamd-mailcow_1      | sent 23,180 bytes  received 350,104 bytes  746,568.00 bytes/sec
clamd-mailcow_1      | total size is 18,793,645  speedup is 50.35
clamd-mailcow_1      | RELOADING
clamd-mailcow_1      | Wed Sep 16 05:00:29 2020 -> Reading databases from /var/lib/clamav
clamd-mailcow_1      | Wed Sep 16 05:00:37 2020 -> instream(local): OK
clamd-mailcow_1      | Wed Sep 16 06:55:26 2020 -> ClamAV update process started at Wed Sep 16 06:55:26 2020
clamd-mailcow_1      | Wed Sep 16 06:55:26 2020 -> daily.cvd database is up to date (version: 25930, sigs: 4317819, f-level: 63, builder: raynman)
clamd-mailcow_1      | Wed Sep 16 06:55:26 2020 -> main.cvd database is up to date (version: 59, sigs: 4564902, f-level: 60, builder: sigmgr)
clamd-mailcow_1      | Wed Sep 16 06:55:26 2020 -> bytecode.cvd database is up to date (version: 331, sigs: 94, f-level: 63, builder: anvilleg)
clamd-mailcow_1      | /clamd.sh: line 97:    31 Killed                  nice -n10 clamd
clamd-mailcow_1      | /clamd.sh: line 98: kill: (31) - No such process
clamd-mailcow_1      | Worker 31 died, stopping container waiting for respawn...
clamd-mailcow_1      | Cleaning up tmp files...
clamd-mailcow_1      | Copying non-empty whitelist.ign2 to /var/lib/clamav/whitelist.ign2
clamd-mailcow_1      |   File: /var/lib/clamav/whitelist.ign2
clamd-mailcow_1      |   Size: 142       	Blocks: 8          IO Block: 4096   regular file
clamd-mailcow_1      | Device: 31h/49d	Inode: 2076528     Links: 1
clamd-mailcow_1      | Access: (0644/-rw-r--r--)  Uid: (  700/  clamav)   Gid: (  700/  clamav)
clamd-mailcow_1      | Access: 2020-09-15 16:35:09.048000000 +0000
clamd-mailcow_1      | Modify: 2020-09-16 06:56:19.268000000 +0000
clamd-mailcow_1      | Change: 2020-09-16 06:56:19.328000000 +0000
clamd-mailcow_1      |  Birth: -
clamd-mailcow_1      | dos2unix: converting file /var/lib/clamav/whitelist.ign2 to Unix format...
clamd-mailcow_1      | Running freshclam...
clamd-mailcow_1      | Wed Sep 16 06:56:19 2020 -> ClamAV update process started at Wed Sep 16 06:56:19 2020
clamd-mailcow_1      | Wed Sep 16 06:56:19 2020 -> daily.cvd database is up to date (version: 25930, sigs: 4317819, f-level: 63, builder: raynman)
clamd-mailcow_1      | Wed Sep 16 06:56:19 2020 -> main.cvd database is up to date (version: 59, sigs: 4564902, f-level: 60, builder: sigmgr)
clamd-mailcow_1      | Wed Sep 16 06:56:19 2020 -> bytecode.cvd database is up to date (version: 331, sigs: 94, f-level: 63, builder: anvilleg)
clamd-mailcow_1      | Wed Sep 16 06:58:09 2020 -> Limits: Global time limit set to 120000 milliseconds.
clamd-mailcow_1      | Wed Sep 16 06:58:09 2020 -> Limits: Global size limit set to 52428800 bytes.
clamd-mailcow_1      | Wed Sep 16 06:58:09 2020 -> Limits: File size limit set to 26214400 bytes.
clamd-mailcow_1      | Wed Sep 16 06:58:09 2020 -> Limits: Recursion level limit set to 5.
clamd-mailcow_1      | Wed Sep 16 06:58:09 2020 -> Limits: Files limit set to 200.
clamd-mailcow_1      | Wed Sep 16 06:58:09 2020 -> Limits: MaxEmbeddedPE limit set to 10485760 bytes.
clamd-mailcow_1      | Wed Sep 16 06:58:09 2020 -> Limits: MaxHTMLNormalize limit set to 10485760 bytes.
clamd-mailcow_1      | Wed Sep 16 06:58:09 2020 -> Limits: MaxHTMLNoTags limit set to 2097152 bytes.
clamd-mailcow_1      | Wed Sep 16 06:58:09 2020 -> Limits: MaxScriptNormalize limit set to 5242880 bytes.
clamd-mailcow_1      | Wed Sep 16 06:58:09 2020 -> Limits: MaxZipTypeRcg limit set to 1048576 bytes.
clamd-mailcow_1      | Wed Sep 16 06:58:09 2020 -> Limits: MaxPartitions limit set to 50.
clamd-mailcow_1      | Wed Sep 16 06:58:09 2020 -> Limits: MaxIconsPE limit set to 100.
clamd-mailcow_1      | Wed Sep 16 06:58:09 2020 -> Limits: MaxRecHWP3 limit set to 16.
clamd-mailcow_1      | Wed Sep 16 06:58:09 2020 -> Limits: PCREMatchLimit limit set to 100000.
clamd-mailcow_1      | Wed Sep 16 06:58:09 2020 -> Limits: PCRERecMatchLimit limit set to 2000.
clamd-mailcow_1      | Wed Sep 16 06:58:09 2020 -> Limits: PCREMaxFileSize limit set to 26214400.
clamd-mailcow_1      | Wed Sep 16 06:58:09 2020 -> Archive support enabled.
clamd-mailcow_1      | Wed Sep 16 06:58:09 2020 -> AlertExceedsMax heuristic detection disabled.
clamd-mailcow_1      | Wed Sep 16 06:58:09 2020 -> Heuristic alerts enabled.
clamd-mailcow_1      | Wed Sep 16 06:58:09 2020 -> Portable Executable support enabled.
clamd-mailcow_1      | Wed Sep 16 06:58:09 2020 -> ELF support enabled.
clamd-mailcow_1      | Wed Sep 16 06:58:09 2020 -> Mail files support enabled.
clamd-mailcow_1      | Wed Sep 16 06:58:09 2020 -> OLE2 support enabled.
clamd-mailcow_1      | Wed Sep 16 06:58:09 2020 -> PDF support enabled.
clamd-mailcow_1      | Wed Sep 16 06:58:09 2020 -> SWF support enabled.
clamd-mailcow_1      | Wed Sep 16 06:58:09 2020 -> HTML support enabled.
clamd-mailcow_1      | Wed Sep 16 06:58:09 2020 -> XMLDOCS support enabled.
clamd-mailcow_1      | Wed Sep 16 06:58:09 2020 -> HWP3 support enabled.
clamd-mailcow_1      | Wed Sep 16 06:58:09 2020 -> Heuristic: precedence enabled
clamd-mailcow_1      | Wed Sep 16 06:58:09 2020 -> Self checking every 3600 seconds.
clamd-mailcow_1      | Wed Sep 16 06:58:09 2020 -> Set stacksize to 8454144
clamd-mailcow_1      | Wed Sep 16 07:02:28 2020 -> instream(172.22.1.13@53526): OK
clamd-mailcow_1      | Wed Sep 16 07:02:34 2020 -> instream(local): OK
clamd-mailcow_1      | Wed Sep 16 07:04:49 2020 -> instream(172.22.1.13@53952): OK
clamd-mailcow_1      | Wed Sep 16 07:04:51 2020 -> instream(local): OK
clamd-mailcow_1      | Wed Sep 16 07:05:19 2020 -> instream(172.22.1.13@54110): OK
clamd-mailcow_1      | Wed Sep 16 07:05:46 2020 -> instream(local): OK
clamd-mailcow_1      | #####################################################
clamd-mailcow_1      | Welcome to Sanesecurity mirror for Clamav signatures.
clamd-mailcow_1      | 
clamd-mailcow_1      | Service brough you www.virusfree.cz, all activity is
clamd-mailcow_1      | logged and evaluated. Abuse of the service will result
clamd-mailcow_1      | in permanent ban and legal prosecution.
clamd-mailcow_1      | 
clamd-mailcow_1      | Feel free to contact us at support@virusfree.cz
clamd-mailcow_1      | ####################################################
clamd-mailcow_1      | 
clamd-mailcow_1      | receiving incremental file list
clamd-mailcow_1      | ./
clamd-mailcow_1      | blurl.ndb
clamd-mailcow_1      | phishtank.ndb
clamd-mailcow_1      | 
clamd-mailcow_1      | sent 10,800 bytes  received 96,536 bytes  30,667.43 bytes/sec
clamd-mailcow_1      | total size is 18,801,462  speedup is 175.16
clamd-mailcow_1      | RELOADING
clamd-mailcow_1      | Wed Sep 16 07:06:22 2020 -> Reading databases from /var/lib/clamav
clamd-mailcow_1      | /clamd.sh: line 97:    21 Killed                  nice -n10 clamd
clamd-mailcow_1      | /clamd.sh: line 98: kill: (21) - No such process
clamd-mailcow_1      | Worker 21 died, stopping container waiting for respawn...
clamd-mailcow_1      | Cleaning up tmp files...
clamd-mailcow_1      | Copying non-empty whitelist.ign2 to /var/lib/clamav/whitelist.ign2
clamd-mailcow_1      |   File: /var/lib/clamav/whitelist.ign2
clamd-mailcow_1      |   Size: 142       	Blocks: 8          IO Block: 4096   regular file
clamd-mailcow_1      | Device: 31h/49d	Inode: 2076528     Links: 1
clamd-mailcow_1      | Access: (0644/-rw-r--r--)  Uid: (  700/  clamav)   Gid: (  700/  clamav)
clamd-mailcow_1      | Access: 2020-09-16 06:56:19.392000000 +0000
clamd-mailcow_1      | Modify: 2020-09-16 07:19:28.624000000 +0000
clamd-mailcow_1      | Change: 2020-09-16 07:19:28.648000000 +0000
clamd-mailcow_1      |  Birth: -
clamd-mailcow_1      | dos2unix: converting file /var/lib/clamav/whitelist.ign2 to Unix format...
clamd-mailcow_1      | Running freshclam...
clamd-mailcow_1      | Wed Sep 16 07:19:28 2020 -> ClamAV update process started at Wed Sep 16 07:19:28 2020
clamd-mailcow_1      | Wed Sep 16 07:19:28 2020 -> daily.cvd database is up to date (version: 25930, sigs: 4317819, f-level: 63, builder: raynman)
clamd-mailcow_1      | Wed Sep 16 07:19:28 2020 -> main.cvd database is up to date (version: 59, sigs: 4564902, f-level: 60, builder: sigmgr)
clamd-mailcow_1      | Wed Sep 16 07:19:28 2020 -> bytecode.cvd database is up to date (version: 331, sigs: 94, f-level: 63, builder: anvilleg)
clamd-mailcow_1      | Wed Sep 16 07:20:09 2020 -> Limits: Global time limit set to 120000 milliseconds.
clamd-mailcow_1      | Wed Sep 16 07:20:09 2020 -> Limits: Global size limit set to 52428800 bytes.
clamd-mailcow_1      | Wed Sep 16 07:20:09 2020 -> Limits: File size limit set to 26214400 bytes.
clamd-mailcow_1      | Wed Sep 16 07:20:09 2020 -> Limits: Recursion level limit set to 5.
clamd-mailcow_1      | Wed Sep 16 07:20:09 2020 -> Limits: Files limit set to 200.
clamd-mailcow_1      | Wed Sep 16 07:20:09 2020 -> Limits: MaxEmbeddedPE limit set to 10485760 bytes.
clamd-mailcow_1      | Wed Sep 16 07:20:09 2020 -> Limits: MaxHTMLNormalize limit set to 10485760 bytes.
clamd-mailcow_1      | Wed Sep 16 07:20:09 2020 -> Limits: MaxHTMLNoTags limit set to 2097152 bytes.
clamd-mailcow_1      | Wed Sep 16 07:20:09 2020 -> Limits: MaxScriptNormalize limit set to 5242880 bytes.
clamd-mailcow_1      | Wed Sep 16 07:20:09 2020 -> Limits: MaxZipTypeRcg limit set to 1048576 bytes.
clamd-mailcow_1      | Wed Sep 16 07:20:09 2020 -> Limits: MaxPartitions limit set to 50.
clamd-mailcow_1      | Wed Sep 16 07:20:09 2020 -> Limits: MaxIconsPE limit set to 100.
clamd-mailcow_1      | Wed Sep 16 07:20:09 2020 -> Limits: MaxRecHWP3 limit set to 16.
clamd-mailcow_1      | Wed Sep 16 07:20:09 2020 -> Limits: PCREMatchLimit limit set to 100000.
clamd-mailcow_1      | Wed Sep 16 07:20:09 2020 -> Limits: PCRERecMatchLimit limit set to 2000.
clamd-mailcow_1      | Wed Sep 16 07:20:09 2020 -> Limits: PCREMaxFileSize limit set to 26214400.
clamd-mailcow_1      | Wed Sep 16 07:20:09 2020 -> Archive support enabled.
clamd-mailcow_1      | Wed Sep 16 07:20:09 2020 -> AlertExceedsMax heuristic detection disabled.
clamd-mailcow_1      | Wed Sep 16 07:20:09 2020 -> Heuristic alerts enabled.
clamd-mailcow_1      | Wed Sep 16 07:20:09 2020 -> Portable Executable support enabled.
clamd-mailcow_1      | Wed Sep 16 07:20:09 2020 -> ELF support enabled.
clamd-mailcow_1      | Wed Sep 16 07:20:09 2020 -> Mail files support enabled.
clamd-mailcow_1      | Wed Sep 16 07:20:09 2020 -> OLE2 support enabled.
clamd-mailcow_1      | Wed Sep 16 07:20:09 2020 -> PDF support enabled.
clamd-mailcow_1      | Wed Sep 16 07:20:09 2020 -> SWF support enabled.
clamd-mailcow_1      | Wed Sep 16 07:20:09 2020 -> HTML support enabled.
clamd-mailcow_1      | Wed Sep 16 07:20:09 2020 -> XMLDOCS support enabled.
clamd-mailcow_1      | Wed Sep 16 07:20:09 2020 -> HWP3 support enabled.
clamd-mailcow_1      | Wed Sep 16 07:20:09 2020 -> Heuristic: precedence enabled
clamd-mailcow_1      | Wed Sep 16 07:20:09 2020 -> Self checking every 3600 seconds.
clamd-mailcow_1      | Wed Sep 16 07:20:09 2020 -> Set stacksize to 8454144
clamd-mailcow_1      | Wed Sep 16 07:20:24 2020 -> instream(172.22.1.13@55192): OK
clamd-mailcow_1      | Wed Sep 16 07:24:05 2020 -> instream(local): OK
clamd-mailcow_1      | Wed Sep 16 07:26:48 2020 -> instream(172.22.1.13@56336): OK
clamd-mailcow_1      | Wed Sep 16 07:27:42 2020 -> instream(local): OK
clamd-mailcow_1      | Wed Sep 16 07:27:46 2020 -> instream(172.22.1.13@56530): OK
clamd-mailcow_1      | receiving incremental file list
clamd-mailcow_1      | ./
clamd-mailcow_1      | 
clamd-mailcow_1      | sent 230 bytes  received 260 bytes  980.00 bytes/sec
clamd-mailcow_1      | total size is 18,801,462  speedup is 38,370.33
clamd-mailcow_1      | RELOADING
clamd-mailcow_1      | Wed Sep 16 07:29:29 2020 -> Reading databases from /var/lib/clamav
clamd-mailcow_1      | /clamd.sh: line 97:    21 Killed                  nice -n10 clamd
clamd-mailcow_1      | /clamd.sh: line 98: kill: (21) - No such process
clamd-mailcow_1      | Worker 21 died, stopping container waiting for respawn...
clamd-mailcow_1      | Cleaning up tmp files...
clamd-mailcow_1      | Copying non-empty whitelist.ign2 to /var/lib/clamav/whitelist.ign2
clamd-mailcow_1      |   File: /var/lib/clamav/whitelist.ign2
clamd-mailcow_1      |   Size: 142       	Blocks: 8          IO Block: 4096   regular file
clamd-mailcow_1      | Device: 31h/49d	Inode: 2076528     Links: 1
clamd-mailcow_1      | Access: (0644/-rw-r--r--)  Uid: (  700/  clamav)   Gid: (  700/  clamav)
clamd-mailcow_1      | Access: 2020-09-16 07:19:28.676000000 +0000
clamd-mailcow_1      | Modify: 2020-09-16 07:40:32.700000000 +0000
clamd-mailcow_1      | Change: 2020-09-16 07:40:32.712000000 +0000

Reproduction of said bug:

Logged into server, stopped clamd container, rebooted to make sure server wasn't in inconsistant state after oom. Observed server for the working day, no problem, issue then reoccurred mid evening.

System information:

Question Answer
My operating system Ubuntu 18.04.5 LTS
Is Apparmor, SELinux or similar active? cat /sys/kernel/security/apparmor/profiles
docker-default (enforce)
/usr/sbin/tcpdump (enforce)
/usr/lib/snapd/snap-confine (enforce)
/usr/lib/snapd/snap-confine//mount-namespace-capture-helper (enforce)
man_groff (enforce)
man_filter (enforce)
/usr/bin/man (enforce)
/usr/bin/lxc-start (enforce)
/usr/lib/connman/scripts/dhclient-script (enforce)
/usr/lib/NetworkManager/nm-dhcp-helper (enforce)
/usr/lib/NetworkManager/nm-dhcp-client.action (enforce)
/sbin/dhclient (enforce)
lxc-container-default-with-nesting (enforce)
lxc-container-default-with-mounting (enforce)
lxc-container-default-cgns (enforce)
lxc-container-default (enforce)

|
| Virtualization technlogy (KVM, VMware, Xen, etc - LXC and OpenVZ are not supported | KVM |
| Server/VM specifications (Memory, CPU Cores) | 4GB, 1 core|
| Docker Version (docker version) | 19.03.12 |
| Docker-Compose Version (docker-compose version) | docker-compose version 1.27.2, build 18f557f9
docker-py version: 4.3.1
CPython version: 3.7.7
OpenSSL version: OpenSSL 1.1.0l 10 Sep 2019 |
| Reverse proxy (custom solution) | Nope |

  • Output of git diff origin/master, any other changes to the code? If so, please post them.
    No, nothing.
  • All third-party firewalls and custom iptables rules are unsupported. Please check the Docker docs about how to use Docker with your own ruleset. Nevertheless, iptabels output can help us to help you: iptables -L -vn, ip6tables -L -vn, iptables -L -vn -t nat and ip6tables -L -vn -t nat.
    Nope.
  • DNS problems? Please run docker exec -it $(docker ps -qf name=acme-mailcow) dig +short stackoverflow.com @172.22.1.254 (set the IP accordingly, if you changed the internal mailcow network) and post the output.

151.101.1.69
151.101.193.69
151.101.129.69
151.101.65.69

@Hedders
Copy link

Hedders commented Sep 17, 2020

I'm also seeing this, also on a 4GB server. Seems to happen periodically (no pattern that I can detect) making the server essentially unresponsive for periods of 8-14 mins.

@andryyy
Copy link
Contributor

andryyy commented Sep 17, 2020

I mean... we can try to give it a memory limit. Or remove signatures.
Limiting resources via compose may introduce new fancy problems on some systems.

Any ideas anybody?

@mkuron ?

@Hedders
Copy link

Hedders commented Sep 17, 2020

If it helps, this is new - it only seems to be an issue with the latest version of the clamav container.

@andryyy
Copy link
Contributor

andryyy commented Sep 17, 2020

We just updated to 0.103.0, it is possible this version has a higher need for memory.

@Hedders
Copy link

Hedders commented Sep 17, 2020

Ah! This from the release notes for 0.103:

" clamd can now reload the signature database without blocking scanning. This multi-threaded database reload improvement was made possible thanks to a community effort.
Non-blocking database reloads are now the default behavior. Some systems that are more constrained on RAM may need to disable non-blocking reloads, as it will temporarily consume double the amount of memory. We added a new clamd config option ConcurrentDatabaseReload, which may be set to no."

@andryyy
Copy link
Contributor

andryyy commented Sep 17, 2020

Good catch. :)

@Hedders
Copy link

Hedders commented Sep 17, 2020

I wonder is there a way to expose that ConcurrentDatabaseReload setting as an option in mailcow.conf?

@andryyy
Copy link
Contributor

andryyy commented Sep 17, 2020

I set it to no by default for now and will add it to the docs. All options can be set via data/conf/clamav/clamd.conf.

@Hedders
Copy link

Hedders commented Sep 17, 2020

Perfect! Thank you very much indeed. Will drop another donation in the morning :)

@andryyy
Copy link
Contributor

andryyy commented Sep 17, 2020

I thank you. :)

@scryptio
Copy link

I set it to no by default for now and will add it to the docs. All options can be set via data/conf/clamav/clamd.conf.

Just seen this by coincidence. Would prefer managing any kind of config using mailcow.conf if it is possible. It's a good approach to bundle config settings in one file if possible instead of having to edit several files laying somewhere in "unknown" locations. Would make managing things easier.

@andryyy
Copy link
Contributor

andryyy commented Sep 21, 2020

We cannot handle every single config there.

We use git for this reason. It will not kill your changes as long as we didn't change it either. If we did, we need to overwrite it for compatibility.

All config files share the same location by the way: data/conf

@wblondel
Copy link
Contributor

wblondel commented Sep 26, 2020

My ClamAV is also running OOM when updating even with ConcurrentDatabaseReload set to NO, since a week; it just happened 10 minutes ago

@andryyy
Copy link
Contributor

andryyy commented Sep 26, 2020

How much RAM?

@wblondel
Copy link
Contributor

4GB. I know it's not a lot but it has been working fine for 2 years, until now =)

@Hedders
Copy link

Hedders commented Sep 26, 2020

Silly question but I noticed you say it's set to NO. Does your clamd.conf say "ConcurrentDatabaseReload no" or "ConcurrentDatabaseReload NO" ? IIRC clamd.conf directives are case sensitive.

@wblondel
Copy link
Contributor

Yes it's in lower case. It was added by a recent commit

ConcurrentDatabaseReload no

@andryyy
Copy link
Contributor

andryyy commented Sep 26, 2020

"no" is fine.

You can try to decrease SOGo worker count to 10.

@ThinIce
Copy link
Author

ThinIce commented Oct 7, 2020

This has been fine since the ConcurrentDatabaseReload change but just bombed again this evening

clamd-mailcow_1      | receiving incremental file list
clamd-mailcow_1      | ./
clamd-mailcow_1      | blurl.ndb
clamd-mailcow_1      | jurlbl.ndb
clamd-mailcow_1      | phishtank.ndb
clamd-mailcow_1      | rogue.hdb
clamd-mailcow_1      | 
clamd-mailcow_1      | sent 23,606 bytes  received 314,924 bytes  225,686.67 bytes/sec
clamd-mailcow_1      | total size is 18,879,877  speedup is 55.77
clamd-mailcow_1      | RELOADING
clamd-mailcow_1      | Wed Oct  7 17:16:27 2020 -> Reading databases from /var/lib/clamav
clamd-mailcow_1      | Wed Oct  7 17:17:28 2020 -> Database correctly reloaded (9073104 signatures)
clamd-mailcow_1      | Wed Oct  7 17:17:28 2020 -> Database reload completed.
clamd-mailcow_1      | Wed Oct  7 17:17:28 2020 -> Activating the newly loaded database...
clamd-mailcow_1      | Wed Oct  7 17:17:28 2020 -> instream(local): OK
clamd-mailcow_1      | Wed Oct  7 17:17:28 2020 -> instream(172.22.1.13@44646): OK
clamd-mailcow_1      | Wed Oct  7 17:17:28 2020 -> instream(local): OK
clamd-mailcow_1      | Wed Oct  7 17:19:09 2020 -> instream(172.22.1.13@45048): OK
clamd-mailcow_1      | Wed Oct  7 17:24:06 2020 -> instream(local): OK
clamd-mailcow_1      | Wed Oct  7 17:27:21 2020 -> ClamAV update process started at Wed Oct  7 17:27:21 2020
clamd-mailcow_1      | Wed Oct  7 17:27:26 2020 -> daily database available for update (local version: 25949, remote version: 25950)
clamd-mailcow_1      | Wed Oct  7 17:27:59 2020 -> Testing database: '/var/lib/clamav/tmp.c550237f77/clamav-fa1482832880b3b414a882962cbfb28f.tmp-daily.cld' ...
clamd-mailcow_1      | /clamd.sh: line 97:    23 Killed                  nice -n10 clamd
clamd-mailcow_1      | /clamd.sh: line 98: kill: (23) - No such process
clamd-mailcow_1      | Worker 23 died, stopping container waiting for respawn...
clamd-mailcow_1      | Cleaning up tmp files...
clamd-mailcow_1      | Copying non-empty whitelist.ign2 to /var/lib/clamav/whitelist.ign2
clamd-mailcow_1      |   File: /var/lib/clamav/whitelist.ign2
clamd-mailcow_1      |   Size: 142       	Blocks: 8          IO Block: 4096   regular file
clamd-mailcow_1      | Device: 5fh/95d	Inode: 1048148     Links: 1
clamd-mailcow_1      | Access: (0644/-rw-r--r--)  Uid: (  700/  clamav)   Gid: (  700/  clamav)
clamd-mailcow_1      | Access: 2020-10-07 17:16:27.404000000 +0000
clamd-mailcow_1      | Modify: 2020-10-07 18:12:30.404000000 +0000
clamd-mailcow_1      | Change: 2020-10-07 18:12:30.460000000 +0000
clamd-mailcow_1      |  Birth: -
clamd-mailcow_1      | dos2unix: converting file /var/lib/clamav/whitelist.ign2 to Unix format...
clamd-mailcow_1      | Running freshclam...
clamd-mailcow_1      | Wed Oct  7 18:12:30 2020 -> ClamAV update process started at Wed Oct  7 18:12:30 2020
clamd-mailcow_1      | Wed Oct  7 18:12:31 2020 -> daily database available for update (local version: 25949, remote version: 25950)
clamd-mailcow_1      | Wed Oct  7 18:12:48 2020 -> Testing database: '/var/lib/clamav/tmp.46a08b2ade/clamav-b81f532048bf594a68b1079705518bf7.tmp-daily.cld' ...
clamd-mailcow_1      | Wed Oct  7 18:13:19 2020 -> Database test passed.
clamd-mailcow_1      | Wed Oct  7 18:13:19 2020 -> daily.cld updated (version: 25950, sigs: 4328320, f-level: 63, builder: raynman)
clamd-mailcow_1      | Wed Oct  7 18:13:19 2020 -> main.cvd database is up to date (version: 59, sigs: 4564902, f-level: 60, builder: sigmgr)
clamd-mailcow_1      | Wed Oct  7 18:13:19 2020 -> bytecode.cvd database is up to date (version: 331, sigs: 94, f-level: 63, builder: anvilleg)
clamd-mailcow_1      | Wed Oct  7 18:13:19 2020 -> ^Clamd was NOT notified: Can't connect to clamd through /run/clamav/clamd.sock: Connection refused

I can decrease SoGo workers as you've recommended, but I don't actually have any users using it, which I guess would make a difference?

@mkuron
Copy link
Member

mkuron commented Oct 7, 2020

Could you run docker stats every minute and check whether the memory size of the clamd container (or any other container) grows significantly over time? clamd is probably the process with the largest memory usage on your server, so the oom killer kills it even if it‘s not the culprit.

@andryyy
Copy link
Contributor

andryyy commented Oct 7, 2020

They can still eat some RAM when you switch between these workers with each new request.
Please try docker stats as mkuron suggested and also reduce the worker count. :)

@Geitenijs
Copy link
Contributor

Same issue here, even with ConcurrentDatabaseReload set to no. Every few hours it'll freeze for a minute or two.

Up until this point, mailcow has been running great for over a year on this server.

screen

screen 2

Clamd log:
https://pastebin.com/TBjBUPn9

@andryyy
Copy link
Contributor

andryyy commented Oct 20, 2020

So? I cannot change that. If you want to keep using ClamAV, you need more RAM. 👍

Or reduce the SOGo workers. I cannot change that I'm afraid. :/

We will update the requirements.

@Geitenijs
Copy link
Contributor

:(

I get it of course.. Just hoped you'd have a solution for me :)

Thanks anyway, I'll upgrade the RAM of my server.

@andryyy
Copy link
Contributor

andryyy commented Oct 20, 2020

Or at least try with less workers in SOGo first. :)

@ThinIce
Copy link
Author

ThinIce commented Oct 20, 2020

Could you run docker stats every minute and check whether the memory size of the clamd container (or any other container) grows significantly over time? clamd is probably the process with the largest memory usage on your server, so the oom killer kills it even if it‘s not the culprit.

I disabled the clamv container and have been watching the others since your post. It looks like on my single system over a working week solr grows slowly, but only by about 100MiB from where it starts (350-450). It looks like rspamd spikes quite severely at times from a resting ~250, I think about 650 is the highest I've seen it. Redis also seems to be capable of varying by a few hundred MiB, presumably depending on what's going on at the time, but nothing has an obvious memory leak.

I'll have to look at something to do this monitoring more scientifically over a longer period and produce some graphs but it seems like maybe the issue is indeed just clam plus other things using more memory at the time down to random usage.

For now I'll reduce the SOGo workers and re-enable clamd and keep an eye on it.

@mkuron
Copy link
Member

mkuron commented Oct 20, 2020

Just hoped you'd have a solution for me

Unfortunately, ClamAV is quite a memory hog because it loads all its virus definitions into memory, and those obviously get larger with every update. You‘ll need to reduce the set of virus definitions to reduce memory usage. Or reconsider whether you actually need a virus scanner: we block .exe attachments and MS Office documents with macros, which should already take care of most virus distribution vectors.

It looks like on my single system over a working week solr grows slowly, but only by about 100MiB from where it starts (350-450).

SOLR needs quite a lot of memory, depending on how many messages you have. It is recommended to be kept disabled unless you have a lot of memory and very few users.

It looks like rspamd spikes quite severely at times from a resting ~250, I think about 650 is the highest I've seen it.

Rspamd uses Lua, which is garbage-collected. You can reduce the garbage collection timeout (#3049 (comment)) to keep its memory usage more constant.

Redis also seems to be capable of varying by a few hundred MiB, presumably depending on what's going on at the time,

Redis is an in-memory database that periodically dumps its state to a file. Its memory footprint probably grows as it accumulates transactions between dumps, but I have not seen it consume an unreasonable amount of memory.

@Adorfer Adorfer changed the title clamd using all memory and getting oom killed since last update clamd using all memory and getting oom killed since 567064e Oct 21, 2020
@jjkondrat
Copy link

I run my server in AWS with 2GB ram.. since I'm running this for personal sites and some friends domains and it's not very high traffic. I however did create a 4GB swap file and have had no issues... Not saying thats suitable for everyone, but may be an option for you if it's not a high traffic server.

My server does need to use the swapfile.... and for i see no reason to pay for more memory:
root@mail:/opt/mailcow-dockerized# free -m
total used free shared buff/cache available
Mem: 1949 1342 115 9 491 444
Swap: 4095 2087 2008

@Adorfer
Copy link

Adorfer commented Nov 26, 2020

@jjkondrat And you are suffering from the same issue "clamd getting oom-killed", so the swap file does not help?

@jjkondrat
Copy link

@Adorfer No and I have never had any memory problems on any of the containers using the large swap file. I've been using a large swap file since I built my server well over a year ago.

@Adorfer
Copy link

Adorfer commented Nov 27, 2020

so what is your point posting to this thread? "adding swap may resolve the issue"?

@jjkondrat
Copy link

Yes. Although more memory would be better, if the user defines or increases swap file size they may avoid the crash.....

@MAGICCC MAGICCC closed this as completed Nov 27, 2020
@eldrik
Copy link

eldrik commented Dec 7, 2020

so I already disabled clamd in mailcow.conf but still getting oom messages that seem related to clamd.

I am running mailcow inside an esxi vm with 2cpu & 4GB RAM.

For me it feels like clamd is still running even disabled via mailcow.conf (last restart of mailcow and server was about 14 days ago)

example docker-compose log entries from restart

clamd-mailcow_1 | Mon Dec 7 09:38:54 2020 -> instream(172.22.1.10@43964): OK
clamd-mailcow_1 | Mon Dec 7 09:47:45 2020 -> instream(local): OK
clamd-mailcow_1 | Mon Dec 7 09:49:22 2020 -> instream(172.22.1.10@45540): OK
clamd-mailcow_1 | Mon Dec 7 09:49:38 2020 -> instream(local): OK
clamd-mailcow_1 | Mon Dec 7 09:58:34 2020 -> instream(172.22.1.10@46910): OK
clamd-mailcow_1 | Mon Dec 7 10:04:24 2020 -> instream(local): OK
clamd-mailcow_1 | Mon Dec 7 10:07:30 2020 -> instream(172.22.1.10@48194): OK
clamd-mailcow_1 | Mon Dec 7 10:12:44 2020 -> instream(local): OK
clamd-mailcow_1 | Mon Dec 7 10:14:23 2020 -> instream(172.22.1.10@49214): OK
clamd-mailcow_1 | Mon Dec 7 10:17:19 2020 -> instream(local): OK
clamd-mailcow_1 | Mon Dec 7 10:17:33 2020 -> instream(172.22.1.10@49704): OK
clamd-mailcow_1 | Worker 22 died, stopping container waiting for respawn...
clamd-mailcow_1 | /clamd.sh: line 97: 22 Killed nice -n10 clamd
clamd-mailcow_1 | /clamd.sh: line 98: kill: (22) - No such process
clamd-mailcow_1 | Cleaning up tmp files...
clamd-mailcow_1 | Copying non-empty whitelist.ign2 to /var/lib/clamav/whitelist.ign2
clamd-mailcow_1 | File: /var/lib/clamav/whitelist.ign2
clamd-mailcow_1 | Size: 142 Blocks: 8 IO Block: 4096 regular file
clamd-mailcow_1 | Device: 801h/2049d Inode: 1724287 Links: 1
clamd-mailcow_1 | Access: (0644/-rw-r--r--) Uid: ( 700/ clamav) Gid: ( 700/ clamav)
clamd-mailcow_1 | Access: 2020-12-07 08:42:07.698887776 +0100
clamd-mailcow_1 | Modify: 2020-12-07 10:44:34.718272781 +0100
clamd-mailcow_1 | Change: 2020-12-07 10:44:35.406301724 +0100
clamd-mailcow_1 | Birth: -
clamd-mailcow_1 | dos2unix: converting file /var/lib/clamav/whitelist.ign2 to Unix format...
clamd-mailcow_1 | Running freshclam...

example /var/log/messages entries related to the oom

Dec 7 10:41:21 mailstation kernel: Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
Dec 7 10:41:21 mailstation kernel: 51489 total pagecache pages
Dec 7 10:41:21 mailstation kernel: 49539 pages in swap cache
Dec 7 10:41:21 mailstation kernel: Swap cache stats: add 7151097, delete 7101558, find 113127588/114180952
Dec 7 10:41:21 mailstation kernel: Free swap = 0kB
Dec 7 10:41:21 mailstation kernel: Total swap = 2095100kB
Dec 7 10:41:21 mailstation kernel: 1048446 pages RAM
Dec 7 10:41:21 mailstation kernel: 0 pages HighMem/MovableOnly
Dec 7 10:41:21 mailstation kernel: 35750 pages reserved
Dec 7 10:41:21 mailstation kernel: 0 pages hwpoisoned
Dec 7 10:41:21 mailstation kernel: [ pid ] uid tgid total_vm rss nr_ptes nr_pmds swapents oom_score_adj name
Dec 7 10:41:21 mailstation kernel: [ 6202] 401 6202 1050 63 6 3 24 0 anvil
Dec 7 10:41:21 mailstation kernel: [ 6203] 402 6203 1115 89 6 3 47 0 log
Dec 7 10:41:21 mailstation kernel: [ 6204] 402 6204 2060 0 8 3 174 0 managesieve-log
Dec 7 10:41:21 mailstation kernel: [ 6205] 401 6205 2813 256 8 3 308 0 stats
Dec 7 10:41:21 mailstation kernel: [ 6206] 0 6206 2367 454 10 3 397 0 config
Dec 7 10:41:21 mailstation kernel: [ 6208] 401 6208 6019 151 15 3 291 0 auth
Dec 7 10:41:21 mailstation kernel: [ 6226] 101 6226 10984 103 13 3 177 0 tlsmgr
Dec 7 10:41:21 mailstation kernel: [29783] 82 29783 59769 576 34 3 1513 0 php-fpm
Dec 7 10:41:21 mailstation kernel: [12579] 0 12579 27180 159 10 5 73 1 containerd-shim
Dec 7 10:41:21 mailstation kernel: [12594] 0 12594 61127 654 94 3 23559 0 rspamd
Dec 7 10:41:21 mailstation kernel: [12735] 101 12735 61127 615 91 3 22719 0 rspamd
Dec 7 10:41:21 mailstation kernel: [12736] 101 12736 61127 817 93 3 22616 0 rspamd
Dec 7 10:41:21 mailstation kernel: [12739] 101 12739 61127 501 96 3 22767 0 rspamd
Dec 7 10:41:21 mailstation kernel: [19409] 101 19409 463594 62805 795 5 41791 0 rspamd
Dec 7 10:41:21 mailstation kernel: [ 8628] 82 8628 59768 594 34 3 1587 0 php-fpm
Dec 7 10:41:21 mailstation kernel: [ 8636] 999 8636 88806 63793 169 3 3315 0 sogod
Dec 7 10:41:21 mailstation kernel: [ 8975] 999 8975 85152 525 162 3 63746 0 sogod
Dec 7 10:41:21 mailstation kernel: [22578] 82 22578 59770 592 34 3 1579 0 php-fpm
Dec 7 10:41:21 mailstation kernel: [22579] 82 22579 59770 612 34 3 1559 0 php-fpm
Dec 7 10:41:21 mailstation kernel: [19575] 0 19575 10767 18 24 3 109 0 systemd-journal
Dec 7 10:41:21 mailstation kernel: [19783] 0 19783 27180 101 11 4 69 1 containerd-shim
Dec 7 10:41:21 mailstation kernel: [19799] 0 19799 569 5 6 3 15 0 tini
Dec 7 10:41:21 mailstation kernel: [19874] 0 19874 933 38 7 3 32 0 clamd.sh
Dec 7 10:41:21 mailstation kernel: [19888] 0 19888 933 34 7 3 31 0 clamd.sh
Dec 7 10:41:21 mailstation kernel: [19889] 0 19889 933 24 7 3 48 0 clamd.sh
Dec 7 10:41:21 mailstation kernel: [19890] 700 19890 403098 233170 646 4 75501 0 clamd

Any help on this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests