New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
clamd using all memory and getting oom killed since 567064e #3761
Comments
I'm also seeing this, also on a 4GB server. Seems to happen periodically (no pattern that I can detect) making the server essentially unresponsive for periods of 8-14 mins. |
I mean... we can try to give it a memory limit. Or remove signatures. Any ideas anybody? @mkuron ? |
If it helps, this is new - it only seems to be an issue with the latest version of the clamav container. |
We just updated to 0.103.0, it is possible this version has a higher need for memory. |
Ah! This from the release notes for 0.103: " clamd can now reload the signature database without blocking scanning. This multi-threaded database reload improvement was made possible thanks to a community effort. |
Good catch. :) |
I wonder is there a way to expose that ConcurrentDatabaseReload setting as an option in mailcow.conf? |
I set it to no by default for now and will add it to the docs. All options can be set via |
Perfect! Thank you very much indeed. Will drop another donation in the morning :) |
I thank you. :) |
Just seen this by coincidence. Would prefer managing any kind of config using mailcow.conf if it is possible. It's a good approach to bundle config settings in one file if possible instead of having to edit several files laying somewhere in "unknown" locations. Would make managing things easier. |
We cannot handle every single config there. We use git for this reason. It will not kill your changes as long as we didn't change it either. If we did, we need to overwrite it for compatibility. All config files share the same location by the way: data/conf |
My ClamAV is also running OOM when updating even with ConcurrentDatabaseReload set to NO, since a week; it just happened 10 minutes ago |
How much RAM? |
4GB. I know it's not a lot but it has been working fine for 2 years, until now =) |
Silly question but I noticed you say it's set to NO. Does your clamd.conf say "ConcurrentDatabaseReload no" or "ConcurrentDatabaseReload NO" ? IIRC clamd.conf directives are case sensitive. |
Yes it's in lower case. It was added by a recent commit
|
"no" is fine. You can try to decrease SOGo worker count to 10. |
This has been fine since the ConcurrentDatabaseReload change but just bombed again this evening
I can decrease SoGo workers as you've recommended, but I don't actually have any users using it, which I guess would make a difference? |
Could you run |
They can still eat some RAM when you switch between these workers with each new request. |
Same issue here, even with ConcurrentDatabaseReload set to no. Every few hours it'll freeze for a minute or two. Up until this point, mailcow has been running great for over a year on this server. Clamd log: |
So? I cannot change that. If you want to keep using ClamAV, you need more RAM. 👍 Or reduce the SOGo workers. I cannot change that I'm afraid. :/ We will update the requirements. |
:( I get it of course.. Just hoped you'd have a solution for me :) Thanks anyway, I'll upgrade the RAM of my server. |
Or at least try with less workers in SOGo first. :) |
I disabled the clamv container and have been watching the others since your post. It looks like on my single system over a working week solr grows slowly, but only by about 100MiB from where it starts (350-450). It looks like rspamd spikes quite severely at times from a resting ~250, I think about 650 is the highest I've seen it. Redis also seems to be capable of varying by a few hundred MiB, presumably depending on what's going on at the time, but nothing has an obvious memory leak. I'll have to look at something to do this monitoring more scientifically over a longer period and produce some graphs but it seems like maybe the issue is indeed just clam plus other things using more memory at the time down to random usage. For now I'll reduce the SOGo workers and re-enable clamd and keep an eye on it. |
Unfortunately, ClamAV is quite a memory hog because it loads all its virus definitions into memory, and those obviously get larger with every update. You‘ll need to reduce the set of virus definitions to reduce memory usage. Or reconsider whether you actually need a virus scanner: we block .exe attachments and MS Office documents with macros, which should already take care of most virus distribution vectors.
SOLR needs quite a lot of memory, depending on how many messages you have. It is recommended to be kept disabled unless you have a lot of memory and very few users.
Rspamd uses Lua, which is garbage-collected. You can reduce the garbage collection timeout (#3049 (comment)) to keep its memory usage more constant.
Redis is an in-memory database that periodically dumps its state to a file. Its memory footprint probably grows as it accumulates transactions between dumps, but I have not seen it consume an unreasonable amount of memory. |
I run my server in AWS with 2GB ram.. since I'm running this for personal sites and some friends domains and it's not very high traffic. I however did create a 4GB swap file and have had no issues... Not saying thats suitable for everyone, but may be an option for you if it's not a high traffic server. My server does need to use the swapfile.... and for i see no reason to pay for more memory: |
@jjkondrat And you are suffering from the same issue "clamd getting oom-killed", so the swap file does not help? |
@Adorfer No and I have never had any memory problems on any of the containers using the large swap file. I've been using a large swap file since I built my server well over a year ago. |
so what is your point posting to this thread? "adding swap may resolve the issue"? |
Yes. Although more memory would be better, if the user defines or increases swap file size they may avoid the crash..... |
so I already disabled clamd in mailcow.conf but still getting oom messages that seem related to clamd. I am running mailcow inside an esxi vm with 2cpu & 4GB RAM. For me it feels like clamd is still running even disabled via mailcow.conf (last restart of mailcow and server was about 14 days ago) example docker-compose log entries from restart clamd-mailcow_1 | Mon Dec 7 09:38:54 2020 -> instream(172.22.1.10@43964): OK example /var/log/messages entries related to the oom Dec 7 10:41:21 mailstation kernel: Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB Any help on this? |
Prior to placing the issue, please check following: (fill out each checkbox with an
X
once done)Description of the bug:
Since last update containing 567064e clamd has been using much more memory, to the extent that the server oom kills it. server has 4GB of ram and has been running without issue (regularly updated).
I'm unsure whether it is a particular message that triggers this or the clamd update process. After the instance logged below, it ran fine all day before then causing problems again mid evening. I've had to disable clamd for now in the config file.
Docker container logs of affected containers:
Reproduction of said bug:
Logged into server, stopped clamd container, rebooted to make sure server wasn't in inconsistant state after oom. Observed server for the working day, no problem, issue then reoccurred mid evening.
System information:
|
| Virtualization technlogy (KVM, VMware, Xen, etc - LXC and OpenVZ are not supported | KVM |
| Server/VM specifications (Memory, CPU Cores) | 4GB, 1 core|
| Docker Version (
docker version
) | 19.03.12 || Docker-Compose Version (
docker-compose version
) | docker-compose version 1.27.2, build 18f557f9docker-py version: 4.3.1
CPython version: 3.7.7
OpenSSL version: OpenSSL 1.1.0l 10 Sep 2019 |
| Reverse proxy (custom solution) | Nope |
git diff origin/master
, any other changes to the code? If so, please post them.No, nothing.
iptables -L -vn
,ip6tables -L -vn
,iptables -L -vn -t nat
andip6tables -L -vn -t nat
.Nope.
docker exec -it $(docker ps -qf name=acme-mailcow) dig +short stackoverflow.com @172.22.1.254
(set the IP accordingly, if you changed the internal mailcow network) and post the output.151.101.1.69
151.101.193.69
151.101.129.69
151.101.65.69
The text was updated successfully, but these errors were encountered: