HTTPS requests to invaluement.com are performed every 5-6 seconds on idle server #3929

ValdikSS · 2021-01-10T20:28:13Z

Prior to placing the issue, please check following: (fill out each checkbox with an X once done)

I understand that not following or deleting the below instructions will result in immediate closure and/or deletion of my issue.
I have understood that this bug report is dedicated for bugs, and not for support-related inquiries.
I have understood that answers are voluntary and community-driven, and not commercial support.
I have verified that my issue has not been already answered in the past. I also checked previous issues.

Summary

Mailcow performs HTTPS queries to www.invaluement.com domain (to https://www.invaluement.com/spdata/sendgrid-id-dnsbl.txt and https://www.invaluement.com/spdata/sendgrid-envelopefromdomain-dnsbl.txt URLs) every 5-6 seconds.
Full TCP connection is established and closed for this query (from SYN to FIN), this is not a keep-alive ping.
This creates unnecessary load to invaluement.com server.
The queries are performed by ivm-sg.lua script.

Logs

I found nothing in logs regarding these requests.
Tcpdump log with timestamps is attached.

tcpdump-log-invaluement-com.txt

Reproduction

Run tcpdump on www.invaluement.com IPv4 and IPv6 addresses. Command for current address set: tcpdump host 104.22.15.144 or host 172.67.14.207 or host 104.22.14.144 or host 2606:4700:10::6816:f90 or host 2606:4700:10::6816:e90 or host 2606:4700:10::ac43:ecf
Observe new TCP connections every 5-6 seconds.

I've tried to intercept the data by replacing the URL to my HTTP mocking server which just returned HTTP 200 OK, but in this case there are no repetitive requests are performed.

System information

Question	Answer
My operating system	Linux Ubuntu 20.04
Is Apparmor, SELinux or similar active?	Yes, AppArmor. No issues with it in audit logs.
Virtualization technlogy (KVM, VMware, Xen, etc - LXC and OpenVZ are not supported	Bare metal
Server/VM specifications (Memory, CPU Cores)	4 cores, 16 GB RAM
Docker Version (`docker version`)	20.10.1
Docker-Compose Version (`docker-compose version`)	1.27.4, build 40524192
Reverse proxy (custom solution)	Custom configuration, did not touch Mailcow configs, irrelevant

Output of git diff origin/master, any other changes to the code? No.
All third-party firewalls and custom iptables rules are unsupported. Please check the Docker docs about how to use Docker with your own ruleset. Nevertheless, iptabels output can help us to help you: iptables -L -vn, ip6tables -L -vn, iptables -L -vn -t nat and ip6tables -L -vn -t nat.
DNS problems? Please run docker exec -it $(docker ps -qf name=acme-mailcow) dig +short stackoverflow.com @172.22.1.254 (set the IP accordingly, if you changed the internal mailcow network) and post the output.

The text was updated successfully, but these errors were encountered:

ValdikSS · 2021-01-10T20:29:25Z

This issue was mentioned in now locked bug: #3877
Only DNS queries are mentioned there, however full HTTPS requests are performed.

ValdikSS · 2021-01-10T20:41:37Z

ivm-sg.lua is sourced from https://github.com/fatalbanana/ivm-rspamd repository. It has been archived, recommending to use rspamd selectors: https://rspamd.com/doc/configuration/selectors.html
Rspamd page has an example with invaluement.com sendgrid lists, doing the same without lua script.

andryyy · 2021-01-10T20:49:31Z

Same with selectors. I switched to fatalbananas implementation because it has some slight advantages.

andryyy · 2021-01-10T20:49:55Z

If you don't like it, remove it. ;)

ValdikSS · 2021-01-10T20:57:02Z

I don't want to remove it. The issue is that the lists are updated much more frequently than they should be, which creates unnecessary load on both Mailcow server and invaluement.com service. The interval should be increased to several minutes, not seconds.
Please reopen the issue.

andryyy · 2021-01-10T21:05:20Z

It checks for modifications via header and quits if nothing changes.

…

Am 10.01.2021 um 21:57 schrieb ValdikSS ***@***.***>: I don't want to remove it. The issue is that the lists are updated much more frequently than they should be, which creates unnecessary load on both Mailcow server and invaluement.com service. The interval should be increased to several minutes, not seconds. Please reopen the issue. — You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub, or unsubscribe.

ValdikSS · 2021-01-10T21:08:30Z

Even so, this is still a bug. It shouldn't check for updates more frequently than at least once in a minute. This is undesired behavior for both the server and for remote.
Links to other lists are checked much less frequently.

andryyy · 2021-01-10T21:21:56Z

It's not a bug. It's the default map refresh interval. We don't pull data (again) and only check for changes. That's simply not a bug.

…

Am 10.01.2021 um 22:08 schrieb ValdikSS ***@***.***>: Even so, this is still a bug. It shouldn't check for updates more frequently than at least once in a minute. This is undesired behavior for both the server and for remote. Links to other lists are checked much less frequently. — You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub, or unsubscribe.

ValdikSS · 2021-01-10T21:35:10Z

As far as I can see, map_watch_interval is set to 30 seconds, not 5 seconds.
The files are updated at least once in 10 minutes (I've monitored last-modified since I opened this issue), it's not necessary to check for updates every 5 seconds.

If you don't want to fix this bug, or don't know how to fix it, or it's much more complex to fix than it seems, please say so. Don't pretend like it's totally normal and everybody should refresh almost-static files with such interval, making full HTTPS connection every time without keep-alive.

kirkham · 2021-01-11T03:42:20Z

André,

This is Rob McEwen, CEO of invaluement.com - on our website where we provide this free service - we recommend:
(1) once-per-minute updates
AND
(2) caching the last downloaded data and then checking to see if the server version is newer than the stored version, and then only downloading the data if the server version is newer than the last download. (the data update is VERY unpredictable - it can often go several hours without an update - then it can have many updates minutes apart - this has more to do with if/when new spammers start using Sendgrid than anything I'm doing.)

It looks like you're doing (2) - and if so, thanks for that. This is critical.

But I need your help with (1).

Please change your update-checking interval to 60 seconds. For example, I've actually recently had about a $1K/year increase in my hosting costs just due to access to these files alone, and that is only going to increase as more use this, and as more types of ESP data files are added in the near future. btw - I'm using cloudflare for best performance, and I've greatly optimized the performance by configuring Cloudflare to NOT check for updates for these files, and to keep DDOS protection to a minimum on files in this folder. (these being at Cloudflare as 100% cached static files eliminates the need for DDOS protection anyways, for these particular files) Then, whenever a file updates, I alert their API of the change, so that they can only fetch the new copy THEN. This is amazingly efficient! Without this, they would check my server for updates OFTEN, during the middle of client updates, slowing down many client updates. However, even so, I'm using their "Argo" feature which improves network efficiency, and that is causing extra charges with all this extra traffic. (I guess I could turn Argo off? Maybe it wouldn't be that much of a difference either way?)

So when I saw that increase, I went to cloudflare support and got lists of IPs that were causing the most traffic/connections - and MANY of them had PTR records OR were mail servers with SMTP banners - that had the word "mailcow" in them. So mailcow is a large cause of this extra traffic. So - again - please do me a favor and change your update interval to 60 seconds.

I recognize that slightly less frequent checks will cause some amount of "false negatives" - but I think that amount will be extremely tiny compared to the amount of spam that this data blocks. Unlike spammers who burn through IPs and domains when they self-host, most of the Sendgrid spammers don't do "hit and run" burst sends, and then are never seen again. Most of the ESPs rate-limit their sending, especially for their less trusted customers. So getting the data up to 60 seconds later (probably averaging closer to 30 seconds later) - shouldn't cause a big difference, but will go a long way towards not overusing our free data.

Thanks again for your help with this!

Rob McEwen, CEO of invaluement.com
rob AT invaluement DOT com
+1 478-475-9032

andryyy · 2021-01-11T05:38:34Z

Oh, Rob, thanks for your response. I respond by mail and didn't read your text until now. I tried to get in contact with you about that a while ago and also tried to offer a regular payment. :) Thanks for that service, it really helps a lot. I will check the map interval for your service and reduce it if it's too much, no problem. And yes, we already check for modifications via header. :)

…

Am 11.01.2021 um 04:42 schrieb kirkham ***@***.***>: André, This is Rob McEwen, CEO of invaluement.com - on our website where we provide this free service - we recommend: (1) once-per-minute updates AND (2) caching the last downloaded data and then checking to see if the server version is newer than the stored version, and then only downloading the data if the server version is newer than the last download. (the data update is VERY unpredictable - it can often go several hours without an update - then it can have many updates minutes apart - this has more to do with if/when new spammers start using Sendgrid than anything I'm doing.) It looks like you're doing (2) - and if so, thanks for that. This is critical. But I need your help with (1). Please change your update-checking interval to 60 seconds. For example, I've actually recently had about a $1K/year increase in my hosting costs just due to access to these files alone, and that is only going to increase as more use this, and as more types of ESP data files are added in the near future. btw - I'm using cloudflare for best performance, and I've greatly optimized the performance by configuring Cloudflare to NOT check for updates for these files, and to keep DDOS protection to a minimum on files in this folder. (these being at Cloudflare as 100% cached static files eliminates the need for DDOS protection anyways, for these particular files) Then, whenever a file updates, I alert their API of the change, so that they can only fetch the new copy THEN. This is amazingly efficient! Without this, they would check my server for updates OFTEN, during the middle of client updates, slowing down many client updates. However, even so, I'm using their "Argo" feature which improves network efficiency, and that is causing extra charges with all this extra traffic. (I guess I could turn Argo off? Maybe it wouldn't be that much of a difference either way?) So when I saw that increase, I went to cloudflare support and got lists of IPs that were causing the most traffic/connections - and MANY of them had PTR records OR were mail servers with SMTP banners - that had the word "mailcow" in them. So mailcow is a large cause of this extra traffic. So - again - please do me a favor and change your update interval to 60 seconds. I recognize that slightly less frequent checks will cause some amount of "false negatives" - but I think that amount will be extremely tiny compared to the amount of spam that this data blocks. Unlike spammers who burn through IPs and domains when they self-host, most of the Sendgrid spammers don't do "hit and run" burst sends, and then are never seen again. Most of the ESPs rate-limit their sending, especially for their less trusted customers. So getting the data up to 60 seconds later (probably averaging closer to 30 seconds later) - shouldn't cause a big difference, but will go a long way towards not overusing our free data. Thanks again for your help with this! Rob McEwen, CEO of invaluement.com rob AT invaluement DOT com +1 478-475-9032 — You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub, or unsubscribe.

kirkham · 2021-01-11T06:12:03Z

Excellent. Thanks! And sorry I had overlooked your previous email. In the next couple of days, I'll look for it and respond. Thanks again!
--Rob McEwen

andryyy · 2021-01-11T06:57:35Z

No worries at all.

ValdikSS · 2021-02-12T03:37:59Z

@andryyy has it been fixed?

ValdikSS added the bug label Jan 10, 2021

andryyy closed this as completed Jan 10, 2021

ValdikSS mentioned this issue Jan 11, 2021

Document all remote resources / "phoning home" situations mailcow/mailcow-dockerized-docs#252

Closed

MAGICCC mentioned this issue Feb 16, 2021

[Rspamd] Increase map watch interval #3987

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

HTTPS requests to invaluement.com are performed every 5-6 seconds on idle server #3929

HTTPS requests to invaluement.com are performed every 5-6 seconds on idle server #3929

ValdikSS commented Jan 10, 2021

ValdikSS commented Jan 10, 2021

ValdikSS commented Jan 10, 2021

andryyy commented Jan 10, 2021

andryyy commented Jan 10, 2021

ValdikSS commented Jan 10, 2021

andryyy commented Jan 10, 2021 via email

ValdikSS commented Jan 10, 2021

andryyy commented Jan 10, 2021 via email

ValdikSS commented Jan 10, 2021

kirkham commented Jan 11, 2021

andryyy commented Jan 11, 2021 via email

kirkham commented Jan 11, 2021

andryyy commented Jan 11, 2021

ValdikSS commented Feb 12, 2021

HTTPS requests to invaluement.com are performed every 5-6 seconds on idle server #3929

HTTPS requests to invaluement.com are performed every 5-6 seconds on idle server #3929

Comments

ValdikSS commented Jan 10, 2021

Summary

Logs

Reproduction

System information

ValdikSS commented Jan 10, 2021

ValdikSS commented Jan 10, 2021

andryyy commented Jan 10, 2021

andryyy commented Jan 10, 2021

ValdikSS commented Jan 10, 2021

andryyy commented Jan 10, 2021 via email

ValdikSS commented Jan 10, 2021

andryyy commented Jan 10, 2021 via email

ValdikSS commented Jan 10, 2021

kirkham commented Jan 11, 2021

andryyy commented Jan 11, 2021 via email

kirkham commented Jan 11, 2021

andryyy commented Jan 11, 2021

ValdikSS commented Feb 12, 2021