Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HTTPS requests to invaluement.com are performed every 5-6 seconds on idle server #3929

Closed
4 tasks done
ValdikSS opened this issue Jan 10, 2021 · 14 comments
Closed
4 tasks done
Labels

Comments

@ValdikSS
Copy link
Contributor

Prior to placing the issue, please check following: (fill out each checkbox with an X once done)

  • I understand that not following or deleting the below instructions will result in immediate closure and/or deletion of my issue.
  • I have understood that this bug report is dedicated for bugs, and not for support-related inquiries.
  • I have understood that answers are voluntary and community-driven, and not commercial support.
  • I have verified that my issue has not been already answered in the past. I also checked previous issues.

Summary

Mailcow performs HTTPS queries to www.invaluement.com domain (to https://www.invaluement.com/spdata/sendgrid-id-dnsbl.txt and https://www.invaluement.com/spdata/sendgrid-envelopefromdomain-dnsbl.txt URLs) every 5-6 seconds.
Full TCP connection is established and closed for this query (from SYN to FIN), this is not a keep-alive ping.
This creates unnecessary load to invaluement.com server.
The queries are performed by ivm-sg.lua script.

Logs

I found nothing in logs regarding these requests.
Tcpdump log with timestamps is attached.

tcpdump-log-invaluement-com.txt

Reproduction

  1. Run tcpdump on www.invaluement.com IPv4 and IPv6 addresses. Command for current address set: tcpdump host 104.22.15.144 or host 172.67.14.207 or host 104.22.14.144 or host 2606:4700:10::6816:f90 or host 2606:4700:10::6816:e90 or host 2606:4700:10::ac43:ecf
  2. Observe new TCP connections every 5-6 seconds.

I've tried to intercept the data by replacing the URL to my HTTP mocking server which just returned HTTP 200 OK, but in this case there are no repetitive requests are performed.

System information

Question Answer
My operating system Linux Ubuntu 20.04
Is Apparmor, SELinux or similar active? Yes, AppArmor. No issues with it in audit logs.
Virtualization technlogy (KVM, VMware, Xen, etc - LXC and OpenVZ are not supported Bare metal
Server/VM specifications (Memory, CPU Cores) 4 cores, 16 GB RAM
Docker Version (docker version) 20.10.1
Docker-Compose Version (docker-compose version) 1.27.4, build 40524192
Reverse proxy (custom solution) Custom configuration, did not touch Mailcow configs, irrelevant
  • Output of git diff origin/master, any other changes to the code? No.
  • All third-party firewalls and custom iptables rules are unsupported. Please check the Docker docs about how to use Docker with your own ruleset. Nevertheless, iptabels output can help us to help you: iptables -L -vn, ip6tables -L -vn, iptables -L -vn -t nat and ip6tables -L -vn -t nat.
  • DNS problems? Please run docker exec -it $(docker ps -qf name=acme-mailcow) dig +short stackoverflow.com @172.22.1.254 (set the IP accordingly, if you changed the internal mailcow network) and post the output.
@ValdikSS ValdikSS added the bug label Jan 10, 2021
@ValdikSS
Copy link
Contributor Author

This issue was mentioned in now locked bug: #3877
Only DNS queries are mentioned there, however full HTTPS requests are performed.

@ValdikSS
Copy link
Contributor Author

ivm-sg.lua is sourced from https://github.com/fatalbanana/ivm-rspamd repository. It has been archived, recommending to use rspamd selectors: https://rspamd.com/doc/configuration/selectors.html
Rspamd page has an example with invaluement.com sendgrid lists, doing the same without lua script.

@andryyy
Copy link
Contributor

andryyy commented Jan 10, 2021

Same with selectors. I switched to fatalbananas implementation because it has some slight advantages.

@andryyy andryyy closed this as completed Jan 10, 2021
@andryyy
Copy link
Contributor

andryyy commented Jan 10, 2021

If you don't like it, remove it. ;)

@ValdikSS
Copy link
Contributor Author

I don't want to remove it. The issue is that the lists are updated much more frequently than they should be, which creates unnecessary load on both Mailcow server and invaluement.com service. The interval should be increased to several minutes, not seconds.
Please reopen the issue.

@andryyy
Copy link
Contributor

andryyy commented Jan 10, 2021 via email

@ValdikSS
Copy link
Contributor Author

Even so, this is still a bug. It shouldn't check for updates more frequently than at least once in a minute. This is undesired behavior for both the server and for remote.
Links to other lists are checked much less frequently.

@andryyy
Copy link
Contributor

andryyy commented Jan 10, 2021 via email

@ValdikSS
Copy link
Contributor Author

As far as I can see, map_watch_interval is set to 30 seconds, not 5 seconds.
The files are updated at least once in 10 minutes (I've monitored last-modified since I opened this issue), it's not necessary to check for updates every 5 seconds.

If you don't want to fix this bug, or don't know how to fix it, or it's much more complex to fix than it seems, please say so. Don't pretend like it's totally normal and everybody should refresh almost-static files with such interval, making full HTTPS connection every time without keep-alive.

@kirkham
Copy link

kirkham commented Jan 11, 2021

André,

This is Rob McEwen, CEO of invaluement.com - on our website where we provide this free service - we recommend:
(1) once-per-minute updates
AND
(2) caching the last downloaded data and then checking to see if the server version is newer than the stored version, and then only downloading the data if the server version is newer than the last download. (the data update is VERY unpredictable - it can often go several hours without an update - then it can have many updates minutes apart - this has more to do with if/when new spammers start using Sendgrid than anything I'm doing.)

It looks like you're doing (2) - and if so, thanks for that. This is critical.

But I need your help with (1).

Please change your update-checking interval to 60 seconds. For example, I've actually recently had about a $1K/year increase in my hosting costs just due to access to these files alone, and that is only going to increase as more use this, and as more types of ESP data files are added in the near future. btw - I'm using cloudflare for best performance, and I've greatly optimized the performance by configuring Cloudflare to NOT check for updates for these files, and to keep DDOS protection to a minimum on files in this folder. (these being at Cloudflare as 100% cached static files eliminates the need for DDOS protection anyways, for these particular files) Then, whenever a file updates, I alert their API of the change, so that they can only fetch the new copy THEN. This is amazingly efficient! Without this, they would check my server for updates OFTEN, during the middle of client updates, slowing down many client updates. However, even so, I'm using their "Argo" feature which improves network efficiency, and that is causing extra charges with all this extra traffic. (I guess I could turn Argo off? Maybe it wouldn't be that much of a difference either way?)

So when I saw that increase, I went to cloudflare support and got lists of IPs that were causing the most traffic/connections - and MANY of them had PTR records OR were mail servers with SMTP banners - that had the word "mailcow" in them. So mailcow is a large cause of this extra traffic. So - again - please do me a favor and change your update interval to 60 seconds.

I recognize that slightly less frequent checks will cause some amount of "false negatives" - but I think that amount will be extremely tiny compared to the amount of spam that this data blocks. Unlike spammers who burn through IPs and domains when they self-host, most of the Sendgrid spammers don't do "hit and run" burst sends, and then are never seen again. Most of the ESPs rate-limit their sending, especially for their less trusted customers. So getting the data up to 60 seconds later (probably averaging closer to 30 seconds later) - shouldn't cause a big difference, but will go a long way towards not overusing our free data.

Thanks again for your help with this!

Rob McEwen, CEO of invaluement.com
rob AT invaluement DOT com
+1 478-475-9032

@andryyy
Copy link
Contributor

andryyy commented Jan 11, 2021 via email

@kirkham
Copy link

kirkham commented Jan 11, 2021

Excellent. Thanks! And sorry I had overlooked your previous email. In the next couple of days, I'll look for it and respond. Thanks again!
--Rob McEwen

@andryyy
Copy link
Contributor

andryyy commented Jan 11, 2021

No worries at all.

@ValdikSS
Copy link
Contributor Author

@andryyy has it been fixed?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants