Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] RateLimit is not working by expecting. #4467

Open
1 task done
ladisone opened this issue Apr 26, 2023 · 24 comments
Open
1 task done

[BUG] RateLimit is not working by expecting. #4467

ladisone opened this issue Apr 26, 2023 · 24 comments
Labels

Comments

@ladisone
Copy link

Prerequisites

Describe the bug
When I will reach the "burst" border in ratelimit.conf ratelimit doesn't set a rate, only block send e-mails then expire burst counter. The default "expiry" is set to 2 days, but when I put "expiry" to 1h, this setting is not accepting.

Steps to Reproduce

  1. ratelimit.conf:
rates {
  user = {
    selector = 'user.lower';
    bucket = [
    {
      burst = 10;
      rate  = "8 / 1min";
    },
    {
      burst = 20;
      rate  = "10 / 10min";
    },
    {
      burst = 120;
      rate  = "100 / 1h";
    }
    ]
  }
};
info_symbol = "R_RATELIMIT_INFO";
expiry = 1h;
  1. I send 119 messages
127.0.0.1:6379> HGETALL RLzni8u6qjhjak
 1) "l"
 2) "1682405057593"
 3) "b"
 4) "0"
 5) "dr"
 6) "10000"
 7) "db"
 8) "10000"
 9) "p"
10) "119"

And e-mail client says, "Ratelimit user exceeded." Now I have to wait two days to expire the counter or delete RL* from Redis DB.

Expected behavior

  • When I send 10 messages, then I will continue, but only 8 messages per 1 minute.
  • When I send 20 messages, then I will continue, but only max 10 messages per 10 minutes.
  • When I send 120 messages, I will continue, but only 100 messages per 1 hour.

Versions

  • Rspamd daemon version 3.5
  • OS - AlmaLinux release 8.7 (Stone Smilodon)
@ladisone ladisone added the bug label Apr 26, 2023
@sriccio
Copy link

sriccio commented May 13, 2023

I'm seeing the same behavior on the Rspamd instances we updated to 3.5. Going back to 3.4.x and flushing the rate limit redis entries helped returning to normal operations.

Also seems to me this issue in mailcow is related
mailcow/mailcow-dockerized#5168

@remkolodder
Copy link

Same issue here via Mailcow. After reverting to rspamd 3.4 it works fine again (so far at least). @vstakhov can you please have a look at this? If you need info from me, please let me know (See the mailcow thread as well). Thank you for all the continued hard efforts that you make, it's appreciated!

@vstakhov
Copy link
Member

I just do not understand the issue tbh. Is it related to the whitelisted_ip option? From the original issue description I can conclude that the pending (or p) key is not drained like the ordinary bucket value.

@remkolodder
Copy link

For what it's worth, this occurred on a mailcow account that does not have whitelist_ip selected, it appears that indeed a bucket is not drained, I saw a sudden increase around the 11th of may (when I upgraded the local install that included rspamd 3.5) and then gradually continued. There was no release of the ratelimit, it kept on going.

rates {
#    # Format: "1 / 1h" or "20 / 1m" etc. - global ratelimits are disabled by default
    to = "100 / 1s";
    to_ip = "100 / 1s";
    to_ip_from = "100 / 1s";
    bounce_to = "100 / 1h";
    bounce_to_ip = "7 / 1m";
}
whitelisted_rcpts = "postmaster,mailer-daemon";
max_rcpt = 25;
custom_keywords = "/etc/rspamd/lua/ratelimit.lua";
info_symbol = "RATELIMITED";

those are the settings that mailcow uses, nothing fancy.

where the lua file contains:

cat lua/ratelimit.lua 
local custom_keywords = {}

custom_keywords.mailcow = function(task)
  local rspamd_logger = require "rspamd_logger"
  local dyn_rl_symbol = task:get_symbol("DYN_RL")
  if dyn_rl_symbol then
    local rl_value = dyn_rl_symbol[1].options[1]
    local rl_object = dyn_rl_symbol[1].options[2]
    if rl_value and rl_object then
      rspamd_logger.infox(rspamd_config, "DYN_RL symbol has value %s for object %s, returning %s...", rl_value, rl_object, "rs_dynrl_" .. rl_object)
      return "rs_dynrl_" .. rl_object, rl_value
    end
  end
end

return custom_keywords

@frederikbosch
Copy link
Contributor

I removed my comment regarding whitelisted_ip. I believe something did change when upgrading from 3.4 to 3.5 because I also see an increase in my logs.

@vstakhov
Copy link
Member

Ok, I think the reason is that p is not cleared. The intention of the pending field was to count messages that are currently being processed. However, if you have short-circuit rules (and they are evil - I've told that many-many times), then p can be increased in the pre-filter but never decreased in the post-filter as post-filters are skipped.

@vstakhov
Copy link
Member

Or no, this symbol has all guards against it: flags = 'explicit_disable,ignore_passthrough'

@frederikbosch
Copy link
Contributor

I am still having issues. E-mails are being send from our webmail client, using an IP that I have included in the ip_whitelisted map.

The log.

2023-05-30 13:11:59 #9(normal) <d79c6d>; task; rspamd_task_write_log: id: <41bc017f249d2186cfdbbe934e5521ac@client.nl>, qid: <1q3z8p-0007ZJ-00>, ip: 172.16.20.123, user: training@client.nl, from: <training@client.nl>, (default: F (soft reject): [0.00/15.00] [RATELIMIT(0.00){incoming_ip_limit(RLg4m1d1d86msx3);},TAGGED_RCPT(0.00){}]), len: 2983, time: 6.322ms, dns req: 0, digest: <a50b2b63cf6d76a2433c197eda5a9a41>, rcpts: <...>, mime_rcpts: <>, forced: soft reject "Ratelimit "incoming_ip_limit" exceeded"; score=nan (set by ratelimit)

My ratelimit.conf:

rates {
  user = {
    bucket = [
      {
        burst = 20;
        rate = "100 / 10m";
      }
    ]
  }

  incoming_ip_limit {
    selector = "ip";
    whitelisted_ip = "/etc/rspamd/maps.d/ip-whitelist.map"
    bucket [
      {
        burst = 20;
        rate = "400 / 10m";
      }
    ]
  }
}

info_symbol = "RATELIMIT";

And my ip-whitelist.map:

app@rspamd:/$ cat /etc/rspamd/maps.d/ip-whitelist.map
46.21.123.12
2.58.123.12
172.16.20.0/24
250.0.0.0/8

And the entry from Redis.

127.0.0.1:6379> HGETALL RLg4m1d1d86msx3
 1) "l"
 2) "1685452319129"
 3) "b"
 4) "0"
 5) "dr"
 6) "50444"
 7) "db"
 8) "101221"
 9) "p"
10) "189"

@vstakhov
Copy link
Member

You cannot define per rule whitelist maps, they are defined globally for this module. The main question is why p bucket is not clearing.

It is increased here: https://github.com/rspamd/rspamd/blob/master/lualib/redis_scripts/ratelimit_check.lua#L69 when a message is started to be scanned.

It is decreased here: https://github.com/rspamd/rspamd/blob/master/lualib/redis_scripts/ratelimit_update.lua#L78

So if this postfilter is not called, we are in real troubles. But this postfilter must be called in all cases: https://github.com/rspamd/rspamd/blob/master/src/plugins/lua/reputation.lua#L1332

@vstakhov
Copy link
Member

So normally p must always be around 0.

@ladisone
Copy link
Author

ladisone commented Jun 5, 2023

@vstakhov I read previous comments and am unsure if my question is relevant. I understood the problem is p bucket is not clearing. Is this issue still considered a bug?

@benschhold
Copy link

i also reverted back to old rspamd version because i had the described issue and also my prefilter didnt work anymore. I had a whitelist with prefilter to completly whitelist a IP, after the update ratelimit wasnt included in the prefilter whitelist anymore and i had to use the whitelist from the ratelimit module. Might have something todo with the not working ratelimit but not sure

@vstakhov
Copy link
Member

Is this issue still considered a bug?

I'm not sure it is Rspamd bug, as all reports are likely from Mailcow users. I also see no way how p bucket could not be cleaned if ratelimit callbacks are called properly. That's the problem.

@vstakhov
Copy link
Member

I had a whitelist with prefilter to completly whitelist a IP, after the update ratelimit wasnt included in the prefilter whitelist anymore and i had to use the whitelist from the ratelimit module.

I'm sorry, but I cannot parse this sentence.

@benschhold
Copy link

what i tried to say that prefilter as described here https://rspamd.com/doc/modules/multimap.html#pre-filter-maps is not whitelisting the ratelimit module anymore
im not using mailcow but i also use a custom ratelimit lua, maybe thats what i have in common with mailcow

@vstakhov
Copy link
Member

I'm sorry, but what do you mean by "whitelisting ratelimit by multimap"? If that's what I think about, it has never worked as you could expect. It might work merely by occasion, and it is not an issue. For disabling symbols, you can use many methods: settings, custom Lua code, conditions etc. Multimap is not a proper tool for this task.

@benschhold
Copy link

i understand your point and i dont disagree, would just point out that it worked perfectly since years until now
i dont want to mixup issues if that has nothing to do with ratelimit issue as this might have something todo with mailcow and me using the custom_keywords feature with a custom lua script? want to mention that i also never heard issues with that before.

@ladisone
Copy link
Author

I had a whitelist with prefilter to completly whitelist a IP, after the update ratelimit wasnt included in the prefilter whitelist anymore and i had to use the whitelist from the ratelimit module.

I'm sorry, but I cannot parse this sentence.

@vstakhov I don't use Mailcow. I use only Rspamd with Postfix in my configuration. I have only this problem, which I wrote here #4467 (comment) with this simple configuration.

@frederikbosch
Copy link
Contributor

For what it is worth, my whitelist problem was indeed resolved by defining the whitelisted_ip config globally. I also checked some of the p values by fetching them from Redis, after the change, and they were indeed zero or close to zero. So my problem was something different than the issue others have here.

@sriccio
Copy link

sriccio commented Jun 13, 2023

Is this issue still considered a bug?

I'm not sure it is Rspamd bug, as all reports are likely from Mailcow users. I also see no way how p bucket could not be cleaned if ratelimit callbacks are called properly. That's the problem.

Hi,

I discovered this ratelimit issue after upgrading two standalone rspamd (not related to mailcow) from 3.4.x to 3.5.x.
Many users were getting unusually ratelimited with 3.5, as if the buckets were getting filled but weren't leaking.
Clearing the buckets in redis temporarily helped for a few hours, but downgrading to 3.4.1 definitively fixed the problem.

Looks to me that there was a change somewhere in 3.5.x which affected how ratelimit behave, at least with our config.

While googling around I found out that mailcow users were having the same kind of issues when their rspamd container was
updated to 3.5.x too.

Nothing really special in our config, except a ratelimit whitelist based on authenticated user names.

# local.d/ratelimit.conf
#
  whitelisted_user  = "${LOCAL_CONFDIR}/custom/ratelimit_whitelisted_users.map";

  rates {
    # Selector based ratelimit
    some_limit = {
      selector = 'user.lower';
      # You can define more than one bucket, however, you need to use array syntax only
      bucket = [
      {
        burst = 60; # capacity of 50 messages in the bucket
        rate = "12 / 1min"; # leak 12 messages per minute (every 5s)
      }]
    }
    # Predefined ratelimit
    to = {
      bucket = {
        burst = 100;
        rate = 0.01666666666666666666; # leak 1 message per minute
      }
    }
    # or define it with selector
    other_limit_alt = {
      selector = 'rcpts:addr.take_n(5)';
      bucket = {
        burst = 100;
        rate = "1 / 1m"; # leak 1 message per minute
      }
    }
  }

Kind regards

vstakhov added a commit that referenced this issue Jun 17, 2023
@vstakhov
Copy link
Member

Ok, I think I know the reason now: it is again about short-curcuit rules indeed. I have added one more workaround to really clean the pending bucket.

@barianiluca
Copy link

barianiluca commented Oct 4, 2023

In my opinion still exist problem with ratelimit module. Basically what happen to us is as follows:
our ratelimit local.d/ratelimit.conf:

rates {
	1000_smtp_mail_daily_limit_customerdomain_com = {
	  # 1000 mail /24h for user of @customerdomain.com domain
	  selector = 'user.lower.regexp("^[A-Za-z0-9._%+-]+@customerdomain\.com$")';
	  bucket = [
	  {
			burst = 1000;
			rate = "1000 / 24h";
	  }]
	}
	smtp_mail_daily_limit = {
	  # 300 mail /24h for others user authenticated users
	  selector = 'user.lower';
	  bucket = [
	  {
			burst = 300;
			rate = "300 / 24h";
	  }]
	}
	web_mail_daily_limit = {
	  # 20 mail /24h for not authenticated user
	  selector = 'digest(header("Subject");header("From"))';
	  bucket = [
	  {
			burst = 20;
			rate = "20 / 24h";
	  }]
	}
}

This work flawless for a while (could be something like 1 or 2 days) and then suddently the mail of @customerdomain.com does not enter any longer on their selector 1000_smtp_mail_daily_limit_customerdomain_com and therefore goes into the smtp_mail_daily_limit

Checking on the log i see (debug module on for ratelimit) something like this:

2023-10-03 09:43:07 #1478139(normal) <423ed3>; ratelimit; ratelimit.lua:527: check limit 1000_smtp_mail_daily_limit_customerdomain_it:xxx@customerdomain.com -> RLqmi6erhopgobsnh8qh8ay94b (1000/0.011574074074074073)
2023-10-03 09:43:07 #1478139(normal) <423ed3>; ratelimit; ratelimit.lua:466: got reply for limit xxx@customerdomain.com (1000 / 0.011574074074074073); 1 burst, 1.01:1.02 dyn, 1 leaked
2023-10-03 09:43:07 #1478139(normal) <423ed3>; ratelimit; ratelimit.lua:606: updated limit 1000_smtp_mail_daily_limit_customerdomain_it:xxx@customerdomain.com -> RLqmi6erhopgobsnh8qh8ay94b (1000/0.011574074074074073), burst: 1, dyn_rate: 1.01, dyn_burst: 1.02
2023-10-03 09:52:09 #1477810(main) <dsrs53>; lua; ratelimit.lua:708: enabled ratelimit: 1000_smtp_mail_daily_limit_customerdomain_it [symbol: nil, 1000 msgs burst, 0.011574074074074 msgs/sec rate]
2023-10-03 10:29:47 #2825400(normal) <5c1a01>; ratelimit; ratelimit.lua:527: check limit smtp_mail_daily_limit:xxx@customerdomain.com -> RLqmi6erhopgobsnh8qh8ay94b (300/0.003472222222222222)
2023-10-03 10:29:47 #2825400(normal) <5c1a01>; ratelimit; ratelimit.lua:466: got reply for limit xxx@customerdomain.com (300 / 0.003472222222222222); 1 burst, 1.01:1.02 dyn, 1 leaked

where seems that suddenly the limit to check change suddently from the correct on, to the default one.

We use rspamd 3.6.2

@fatalbanana
Copy link
Member

fatalbanana commented Oct 4, 2023

This work flawless for a while (could be something like 1 or 2 days) and then suddently the mail of @customerdomain.com does not enter any longer on their selector 1000_smtp_mail_daily_limit_customerdomain_com and therefore goes into the smtp_mail_daily_limit

Independent ratelimits are not evaluated in any particular order (not reliably so anyway), the selector for the catch-all limit should exclude things that are to be handled elsewhere.

That could be improved on but it's an unrelated concern to the matter reported in this issue.

@barianiluca
Copy link

This work flawless for a while (could be something like 1 or 2 days) and then suddently the mail of @customerdomain.com does not enter any longer on their selector 1000_smtp_mail_daily_limit_customerdomain_com and therefore goes into the smtp_mail_daily_limit

Independent ratelimits are not evaluated in any particular order (not reliably so anyway), the selector for the catch-all limit should exclude things that are to be handled elsewhere.

That could be improved on but it's an unrelated concern to the matter reported in this issue.

thank you. as a matter of coincidence (a wrong answer on another forum) and a casualty choice of our rules selection that guide me on thinking on top-down approach.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

8 participants