Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

High CPU when fdm handed a strange spam Email in combination with questionable match rules #129

Open
koitsu opened this issue Jul 6, 2023 · 2 comments

Comments

@koitsu
Copy link

koitsu commented Jul 6, 2023

I came across a very interesting edge case that manifests itself quite badly with fdm. A combination of a very strange spam Email alongside some questionably-written fdm.conf match rules resulted in fdm stalling on Email processing, taking a huge amount of CPU time (100%+), then after a very long time (60+ seconds) would continue yet still took up huge CPU amounts on subsequent (different) mails.

I can tell the whole story if asked, but I'm trying to keep this bug report as on-topic as possible.

The Email I won't attach here because it contains some headers/private info in it, but just let me know where I can Email it (I'll base64 it first + send as an attachment). It's a syntactically-valid mail (mutt parses it just fine), and is not particularly large.

I spent a couple hours working out a simple reproduction case.

$ ls -l bad.email
-rw-------    1 jdc       users     81718 Jul  6 00:53 bad.email
$ cat bad.fdm.conf
account "stdin" disabled stdin

action "drop" drop
action "keep" keep

match "^From:.*<([^>]+)>$" in headers {
    match all action tag "from" value "%1" continue

    match string "%[from]"
    to ".+@(amazon.com|.+\\.amazon.com)"
    or ".+@(xxxxxxxx.com|.+\\.xxxxxxx.com)"
    action "drop"
}

match all action "keep"
$ fdm -f bad.fdm.conf -n
$ cat bad.email | fdm -f bad.fdm.conf -vv -a stdin fetch
version is: fdm 2.0, started at: Thu Jul  6 02:09:43 2023
running on: FreeBSD 11.4-STABLE FreeBSD 11.4-STABLE #0 stable/11-n215831-21e88c53735d: Fri Oct  1 15:07:44 PDT 2021     root@icarus.home.lan:/usr/obj/usr/src/sys/X7SBA_RELENG_11_amd64 amd64
host is: icarus.home.lan icarus.home.lan 192.168.1.51
home is: /home/jdc
loading configuration from bad.fdm.conf
added account "stdin": fetch=stdin
added action "drop": deliver=0:drop
added action "keep": deliver=0:keep
added rule 0: matches=regexp "^From:.*<([^>]+)>$" in headers nested
added rule 1: matches=all lambda=0:tag "from" value "%1"
added rule 2: matches=string "%[from]" to ".+@(amazon.com|.+\.amazon.com)" or regexp ".+@(xxxxxxxx.com|.+\.xxxxxxx.com)" in any actions="drop"
added rule 3: matches=all actions="keep"
configuration loaded
locking using: flock
options are: maximum-size=33554432, timeout=900, default-user="jdc", command-user="jdc", file-umask=077, queue-high=2, queue-low=1, strip-characters="\<>$%^&*|{}[]"'`;"
using tmp directory: /tmp
parent: started, pid is 3772
parent: 0 children, 0 dead children
parent: child 3773 (stdin) started
stdin: fetch started, pid 3773
stdin: user is 1000
stdin: started processing
stdin: fetching
stdin: found 1108 lines, 928 in body
stdin: message-id is: <CA++7=XH0BYbUnueCJ6BVQgUGy82nmBdLscJb+gXhc9uTOPhOPg@mail.gmail.com>
stdin: found 156 wrapped lines
stdin: got message 1: size 81770, body 11755
stdin: matched to rule 0
stdin: entering nested rules
stdin: matched to rule 1
stdin: message 1, running action <rule 1>:0 (tag) as user jdc
stdin: tagging message: from (sirekquesinberry7p5013@gmail.com)
stdin: message 1 delivered (rule 1, tag) in 0.000 seconds
^Cparent: 1 children, 0 dead children
parent: caught SIGINT. stopping
parent: child 3773 killed
parent: finished, total time 29.175 seconds

A top while things are running:

  PID USERNAME    THR PRI NICE   SIZE    RES   SWAP STATE   C   TIME    WCPU COMMAND
 3773 jdc           1  99    0 10680K  6364K     0K CPU2    2   0:22  98.05% fdm: child: stdin (fdm)
 3774 jdc           1  20    0  7928K  3568K     0K CPU0    0   0:00   0.08% top

If I don't Ctrl-C it, eventually it does get processed, but took over 60 seconds:

stdin: tagging message: from (sirekquesinberry7p5013@gmail.com)
stdin: message 1 delivered (rule 1, tag) in 0.000 seconds
stdin: exiting nested rules
stdin: matched to rule 3
stdin: looking for actions matching: keep
stdin: found 1 actions
stdin: message 1, running action keep:0 (keep) as user jdc
stdin: message 1 delivered (rule 3, keep) in 0.000 seconds
stdin: keeping message 1
stdin: 1 messages processed (1 kept) in 66.143 seconds (average 66.143)
stdin: finished processing. exiting
parent: sending exit message to child 3810
parent: 1 children, 0 dead children
parent: 1 children, 0 dead children
parent: child 3810 socket error
parent: 1 children, 0 dead children
parent: child 3810 returned 0
parent: finished, total time 66.144 seconds

What I SHOULD have been using (and does not trigger the problem with the same strange spam Email):

    match string "%[from]" to ".+@(amazon.com|.+\\.amazon.com)"
       or string "%[from]" to ".+@(xxxxxxxx.com|.+\\.xxxxxxx.com)"
    action "drop"

fdm -n -f test.fdm.conf does not complain about this questionable syntax, but the man page implies it should be considered invalid:

RULES
     Rules are specified using the match keyword.  It has the following basic form:

     match condition [and | or condition ...] [users] actions [continue]

     The condition argument may be one of:
...
     [case] regexp [in headers | in body]
             Specifies a regexp against which each mail should be matched.  The regexp
             matches may be restricted to either the headers or body of the message by
             specifying either in headers or in body.  The case keyword forces the regexp to
             be matched case-sensitively: the default is case-insensitive matching.
...
     Multiple conditions may be chained together using the and or or keywords.  The
     conditions are tested from left to right.  Any condition may be prefixed by the not
     keyword to invert it.
...
@iamleot
Copy link

iamleot commented Jul 6, 2023

Hello Jeremy,

fdm -n -f test.fdm.conf does not complain about this questionable syntax, but the man page implies it should be considered invalid:
[...]

Mmm, why/what? (or, I couldn't understand what part of fdm.conf(5) seems to imply that)


As a side note... I've definitely hit something similar too (I forgot details though, i.e. I forgot if the same RE was evaluated over and over for each single rule and that was the CPU intensive part) and to avoid that for all the headers that I would like to filter I have the following in fdm.conf:

# Populate headers tags needed by rules
match "^cc:(.*)" in headers action tag "cc" value "%1" continue
match "^from:(.*)" in headers action tag "from" value "%1" continue
match "^list-id:(.*)" in headers action tag "list-id" value "%1" continue
match "^subject:(.*)" in headers action tag "subject" value "%1" continue
match "^to:(.*)" in headers action tag "to" value "%1" continue

...and then I e.g.:

match string "%[list-id]" to "fdm-users[@.]lists.sourceforge.net" or string "%[to]" to "fdm-users[@.]lists.sourceforge.net" or string "%[cc]" to "fdm-users[@.]lists.sourceforge.net" actions { tag "ml" value "fdm-users" action "rcvstore" }

@koitsu
Copy link
Author

koitsu commented Jul 7, 2023

fdm -n -f test.fdm.conf does not complain about this questionable syntax, but the man page implies it should be considered invalid:
[...]

Mmm, why/what? (or, I couldn't understand what part of fdm.conf(5) seems to imply that)

This has to do with how the documentation is written, I think. The mistake with my fdm.conf rules is my own regardless. Now that I've gone over the documentation more closely, it seems this IS valid syntax.

The part of the rule syntax that lead to my mistake was this:

[case] regexp [in headers | in body]
        Specifies a regexp against which each mail should be matched.  The regexp
        matches may be restricted to either the headers or body of the message by
        specifying either in headers or in body.  The case keyword forces the regexp to
        be matched case-sensitively: the default is case-insensitive matching.

This is the only condition that DOES NOT have a preceding operator word in front of it (e.g. tagged string, exec command ..., string string to regex, etc.). It really should have been something like content [case] regexp [in headers | in body]. (Additionally, there is no mention that if you do not specify in headers or in body, BOTH are examined).

Thus, when I wrote my rules, I effectively expected these 2 rules to be identical, and they are very clearly not:

match string "%[from]" to ".+@(amazon.com|.+\.amazon.com)" or ".+@(xxxxxxxx.com|.+\.xxxxxxx.com)"

match string "%[from]" to ".+@(amazon.com|.+\.amazon.com)" or string "%[from]" to ".+@(xxxxxxxx.com|.+\.xxxxxxx.com)"

This is one of those situations where a BNF for fdm.conf would have made things more clear to me.

That said: it still doesn't explain why fdm hits 100% CPU for 66+ seconds when fed this bizarre spam mail I received.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants