Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unicode strings in HTTP headers when using metadata_exporter [BUG] #4808

Closed
1 task done
larskraemer opened this issue Feb 2, 2024 · 11 comments
Closed
1 task done
Labels

Comments

@larskraemer
Copy link

Prerequisites

Describe the bug

When using the metadata_exporter module with meta_headers enabled, UTF-8 strings from the mail data (e.g. subject) are sent directly as HTTP header values.
HTTP headers, including their values, are required to be pure ASCII.
This behaviour leads to encoding errors when receiving POST requests from rspamd with common HTTP server implementations. (Tested: ExpressJS)

Steps to Reproduce

  1. Set up metadata_exporter to send data to a Server via HTTP, with the meta_headers flag enabled
  2. Set up a simple HTTP server to dump the HTTP requests sent by rspamd
  3. Receive Mail with Unicode characters in the subject
  4. Observe incorrectly encoded characters on the server (plain UTF-8 in the header values)

Expected behavior
rspamd should send UTF-8 header values encoded according to RFC 2047.
I suggest either requiring all modules using rspamd_http to encode header fields before passing them, or to modify rspamd_http to encode UTF-8 headers correctly before sending.

Versions

Tested on rspamd 3.4 (Debian 12.2 package, kernel 6.1.0-13-amd64).

[Looking at the code in master, I don't expect the behaviour to be different in the current version]

@lspagnol
Copy link

Hi, i have the same issue and it is not fixed:
Ubuntu 20.04.6 LTS
rspamd: 3.8.4-193fa4f6focal
Thanks in advance

@vstakhov
Copy link
Member

I am not able to reproduce.

@lspagnol
Copy link

I've just downgraded to previous installed version (rspamd_3.7.5-28c86c16focal_amd64.deb) and it's OK
I have a cluster of 4 rspamd servers -> "rspamd1", "rspamd2", "rspamd3".
These servers send data to "rspamd" node with metadata_exporter
The "rspamd" is my quarantine server: metadatas are imported to a MariaDB database, mails are stored in filesystem
I wrote a WebUI for quarantine management, all works fine for 4 years ago.

@vstakhov
Copy link
Member

I'm sorry but I don't see how that is intended to support your claim. The issue with unicode encoding has been there since the start of this plugin (so before v3 at all).

@lspagnol
Copy link

My english is very poor, so these are 2 screenshots that illustrate the problem
good encoding headers
bad encoding headers

@lspagnol
Copy link

Rspamd 3.8.4: encoding of metadata is bad
Downgrade to 3.8.2: encoding of metadata is also bad
Dowgrade to 3.7.5: encoding of metadata is good

@lspagnol
Copy link

The first screenshot: good encodind, the second screenshot: bad encoding

@lspagnol
Copy link

A file with good metadada:

[11:33:12]:[rspamd@~]
# cat /opt/rspamd-quarantine/data/meta/4/2/4288C634_rspamd2_1710801724
HTTP_X_RSPAMD_ACTION=reject
HTTP_X_RSPAMD_FROM=yclucll@rossav.or.mg
HTTP_X_RSPAMD_FUZZY='[]'
HTTP_X_RSPAMD_IP=77.87.212.166
HTTP_X_RSPAMD_QID=4288C634
HTTP_X_RSPAMD_RCPT='["XXXXXXXXXniv-reXXXX.fr","XXXXXXXXXXX@XXXXXXs.fr","XXXXXXXXXhi@uXXXXXXs.fr","olXXXXXXXXge@uXXXs.fr","pXXXXXXXXXXXs.fr","sXXXXXXXXXX.fr","XXXXXXXXXs.fr"]'
HTTP_X_RSPAMD_SCORE=20.196811723969
HTTP_X_RSPAMD_SIZE=78743
HTTP_X_RSPAMD_SUBJECT=$'Asseyez-vous confortablement, n\'importe o\303\271...'
HTTP_X_RSPAMD_SYMBOLS='[{"name":"DMARC_POLICY_ALLOW","groups":["policies","dmarc"],"group":"policies","score":0,"options":["rossav.or.mg","quarantine"]},{"name":"FROM_HAS_DN","groups":["headers"],"group":"headers","score":0},{"name":"FROM_EQ_ENVFROM","groups":["headers"],"group":"headers","score":0},{"name":"NEURAL_SPAM","groups":["neural"],"group":"neural","score":2.996819,"options":["0.999"]},{"name":"BAD_REP_POLICIES","groups":["composite"],"group":"composite","score":0.100000},{"name":"TO_DN_NONE","groups":["headers"],"group":"headers","score":0},{"name":"MID_RHS_NOT_FQDN","groups":["Message ID"],"group":"Message ID","score":0.500000},{"name":"MIME_GOOD","groups":["mime_types"],"group":"mime_types","score":-0.100000,"options":["multipart/related","multipart/alternative","text/plain"]},{"name":"FORGED_RECIPIENTS","groups":["headers"],"group":"headers","score":2,"options":["m:ia@univ-orleans.fr","s:aep.dir@univ-reims.fr","s:bart.lamiroy@univ-reims.fr","s:dominique.flenghi@univ-reims.fr","s:olivier.debarge@univ-reims.fr","s:philippe.gillery@univ-reims.fr","s:sesg.master-ape@univ-reims.fr","s:srh@univ-reims.fr"]},{"name":"URIBL_BLACK","groups":["surbl","uribl","rbl"],"group":"surbl","score":7.500000,"options":["revera.bieszczady.pl:url"]},{"name":"R_DKIM_NA","groups":["policies","dkim"],"group":"policies","score":0},{"name":"R_SPF_ALLOW","groups":["policies","spf"],"group":"policies","score":0,"options":["+mx:c"]},{"name":"RCVD_VIA_SMTP_AUTH","groups":["headers"],"group":"headers","score":0},{"name":"ARC_NA","groups":["policies","arc"],"group":"policies","score":0},{"name":"ASN","groups":[],"group":"ungrouped","score":0,"options":["asn:12616, ipnet:77.87.212.0/24, country:RU"]},{"name":"RCVD_COUNT_ONE","groups":["headers"],"group":"headers","score":0,"options":["1"]},{"name":"MIME_TRACE","groups":["mime_types"],"group":"mime_types","score":0,"options":["0:+","1:+","2:+","3:~","4:~","5:+"]},{"name":"RCVD_NO_TLS_LAST","groups":["headers"],"group":"headers","score":0.100000},{"name":"RCPT_COUNT_ONE","groups":["headers"],"group":"headers","score":0,"options":["1"]},{"name":"MISSING_XM_UA","groups":["headers"],"group":"headers","score":0},{"name":"BAYES_SPAM","groups":["statistics"],"group":"statistics","score":5.099993,"options":["99.99%"]},{"name":"DCC_REJECT","groups":["dcc"],"group":"dcc","score":2,"options":["bulk Body=7 Fuz1=7 Fuz2=many rep=99% "]}]'
HTTP_X_RSPAMD_USER=unknown
NODE=rspamd2

Encodind of "HTTP_X_RSPAMD_SUBJECT" is good

@lspagnol
Copy link

A file with bad metadata:

# cat /opt/rspamd-quarantine/data/meta/3/D/3D554156_rspamd1_1711185005
HTTP_X_RSPAMD_ACTION='soft reject'
HTTP_X_RSPAMD_FROM=noreply@les-deals-du-web.fr
HTTP_X_RSPAMD_FUZZY='[]'
HTTP_X_RSPAMD_IP=37.60.56.62
HTTP_X_RSPAMD_QID=3D554156
HTTP_X_RSPAMD_RCPT='["XXXXXXXXs.fr"]'
HTTP_X_RSPAMD_SCORE=9.7485564304462
HTTP_X_RSPAMD_SIZE=55755
HTTP_X_RSPAMD_SUBJECT='=?UTF-8?Q?=5BCaroll=5D_Derni=C3=A8re_Cha?= =?UTF-8?Q?nce_=3A_=2D50=25_sur_nos_Pr?= =?UTF-8?Q?oduits_=21?='
HTTP_X_RSPAMD_SYMBOLS=$'[{"score":0,"group":"headers","name":"FROM_HAS_DN","groups":["headers"]},{"score":0,"group":"headers","name":"FROM_EQ_ENVFROM","groups":["headers"]},{"score":0,"group":"headers","name":"REPLYTO_ADDR_EQ_FROM","groups":["headers"]},{"score":1,"group":"headers","name":"HAS_INTERSPIRE_SIG","groups":["headers"]},{"score":0,"group":"headers","name":"TO_DN_NONE","groups":["headers"]},{"score":-0.010000,"group":"headers","name":"HAS_LIST_UNSUB","groups":["headers"]},{"options":["caroll.momot@univ-reims.fr"],"score":0,"group":"headers","name":"PREVIOUSLY_DELIVERED","groups":["headers"]},{"options":["77.4%"],"score":0.548556,"group":"body","name":"R_PARTS_DIFFER","groups":["body"]},{"options":["multipart/alternative","text/plain"],"score":-0.100000,"group":"mime_types","name":"MIME_GOOD","groups":["mime_types"]},{"score":1,"group":"composite","name":"AUTOGEN_PHP_SPAMMY","groups":["composite"]},{"options":["100.00%"],"score":5.100000,"group":"statistics","name":"BAYES_SPAM","groups":["statistics"]},{"options":["add header"],"score":0,"group":"force_actions","name":"FORCE_ACTION_UCM_03","groups":["force_actions"]},{"options":["+ip4:37.60.56.0/21:c"],"score":0,"group":"policies","name":"R_SPF_ALLOW","groups":["policies","spf"]},{"score":0.100000,"group":"headers","name":"ONCE_RECEIVED","groups":["headers"]},{"options":["asn:16276, ipnet:37.60.56.0/21, country:FR"],"score":0,"group":"ungrouped","name":"ASN","groups":[]},{"options":["1"],"score":0,"group":"headers","name":"RCVD_COUNT_ONE","groups":["headers"]},{"options":["bulk Body=1 Fuz1=493 Fuz2=many rep=98% "],"score":2,"group":"dcc","name":"DCC_REJECT","groups":["dcc"]},{"score":0.100000,"group":"composite","name":"BAD_REP_POLICIES","groups":["composite"]},{"options":["les-deals-du-web.fr:+"],"score":0,"group":"policies","name":"DKIM_TRACE","groups":["policies","dkim"]},{"score":0,"group":"compromised_hosts","name":"HAS_PHPMAILER_SIG","groups":["compromised_hosts"]},{"score":0,"group":"headers","name":"TO_MATCH_ENVRCPT_ALL","groups":["headers"]},{"score":0,"group":"headers","name":"RCVD_TLS_LAST","groups":["headers"]},{"score":0.010000,"group":"experimental","name":"XM_UA_NO_VERSION","groups":["experimental"]},{"options":["les-deals-du-web.fr:s=smtp"],"score":0,"group":"policies","name":"R_DKIM_ALLOW","groups":["policies","dkim"]},{"options":["les-deals-du-web.fr"],"score":0,"group":"policies","name":"DMARC_NA","groups":["policies","dmarc"]},{"options":["Caroll avec Happy Promos\342\234\214\357\270\217"],"score":0,"group":"multimap","name":"BLACKLIST_UCM_HEADER_NAME","groups":["multimap"]},{"options":["rspamd.com"],"score":0,"name":"FUZZY_BLOCKED","group":"ungrouped"},{"options":["noreply@les-deals-du-web.fr"],"score":0,"group":"multimap","name":"BLACKLIST_UCM_HEADER_FROM","groups":["multimap"]},{"options":["add header"],"score":0,"group":"force_actions","name":"FORCE_ACTION_UCM_02","groups":["force_actions"]},{"options":["greylisted","Sat, 23 Mar 2024 09:15:05 GMT","new record"],"score":0,"group":"ungrouped","name":"GREYLIST","groups":[]},{"options":["1"],"score":0,"group":"headers","name":"RCPT_COUNT_ONE","groups":["headers"]},{"score":0,"group":"policies","name":"ARC_NA","groups":["policies","arc"]},{"options":["0:+","1:+","2:~"],"score":0,"group":"mime_types","name":"MIME_TRACE","groups":["mime_types"]},{"score":0,"name":"ONE_SMTP_RCPT","group":"ungrouped"},{"options":["add header"],"score":0,"group":"force_actions","name":"FORCE_ACTION_UCM_05","groups":["force_actions"]},{"score":0,"group":"headers","name":"SUBJECT_ENDS_EXCLAIM","groups":["headers"]},{"options":["37.60.56.62:from"],"score":0,"group":"rbl","name":"RCVD_IN_DNSWL_NONE","groups":["rbl","dnswl"]},{"options":["noreply@les-deals-du-web.fr","les-deals-du-web.fr"],"score":0,"group":"multimap","name":"BLACKLIST_UCM_ENVELOPE_FROM","groups":["multimap"]},{"options":["noreply@les-deals-du-web.fr"],"score":0,"group":"headers","name":"HAS_REPLYTO","groups":["headers"]}]'
HTTP_X_RSPAMD_USER=unknown
NODE=rspamd1

Encodind of "HTTP_X_RSPAMD_SUBJECT" is bad

@lspagnol
Copy link

So all was OK with version 3.7.5, i have upgraded to 3.8.4 this week and the problem appears: encoding of "HTTP_X_RSPAMD_SUBJECT" is bad. I've downgraded to 3.8.2 and the problem is also here.
So i downgraded directly to my previous "good" version (3.7.5) and all is OK for me.
It's difficult for me to test other intermediate version because these are "production" servers of my University.

@lspagnol
Copy link

Personnal comment: before Rspamd, i've worked with SpamAssassin and Sophos PureMessage.
My opinion is that Rspamd is better than other softwares, it has killer functionnalities such as wrapping with OLEtools and fuzzy lists (and more). I also works with a French contributor of ClamAV for efficient 0-Day signatures a very low cost (securiteinfo.com)
Big up and thanks for the Rspamd team !! :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants