Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ReturnPathFBL parsing missing rhosts #415

Closed
genericcx opened this issue Oct 19, 2020 · 14 comments
Closed

ReturnPathFBL parsing missing rhosts #415

genericcx opened this issue Oct 19, 2020 · 14 comments

Comments

@genericcx
Copy link

It seems like the ARF/Feedbackloop parser does not detect all of the Returnpath (https://fbl.returnpath.net/) versions.

Example (redacted)

This is a Rackspace Abuse Report for an email message received from domain =
example.com, IP 10.0.0.1, on Wed, 14 Oct 2020 14:00:29 +0000.

--061ac93f6aad9631ee4cf05d779c8f0f9b12bc5ce29b0755e76670ef7737
Content-Disposition: inline
Content-Transfer-Encoding: 8bit
Content-Type: message/feedback-report

Version: 1
Arrival-Date: Wed, 14 Oct 2020 14:00:29 +0000
Feedback-Type: abuse
Original-Rcpt-To: e25a7fe465a61a78xxxxxx@example.net
Original-Rcpt-To: e25a7fe465a61a78xxxxxx@example.net
Original-Mail-From: casper@example.com
Reported-Domain: example.com
Source-Ip: 10.0.0.1
Source: Rackspace
Abuse-Type: complaint
Subscription-Link: https://fbl.returnpath.net/manage/subscriptions/xxxxxx
User-Agent: ReturnPathFBL/2.0
$ perl -MSisimai -e 'print Sisimai->dump("1.eml");' | jq .
[
  {
    "replycode": "",
    "recipient": "e25a7fe465a61a78xxxxxx@example.net",
    "subject": "REDACTED",
    "origin": "1.eml",
    "rhost": "",
    "addresser": "casper@example.com",
    "messageid": "3744f1f6REDTACTED2@example.com",
    "feedbacktype": "",
    "diagnostictype": "",
    "deliverystatus": "",
    "timezoneoffset": "+0000",
    "listid": "",
    "action": "",
    "smtpcommand": "",
    "senderdomain": "example.com",
    "softbounce": -1,
    "lhost": "",
    "smtpagent": "Feedback-Loop",
    "catch": null,
    "token": "REDTACTED",
    "destination": "example.net",
    "alias": "",
    "diagnosticcode": "",
    "timestamp": 1602697652,
    "reason": "feedback"
  }

I attempted to add some further options in ARF.pm (eg camelcasing the Source-Ip part, as that seems wrong from them, and extracting the IP from the domain text), however on a clean make it didnt take effect, so likely i did something wrong.

@azumakuniyuki
Copy link
Member

@cucx Thanks for the report. We'll inspect and try to fix this issue within a few days :-)

@azumakuniyuki
Copy link
Member

@cucx Sorry for the late response. I've read the issue and the sample email text, then I found the better solution for this issue. To get the value of the Source-Ip field is using a callback feature described at https://libsisimai.org/en/usage/#callback

By the way, the value of rhost is only used for calling a module in Sisimai::Rhost class.

Best regards,

@genericcx
Copy link
Author

@azumakuniyuki Thanks! Although would this mean i would need to write a seperate parser? I would think that i would be able to simply change this

} elsif( $e =~ /\ASource-IP:[ ]*(.+)\z/ ) {

to allow camel casing , and then rebuild (as then this would pick up both the correctly cased FBL's and these) . However if I edit that file and make-clean, make-local the changes do not seem to take effect. Should I be doing something else?

@azumakuniyuki
Copy link
Member

@cucx I'm so sorry. Code for getting the value of Source-IP: field is already implemented at Sisimai::ARF.
The following diff will resolve the issue, perhaps :-)

diff --git a/lib/Sisimai/ARF.pm b/lib/Sisimai/ARF.pm
index 6d1a8602..cf8a2600 100644
--- a/lib/Sisimai/ARF.pm
+++ b/lib/Sisimai/ARF.pm
@@ -231,7 +231,7 @@ sub make {
                 # Reporting-MTA: dns; mx.example.jp
                 $commondata->{'rhost'} = $1;

-            } elsif( $e =~ /\ASource-IP:[ ]*(.+)\z/ ) {
+            } elsif( $e =~ /\ASource-I[Pp]:[ ]*(.+)\z/ ) {
                 # The header is optional and MUST NOT appear more than once.
                 # Source-IP: 192.0.2.45
                 $arfheaders->{'rhost'} = $1;

@genericcx
Copy link
Author

@azumakuniyuki thanks! although when i do that it still doesnt seem to pick it up.

edit the file:

$ cat p5-sisimai/lib/Sisimai/ARF.pm | grep "\[Pp\]"
            } elsif( $e =~ /\ASource-I[Pp]:[ ]*(.+)\z/ ) {

build:

Configuring Sisimai-v4.25.9 ... OK
Building and testing Sisimai-v4.25.9 ... OK
Successfully installed Sisimai-v4.25.9
1 distribution installed

dump

$ perl -MSisimai -e 'print Sisimai->dump("/home/example/Maildir/new/1604169178.H599200P12611.example.com");' | jq 
[
  {
    "timezoneoffset": "+0000",
    "subject": "Message from website",
    "reason": "feedback",
    "diagnostictype": "",
    "senderdomain": "example.com",
    "softbounce": -1,
    "token": "3dcc4a8580d837297c93d3ce3b0045d575ecd81c",
    "catch": null,
    "listid": "",
    "alias": "",
    "deliverystatus": "",
    "smtpcommand": "",
    "destination": "example.com",
    "rhost": "",
    "lhost": "",
    "recipient": "xxx@example.com",
    "messageid": "xxx@example.com",
    "diagnosticcode": "",
    "feedbacktype": "",
    "origin": "/home/example/Maildir/new/1604169178.H599200P12611.example.com",
    "action": "",
    "replycode": "",
    "addresser": "no-reply@example.com",
    "timestamp": 1604172773,
    "smtpagent": "Feedback-Loop"
  }
]

example

$ cat /home/example/Maildir/new/1604169178.H599200P12611.example.com | grep -A2 -B2 "Source-Ip:"
Content-Type: message/feedback-report

Source-Ip: 1.2.3.4
User-Agent: ReturnPathFBL/2.0
Original-Rcpt-To: xxxx@example.com

Am i missing a step here? I can also send u a copy of one if their FBL's if needed, just let me know .

Thanks!

@azumakuniyuki
Copy link
Member

@cucx Would you post the entire ARF email (including all headers) as a sample to this issue? We'll try to parse the email with the fixed code.

Best regards,

@genericcx
Copy link
Author

Added! I had to redact a lot, but it should be fine,
redactedfbl.txt

@azumakuniyuki
Copy link
Member

@cucx Thanks for the quickly response :-) We will try to fix/implement code to resolve this issue.

@azumakuniyuki
Copy link
Member

@cucx The following diff will resolve the issue, perhaps.

diff --git a/lib/Sisimai/ARF.pm b/lib/Sisimai/ARF.pm
index 6d1a8602..55adb218 100644
--- a/lib/Sisimai/ARF.pm
+++ b/lib/Sisimai/ARF.pm
@@ -57,7 +57,7 @@ sub make {
     state $startingof = { 'rfc822' => ['Content-Type: message/rfc822', 'Content-Type: text/rfc822-headers'] };
     state $markingsof = {
         'message' => qr{\A(?>
-             [Tt]his[ ]is[ ]a[ ][^ ]+[ ]email[ ]abuse[ ]report
+             [Tt]his[ ]is[ ]a[ ][^ ]+[ ](?:email[ ])?[Aa]buse[ ][Rr]eport
             |[Tt]his[ ]is[ ]an[ ]email[ ]abuse[ ]report
             |[Tt]his[ ]is[ ](?:
                  a[ ][^ ]+[ ]authentication[ -]failure[ ]report
@@ -231,7 +231,7 @@ sub make {
                 # Reporting-MTA: dns; mx.example.jp
                 $commondata->{'rhost'} = $1;

-            } elsif( $e =~ /\ASource-IP:[ ]*(.+)\z/ ) {
+            } elsif( $e =~ /\ASource-I[Pp]:[ ]*(.+)\z/ ) {
                 # The header is optional and MUST NOT appear more than once.
                 # Source-IP: 192.0.2.45
                 $arfheaders->{'rhost'} = $1;

The patch above returns the following result.

[
  {
    "alias": "",
    "reason": "feedback",
    "feedbacktype": "abuse",
    "destination": "example.com",
    "catch": {
      "sender": "",
      "parsedat": "2020-11-05 20:38:39",
      "queue-id": "",
      "x-mailer": "",
      "mailsize": 2471
    },
    "softbounce": -1,
    "messageid": "",
    "action": "",
    "addresser": "alice@example.com",
    "smtpcommand": "",
    "rhost": "10.0.0.1",
    "lhost": "",
    "smtpagent": "Feedback-Loop",
    "deliverystatus": "",
    "timestamp": 1604199777,
    "diagnostictype": "",
    "replycode": "",
    "listid": "",
    "diagnosticcode": "",
    "recipient": "hashed@example.com",
    "token": "6050a32a445e642594a0931751dc0822d5583597",
    "origin": "issue-415.eml",
    "subject": "",
    "timezoneoffset": "+0000",
    "senderdomain": "example.com"
  }
]

azumakuniyuki added a commit that referenced this issue Nov 5, 2020
@genericcx
Copy link
Author

Yes perfect thanks! Works for all their reports

Thanks again!

@azumakuniyuki
Copy link
Member

@cucx Thanks :-)
By the way, would you permit me to add redactedfbl.txt into the repository as arf-25.eml? We want to use the file for make test at the branch (will be merged into master).

Best regards,

@genericcx
Copy link
Author

sure, please do redact any thing extra that you think would need redacting. Small note, its not only "RackSpace" , returnpath has many 3rd party providers using their "Universal Feedback Loop", so the Source: will not always show rackspace. https://help.returnpath.com/hc/en-us/articles/220221448-List-of-all-available-complaint-feedback-loops-FBLs-

eg

Source: BAE Systems
Source: Comcast
Source: Fastmail
Source: Italia Online (Libero and Virgilio)
Source: La Poste
Source: Liberty Global (Chello, UPC, Unity Media)
Source: Locaweb
Source: Mail.Ru
Source: OpenSRS

@azumakuniyuki
Copy link
Member

@cucx Thank you for your consent :-)

And then, thanks for the information about other email service provider's Source:. We'll find other patterns like This is a Rackspace Abuse Report for an email message received from domain line tomorrow.

 % grep -h 'This is a' set-of-emails/maildir/bsd/arf-* | grep -v MIME
This is an email abuse report for an email message with the message-id of X-000000000000000000000000000000000@YZ received from IP address 192.0.2.89 on Thu, 29 Apr 2009 00:00:00 -0000 (GMT)
This is an email abuse report for an email message received from mx8.example.com  on Thu, 29 Apr 2013 23:45:00 PST
This is an email abuse report for an email message received from IP 192.0.2.2 on Thu, 9 Apr 2006 23:34:45 JST.
This is an opt-out report for an email message received from IP
This is an email abuse report for an email message from amazonses.com on Thu, 29 Apr 2017 23:34:45 +0000
This is a Example email abuse report for an email message received from IP 192.0.2.222 on Thu, 29 Apr 2015 23:34:45 +0000
This is a Example email abuse report for an email message received from IP 192.0.2.1 on Thu, 29 Apr 2015 23:34:45 +0000
This is an email abuse report for an email message received from IP 192.0.2.222 on Thu, 29 Apr 2015 23:34:45 +0000.
This is a spf/dkim authentication-failure report for an email message received from IP 192.0.2.127 on Thu, 29 Apr 2015 23:34:45 +0900.
This is an authentication failure report for an email message received from IP
This is a Example email abuse report for an email message received from IP 198.51.100.224 on Thu, 29 Apr 2015 23:34:45 +0000

% grep '^Source:' set-of-emails/maildir/bsd/arf-*
set-of-emails/maildir/bsd/arf-25.eml:Source: Rackspace
%

@azumakuniyuki
Copy link
Member

@cucx The following diff will be able to parse an ARF message other patterns:

diff --git a/lib/Sisimai/ARF.pm b/lib/Sisimai/ARF.pm
index 6d1a8602..4b74b3c0 100644
--- a/lib/Sisimai/ARF.pm
+++ b/lib/Sisimai/ARF.pm
@@ -8,8 +8,7 @@ use Sisimai::RFC5322;
 sub description { return 'Abuse Feedback Reporting Format' }
 sub is_arf {
     # Email is a Feedback-Loop message or not
-    # @param    [Hash] heads    Email header including "Content-Type", "From",
-    #                           and "Subject" field
+    # @param    [Hash] heads    Email header including "Content-Type", "From" and "Subject" field
     # @return   [Integer]       1: Feedback Loop
     #                           0: is not Feedback loop
     my $class = shift;
@@ -53,11 +52,14 @@ sub make {
     #
     # Netease DMARC uses:    This is a spf/dkim authentication-failure report for an email message received from IP
     # OpenDMARC 1.3.0 uses:  This is an authentication failure report for an email message received from IP
-    # Abusix ARF uses        this is an autogenerated email abuse complaint regarding your network.
-    state $startingof = { 'rfc822' => ['Content-Type: message/rfc822', 'Content-Type: text/rfc822-headers'] };
+    # Abusix ARF uses:       this is an autogenerated email abuse complaint regarding your network.
+    state $startingof = {
+        'rfc822' => ['Content-Type: message/rfc822', 'Content-Type: text/rfc822-headers'],
+        'report' => ['Content-Type: message/feedback-report'],
+    };
     state $markingsof = {
         'message' => qr{\A(?>
-             [Tt]his[ ]is[ ]a[ ][^ ]+[ ]email[ ]abuse[ ]report
+             [Tt]his[ ]is[ ]a[ ][^ ]+[ ](?:email[ ])?[Aa]buse[ ][Rr]eport
             |[Tt]his[ ]is[ ]an[ ]email[ ]abuse[ ]report
             |[Tt]his[ ]is[ ](?:
                  a[ ][^ ]+[ ]authentication[ -]failure[ ]report
@@ -114,12 +116,16 @@ sub make {
     #
     for my $e ( split("\n", $$mbody) ) {
         # Read each line between the start of the message and the start of rfc822 part.
+
+        # This is an email abuse report for an email message with the
+        #   message-id of 0000-000000000000000000000000000000000@mx
+        #   received from IP address 192.0.2.1 on
+        #   Thu, 29 Apr 2010 00:00:00 +0900 (JST)
+        $commondata->{'diagnosis'} ||= $e if $e =~ $markingsof->{'message'};
+
         unless( $readcursor ) {
             # Beginning of the bounce message or message/delivery-status part
-            if( $e =~ $markingsof->{'message'} ) {
-                $readcursor |= $indicators->{'deliverystatus'};
-                next;
-            }
+            $readcursor |= $indicators->{'deliverystatus'} if index($e, $startingof->{'report'}->[0]) == 0;
         }

         unless( $readcursor & $indicators->{'message-rfc822'} ) {
@@ -137,6 +143,7 @@ sub make {
                 # Microsoft ARF: original recipient.
                 $dscontents->[-1]->{'recipient'} = Sisimai::Address->s3s4($1);
                 $recipients++;
+
                 # The "X-HmXmrOriginalRecipient" header appears only once so
                 # we take this opportunity to hard-code ARF headers missing in
                 # Microsoft's implementation.
@@ -174,7 +181,7 @@ sub make {
                 $rcptintext  = $rhs if $lhs eq 'to';
             }
         } else {
-            # message/delivery-status part
+            # message/feedback-report part
             next unless $readcursor & $indicators->{'deliverystatus'};
             next unless length $e;

@@ -231,7 +238,7 @@ sub make {
                 # Reporting-MTA: dns; mx.example.jp
                 $commondata->{'rhost'} = $1;

-            } elsif( $e =~ /\ASource-IP:[ ]*(.+)\z/ ) {
+            } elsif( $e =~ /\ASource-I[Pp]:[ ]*(.+)\z/ ) {
                 # The header is optional and MUST NOT appear more than once.
                 # Source-IP: 192.0.2.45
                 $arfheaders->{'rhost'} = $1;
@@ -240,13 +247,6 @@ sub make {
                 # the header is optional and MUST NOT appear more than once.
                 # Original-Mail-From: <somespammer@example.net>
                 $commondata->{'from'} ||= Sisimai::Address->s3s4($1);
-
-            } elsif( $e =~ $markingsof->{'message'} ) {
-                # This is an email abuse report for an email message with the
-                #   message-id of 0000-000000000000000000000000000000000@mx
-                #   received from IP address 192.0.2.1 on
-                #   Thu, 29 Apr 2010 00:00:00 +0900 (JST)
-                $commondata->{'diagnosis'} = $e;
             }
         } # End of if: rfc822
     }
@@ -292,6 +292,7 @@ sub make {

         $e->{'softbounce'}  = -1;
         $e->{'diagnosis'} ||= $commondata->{'diagnosis'};
+        $e->{'diagnosis'}   = Sisimai::String->sweep($e->{'diagnosis'});
         $e->{'date'}      ||= $mhead->{'date'};
         $e->{'reason'}  = 'feedback';
         $e->{'command'} = '';

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants