Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature attachments #48

Merged
merged 10 commits into from May 27, 2020
Merged

Conversation

pemontto
Copy link
Contributor

Fixes #34

This PR adds a couple of features:

  1. It recursively searches the message parts for attachment and inline attachment filenames
  2. If save_attachments is true it will dump the encoded (usually base64) contents as well

Example content:

{
    ...
    "attachments": [
        {
            "filename": "test-document1.txt",
            "data": "MTI3LjAuMC4x\r\n"
        },
        {
            "filename": "test-attachment2.txt",
            "data": "dGhpcyBpcyBoZWxsbyB3b3JsZAo=\r\n"
        }
    ],
    ...
}

The attachment data can then be used by the Elasticsearch Ingest Attachment Processor Plugin.

Something like:

PUT _ingest/pipeline/test
{
  "description": "Extract attachment information from arrays",
  "processors": [{
      "foreach": {
        "field": "attachments",
        "processor": {
          "gsub": {
            "field": "_ingest._value.data",
            "pattern": "\\s",
            "replacement": ""
          }
        }
      }
    },
    {
      "foreach": {
        "field": "attachments",
        "processor": {
          "attachment": {
            "field": "_ingest._value.data",
            "target_field": "_ingest._value.attachment"
          }
        }
      }
    },
    {
      "foreach": {
        "field": "attachments",
        "processor": {
          "remove": {
            "field": "_ingest._value.data"
          }
        }
      }
    }
  ]
}

@pemontto
Copy link
Contributor Author

Added tests, removed the recursive search and replace it with the inbuilt all_parts method.

@pemontto
Copy link
Contributor Author

@robbavey @karenzone would you be able to review this, #45 and #49?

Copy link
Contributor

@robbavey robbavey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the contribution! Couple of small nitpicks and a concern on a potential docs/functionality mismatch - an attachment may not necessarily be base64 encoded, which is inconsistent with the doc.

spec/inputs/imap_spec.rb Outdated Show resolved Hide resolved
lib/logstash/inputs/imap.rb Show resolved Hide resolved
lib/logstash/inputs/imap.rb Outdated Show resolved Hide resolved
Copy link
Contributor Author

@pemontto pemontto left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All suggestions reviewed and fixed

Copy link
Contributor

@karenzone karenzone left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, @pemontto. One suggestion: Our documentation standard is to alphabetize config options. Please move save_attachments to above secure in the table and the detailed descriptions.

Otherwise, the docs build cleanly and render correctly, so LGTM. Thanks for adding quality docs along with your code changes.

@karenzone karenzone requested a review from robbavey April 24, 2020 17:27
Copy link
Contributor Author

@pemontto pemontto left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤦‍♂️fixed the alphabetical ordering

@sakthiskv
Copy link

I tried above save_attachments , but for me is not working

my conf file

input {
beats {
port => 5044
}
imap {
host => "imap.gmail.com"
user => "sakthiskv@gmail.com"
password => "1234567976736224"
secure => true
check_interval => 3
strip_attachments => "true"
fetch_count => 10000
save_attachments => true
# folder => "MyAwesomeAppEmails" # This line will only work if you apply the above mentioned patch
}
}

output {
elasticsearch {
hosts => ["http://localhost:9200"]
index => "sakthi"
}
}

@pemontto
Copy link
Contributor Author

strip_attachments => "true"

This is likely the cause, you're removing attachments.

@sakthiskv
Copy link

sakthiskv commented May 19, 2020

Thanks for quick reply ,
Its working now , but i am getting .txt files only.
i am not getting .xslx files and images

@pemontto
Copy link
Contributor Author

Are you only seeing the filename without the data field, or is the data field empty? Can you describe in more detail your setup, are you outputting to ES?

@sakthiskv
Copy link

Yes i am using Es.
I didn't get data into Es, its failing logstash itself. i am getting below error.

[2020-05-19T18:43:49,509][ERROR][logstash.inputs.imap ][main] Encountered error NoMethodError {:message=>"Can not decode an entire message, try calling #decoded on the various fields and body or parts if it is a multipart message.", :backtrace=>["C:/logstash/vendor/bundle/jruby/2.5.0/gems/mail-2.6.6/lib/mail/message.rb:1903:in decoded'", "C:/logstash/vendor/bundle/jruby/2.5.0/gems/logstash-input-imap-3.0.6/lib/logstash/inputs/imap.rb:176:in parse_mail'", "C:/logstash/vendor/bundle/jruby/2.5.0/gems/logstash-input-imap-3.0.6/lib/logstash/inputs/imap.rb:119:in block in check_mail'", "org/jruby/RubyArray.java:1800:in each'", "C:/logstash/vendor/bundle/jruby/2.5.0/gems/logstash-input-imap-3.0.6/lib/logstash/inputs/imap.rb:113:in block in check_mail'", "org/jruby/RubyArray.java:1842:in each_slice'", "C:/logstash/vendor/bundle/jruby/2.5.0/gems/logstash-input-imap-3.0.6/lib/logstash/inputs/imap.rb:111:in check_mail'", "C:/logstash/vendor/bundle/jruby/2.5.0/gems/logstash-input-imap-3.0.6/lib/logstash/inputs/imap.rb:92:in block in run'", "C:/logstash/vendor/bundle/jruby/2.5.0/gems/stud-0.0.23/lib/stud/interval.rb:20:in interval'", "C:/logstash/vendor/bundle/jruby/2.5.0/gems/logstash-input-imap-3.0.6/lib/logstash/inputs/imap.rb:91:in run'", "C:/logstash/logstash-core/lib/logstash/java_pipeline.rb:314:in inputworker'", "C:/logstash/logstash-core/lib/logstash/java_pipeline.rb:306:in block in start_input'"]}

@pemontto
Copy link
Contributor Author

@sakthiskv thanks for taking the time to run this. Would you be able to send me a copy of the email, or something similar to pemontto@gmail.com so I can take a look? Otherwise we can continue discussion at https://discuss.elastic.co/u/pemontto.

@pemontto
Copy link
Contributor Author

@sakthiskv the code path causing this error isn't actually related to this PR. Would still be interested in seeing the email that causes this.

Copy link
Contributor

@karenzone karenzone left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Docs are looking good. One minor request in line.

@@ -45,6 +45,7 @@ This plugin supports the following configuration options plus the <<plugins-{typ
| <<plugins-{type}s-{plugin}-password>> |<<password,password>>|Yes
| <<plugins-{type}s-{plugin}-port>> |<<number,number>>|No
| <<plugins-{type}s-{plugin}-secure>> |<<boolean,boolean>>|No
| <<plugins-{type}s-{plugin}-save_attachments>> |<<boolean,boolean>>|No
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Docs are building cleanly and generally look good. Will you please move the option in the table to alpha order, too?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@robbavey Will you please review and comment on recent code changes? Thanks!

@sakthiskv
Copy link

@sakthiskv thanks for taking the time to run this. Would you be able to send me a copy of the email, or something similar to pemontto@gmail.com so I can take a look? Otherwise we can continue discussion at https://discuss.elastic.co/u/pemontto.

I sent mail to above account .
Plz let me know after fixing a problem.

@sakthiskv
Copy link

[ERROR][logstash.inputs.imap ][main] Encountered error NoMethodError {:message=>"Can not decode an entire message, try calling #decoded on the various fields and body or parts if it is a multipart message.", :backtrace=>["C:/logstash/vendor/bundle/jruby/2.5.0/gems/mail-2.6.6/lib/mail/message.rb:1903:in

I am getting this error. But somehow yesterday it's worked for to get attachment like .txt files .
But now it's failing all the time. Plz help me

@thomhubers
Copy link

This PR would be highly beneficial for a use case for us too.

Copy link
Contributor

@robbavey robbavey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM - @pemontto Thank you for contribution! I'm happy to merge and publish this if you are satisfied that the issue presented by @sakthiskv is not related to this PR.

@pemontto
Copy link
Contributor Author

Thanks @robbavey, we're debugging offline, the issue is not related to this PR.

@thomhubers
Copy link

thomhubers commented May 27, 2020

If this is merged, when will it be in a release?

Edit; don't mean to push but we're just really helped here when we can use attachments so eager to test it :)

@robbavey
Copy link
Contributor

@thomhubers Once the fix is merged, we will publish the plugin, at which point we will update this issue with the information. From there, the plugin can be manually updated following the instructions here.

@karenzone karenzone self-requested a review May 27, 2020 18:19
Copy link
Contributor

@karenzone karenzone left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pemontto Thanks for your contribution. This is a nice addition!
Congrats on being a first-time contributor. I hope we'll be seeing you again.

@karenzone karenzone merged commit f35f6ba into logstash-plugins:master May 27, 2020
@karenzone karenzone mentioned this pull request May 27, 2020
karenzone added a commit that referenced this pull request May 27, 2020
Bumps version to publish changes from #48
Reorders options table in documentation
@Alsatea
Copy link

Alsatea commented Aug 10, 2020

Thank you for the amazing addition. Is it possible to apply it on .eml email file format? I usually receive emails that have another email attached to it.

@yuliansen
Copy link

yuliansen commented Oct 8, 2020

Hi good day for you and everyone. So I would like to try this plugin but problem is my attachments seems to be encrypted when i try to set secure as false its not showing any output, but when i set secure as true, it show the right output even my attachment being encrypted. here is my logstash conf

input {
        imap {
                host => "host"
                password => "password"
                user => "email.co.id"
                folder => "folder"
                port => "993"
                save_attachments=> true
                secure => false
                uid_tracking => false
                check_interval =>10
                fetch_count => 1
         }
}
output
{
stdout{codec=>rubydebug}
}
` ``

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

no information about attachments
7 participants