Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Email import is broken #1140

Closed
wallace11 opened this issue Oct 26, 2021 · 14 comments · Fixed by #1160
Closed

Email import is broken #1140

wallace11 opened this issue Oct 26, 2021 · 14 comments · Fixed by #1160
Labels
bug Something isn't working or in unexpected ways joex affects the joex component

Comments

@wallace11
Copy link
Contributor

Hey there.
I'm using the email import feature with "Only import e-mail attachments" and no additional file filtering.
Running this on emails with pdf attachments, and the result is an .eml file that's being imported.
This used to work fine, but for some reason sometime around last month (September) it started acting like this.
I'm using an all-Docker stack, so maybe an update to one of the containers screwed something up?
Also on some items I'm getting the following log error, and their status is "stuck":

BASE64Decoder: Error in encoded stream: needed at least 2 valid base64 characters, but only got 0 before padding character (=), the 10 most recent characters were: "=\r\n------="

Please let me know if there's any other information I need to provide.
Thanks.

@eikek
Copy link
Owner

eikek commented Oct 26, 2021

Hi @wallace11 , I will test again the email import - thanks for reporting. I can't remember changing any code in this area, but who knows what causes this. Is this happening to all your emails? Can you maybe post a bit more from the logs around the base64 error? Thanks!

@wallace11
Copy link
Contributor Author

Yeah, I find it so strange because it just suddenly started acting weird and I didn't see any new Docspell release. That's why I thought maybe it was a dependency update that I didn't notice.
I don't have much time these days so sorry for not investigating this more thoroughly.

Here's an example of a full import log:

2021-10-26T4:11:12: ============ Start processing 019.eml ============
2021-10-26T4:11:12: Checking for duplicate files
2021-10-26T4:11:12: Creating new item with 1 attachment(s)
2021-10-26T4:11:12: Creating item finished in 10 ms
2021-10-26T4:11:12: Reading e-mail 019.eml
2021-10-26T4:11:12: Filtering email attachments with '*'
2021-10-26T4:11:12: Converting e-mail file...
2021-10-26T4:11:12: Job AVmZrbc4c.../wallace/process-item/Low execution failed. Retrying later.: BASE64Decoder: Error in encoded stream: needed at least 2 valid base64 characters, but only got 0 before padding character (=), the 10 most recent characters were: "=\r\n------="
2021-10-26T4:12:49: ============ Start processing 019.eml ============
2021-10-26T4:12:49: Found 1 existing item with these files.
2021-10-26T4:12:49: Found 1 attachments. Use only those from task args: Set(Ident(7M2dWFDZMCo96D8TLmtejDfqt28A47bawdEE91VyuWW5))
2021-10-26T4:12:49: Reading e-mail 019.eml
2021-10-26T4:12:49: Filtering email attachments with '*'
2021-10-26T4:12:49: Converting e-mail file...
2021-10-26T4:12:49: Job AVmZrbc4c.../wallace/process-item/Low execution failed. Retrying later.: BASE64Decoder: Error in encoded stream: needed at least 2 valid base64 characters, but only got 0 before padding character (=), the 10 most recent characters were: "=\r\n------="

@eikek
Copy link
Owner

eikek commented Oct 29, 2021

Don't worry about time, I'll see what I can find. Just to make sure I understand: the mails are now imported completely (with the body) and not just the attachments?

@wallace11
Copy link
Contributor Author

Thanks!
The imported item is a .eml file.
Firefox, for instance, is just prompting for downloading a file upon entering the item:
image

@eikek
Copy link
Owner

eikek commented Nov 1, 2021

Oh strange, it then could not get processed , maybe not even read. Do you maybe have a non-sensible example file to share?

@wallace11
Copy link
Contributor Author

So I went to a government website and downloaded the first PDF that I could find and sent it to myself via email.
I tagged the email and initiated an import and... voila! An error!

The PDF is from the following URL: https://m.knesset.gov.il/Activity/Legislation/Documents/yesod6.pdf (please rename the file to "חוק-יסוד: הממשלה.pdf" before you send it to yourself, to make it extra "problematic").

This is the log of the import:

2021-11-03T19:29:00: ============ Start processing hello.eml ============
2021-11-03T19:29:00: Checking for duplicate files
2021-11-03T19:29:00: Creating new item with 1 attachment(s)
2021-11-03T19:29:00: Creating item finished in 17 ms
2021-11-03T19:29:00: Reading e-mail hello.eml
2021-11-03T19:29:00: Filtering email attachments with '*'
2021-11-03T19:29:00: Converting e-mail file...
2021-11-03T19:29:00: Job 7TLSm94hf.../wallace/process-item/Low execution failed. Retrying later.: BASE64Decoder: Error in encoded stream: needed at least 2 valid base64 characters, but only got 0 before padding character (=), the 10 most recent characters were: "0KJSVFT0Y="

Thanks!

@tobru
Copy link

tobru commented Nov 7, 2021

I'm seeing the same error with 0.28.0. My workflow is to scan from a Brother ADS-2700W to E-Mail and let it import by Docspell. This used to work quite well, but now I also got this BASE64Decoder error.

It happens with all scans, tried several documents, but all fail.

@eikek
Copy link
Owner

eikek commented Nov 7, 2021

Oh noes. I haven't had time to look at it sadly :/. You are also using the scan/import mailbox task? Or is it send "directly" (like using an internal smtp) to docspell?

@tobru
Copy link

tobru commented Nov 7, 2021

It's the "Scan Mailbox" task.

@eikek
Copy link
Owner

eikek commented Nov 8, 2021

Ok, I can reproduce it with the pdf from knesset.gov.il (tried some other test files where I couldn't reproduce it). I'll need to investigate, as I currently have no clue what's going on.

@eikek eikek added bug Something isn't working or in unexpected ways restserver Affects the rest server. joex affects the joex component and removed restserver Affects the rest server. labels Nov 8, 2021
@eikek eikek added this to the Docspell 0.29.0 milestone Nov 8, 2021
eikek added a commit that referenced this issue Nov 8, 2021
This pulls in a fix to address #1140
eikek added a commit that referenced this issue Nov 8, 2021
This pulls in a fix to address #1140
@mergify mergify bot closed this as completed in #1160 Nov 8, 2021
@eikek
Copy link
Owner

eikek commented Nov 8, 2021

Hm… I think I fixed it. It only seems not possible that it worked in 0.27.0 - because this version also contains the bug. If you could test the nightly that is available shortly that would be great!

@wallace11
Copy link
Contributor Author

@eikek Thanks for addressing this issue.
Unfortunately, I can't seen to get email scanning to run on nightly. Maybe it's because my nightly instance doesn't have ssl.
I guess we'll have to wait for the stable release, then :)

@eikek
Copy link
Owner

eikek commented Nov 14, 2021

@wallace11 thanks for trying! no worries, I'll make a release soon and we see if it is fixed or if I have another take at it.

@wallace11
Copy link
Contributor Author

@eikek
I can confirm this issue is now resolved.
I managed to import emails I couldn't import before.
Thank you :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working or in unexpected ways joex affects the joex component
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants