Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

filenames of attachments are not properly encoded #1

Closed
brychcy opened this issue Dec 12, 2021 · 4 comments
Closed

filenames of attachments are not properly encoded #1

brychcy opened this issue Dec 12, 2021 · 4 comments

Comments

@brychcy
Copy link
Contributor

brychcy commented Dec 12, 2021

When reading the .eml files generated by readpst, the filenames of some attachments could not be retrieved with the python e-mail library, e.g if they contain parenthesis like
filename*=utf-8''Status_Announced_Invoice(s).pdf;

It turned out that rfc2231_string in readpst.c doesn't escape all characters as required.

Correct would be
filename*=utf-8''Status_Announced_Invoice%28s%29.pdf;

@pabs3
Copy link
Member

pabs3 commented Dec 20, 2021 via email

@pabs3
Copy link
Member

pabs3 commented Dec 20, 2021 via email

brychcy added a commit to brychcy/libpst that referenced this issue Jan 10, 2022
Parentheses and other characters were not being encoded.

Fixes: pst-format#1
@brychcy
Copy link
Contributor Author

brychcy commented Jan 10, 2022

libpst1.pst.gz

As requested, example PST file (compressed with gzip as required by github).

The name of the attached file in the contained mail is "Hello-(123)-World.pdf"

@brychcy
Copy link
Contributor Author

brychcy commented Jan 10, 2022

A simple python script for printing the file names of PDF attachments:

#!/usr/bin/env python3
# use python 3.6 or later

import email
import email.policy

# point this to the file generated with "readpst -e ..."
filename = "/Users/till/opensource/libpst/test/patched/Outlook-Datendatei/libpst1/1.eml"

with open(filename, "rb") as f:
    msg = email.message_from_binary_file(f, policy=email.policy.default)

# uncomment the following lines to print the structure
# from email.iterators import _structure
# _structure(msg)

found = False
for part in msg.walk():
    if part.is_attachment() and part.get_content_type() == 'application/pdf':
        filename = part.get_filename(failobj="")
        found = True
        print("found: " + filename)

if not found:
    print("no pdf found!")

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

2 participants