Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Synchronize outlookmsgfile fork #217

Merged
merged 7 commits into from Mar 7, 2024

Conversation

nazywam
Copy link
Contributor

@nazywam nazywam commented Mar 7, 2024

I came across an error while processing a msg file that contained non-ascii characters in the recipients' address node (__substg1.0_0E04001E).

  File "/usr/src/app/backend/services/outlookmsgfile.py", line 40, in to_email
    return load_message_stream(doc.root, True, doc)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/src/app/backend/services/outlookmsgfile.py", line 47, in load_message_stream
    props = parse_properties(entry["__properties_version1.0"], is_top_level, entry, doc)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/src/app/backend/services/outlookmsgfile.py", line 243, in parse_properties
    value = tag_type.load(value)
            ^^^^^^^^^^^^^^^^^^^^
  File "/usr/src/app/backend/services/outlookmsgfile.py", line 345, in load
    return value.decode("utf8").rstrip("\x00")
           ^^^^^^^^^^^^^^^^^^^^
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xb3 in position 16: invalid start byte

Unfortunately, I'm unable to share the original msg file.

I did some reading of [MS-OXMSG].pdf and come to the conclusion that if the node contents aren't unicode-encoded (non-Unicode encoding) and the specification only mentions ANSI, we cannot really be sure of the original encoding used.

image (5)

Current version of the forked library takes care of that by using some heuristics and falling back to cp1252: https://github.com/JoshData/convert-outlook-msg-file/blob/primary/outlookmsgfile.py#L391

That seems to make sense so I synchronized portions of the library used to parse the email properties.
It might make more sense to update the whole script to the current version but since it's listed as a "fork" I was not sure if it wouldn't overwrite any changes that you've made.

@ninoseki
Copy link
Owner

ninoseki commented Mar 7, 2024

Thanks and could you add # noqa: C901 in line no.191 to pass ignore the lint issue, please?

def parse_properties(  # noqa: C901
    properties: CompoundFileEntity,
    is_top_level: bool,
    container: CompoundFileEntity,
    doc: CompoundFileReader,
):

@ninoseki ninoseki self-requested a review March 7, 2024 11:11
@ninoseki
Copy link
Owner

ninoseki commented Mar 7, 2024

Lastly, please do pyupgrade --py311-plus **/*.py. Thanks in advance.

@ninoseki ninoseki merged commit 4148d50 into ninoseki:master Mar 7, 2024
5 of 7 checks passed
@nazywam nazywam deleted the bugfix/update-outlookmsgfile branch March 7, 2024 11:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants