Update to try to use windows-950 when extract-msg is imported #20
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Changed
RTFDE.text_extraction.get_python_codec
to lookup thewindows-950
encoding to see if it is available. This is only done when the codepage number is 950, and ifwindows-950
is not found, the function behaves like normal. If the encoding is found, it will return'windows-950'
for the codec to use.The upcoming version of
extract-msg
, version 0.42.0, adds the implementation that Microsoft uses for cp950 to ensure that documents using it will be parsed correctly.This is a fix for #19 and passes with the test file specified in that issue.