Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extend unquotePrintable function to support 4-byte Unicode characters and concatenated sequences #33

Open
joaoaugustogrobe opened this issue Apr 5, 2023 · 0 comments

Comments

@joaoaugustogrobe
Copy link

The current unquotePrintable function does not correctly support 4-byte Unicode characters and has issues in parsing multiple concatenated Unicode character sequences, such as =C9=91=E2=8D=BA (ɑ⍺ - 2 bytes, 3 bytes). The function incorrectly parses this input as ɑ���.

To resolve this issue, we need to:

  1. Extend the function to support 4-byte Unicode characters.
  2. Enable the function to correctly handle multiple concatenated Unicode characters.

A possible solution involves using the first byte of the Unicode character to determine the number of bytes it contains, as described in the IBM documentation. We can implement a recursive helper function that takes the entire Unicode sequence, determines the length of the next character based on the first byte, parses the character, and then calls the function recursively for the subsequent characters.

This enhancement will ensure that the unquotePrintable function properly handles various Unicode character sequences, allowing for more accurate parsing and processing of text data.

@joaoaugustogrobe joaoaugustogrobe changed the title 4 bytes unicode characters Extend unquotePrintable function to support 4-byte Unicode characters and concatenated sequences Apr 5, 2023
joaoaugustogrobe added a commit to joaoaugustogrobe/eml-format that referenced this issue Apr 5, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant