Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Write is not handling latin1 characters correctly #9

Open
ghost opened this issue Jan 19, 2018 · 1 comment
Open

Write is not handling latin1 characters correctly #9

ghost opened this issue Jan 19, 2018 · 1 comment

Comments

@ghost
Copy link

ghost commented Jan 19, 2018

Issue: Converting RTF text containing accented characters (é, ä, ú etc.) trims the spaces.

Example:
Take this text for example:
{\rtf1\ansi\ansicpg1252\deff0{\fonttbl{\f0\fnil Bookman Old Style;}{\f1\fnil\fcharset0 Bookman Old Style;}} \viewkind4\uc1\pard\lang1043\f0\fs20 Histoire naturelle g\f1\'e9n\'e9rale et particuli\'e8re des crustac\'e9s et des insectes. Ouvrage faisant suite aux oeuvres de Buffon, et partie du cours complet d'histoire naturelle r\'e9dig\'e9 par C. S. Sonnini, membre de plusieurs soci\'e9t\'e9s savantes. Familles naturelles des genres. Tomes 1-14. [Complete for the Arthropoda].\f0\par }

There should be a space after rédigé but it currently converts to rédigépar.

@jstewmc
Copy link
Owner

jstewmc commented Nov 6, 2021

The library looks to be confused by the space in r\'e9dig\'e9 par.

I believe it's considering the space a delimiter for the '\e9 control symbol (which is part of the control word), instead of a space in the text (which is output as text).

I'm not certain what the fix would be. I could update the apostrophe control symbol to never be space delimited, but this may have unintended consequences in the other direction (i.e., inserting spaces where they don't belong)?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant