-
-
Notifications
You must be signed in to change notification settings - Fork 29
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
polib mutilates escape sequences #31
Comments
Original comment by qx0monster (Bitbucket: qx0monster, GitHub: Unknown): To add a bit more rational to this issue:
|
Original comment by qx0monster (Bitbucket: qx0monster, GitHub: Unknown): Fine with me. Where would this option be set? On the POFile/_BaseFile object? Will you add it to the params of the |
Original comment by qx0monster (Bitbucket: qx0monster, GitHub: Unknown): Would you suggest I'll take a look at it? Or is this something you want to do yourself? |
Original comment by Jakub Wilk (Bitbucket: jwilk, GitHub: jwilk): The correct solution is to fix the parser to decode all escape sequences (and possibly signal an error on unknown ones). AFAICS it is true that with qx0monster's patch polib indeed round-trips properly. It's just the Python objects that are supposed to represent the PO file contents don't make sense. For example, for this file:
I get:
which is wrong: the msgid didn't contain any whitespace. |
Original comment by qx0monster (Bitbucket: qx0monster, GitHub: Unknown): I'm terribly lagging behind ... |
Original comment by qx0monster (Bitbucket: qx0monster, GitHub: Unknown): Sorry, I'm out of this. I will not be able to provide a patch as discussed, so please don't wait for me. Close or proceed as you see fit. |
Originally reported by: qx0monster (Bitbucket: qx0monster, GitHub: Unknown)
I've just updated to polib 1.0.1, after sticking around with an 0.5.x version for a long time. Great job, D-J! Sorry to re-raise this old issue, today with a slightly different (and hopefully more convincing) phrasing.
polib mutilates valid escape sequences.
To wit, here is a simple test case:
All escape sequences unknown to polib (ie. outside of \t, \r and \n) get an additional '' in front of them. This is particular problematic for us in the case of unicode escapes, as they are frequently used to enter hard-to-type characters into msgid's and msgstr's (like the "Registered" character in the sample).
The problem arises as polib unescapes strings on reads (which removes some '', but leaves them with unknown sequences like '\u...') and escapes on writes/stringify (which unconditionally prefixes unknown escape seq's with another '').
I thought a lot about it, but to keep a long story short my resolution is to have polib leave unknown escape sequences untouched. We've ran long with this patch in several projects with good results. I probably add more of my considerations as a separate comment.
Here is the pull request:
https://bitbucket.org/izi/polib/pull-request/8/removed-escaping-unescaping-of-unknown
The text was updated successfully, but these errors were encountered: