-
-
Notifications
You must be signed in to change notification settings - Fork 39
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unicode escape #354
Unicode escape #354
Conversation
Hi @dariodsa! Thanks for working on this 🤗 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dariodsa Awesome job! Thanks for working on it, really appreciate that 👍🏻
And sorry again for taking too long to review. I left some comments and suggestions, but it already looks great 🙂
src/Toml/Type/Printer.hs
Outdated
quotedText = show finalText | ||
finalText = foldl (\acc (ch, asciiCh) -> acc ++ getCh ch asciiCh) "" asciiArr | ||
xss = Text.unpack text | ||
asciiArr = zip xss $ asciiStatus xss | ||
getCh ch True = [ch] | ||
getCh ch False = printf "\\U%08x" ordChr :: String | ||
where | ||
ordChr = ord ch | ||
asciiStatus = map isAscii |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you please add types to all functions and values inside the where
block? Otherwise, it's a bit hard to understand the whole context and what is going on here.
@@ -307,8 +314,10 @@ genText = genNotEscape $ fmap Text.concat $ Gen.list (Range.constant 0 256) $ Ge | |||
, genPunctuation | |||
, genUniHex4Color | |||
, genUniHex8Color | |||
--, genUnicodeChar |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be uncommented back?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
genUnicodeChar
is the function that I wrote myself thinking that I might be helpful to add it into the testing phase. But if I enable it, the test will fail.
The reason why it fails is mentioned above but I can repeat it. Character č
is transformed into \U0000010d
but decoding it back will give \U0000010d
again, not č
so the encode . decode
tests will fail.
Currently, tests are only include escaped unicode characters in encode . decode
phase.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dariodsa Great work! I'm going to merge this PR 👍🏻
And I propose to create a separate issue regarding encode . decode
property-based tests for unicode characters to track this problem and figure out a fix for it eventually 🙂
Issue #334
File changed
src/Toml/Type/Printer.hs
test/Test/Toml/Gen.hs
Function in
Printer.hs
was changed in order to allow escaping unicode characters. Firstly escaping unicode characters are performed before calling functionshow
which will escape others like\n
.I implemented tests but they are not functional. The reason is that because currently char
č
is parsed into\U0000010d
and it can only be unparsed into\U0000010d
. That is the reason why those tests will fail (starting char is not equal to end). I am still wondering should I add that tests with different printing without \u or \U or this is good enough.