Unicode escape #354

dariodsa · 2020-12-09T20:40:42Z

Issue #334

File changed

src/Toml/Type/Printer.hs
test/Test/Toml/Gen.hs

Function in Printer.hs was changed in order to allow escaping unicode characters. Firstly escaping unicode characters are performed before calling function show which will escape others like \n.
I implemented tests but they are not functional. The reason is that because currently char č is parsed into \U0000010d and it can only be unparsed into \U0000010d. That is the reason why those tests will fail (starting char is not equal to end). I am still wondering should I add that tests with different printing without \u or \U or this is good enough.

*Toml> Toml.decode (Toml.text "a") $ Toml.encode (Toml.text "a") "č"
Right "\\U0000010d"

*Toml> Toml.decode (Toml.text "a") $ Toml.encode (Toml.text "a") "\\u010d"
Right "\\u010d"

chshersh · 2020-12-12T14:30:09Z

Hi @dariodsa! Thanks for working on this 🤗
It's a bit busy time, but we'll try to find time to review the PR.

chshersh

@dariodsa Awesome job! Thanks for working on it, really appreciate that 👍🏻
And sorry again for taking too long to review. I left some comments and suggestions, but it already looks great 🙂

src/Toml/Type/Printer.hs

chshersh · 2020-12-19T18:50:11Z

src/Toml/Type/Printer.hs

+        quotedText = show finalText
+        finalText = foldl (\acc (ch, asciiCh) -> acc ++ getCh ch asciiCh) "" asciiArr 
+        xss = Text.unpack text
+        asciiArr = zip xss $ asciiStatus xss
+        getCh ch True  = [ch]
+        getCh ch False = printf "\\U%08x" ordChr :: String
+          where
+            ordChr = ord ch
+        asciiStatus = map isAscii


Could you please add types to all functions and values inside the where block? Otherwise, it's a bit hard to understand the whole context and what is going on here.

src/Toml/Type/Printer.hs

chshersh · 2020-12-19T18:52:50Z

test/Test/Toml/Gen.hs

@@ -307,8 +314,10 @@ genText = genNotEscape $ fmap Text.concat $ Gen.list (Range.constant 0 256) $ Ge
    , genPunctuation
    , genUniHex4Color
    , genUniHex8Color
+    --, genUnicodeChar


Should this be uncommented back?

genUnicodeChar is the function that I wrote myself thinking that I might be helpful to add it into the testing phase. But if I enable it, the test will fail.
The reason why it fails is mentioned above but I can repeat it. Character č is transformed into \U0000010d but decoding it back will give \U0000010d again, not č so the encode . decode tests will fail.
Currently, tests are only include escaped unicode characters in encode . decode phase.

chshersh

@dariodsa Great work! I'm going to merge this PR 👍🏻
And I propose to create a separate issue regarding encode . decode property-based tests for unicode characters to track this problem and figure out a fix for it eventually 🙂

dariodsa added 7 commits November 14, 2020 19:32

[#334] parse and unparse tests

f922d9a

removed parsing and unparing tests

d1b48b7

[#334] showUnicodeText

0e81ac8

[#334] escaping unicode character as well as regular characters

7587a21

[#334] resolved issue with escaping regular unescaped chars

0349ff4

added tests, but they are not in use

2aace95

examples.hs revert to original content

52d68c5

dariodsa requested review from chshersh and vrom911 as code owners December 9, 2020 20:40

chshersh assigned dariodsa Dec 12, 2020

chshersh added pretty-printer Everything related to `Toml -> Text` test Testing (unit, properties) labels Dec 12, 2020

chshersh approved these changes Dec 19, 2020

View reviewed changes

[#334] changes requested by chshersh

fb68968

chshersh approved these changes Dec 27, 2020

View reviewed changes

chshersh merged commit ee5b8fa into main Dec 27, 2020

chshersh deleted the unicode_escape branch December 27, 2020 11:59

srid mentioned this pull request Aug 18, 2022

Encodes unicode characters with double backslash #408

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unicode escape #354

Unicode escape #354

dariodsa commented Dec 9, 2020 •

edited

Loading

chshersh commented Dec 12, 2020

chshersh left a comment

chshersh Dec 19, 2020

chshersh Dec 19, 2020

dariodsa Dec 19, 2020

chshersh left a comment

Unicode escape #354

Unicode escape #354

Conversation

dariodsa commented Dec 9, 2020 • edited Loading

File changed

chshersh commented Dec 12, 2020

chshersh left a comment

Choose a reason for hiding this comment

chshersh Dec 19, 2020

Choose a reason for hiding this comment

chshersh Dec 19, 2020

Choose a reason for hiding this comment

dariodsa Dec 19, 2020

Choose a reason for hiding this comment

chshersh left a comment

Choose a reason for hiding this comment

dariodsa commented Dec 9, 2020 •

edited

Loading