Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unicode escape #354

Merged
merged 8 commits into from
Dec 27, 2020
Merged

Unicode escape #354

merged 8 commits into from
Dec 27, 2020

Conversation

dariodsa
Copy link
Collaborator

@dariodsa dariodsa commented Dec 9, 2020

Issue #334

File changed

  • src/Toml/Type/Printer.hs
  • test/Test/Toml/Gen.hs

Function in Printer.hs was changed in order to allow escaping unicode characters. Firstly escaping unicode characters are performed before calling function show which will escape others like \n.
I implemented tests but they are not functional. The reason is that because currently char č is parsed into \U0000010d and it can only be unparsed into \U0000010d. That is the reason why those tests will fail (starting char is not equal to end). I am still wondering should I add that tests with different printing without \u or \U or this is good enough.

*Toml> Toml.decode (Toml.text "a") $ Toml.encode (Toml.text "a") "č"
Right "\\U0000010d"
*Toml> Toml.decode (Toml.text "a") $ Toml.encode (Toml.text "a") "\\u010d"
Right "\\u010d"

@chshersh chshersh added pretty-printer Everything related to `Toml -> Text` test Testing (unit, properties) labels Dec 12, 2020
@chshersh
Copy link
Contributor

Hi @dariodsa! Thanks for working on this 🤗
It's a bit busy time, but we'll try to find time to review the PR.

Copy link
Contributor

@chshersh chshersh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dariodsa Awesome job! Thanks for working on it, really appreciate that 👍🏻
And sorry again for taking too long to review. I left some comments and suggestions, but it already looks great 🙂

src/Toml/Type/Printer.hs Outdated Show resolved Hide resolved
Comment on lines 181 to 189
quotedText = show finalText
finalText = foldl (\acc (ch, asciiCh) -> acc ++ getCh ch asciiCh) "" asciiArr
xss = Text.unpack text
asciiArr = zip xss $ asciiStatus xss
getCh ch True = [ch]
getCh ch False = printf "\\U%08x" ordChr :: String
where
ordChr = ord ch
asciiStatus = map isAscii
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please add types to all functions and values inside the where block? Otherwise, it's a bit hard to understand the whole context and what is going on here.

src/Toml/Type/Printer.hs Outdated Show resolved Hide resolved
src/Toml/Type/Printer.hs Show resolved Hide resolved
src/Toml/Type/Printer.hs Outdated Show resolved Hide resolved
@@ -307,8 +314,10 @@ genText = genNotEscape $ fmap Text.concat $ Gen.list (Range.constant 0 256) $ Ge
, genPunctuation
, genUniHex4Color
, genUniHex8Color
--, genUnicodeChar
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be uncommented back?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

genUnicodeChar is the function that I wrote myself thinking that I might be helpful to add it into the testing phase. But if I enable it, the test will fail.
The reason why it fails is mentioned above but I can repeat it. Character č is transformed into \U0000010d but decoding it back will give \U0000010d again, not č so the encode . decode tests will fail.
Currently, tests are only include escaped unicode characters in encode . decode phase.

Copy link
Contributor

@chshersh chshersh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dariodsa Great work! I'm going to merge this PR 👍🏻
And I propose to create a separate issue regarding encode . decode property-based tests for unicode characters to track this problem and figure out a fix for it eventually 🙂

@chshersh chshersh merged commit ee5b8fa into main Dec 27, 2020
@chshersh chshersh deleted the unicode_escape branch December 27, 2020 11:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
pretty-printer Everything related to `Toml -> Text` test Testing (unit, properties)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants