Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pQuotedString misinterprets backslashes due to reads trying to parse Haskell escape characters #2

Open
bwbaugh opened this issue Oct 23, 2017 · 1 comment

Comments

@bwbaugh
Copy link

bwbaugh commented Oct 23, 2017

The issue is that pQuotedString uses reads, which tries to interpret a quoted string as a Haskell style string literal, instead of using a Parsec parser.

pQuotedString :: Parsec String u String
pQuotedString = do
input <- getInput
case reads input of
((str, rest):_) -> const str <$> setInput rest
_ -> empty

For example, trying to parse a backslash in a quoted string:

> putStrLn "\"\\\""
"\"

fails:

> (reads :: ReadS String) "\"\\\""
[]

This is because it’s trying to interpret Haskell specific escape codes, escape characters, and numeric escapes:

> (reads :: ReadS String) "\"\\123\""
[("{","")]

See §2.6 for more info. https://www.haskell.org/onlinereport/lexemes.html

One solution might be to forgo any unescaping and just return the raw string. Another solution might be to make the parsing and printing understand NeXTStep style escaping.

Background

I‘m looking into building a tool (gnarf/osx-compose-key#17) that parses DefaultKeyBinding.dict files, which uses backslashes/escaping heavily, as a learning exercise.

@bwbaugh
Copy link
Author

bwbaugh commented Oct 23, 2017

I’ve come up with a very hacky workaround just to unblock myself to see if I can use this library on that file that I’m interested in. A proper solution would probably use a parser, perhaps similar to https://stackoverflow.com/q/24106314/1988505.

Patch file

Apply with:

$ patch -p1 < «patchfile»
--- a/Text/NSPlist/Parsec.hs	2012-09-30 11:15:43.000000000 -0400
+++ b/Text/NSPlist/Parsec.hs	2017-10-22 21:52:31.000000000 -0400
@@ -3,6 +3,8 @@
   ) where
 
 
+import Data.List (intercalate)
+import Data.List.Split (splitOn)
 import Data.Word (Word8)
 import Numeric (readHex)
 import Control.Applicative ((<$>), (<*), (*>), (<*>), pure, (<|>), empty)
@@ -69,10 +71,18 @@
 
 pQuotedString :: Parsec String u String
 pQuotedString = do
-  input <- getInput
+  input <- hackyReplace <$> getInput
   case reads input of
-       ((str, rest):_) -> const str <$> setInput rest
+       ((str, rest):_) -> const (hackyUnreplace str) <$> setInput (hackyUnreplace rest)
        _               -> empty
+  where
+    replacements    =
+        [ ("\\", "__backslash__")
+        , ("\\\"", "__escaped_double_quote__")
+        ]
+    hackyReplace   = flip (foldr (uncurry       replace)) replacements
+    hackyUnreplace = flip (foldr (uncurry (flip replace))) replacements
+    replace old new = intercalate new . splitOn old
 
 -- | Parses data that is represented by hexadecimal codes
 pBinary :: Parsec String u NSPlistValue
--- a/nextstep-plist.cabal	2017-10-22 21:53:18.000000000 -0400
+++ b/nextstep-plist.cabal	2017-10-22 21:56:05.000000000 -0400
@@ -21,7 +21,7 @@
 
 
 library
-  build-depends:	  base >= 4 && < 5, QuickCheck >= 2, pretty, parsec >= 3
+  build-depends:	  base >= 4 && < 5, QuickCheck >= 2, pretty, parsec >= 3, split
   ghc-options:            -Wall
   exposed-modules:        Text.NSPlist,
                           Text.NSPlist.Pretty,

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant