Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[#3] Implement basic TOML parser #12

Merged
merged 4 commits into from
Apr 4, 2018
Merged

[#3] Implement basic TOML parser #12

merged 4 commits into from
Apr 4, 2018

Conversation

chshersh
Copy link
Contributor

@chshersh chshersh commented Apr 3, 2018

This PR introduces basic implementation of TOML parser. Also, some types in Toml.Type module were refactored. This resulted in a lot of changes... Though, it works! You can checkout parseable example in test.toml file.

@chshersh chshersh added the parser Everything related to `Text -> Toml` label Apr 3, 2018
@chshersh chshersh self-assigned this Apr 3, 2018
@chshersh chshersh requested a review from vrom911 April 3, 2018 21:27
@chshersh
Copy link
Contributor Author

chshersh commented Apr 3, 2018

Argh, -XDerivingStrategies doesn't work on GHC-8.0.2... I guess I need to drop them in that case...

Copy link
Member

@vrom911 vrom911 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks cool! That's hell a huge work 👏
I've added a couple of comments and some questions..

) where

-- I hate default Prelude... Do I really need to import all this stuff manually?..
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

😆 give this man an universum

@@ -1,13 +1,184 @@
module Toml.Parser
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably write some module description for haddock

bareKeyP = lexeme $ Text.pack <$> bareStrP
where
bareStrP :: Parser String
bareStrP = some $ alphaNumChar <|> char '_' <|> char '-'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool that it could start with any of these symbols (at least I didn't see any restriction on it in the docs).
Also instead of char '_' <|> char '-' we can use oneOf ['_', '-'] 🙂

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, restricting to something like - can't be the first character will complicate things a lot... Though, -_-_-_- is a valid key in TOML, I hope people won't use such keys...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops, I forgot to use oneOf ['_', '-'] there... I guess I can silently fix this under one of the future PR :pepe:

bareStrP = some $ alphaNumChar <|> char '_' <|> char '-'

stringP :: Parser Text
stringP = lexeme $ Text.pack <$> (char '"' *> anyChar `manyTill` char '"')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I understand it also could be between '? Should we try it also here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, I'm not sure how to write such parser properly... Because ' is for literal strings. With ' I can write:

quoted   = 'Tom "Dubs" Preston-Werner'

But with " this needed to be write like this:

quoted   = "Tom \"Dubs\" Preston-Werner"

Actually, I'm not sure which behavior is currently implemented and what be the result of this parser... In "" strings I also need to support this:

\b         - backspace       (U+0008)
\t         - tab             (U+0009)
\n         - linefeed        (U+000A)
\f         - form feed       (U+000C)
\r         - carriage return (U+000D)
\"         - quote           (U+0022)
\\         - backslash       (U+005C)
\uXXXX     - unicode         (U+XXXX)
\UXXXXXXXX - unicode         (U+XXXXXXXX)

Not sure how to do this easilty and what comes out of the box...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, "Tom \"Dubs\" Preston-Werner" is not even parseable because current parser interprets \ as a separate character which makes sense. So I just need to replace " with ' to have parser of literal strings.

<|> True <$ text "true"

-- dateTimeP :: Parser DateTime
-- dateTimeP = error "Not implemented!"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That would be tough one..

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, Will create separate issue for this...

k <- keyP
text_ "="
uval <- valueP
case typeCheck uval of
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is cool decision to make it on this level! 👍

isThPref = isPrefixOf `on` unKey . thName

isPrefixOf :: Eq a => NonEmpty a -> NonEmpty a -> Bool
(x :| xs) `isPrefixOf` (y :| ys) = x == y && List.isPrefixOf xs ys
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't this the function from Data.List.NonEmpty?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I see it's not, it works with simple list as prefix 🤔

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vrom911 I was also surprised by this behavior. There's:

isPrefixOf :: [a] -> [a] -> Bool
isPrefixOf :: [a] -> NonEmpty a -> Bool

But somehow there's no

isPrefixOf :: NonEmpty a -> NonEmpty a -> Bool

🤔 🤷‍♂️

2Inner = +42

[table.name.1]
listInner."google.com" = [true, false]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I've seen this red highlight in toml repo also, why is that?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know. universum also had it. I guess it's just a github bug. Though I didn't google it...

@@ -26,13 +26,16 @@ library

build-depends: base >= 4.9 && < 5
, hashable
, megaparsec
, parser-combinators
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this needed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

parser-combinators is a dependency for megaparsec. Some parser combinators are not exported by megaparsec. Instead more general versions of them are in the parser-combinators packages. Specifically, I need these ones:

sepBy1   :: MonadPlus m => m a -> m sep -> m (NonEmpty a)
between  :: Applicative m => m open -> m close -> m a -> m a
sepBy    :: Alternative m => m a -> m sep -> m [a]
manyTill :: Alternative m => m a -> m end -> m [a]

Copy link
Member

@vrom911 vrom911 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks cool! I only wanted to ask about will we continue to update changelog if it's only the initial version.. If yes then we can add a few words there, but I' not sure..

src/Toml/Type.hs Outdated
@@ -122,7 +120,9 @@ arr6 = [ 1, 2.0 ] # INVALID
-}
Array :: [Value t] -> Value 'TArray

-- | Untyped 'Value'.
-- TODO: move into Toml.Type.Internal module then?..
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's good point, because it's exported at the moment..

@chshersh
Copy link
Contributor Author

chshersh commented Apr 4, 2018

@vrom911 Thanks for your review! I guess we should update changelog and reflect every issue we did in there. Will write it. Also I can move UValue into separate module.

Copy link
Member

@vrom911 vrom911 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That looks great! 👍

@@ -120,7 +120,9 @@ arr6 = [ 1, 2.0 ] # INVALID
-}
Array :: [Value t] -> Value 'TArray

-- | Untyped 'Value'.
-- TODO: move into Toml.Type.Internal module then?.. But it uses 'DateTime' which is not internal...
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

😞

@chshersh chshersh merged commit f087a93 into master Apr 4, 2018
@chshersh chshersh deleted the chshersh/3-parser branch April 4, 2018 15:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
parser Everything related to `Text -> Toml`
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants