Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How should Grammar objects be composed? #23

Closed
KushalP opened this issue Apr 20, 2020 · 2 comments
Closed

How should Grammar objects be composed? #23

KushalP opened this issue Apr 20, 2020 · 2 comments

Comments

@KushalP
Copy link

KushalP commented Apr 20, 2020

The goal

I have a file that looks like this:

Header
Line
Line
Line
Footer

And I would like to parse it into the following container:

data class File(
  val header: Header,
  val lines: List<Line>,
  val footer: Footer
)

The problem

I have written a grammar for each of the different variants, i.e Grammar<Header, Grammar<Line>, and Grammar<Footer>. There is some similarity in how these different lines look. How can I compose these grammars together? The variants are split on a newline \n character.

Any help would be much appreciated.

@KushalP KushalP changed the title How should Grammar objects be composed How should Grammar objects be composed? Apr 20, 2020
@h0tk3y
Copy link
Owner

h0tk3y commented Apr 22, 2020

It is possible to compose grammars by composing their rootParsers and all of their tokens in another pair of a tokenizer and parser or in a grammar.

The main trick here is that you can override val tokens in a grammar to make it recognize any tokens you specify.

Note that if the grammars for header, line, and footer have any tokens that would be ambiguous together, you need to move those tokens out of the grammars and reuse them in the grammars.

Here's a simplified example of how I achieved this for two grammars. Composing three would be similar. :)

data class Header(val text: String)
data class Line(val words: List<String>)
data class Document(val header: Header, val lines: List<Line>)

val wordToken = token("\\w+\\b")
val ws = token("\\s+", ignore = true)
val sharedTokens = listOf(wordToken, ws)

object HeaderGrammar : Grammar<Header>() {
    private val tripleDash by literalToken("###")

    override val tokens: List<Token>
        get() = super.tokens + sharedTokens

    override val rootParser: Parser<Header> by (-tripleDash * wordToken).use { Header(text) }
}

object LineGrammar : Grammar<Line>() {
    override val tokens: List<Token>
        get() = super.tokens + sharedTokens

    override val rootParser: Parser<Line> by separatedTerms(wordToken, ws).use { Line(map { it.text }) }
}

object DocumentGrammar : Grammar<Document>() {
    private val NEWLINE by token("\n")

    override val tokens: List<Token>
        get() = (super.tokens + sharedTokens + HeaderGrammar.tokens + LineGrammar.tokens).distinct()

    override val rootParser: Parser<Document> by
    (HeaderGrammar.rootParser * -NEWLINE * (separatedTerms(LineGrammar.rootParser, NEWLINE)))
            .map { Document(it.t1, it.t2) }
}

class MyTestClass {
    @Test
    fun main() {
        val text = """
            ### Header
            x y
            y z
        """.trimIndent()

        println(DocumentGrammar.parseToEnd(text))
        // prints: Document(header=Header(text=Header), lines=[Line(words=[x, y]), Line(words=[y, z])])
    }
}

@KushalP
Copy link
Author

KushalP commented Apr 23, 2020

Thanks for the explanation. This did the trick.

@KushalP KushalP closed this as completed Apr 23, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants