Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Using Grammars as Parsers seems to fail. #44

Open
dan-lugg opened this issue Jul 30, 2021 · 7 comments
Open

Using Grammars as Parsers seems to fail. #44

dan-lugg opened this issue Jul 30, 2021 · 7 comments

Comments

@dan-lugg
Copy link

dan-lugg commented Jul 30, 2021

Seeing that Grammar<T> extends Parser<T>, I figured I should be able to delegate to a Grammar<T>, such as:

val exp: Parser<Exp> by ExpGrammar() // where ExpGrammar is a Grammar<Exp>

However, it doesn't seem to behave as expected. The following is a small SSCCE to demonstrate:

package com.example

import com.github.h0tk3y.betterParse.combinators.and
import com.github.h0tk3y.betterParse.combinators.map
import com.github.h0tk3y.betterParse.combinators.separatedTerms
import com.github.h0tk3y.betterParse.combinators.skip
import com.github.h0tk3y.betterParse.grammar.Grammar
import com.github.h0tk3y.betterParse.grammar.parseToEnd
import com.github.h0tk3y.betterParse.lexer.TokenMatch
import com.github.h0tk3y.betterParse.lexer.literalToken
import com.github.h0tk3y.betterParse.lexer.regexToken
import com.github.h0tk3y.betterParse.parser.Parser

data class Inner(
    val names: List<String>,
)

data class Outer(
    val name: String,
    val inner: Inner,
)

abstract class TestGrammarBase<T> : Grammar<T>()
{
    val idToken by regexToken("\\w+")

    val spaceToken by regexToken("\\s*", true)

    val commaToken by literalToken(",")

    val lBraceToken by literalToken("{")

    val rBraceToken by literalToken("}")
}

class InnerTestGrammar : TestGrammarBase<Inner>()
{
    override val rootParser: Parser<Inner> by separatedTerms(idToken, commaToken, true) map inner@{ tokenMatches ->
        return@inner Inner(
            names = tokenMatches.map(TokenMatch::text),
        )
    }
}

class OuterTestGrammar : TestGrammarBase<Outer>()
{
    val innerTestParser by InnerTestGrammar()

    override val rootParser: Parser<Outer> by idToken and skip(lBraceToken) and innerTestParser and skip(rBraceToken) map outer@{ (tokenMatch, inner) ->
        return@outer Outer(
            name = tokenMatch.text,
            inner = inner,
        )
    }
}

fun main()
{
    val innerTest1 = "X, Y, Z"
    val outerTest1 = "A { }"
    val outerTest2 = "A { X, Y, Z }"

    val innerTestGrammar = InnerTestGrammar()
    val outerTestGrammar = OuterTestGrammar()

    innerTestGrammar.parseToEnd(innerTest1).also(::println)
    outerTestGrammar.parseToEnd(outerTest1).also(::println)
    outerTestGrammar.parseToEnd(outerTest2).also(::println)
}

And the output:

Inner(names=[X, Y, Z])
Outer(name=A, inner=Inner(names=[]))
Exception in thread "main" com.github.h0tk3y.betterParse.parser.ParseException: Could not parse input: MismatchedToken(expected=rBraceToken (}), found=idToken@5 for "X" at 4 (1:5))
	at com.github.h0tk3y.betterParse.parser.ParserKt.toParsedOrThrow(Parser.kt:92)
	at com.github.h0tk3y.betterParse.parser.ParserKt.parseToEnd(Parser.kt:29)
	at com.github.h0tk3y.betterParse.grammar.GrammarKt.parseToEnd(Grammar.kt:70)
	at com.example.__langKt.main(__lang.kt:68)
	at com.example.__langKt.main(__lang.kt)

As you can see, the third attempt to parse the grammar-combined input of A { X, Y, Z } errors out. InnerTestGrammar and OuterTestGrammar, having extended from TestGrammarBase can see the shared member tokens/parsers, but seem to get confused (or perhaps I'm confused).

Is this not an intended use of Grammar?

@zacharygrafton
Copy link

@dan-lugg

I'm not sure if you ever figured this out, but I was able to make this work. It does seem that the tokens from the reference grammar aren't being added to the current grammar. I'm not sure if this a proper work around but here is an example:

object IPv4Grammar : Grammar<IPv4Address>() {
  // Don't actually do this, I haven't figured out how to control look ahead, so this is a hack
  val quad by regexToken("25[0-5]|2[0-4][0-9]|1[0-9][0-9]|[1-9][0-9]|[0-9]")
  val dot by literalToken(".")
  val qnum by quad map { it.text.toUByte() }

  override val rootParser by (quad and skip(dot) and quad and skip(dot) and quad and skip(dot) and quad)
}

object ExampleGrammar : Grammar<IPv4Address>() {
  val header by literalToken("ip4:")
  // Reference the other grammar here
  val ip by IPv4Grammar
  
  // This is the magic. I think you might have to reference all your declared tokens here
  override val tokens = listOf(header) + IPv4Grammar.tokens
  override val rootParser by skip(header) and ip
}

I would assume that since Grammar inherits from Parser that this should work without manually merging tokens, so I'm not sure if this is intended behavior or just a bug.

@dan-lugg
Copy link
Author

Good find @zacharygrafton! I think the "fix" to enable this behavior automatically would then be around here:

protected operator fun <T> Parser<T>.provideDelegate(thisRef: Grammar<*>, property: KProperty<*>): Parser<T> =
also { _parsers.add(it) }
protected operator fun <T> Parser<T>.getValue(thisRef: Grammar<*>, property: KProperty<*>): Parser<T> = this
protected operator fun Token.provideDelegate(thisRef: Grammar<*>, property: KProperty<*>): Token =
also {
if (it.name == null) {
it.name = property.name
}
_tokens.add(it)
}
protected operator fun Token.getValue(thisRef: Grammar<*>, property: KProperty<*>): Token = this

We would want to add:

    protected operator fun <T> Grammar<T>.provideDelegate(thisRef: Grammar<*>, property: KProperty<*>): Grammar<T> = this
        .also { _tokens.add(it.tokens) }
        .also { _parsers.add(it) }

    protected operator fun <T> Grammar<T>.getValue(thisRef: Grammar<*>, property: KProperty<*>): Grammar<T> = this

This should support using Grammar<T> as a Parser<T> in another grammar, with the referenced grammar's tokens added as expected. Will have to fork and test, hopefully this is the way.

@zacharygrafton
Copy link

The suggested code seems to fix both use cases. I had the first extension function in the fork I was working off of to correct the issue, but I was missing the second extension function. Adding the second function makes my expanded test suite pass. Nice work.

@dan-lugg
Copy link
Author

dan-lugg commented Sep 20, 2021

@zacharygrafton I'm not actually having success with my proposed solution 🤔 Do you have an updated fork with the working tests?

@zacharygrafton
Copy link

@dan-lugg Apparently you are correct. I ran ./gradlew test instead of ./gradlew allTests yesterday. Apparently my tests are coming back broken as well.

This is the fork I have that fixes the way I combine parsers, however, it is still broken in the case of inheritance. I think it has something to do with the token matching priority. Tokens are matched based on the order in which they are added to a grammar. In the inheritance case, all the tokens end up in the same list, but in your original example, the InnerTestGrammar tokens never actually get matched because the OuterTestGrammar tokens are placed in the list prior to the InnerTestGrammar tokens.

After looking at it further, I'm not sure if this even fixes the example I provided. It passes my test case, but I think that is because the tokens don't really overlap. In the case you provided, every token overlaps since they are shared. We may need to look closer at DefaultTokenizer and provide a different implementation of Tokenizer when combining grammars. Another approach might be to replace tokens on parsers during the call to Grammar<T>.provideDelegate with single instances. I'm not entirely sure this is possible... I'm definitely open to ideas and trying to make this work.

@xetra11
Copy link

xetra11 commented Jul 27, 2022

Any news on this?

I have script grammar which I separated into two Grammars.

Here is the one using another one called ScriptStatementGrammar() which I used as it felt intentionally right. However the matcher fails on the first token of the ScriptStatementGrammar which tells me it does not work this way :/

class LogicScriptFileGrammar : Grammar<ScriptedLogic<Civilisation>>() {
    private val space by regexToken("\\s+", ignore = true)
    private val newLine by literalToken("\n", ignore = true)

    private val scriptKeyword by literalToken("logic")
    private val scriptName by regexToken("^[a-z_]+")

    private val statementParser by ScriptStatementGrammar()
    private val scriptHeadParser by -scriptKeyword * scriptName use { ScriptHead(text) }

    override val rootParser by (scriptHeadParser * statementParser) map { (scriptHead, statements) ->
        ScriptedLogic<Civilisation>(scriptHead.name) { context ->
            statements.forEach { statement ->
                if (statement.context == "actors") {
                    if (statement.mutationType == "urge") {
                        if (statement.mutationTarget == "eat") {
                            if (statement.mutationOperation == "plus") {
                                context.actors.forEach {
                                    it.urges.increaseUrge(
                                        statement.mutationTarget,
                                        statement.mutationOperationArgument.toDouble()
                                    )
                                }
                            }
                        }
                    }
                }
            }
        }
    }

    internal class ScriptHead(
        val name: String
    )
}

@dan-lugg
Copy link
Author

dan-lugg commented Apr 6, 2023

I started working on a parser combinator library (implementation heavily influenced from here) to build on / extend the capabilities of this awesomeness, but it's sitting abandoned because life.

Really hope @h0tk3y can come back to this again.

I'll fork and make some PRs when I have some time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants