Using Grammars as Parsers seems to fail. #44

dan-lugg · 2021-07-30T15:39:00Z

Seeing that Grammar<T> extends Parser<T>, I figured I should be able to delegate to a Grammar<T>, such as:

val exp: Parser<Exp> by ExpGrammar() // where ExpGrammar is a Grammar<Exp>

However, it doesn't seem to behave as expected. The following is a small SSCCE to demonstrate:

package com.example

import com.github.h0tk3y.betterParse.combinators.and
import com.github.h0tk3y.betterParse.combinators.map
import com.github.h0tk3y.betterParse.combinators.separatedTerms
import com.github.h0tk3y.betterParse.combinators.skip
import com.github.h0tk3y.betterParse.grammar.Grammar
import com.github.h0tk3y.betterParse.grammar.parseToEnd
import com.github.h0tk3y.betterParse.lexer.TokenMatch
import com.github.h0tk3y.betterParse.lexer.literalToken
import com.github.h0tk3y.betterParse.lexer.regexToken
import com.github.h0tk3y.betterParse.parser.Parser

data class Inner(
    val names: List<String>,
)

data class Outer(
    val name: String,
    val inner: Inner,
)

abstract class TestGrammarBase<T> : Grammar<T>()
{
    val idToken by regexToken("\\w+")

    val spaceToken by regexToken("\\s*", true)

    val commaToken by literalToken(",")

    val lBraceToken by literalToken("{")

    val rBraceToken by literalToken("}")
}

class InnerTestGrammar : TestGrammarBase<Inner>()
{
    override val rootParser: Parser<Inner> by separatedTerms(idToken, commaToken, true) map inner@{ tokenMatches ->
        return@inner Inner(
            names = tokenMatches.map(TokenMatch::text),
        )
    }
}

class OuterTestGrammar : TestGrammarBase<Outer>()
{
    val innerTestParser by InnerTestGrammar()

    override val rootParser: Parser<Outer> by idToken and skip(lBraceToken) and innerTestParser and skip(rBraceToken) map outer@{ (tokenMatch, inner) ->
        return@outer Outer(
            name = tokenMatch.text,
            inner = inner,
        )
    }
}

fun main()
{
    val innerTest1 = "X, Y, Z"
    val outerTest1 = "A { }"
    val outerTest2 = "A { X, Y, Z }"

    val innerTestGrammar = InnerTestGrammar()
    val outerTestGrammar = OuterTestGrammar()

    innerTestGrammar.parseToEnd(innerTest1).also(::println)
    outerTestGrammar.parseToEnd(outerTest1).also(::println)
    outerTestGrammar.parseToEnd(outerTest2).also(::println)
}

And the output:

Inner(names=[X, Y, Z])
Outer(name=A, inner=Inner(names=[]))
Exception in thread "main" com.github.h0tk3y.betterParse.parser.ParseException: Could not parse input: MismatchedToken(expected=rBraceToken (}), found=idToken@5 for "X" at 4 (1:5))
	at com.github.h0tk3y.betterParse.parser.ParserKt.toParsedOrThrow(Parser.kt:92)
	at com.github.h0tk3y.betterParse.parser.ParserKt.parseToEnd(Parser.kt:29)
	at com.github.h0tk3y.betterParse.grammar.GrammarKt.parseToEnd(Grammar.kt:70)
	at com.example.__langKt.main(__lang.kt:68)
	at com.example.__langKt.main(__lang.kt)

As you can see, the third attempt to parse the grammar-combined input of A { X, Y, Z } errors out. InnerTestGrammar and OuterTestGrammar, having extended from TestGrammarBase can see the shared member tokens/parsers, but seem to get confused (or perhaps I'm confused).

Is this not an intended use of Grammar?

The text was updated successfully, but these errors were encountered:

zacharygrafton · 2021-09-16T03:31:58Z

@dan-lugg

I'm not sure if you ever figured this out, but I was able to make this work. It does seem that the tokens from the reference grammar aren't being added to the current grammar. I'm not sure if this a proper work around but here is an example:

object IPv4Grammar : Grammar<IPv4Address>() {
  // Don't actually do this, I haven't figured out how to control look ahead, so this is a hack
  val quad by regexToken("25[0-5]|2[0-4][0-9]|1[0-9][0-9]|[1-9][0-9]|[0-9]")
  val dot by literalToken(".")
  val qnum by quad map { it.text.toUByte() }

  override val rootParser by (quad and skip(dot) and quad and skip(dot) and quad and skip(dot) and quad)
}

object ExampleGrammar : Grammar<IPv4Address>() {
  val header by literalToken("ip4:")
  // Reference the other grammar here
  val ip by IPv4Grammar
  
  // This is the magic. I think you might have to reference all your declared tokens here
  override val tokens = listOf(header) + IPv4Grammar.tokens
  override val rootParser by skip(header) and ip
}

I would assume that since Grammar inherits from Parser that this should work without manually merging tokens, so I'm not sure if this is intended behavior or just a bug.

dan-lugg · 2021-09-18T13:41:59Z

Good find @zacharygrafton! I think the "fix" to enable this behavior automatically would then be around here:

better-parse/src/commonMain/kotlin/com/github/h0tk3y/betterParse/grammar/Grammar.kt

Lines 40 to 53 in 29ed5f2

    
           protected operator fun <T> Parser<T>.provideDelegate(thisRef: Grammar<*>, property: KProperty<*>): Parser<T> = 
        
               also { _parsers.add(it) } 
        
           protected operator fun <T> Parser<T>.getValue(thisRef: Grammar<*>, property: KProperty<*>): Parser<T> = this 
        
           protected operator fun Token.provideDelegate(thisRef: Grammar<*>, property: KProperty<*>): Token = 
        
               also { 
        
                   if (it.name == null) { 
        
                       it.name = property.name 
        
                   } 
        
                   _tokens.add(it) 
        
               } 
        
           protected operator fun Token.getValue(thisRef: Grammar<*>, property: KProperty<*>): Token = this

We would want to add:

    protected operator fun <T> Grammar<T>.provideDelegate(thisRef: Grammar<*>, property: KProperty<*>): Grammar<T> = this
        .also { _tokens.add(it.tokens) }
        .also { _parsers.add(it) }

    protected operator fun <T> Grammar<T>.getValue(thisRef: Grammar<*>, property: KProperty<*>): Grammar<T> = this

This should support using Grammar<T> as a Parser<T> in another grammar, with the referenced grammar's tokens added as expected. Will have to fork and test, hopefully this is the way.

zacharygrafton · 2021-09-20T11:49:59Z

The suggested code seems to fix both use cases. I had the first extension function in the fork I was working off of to correct the issue, but I was missing the second extension function. Adding the second function makes my expanded test suite pass. Nice work.

dan-lugg · 2021-09-20T15:27:32Z

@zacharygrafton I'm not actually having success with my proposed solution 🤔 Do you have an updated fork with the working tests?

zacharygrafton · 2021-09-21T17:17:40Z

@dan-lugg Apparently you are correct. I ran ./gradlew test instead of ./gradlew allTests yesterday. Apparently my tests are coming back broken as well.

This is the fork I have that fixes the way I combine parsers, however, it is still broken in the case of inheritance. I think it has something to do with the token matching priority. Tokens are matched based on the order in which they are added to a grammar. In the inheritance case, all the tokens end up in the same list, but in your original example, the InnerTestGrammar tokens never actually get matched because the OuterTestGrammar tokens are placed in the list prior to the InnerTestGrammar tokens.

After looking at it further, I'm not sure if this even fixes the example I provided. It passes my test case, but I think that is because the tokens don't really overlap. In the case you provided, every token overlaps since they are shared. We may need to look closer at DefaultTokenizer and provide a different implementation of Tokenizer when combining grammars. Another approach might be to replace tokens on parsers during the call to Grammar<T>.provideDelegate with single instances. I'm not entirely sure this is possible... I'm definitely open to ideas and trying to make this work.

xetra11 · 2022-07-27T14:43:05Z

Any news on this?

I have script grammar which I separated into two Grammars.

Here is the one using another one called ScriptStatementGrammar() which I used as it felt intentionally right. However the matcher fails on the first token of the ScriptStatementGrammar which tells me it does not work this way :/

class LogicScriptFileGrammar : Grammar<ScriptedLogic<Civilisation>>() {
    private val space by regexToken("\\s+", ignore = true)
    private val newLine by literalToken("\n", ignore = true)

    private val scriptKeyword by literalToken("logic")
    private val scriptName by regexToken("^[a-z_]+")

    private val statementParser by ScriptStatementGrammar()
    private val scriptHeadParser by -scriptKeyword * scriptName use { ScriptHead(text) }

    override val rootParser by (scriptHeadParser * statementParser) map { (scriptHead, statements) ->
        ScriptedLogic<Civilisation>(scriptHead.name) { context ->
            statements.forEach { statement ->
                if (statement.context == "actors") {
                    if (statement.mutationType == "urge") {
                        if (statement.mutationTarget == "eat") {
                            if (statement.mutationOperation == "plus") {
                                context.actors.forEach {
                                    it.urges.increaseUrge(
                                        statement.mutationTarget,
                                        statement.mutationOperationArgument.toDouble()
                                    )
                                }
                            }
                        }
                    }
                }
            }
        }
    }

    internal class ScriptHead(
        val name: String
    )
}

dan-lugg · 2023-04-06T05:25:04Z

I started working on a parser combinator library (implementation heavily influenced from here) to build on / extend the capabilities of this awesomeness, but it's sitting abandoned because life.

Really hope @h0tk3y can come back to this again.

I'll fork and make some PRs when I have some time.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Using Grammars as Parsers seems to fail. #44

Using Grammars as Parsers seems to fail. #44

dan-lugg commented Jul 30, 2021 •

edited

zacharygrafton commented Sep 16, 2021

dan-lugg commented Sep 18, 2021

zacharygrafton commented Sep 20, 2021

dan-lugg commented Sep 20, 2021 •

edited

zacharygrafton commented Sep 21, 2021

xetra11 commented Jul 27, 2022

dan-lugg commented Apr 6, 2023

Using Grammars as Parsers seems to fail. #44

Using Grammars as Parsers seems to fail. #44

Comments

dan-lugg commented Jul 30, 2021 • edited

zacharygrafton commented Sep 16, 2021

dan-lugg commented Sep 18, 2021

zacharygrafton commented Sep 20, 2021

dan-lugg commented Sep 20, 2021 • edited

zacharygrafton commented Sep 21, 2021

xetra11 commented Jul 27, 2022

dan-lugg commented Apr 6, 2023

dan-lugg commented Jul 30, 2021 •

edited

dan-lugg commented Sep 20, 2021 •

edited