Skip to content
This repository has been archived by the owner on May 16, 2022. It is now read-only.

Loco Question, MatchParser #4

Closed
GDmac opened this issue Sep 25, 2012 · 5 comments
Closed

Loco Question, MatchParser #4

GDmac opened this issue Sep 25, 2012 · 5 comments

Comments

@GDmac
Copy link

GDmac commented Sep 25, 2012

I am playing with a Matchparser but as somewhat newbie to parsing and lexing, i am wondering if there is place for such a parser in the library, or that it might be totally beyond the scope of loco.

The idea is that instead of using a ConcParser for every possible known xml or html tag <p <h etc. just fetch matching pairs.

new MatchParser( array(
   '<', 'name', '>','content','</', 'name', '>',
   1, 4 // should match and not be nullable
))
//...

/*
<channel>
   <item></item>
   <xxx></xxx>
</channel>
*/

I made a copy of the ConcParser where two internals have to match. During construct it is checked that both internals are identical, and during parsing the two args[] should be identical, otherwise throw an exception.

I haven't found out an easy way how to check that both the matching internals should not be nullable.

The MatchParser can be found in this gist, as i didn't want to do a pull request, before knowing if this type of parser might be out of order.

https://gist.github.com/3783791

@GDmac
Copy link
Author

GDmac commented Sep 25, 2012

also, i bump into parsing completed prematurely exceptions on stray greater-than signs "<" and i have no clue on how to allow them (much like a browser does just displays them). once they're matched in 'content', they can't be part of the closing tag anymore.

@qntm
Copy link
Owner

qntm commented Sep 27, 2012

My preferred solution to this problem is just this:

$name = new RegexParser("#^[a-zA-Z_][a-zA-Z0-9_]*#");
$content = new RegexParser("#^[^<>&]*#"); # or whatever

$tag = new ConcParser(
    array(
        new StringParser("<"),
        $name,
        new StringParser(">"),
        $content,
        new StringParser("</"),
        $name,
        new StringParser(">")
    ),
    function($lt1, $name1, $gt1, $content, $lt2, $name2, $gt2) {
        if($name1 !== $name2) {
            throw new ParseFailureException("Close tag name ".$name2." doesn't match open tag name ".$name1, 0, "");
        }
        return new Tag($name1, $content);
    }
);

print_r($tag->parse("<p>what</p>")); # ok
$tag->parse("<h1>what</h2>");        # exception

(Obviously you'd need to implement a Tag class yourself, and come up with a better $content Parser.)

However, I do see that this is undesirable, because I'm having to throw a ParseFailureException, which demands both an index and a string to be supplied as arguments when instantiated. We don't have access to this information in this case, which means I'm passing 0 and "" respectively instead just as stopgaps. It's almost certainly possible to provide the index and string to the ConcParser callback as arguments somehow, but I'll have to do some thinking before I settle on a good solution to this problem. (It may be that just making those two attributes optional when creating a ParseFailureException is a possibility.)

@qntm
Copy link
Owner

qntm commented Sep 27, 2012

I don't see what use a MatchParser provides anyway. Surely you still need to create one MatchParser for every possible tag, which means it doesn't save any effort over doing the same with ConcParsers.

@qntm qntm closed this as completed Sep 27, 2012
@GDmac
Copy link
Author

GDmac commented Oct 1, 2012

@ferno that's almost exactly what i did, but inside the parser. Have you seen the Gist?

For instance the simpleComment example will fail on a h4 or h6 tag. A match-parser can fetch valid (xml) tags syntax and your callback can discard or ignore them or whatever.

@qntm
Copy link
Owner

qntm commented Oct 1, 2012

Yes, I did see the Gist. The point I was making with my example was that
the functionality you're asking for is already available in Loco, in a much
simpler way than your MatchParser. Therefore, I don't see a good reason to
add this new class to the library.

On 1 October 2012 06:42, GDmac notifications@github.com wrote:

@ferno https://github.com/ferno that's almost exactly what i did, but
inside the parser. Have you seen the Gist?

For instance the simpleComment example will fail on a h4 or h6 tag. A
match-parser can fetch valid (xml) tags syntax and your callback can
discard or ignore them or whatever.


Reply to this email directly or view it on GitHubhttps://github.com//issues/4#issuecomment-9022320.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

2 participants