Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cannot mix strings and tokens #22

Closed
GoogleCodeExporter opened this issue Apr 17, 2015 · 5 comments
Closed

cannot mix strings and tokens #22

GoogleCodeExporter opened this issue Apr 17, 2015 · 5 comments

Comments

@GoogleCodeExporter
Copy link

the code below

-----
from lepl import *

v = Token('[a-z]+') & Token(' +') & String()
v.parse('aaa "aaa"')
-----

gives the error

------
lepl.lexer.support.LexerError: The grammar contains a mix of Tokens and 
non-Token matchers at the top level.  If Tokens are used then non-token 
matchers that consume input must only appear "inside" Tokens.  The non-Token 
matchers include: Any(None); Literal('"'); Lookahead(Literal, True); 
Literal('"'); Literal('"'); Literal('\\').
------

trying to tokenize string fails as well

-------
from lepl import *

v = Token('[a-z]+') & Token(' +') & Token(String())
v.parse('aaa "aaa"')
-------

as the code above gives

-------
lepl.lexer.support.LexerError: A Token was specified with a matcher, but the 
matcher could not be converted to a regular expression: And(NfaRegexp, 
Transform, NfaRegexp)
--------


Original issue reported on code.google.com by wrob...@gmail.com on 26 Dec 2011 at 6:10

@GoogleCodeExporter
Copy link
Author

hi.  the main issue here isn't a bug - it's how lepl works.  you can either 
work in tokens, or not, but not both.  that's what the error message says.

trying to tokenize String fails because String is too complex for lepl to 
tokenize automatically.  it might be possible to write String so that it can be 
converted automatically, and i will add that to the list of things to do, but 
meantime you can simply define your own regular expression:

  myString = Regexp("'[^']*'")

or similar.

andrew

Original comment by acooke....@gmail.com on 1 Jan 2012 at 10:55

  • Changed state: WontFix

@GoogleCodeExporter
Copy link
Author

[deleted comment]

@GoogleCodeExporter
Copy link
Author

Well, I have defined my own string indeed

    string = Token('"[^"]+"') | Token("'[^']+'")

However I have no idea how to allow quoting of apostrophe or quotation
characters with backslash, i.e. "test\"string" or 'test\'string'.

Using Python regular expressions that would be

     r""""(([^"]|\")+)"|'(([^']|\')+)'"""

w

Original comment by wrob...@gmail.com on 1 Jan 2012 at 11:48

@GoogleCodeExporter
Copy link
Author

the syntax for regexps should be the same as python, except that capturing 
groups are not supported.  so you need to replace each (...) with (?:...)

andrew

Original comment by acooke....@gmail.com on 2 Jan 2012 at 12:11

@GoogleCodeExporter
Copy link
Author

Thanks for the tip.

The definition is as follows

    string = Token(r'"(?:[^"]|\\")+"') | Token(r"'(?:[^']|\\')+'")

But I wonder how above is different from SingleLineString?

If it is not different, then I would like to report that the following code 
fails

"""
from lepl import *

v = Token('[a-z]+') & Token(' +') & Token(SingleLineString())
v.parse('aaa "aaa"')
"""

w

Original comment by wrob...@gmail.com on 5 Jan 2012 at 6:53

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant