Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with
or
.
Download ZIP

Loading…

Decide whether Parsimonious is for Unicode, bytestrings, or both #31

Open
erikrose opened this Issue · 0 comments

1 participant

@erikrose
Owner

First, we should probably stop supporting the re.L flag; it's unreliable and worse than re.U, as http://docs.python.org/3/library/re.html observes.

In order to simplify things and make the API work uniformly across Python 2 and 3, I propose we adopt the convention from Python 3's re lib: grammars defined in Unicode can match only Unicode strings, and those defined by bytestrings can match only bytestrings. We drop support for the re.U flag, letting it be determined at Grammar construction time by what sort of string is passed in. Support re.A if you want, but I'd be content to make people spell out what they mean by \s, \w, and \d explicitly. (What about `\b'?)

To support the naive use of grammars, we can try to promote bytestrings to Unicode if an attempt is made to parse them with a Unicode grammar. But people defining grammars should know better.

Remember to address ParseError.line() and column(), which assume '\n' will be a bytestring in 2 and a Unicode in 3 atm.

@erikrose erikrose referenced this issue from a commit
@erikrose WIP. Try to figure out our Unicode/bytestring story. Go a bit too nut…
…s with the unicode=True bits. Ref #31.
2335f23
@erikrose erikrose added this to the 1.0 milestone
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Something went wrong with that request. Please try again.