Are there any plans to do something with, I'd say, inconsistency, when [^] matches 2 chars utf-16 emoji, for example, but yypushback(1) will push us back by 1 char only, in the middle of the pair.
Or this is intentional behavior and in this case, it would be nice to mention this in documentation.
The text was updated successfully, but these errors were encountered:
Sorry, somehow the notification mechanism seems to have failed, and I only saw this just now.
Interesting question. I'm not sure what the semantics should be, i.e. whether yypushback should refer to Java chars or full Unicode code points. Java char is certainly easier in the implementation.
Does yypushback(yylength()) do the correct thing for [^]?
I don't think it's unreasonable that Java programmers have to know about the char/code point duality, so I think it's reasonable for yypushback(int) to deal strictly with chars. Maybe we could add a yypushback_codepoints(int) (or something like that but better named) ?
Are there any plans to do something with, I'd say, inconsistency, when
[^]
matches 2 chars utf-16 emoji, for example, but yypushback(1) will push us back by 1 char only, in the middle of the pair.Or this is intentional behavior and in this case, it would be nice to mention this in documentation.
The text was updated successfully, but these errors were encountered: