New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

yypushback() behaviour on surrogate characters #215

Closed
hurricup opened this Issue Apr 13, 2017 · 4 comments

Comments

Projects
None yet
3 participants
@hurricup

hurricup commented Apr 13, 2017

Are there any plans to do something with, I'd say, inconsistency, when [^] matches 2 chars utf-16 emoji, for example, but yypushback(1) will push us back by 1 char only, in the middle of the pair.

Or this is intentional behavior and in this case, it would be nice to mention this in documentation.

@lsf37

This comment has been minimized.

Show comment
Hide comment
@lsf37

lsf37 Nov 3, 2017

Member

Sorry, somehow the notification mechanism seems to have failed, and I only saw this just now.

Interesting question. I'm not sure what the semantics should be, i.e. whether yypushback should refer to Java chars or full Unicode code points. Java char is certainly easier in the implementation.

Does yypushback(yylength()) do the correct thing for [^]?

@sarowe do you have an opinion on this?

Member

lsf37 commented Nov 3, 2017

Sorry, somehow the notification mechanism seems to have failed, and I only saw this just now.

Interesting question. I'm not sure what the semantics should be, i.e. whether yypushback should refer to Java chars or full Unicode code points. Java char is certainly easier in the implementation.

Does yypushback(yylength()) do the correct thing for [^]?

@sarowe do you have an opinion on this?

@sarowe

This comment has been minimized.

Show comment
Hide comment
@sarowe

sarowe Nov 3, 2017

Contributor

I don't think it's unreasonable that Java programmers have to know about the char/code point duality, so I think it's reasonable for yypushback(int) to deal strictly with chars. Maybe we could add a yypushback_codepoints(int) (or something like that but better named) ?

Contributor

sarowe commented Nov 3, 2017

I don't think it's unreasonable that Java programmers have to know about the char/code point duality, so I think it's reasonable for yypushback(int) to deal strictly with chars. Maybe we could add a yypushback_codepoints(int) (or something like that but better named) ?

@lsf37

This comment has been minimized.

Show comment
Hide comment
@lsf37

lsf37 Nov 3, 2017

Member

Ok, I agree. I guess we mainly should document the current behaviour more clearly.

Member

lsf37 commented Nov 3, 2017

Ok, I agree. I guess we mainly should document the current behaviour more clearly.

@lsf37 lsf37 self-assigned this Nov 3, 2017

@lsf37 lsf37 added the documentation label Nov 3, 2017

@lsf37 lsf37 added this to the release 1.7.0 milestone Nov 3, 2017

lsf37 added a commit that referenced this issue Nov 3, 2017

@lsf37

This comment has been minimized.

Show comment
Hide comment
@lsf37

lsf37 Nov 3, 2017

Member

Have now mentioned this behaviour in the docs, as requested, in 846d20c

Member

lsf37 commented Nov 3, 2017

Have now mentioned this behaviour in the docs, as requested, in 846d20c

@lsf37 lsf37 closed this Nov 5, 2017

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment