-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Lexer: match escape sequences in strings #290
Conversation
Most languages say in their spec what a String is and what an escape sequence is (for instance here for JS: https://tc39.es/ecma262/multipage/text-processing.html#prod-CharacterEscape). So far we are always "use case driven". That is, we implement what we need to implement a specific use case; not design and implement a standard. |
So we would need to decide on a grammar for strings and change the lexer accordingly? For example
Currently, a string literal is tokenized as a whole. If we want to have better errors regarding invalid control characters/escape sequences, each of these will be probably need to be its own token. However, the lexer is currently skipping all whitespaces, making it impossible to correctly tokenize a string containing such whitespaces ( What are your thoughts on this? |
I know too little about string escapes, their use cases, and how they would be transported in the compiler to the different backends; so I don't have an informed opinion. The grammar you suggest makes somewhat sense but I don;t know what the |
They are just taken from the JS spec sheet you posted. You can read more about them here. |
On the meeting, we talked about possible designs of escape sequences in strings and settled on similar design as Zig: |
@jiribenes's proposal sounds good. However, we need to check that the encoding in source, Scala, and the backends somehow aligns (we need enough tests! :) ) Proposal approved by the Effekt committee (in person meeting). |
What do you mean by align? The given grammar is not compatible across all backends. For example, Either we somehow need to deploy some backend specific check for strings that always converts them into valid strings for the respective backend, or find the smallest subset of the given grammar supported by all backends. Or am I perhaps misinterpreting what you are suggesting? |
No, I just meant that ideally strings would potentially work the same on all backends. But it is fine for now (TM) |
Eventually we can write a custom parser like |
This PR addresses and fixes issue #202.
Of course, I am open to suggestions on how to properly integrate escape sequences into the lexing process as proposed by @b-studios (#202 (comment)):
What changes do you think this entails?