Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

StringLiteral #4

Merged
merged 23 commits into from May 2, 2023
Merged

StringLiteral #4

merged 23 commits into from May 2, 2023

Conversation

imteekay
Copy link
Owner

@imteekay imteekay commented Apr 23, 2023

@imteekay imteekay self-assigned this Apr 24, 2023
src/lex.ts Outdated
@@ -40,6 +40,10 @@ export function lex(s: string): Lexer {
text in keywords
? keywords[text as keyof typeof keywords]
: Token.Identifier;
} else if (['"', "'"].includes(s.charAt(pos))) {
scanForward((c) => /[_a-zA-Z0-9'"]/.test(c));

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

strings should be able to contain any character except the matching " or '. And you'll need to allow for escaping like with \' as well.

src/lex.ts Outdated
case CharCodes.doubleQuote:
return '"';
default:
return '';

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't remember what invalid escapes are supposed to do, but I think it's returning char itself, or maybe slash+char.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Exactly, it returns the char itself using the String.fromCharCode() fn (ts sourcecode)
Just made the changes here imteekay/mini-typescript@bc76e30

@@ -66,6 +65,62 @@ export function lex(s: string): Lexer {
function scanForward(pred: (x: string) => boolean) {
while (pos < s.length && pred(s.charAt(pos))) pos++;
}

function scanString() {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

feels like this code needs a lot more tests

Copy link
Owner Author

@imteekay imteekay Apr 28, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just made some adjustments to how I scan, parse, and emit escape characters. And separated the StringLiteral node from the Literal (I will rename the Literal to NumericLiteral and refactor the code in the future) (imteekay/mini-typescript@c5e8220).

Also added more tests as you recommended (imteekay/mini-typescript@ead71e2 and imteekay/mini-typescript@b04fc8a).

baselines/reference/stringLIteral.tree.baseline Outdated Show resolved Hide resolved
@@ -0,0 +1 @@
"var singleQuote = 'singleQuote';\nvar doubleQuote = \"doubleQuote\";\nvar escapedSingleQuote = 'escapedSingle\\'Quote';\nvar escapedDoubleQuote = \"escapedDouble\\\"Quote\";\n(missing)"
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should remove the (missing) after merging the EmptyStatement PR.

Comment on lines 149 to 155
{
"kind": "ExpressionStatement",
"expr": {
"kind": "Identifier",
"text": "(missing)"
}
}
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should replace this node with the EmptyStatement.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the reason that you have empty statements is that ; is parsed as a separator, not a terminator, so a complete program looks like X;Y;Z not X;Y;Z;.

(separators have become less and less popular over the history of programming languages)

Comment on lines +80 to +82
if (pos >= s.length) {
// report unterminated string literal error
}
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still not sure how to report this kind of error in the lexer scope 🤔

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you'll probably have to make the lexer able to report errors the same way the other phases do

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I decided to add a new exercise to add the support for the lexer to report errors (imteekay/mini-typescript@65a3270). I also saw an interesting tweet by Maria about reporting errors for string literals.

@imteekay imteekay requested a review from sandersn April 28, 2023 18:10
Copy link

@sandersn sandersn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. As you noted, you just need a way to report errors, and some refactoring of the literal types.

export type StringLiteral = Location & {
kind: Node.StringLiteral;
value: string;
isSingleQuote: boolean;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is a decent choice; you could also have the emitter always produce double quotes or single quotes (and, in theory, make that configurable).

Comment on lines 149 to 155
{
"kind": "ExpressionStatement",
"expr": {
"kind": "Identifier",
"text": "(missing)"
}
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the reason that you have empty statements is that ; is parsed as a separator, not a terminator, so a complete program looks like X;Y;Z not X;Y;Z;.

(separators have become less and less popular over the history of programming languages)

Comment on lines +80 to +82
if (pos >= s.length) {
// report unterminated string literal error
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you'll probably have to make the lexer able to report errors the same way the other phases do

@imteekay imteekay changed the title String literal StringLiteral May 1, 2023
@imteekay
Copy link
Owner Author

imteekay commented May 2, 2023

and some refactoring of the literal types.

@sandersn just decided to add a new exercise to refactor/rename from Literal to NumericLiteral later on

edit: just opened a new PR to handle that imteekay/mini-typescript#6

@imteekay imteekay requested a review from sandersn May 2, 2023 01:48
@imteekay imteekay merged commit f547ee3 into main May 2, 2023
1 check passed
@imteekay imteekay deleted the string-literals branch May 2, 2023 21:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants