Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

StringLiteral #4

Merged
merged 23 commits into from
May 2, 2023
Merged
Show file tree
Hide file tree
Changes from 12 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
2 changes: 1 addition & 1 deletion README.md
Expand Up @@ -37,7 +37,7 @@ npm run mtsc ./tests/singleVar.ts
- [ ] Make semicolon a statement ender, not statement separator.
- Hint: You'll need a predicate to peek at the next token and decide if it's the start of an element.
- Bonus: Switch from semicolon to newline as statement ender.
- [ ] Add string literals.
- [x] Add string literals.
- [ ] Add let.
- Then add use-before-declaration errors in the checker.
- Finally, add an ES2015 -> ES5 transform that transforms `let` to `var`.
Expand Down
4 changes: 4 additions & 0 deletions baselines/reference/singleTypedVar.errors.baseline
Expand Up @@ -2,5 +2,9 @@
{
"pos": 17,
"message": "Cannot assign initialiser of type 'number' to variable with declared type 'string'."
},
{
"pos": 41,
"message": "Cannot assign initialiser of type 'string' to variable with declared type 'number'."
}
]
2 changes: 1 addition & 1 deletion baselines/reference/singleTypedVar.js.baseline
@@ -1 +1 @@
"var s = 1"
"var s = 1;\nvar n = 'test';\n"
15 changes: 15 additions & 0 deletions baselines/reference/singleTypedVar.tree.baseline
Expand Up @@ -22,6 +22,21 @@
"kind": "Literal",
"value": 1
}
},
{
"kind": "Var",
"name": {
"kind": "Identifier",
"text": "n"
},
"typename": {
"kind": "Identifier",
"text": "number"
},
"init": {
"kind": "Literal",
"value": "'test'"
}
}
]
}
1 change: 1 addition & 0 deletions baselines/reference/stringLIteral.errors.baseline
@@ -0,0 +1 @@
[]
1 change: 1 addition & 0 deletions baselines/reference/stringLIteral.js.baseline
@@ -0,0 +1 @@
"var singleQuote = 'singleQuote';\nvar doubleQuote = \"doubleQuote\""
47 changes: 47 additions & 0 deletions baselines/reference/stringLIteral.tree.baseline
@@ -0,0 +1,47 @@
{
"locals": {
"singleQuote": [
{
"kind": "Var",
"pos": 3
}
],
"doubleQuote": [
{
"kind": "Var",
"pos": 36
}
]
},
"statements": [
{
"kind": "Var",
"name": {
"kind": "Identifier",
"text": "singleQuote"
},
"init": {
"kind": "Literal",
"value": "singleQuote"
}
},
{
"kind": "Var",
"name": {
"kind": "Identifier",
"text": "doubleQuote"
},
"init": {
"kind": "Literal",
"value": "double'Quote"
}
},
{
"kind": "ExpressionStatement",
"expr": {
"kind": "Identifier",
"text": "(missing)"
}
}
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should replace this node with the EmptyStatement.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the reason that you have empty statements is that ; is parsed as a separator, not a terminator, so a complete program looks like X;Y;Z not X;Y;Z;.

(separators have become less and less popular over the history of programming languages)

]
}
2 changes: 1 addition & 1 deletion src/check.ts
Expand Up @@ -54,7 +54,7 @@ export function check(module: Module) {
error(expression.pos, 'Could not resolve ' + expression.text);
return errorType;
case Node.Literal:
return numberType;
return typeof expression.value === 'string' ? stringType : numberType;
case Node.Assignment:
const v = checkExpression(expression.value);
const t = checkExpression(expression.name);
Expand Down
61 changes: 60 additions & 1 deletion src/lex.ts
@@ -1,4 +1,4 @@
import { Token, Lexer } from './types';
import { Token, Lexer, CharCodes } from './types';

const keywords = {
function: Token.Function,
Expand Down Expand Up @@ -40,6 +40,9 @@ export function lex(s: string): Lexer {
text in keywords
? keywords[text as keyof typeof keywords]
: Token.Identifier;
} else if (['"', "'"].includes(s.charAt(pos))) {
text = scanString();
token = Token.String;
} else {
pos++;
switch (s.charAt(pos - 1)) {
Expand All @@ -62,6 +65,62 @@ export function lex(s: string): Lexer {
function scanForward(pred: (x: string) => boolean) {
while (pos < s.length && pred(s.charAt(pos))) pos++;
}

function scanString() {
const quote = s.charCodeAt(pos);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

feels like this code needs a lot more tests

Copy link
Owner Author

@imteekay imteekay Apr 28, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just made some adjustments to how I scan, parse, and emit escape characters. And separated the StringLiteral node from the Literal (I will rename the Literal to NumericLiteral and refactor the code in the future) (imteekay/mini-typescript@c5e8220).

Also added more tests as you recommended (imteekay/mini-typescript@ead71e2 and imteekay/mini-typescript@b04fc8a).

pos++;

let stringValue = '';
let start = pos;

while (true) {
if (pos >= s.length) {
// report unterminated string literal error
}

Comment on lines +80 to +82
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still not sure how to report this kind of error in the lexer scope 🤔

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you'll probably have to make the lexer able to report errors the same way the other phases do

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I decided to add a new exercise to add the support for the lexer to report errors (imteekay/mini-typescript@65a3270). I also saw an interesting tweet by Maria about reporting errors for string literals.

const char = s.charCodeAt(pos);

if (char === quote) {
stringValue += s.slice(start, pos);
pos++;
break;
}

if (char === CharCodes.backslash) {
stringValue += s.slice(start, pos);
stringValue += scanEscapeSequence();
start = pos;
continue;
}

pos++;
}

return stringValue;
}

function scanEscapeSequence() {
pos++;
const char = s.charCodeAt(pos);
pos++;

switch (char) {
case CharCodes.b:
return '\b';
case CharCodes.t:
return '\t';
case CharCodes.n:
return '\n';
case CharCodes.r:
return '\r';
case CharCodes.singleQuote:
return "'";
case CharCodes.doubleQuote:
return '"';
default:
return '';
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't remember what invalid escapes are supposed to do, but I think it's returning char itself, or maybe slash+char.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Exactly, it returns the char itself using the String.fromCharCode() fn (ts sourcecode)
Just made the changes here imteekay/mini-typescript@bc76e30

}
}

export function lexAll(s: string) {
Expand Down
2 changes: 2 additions & 0 deletions src/parse.ts
Expand Up @@ -36,6 +36,8 @@ export function parse(lexer: Lexer): Module {
return { kind: Node.Identifier, text: lexer.text(), pos };
} else if (tryParseToken(Token.Literal)) {
return { kind: Node.Literal, value: +lexer.text(), pos };
} else if (tryParseToken(Token.String)) {
return { kind: Node.Literal, value: lexer.text(), pos };
}
error(
pos,
Expand Down
13 changes: 12 additions & 1 deletion src/types.ts
Expand Up @@ -10,6 +10,7 @@ export enum Token {
Semicolon = 'Semicolon',
Colon = 'Colon',
Whitespace = 'Whitespace',
String = 'String',
Unknown = 'Unknown',
BOF = 'BOF',
EOF = 'EOF',
Expand Down Expand Up @@ -49,7 +50,7 @@ export type Identifier = Location & {

export type Literal = Location & {
kind: Node.Literal;
value: number;
value: number | string;
};

export type Assignment = Location & {
Expand Down Expand Up @@ -93,3 +94,13 @@ export type Module = {
};

export type Type = { id: string };

export enum CharCodes {
b = 98,
t = 116,
n = 110,
r = 114,
singleQuote = 39,
doubleQuote = 34,
backslash = 92,
}
1 change: 1 addition & 0 deletions tests/singleTypedVar.ts
@@ -1 +1,2 @@
var s: string = 1;
var n: number = 'test';
2 changes: 2 additions & 0 deletions tests/stringLiteral.ts
@@ -0,0 +1,2 @@
var singleQuote = 'singleQuote';
var doubleQuote = 'double\'Quote';