Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

StringLiteral #4

Merged
merged 23 commits into from
May 2, 2023
Merged
Show file tree
Hide file tree
Changes from 17 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -37,7 +37,7 @@ npm run mtsc ./tests/singleVar.ts
- [ ] Make semicolon a statement ender, not statement separator.
- Hint: You'll need a predicate to peek at the next token and decide if it's the start of an element.
- Bonus: Switch from semicolon to newline as statement ender.
- [ ] Add string literals.
- [x] Add string literals.
- [ ] Add let.
- Then add use-before-declaration errors in the checker.
- Finally, add an ES2015 -> ES5 transform that transforms `let` to `var`.
Expand Down
8 changes: 8 additions & 0 deletions baselines/reference/singleTypedVar.errors.baseline
Original file line number Diff line number Diff line change
@@ -1,6 +1,14 @@
[
{
"pos": 43,
"message": "Expected identifier or literal but got EOF"
},
imteekay marked this conversation as resolved.
Show resolved Hide resolved
{
"pos": 17,
"message": "Cannot assign initialiser of type 'number' to variable with declared type 'string'."
},
{
"pos": 41,
"message": "Cannot assign initialiser of type 'string' to variable with declared type 'number'."
}
]
2 changes: 1 addition & 1 deletion baselines/reference/singleTypedVar.js.baseline
Original file line number Diff line number Diff line change
@@ -1 +1 @@
"var s = 1"
"var s = 1;\nvar n = 'test';\n(missing)"
imteekay marked this conversation as resolved.
Show resolved Hide resolved
29 changes: 29 additions & 0 deletions baselines/reference/singleTypedVar.tree.baseline
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,12 @@
"kind": "Var",
"pos": 3
}
],
"n": [
{
"kind": "Var",
"pos": 22
}
]
},
"statements": [
Expand All @@ -22,6 +28,29 @@
"kind": "Literal",
"value": 1
}
},
{
"kind": "Var",
"name": {
"kind": "Identifier",
"text": "n"
},
"typename": {
"kind": "Identifier",
"text": "number"
},
"init": {
"kind": "StringLiteral",
"value": "test",
"isSingleQuote": true
}
},
{
"kind": "ExpressionStatement",
"expr": {
"kind": "Identifier",
"text": "(missing)"
}
imteekay marked this conversation as resolved.
Show resolved Hide resolved
}
]
}
1 change: 1 addition & 0 deletions baselines/reference/stringLIteral.errors.baseline
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
[]
1 change: 1 addition & 0 deletions baselines/reference/stringLIteral.js.baseline
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
"var singleQuote = 'singleQuote';\nvar doubleQuote = \"doubleQuote\";\nvar escapedSingleQuote = 'escapedSingle\\'Quote';\nvar escapedDoubleQuote = \"escapedDouble\\\"Quote\";\n(missing)"
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should remove the (missing) after merging the EmptyStatement PR.

157 changes: 157 additions & 0 deletions baselines/reference/stringLIteral.tree.baseline
Original file line number Diff line number Diff line change
@@ -0,0 +1,157 @@
{
"locals": {
"singleQuote": [
{
"kind": "Var",
"pos": 3
}
],
"doubleQuote": [
{
"kind": "Var",
"pos": 36
}
],
"escapedSingleQuote": [
{
"kind": "Var",
"pos": 69
}
],
"escapedDoubleQuote": [
{
"kind": "Var",
"pos": 118
}
],
"escapedB": [
{
"kind": "Var",
"pos": 167
}
],
"escapedT": [
{
"kind": "Var",
"pos": 196
}
],
"escapedN": [
{
"kind": "Var",
"pos": 225
}
],
"escapedR": [
{
"kind": "Var",
"pos": 254
}
]
},
"statements": [
{
"kind": "Var",
"name": {
"kind": "Identifier",
"text": "singleQuote"
},
"init": {
"kind": "StringLiteral",
"value": "singleQuote",
"isSingleQuote": true
}
},
{
"kind": "Var",
"name": {
"kind": "Identifier",
"text": "doubleQuote"
},
"init": {
"kind": "StringLiteral",
"value": "doubleQuote",
"isSingleQuote": false
}
},
{
"kind": "Var",
"name": {
"kind": "Identifier",
"text": "escapedSingleQuote"
},
"init": {
"kind": "StringLiteral",
"value": "escapedSingle'Quote",
"isSingleQuote": true
}
},
{
"kind": "Var",
"name": {
"kind": "Identifier",
"text": "escapedDoubleQuote"
},
"init": {
"kind": "StringLiteral",
"value": "escapedDouble\"Quote",
"isSingleQuote": false
}
},
{
"kind": "Var",
"name": {
"kind": "Identifier",
"text": "escapedB"
},
"init": {
"kind": "StringLiteral",
"value": "escaped\nB",
"isSingleQuote": true
}
},
{
"kind": "Var",
"name": {
"kind": "Identifier",
"text": "escapedT"
},
"init": {
"kind": "StringLiteral",
"value": "escaped\nT",
"isSingleQuote": true
}
},
{
"kind": "Var",
"name": {
"kind": "Identifier",
"text": "escapedN"
},
"init": {
"kind": "StringLiteral",
"value": "escaped\nN",
"isSingleQuote": true
}
},
{
"kind": "Var",
"name": {
"kind": "Identifier",
"text": "escapedR"
},
"init": {
"kind": "StringLiteral",
"value": "escaped\nR",
"isSingleQuote": true
}
},
{
"kind": "ExpressionStatement",
"expr": {
"kind": "Identifier",
"text": "(missing)"
}
}
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should replace this node with the EmptyStatement.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the reason that you have empty statements is that ; is parsed as a separator, not a terminator, so a complete program looks like X;Y;Z not X;Y;Z;.

(separators have become less and less popular over the history of programming languages)

]
}
2 changes: 2 additions & 0 deletions src/check.ts
Original file line number Diff line number Diff line change
Expand Up @@ -55,6 +55,8 @@ export function check(module: Module) {
return errorType;
case Node.Literal:
return numberType;
case Node.StringLiteral:
return stringType;
case Node.Assignment:
const v = checkExpression(expression.value);
const t = checkExpression(expression.name);
Expand Down
32 changes: 32 additions & 0 deletions src/emit.ts
Original file line number Diff line number Diff line change
@@ -1,5 +1,22 @@
import { Statement, Node, Expression } from './types';

const singleQuoteRegex = /[\\\'\t\v\f\b\r\n]/g;
const doubleQuoteRegex = /[\\\"\t\v\f\b\r\n]/g;

const escapedCharsMap = new Map(
Object.entries({
'\t': '\\t',
'\v': '\\v',
'\f': '\\f',
'\b': '\\b',
'\r': '\\r',
'\n': '\\n',
'\\': '\\\\',
'"': '\\"',
"'": "\\'",
}),
);

export function emit(statements: Statement[]) {
return statements.map(emitStatement).join(';\n');
}
Expand All @@ -24,7 +41,22 @@ function emitExpression(expression: Expression): string {
return expression.text;
case Node.Literal:
return '' + expression.value;
case Node.StringLiteral:
return expression.isSingleQuote
? `'${escapeString(expression.value, true)}'`
: `"${escapeString(expression.value, false)}"`;
case Node.Assignment:
return `${expression.name.text} = ${emitExpression(expression.value)}`;
}
}

function escapeString(string: string, isSingleQuote: boolean) {
return string.replace(
isSingleQuote ? singleQuoteRegex : doubleQuoteRegex,
replacement,
);
}

function replacement(char: string) {
return escapedCharsMap.get(char) || char;
}
66 changes: 65 additions & 1 deletion src/lex.ts
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
import { Token, Lexer } from './types';
import { Token, Lexer, CharCodes } from './types';

const keywords = {
function: Token.Function,
Expand All @@ -11,12 +11,14 @@ export function lex(s: string): Lexer {
let pos = 0;
let text = '';
let token = Token.BOF;
let firstChar: string;

return {
scan,
token: () => token,
pos: () => pos,
text: () => text,
isSingleQuote: () => firstChar === "'",
};

function scan() {
Expand All @@ -40,6 +42,10 @@ export function lex(s: string): Lexer {
text in keywords
? keywords[text as keyof typeof keywords]
: Token.Identifier;
} else if (['"', "'"].includes(s.charAt(pos))) {
firstChar = s.charAt(pos);
text = scanString();
token = Token.String;
} else {
pos++;
switch (s.charAt(pos - 1)) {
Expand All @@ -62,6 +68,64 @@ export function lex(s: string): Lexer {
function scanForward(pred: (x: string) => boolean) {
while (pos < s.length && pred(s.charAt(pos))) pos++;
}

function scanString() {
const quote = s.charCodeAt(pos);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

feels like this code needs a lot more tests

Copy link
Owner Author

@imteekay imteekay Apr 28, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just made some adjustments to how I scan, parse, and emit escape characters. And separated the StringLiteral node from the Literal (I will rename the Literal to NumericLiteral and refactor the code in the future) (imteekay/mini-typescript@c5e8220).

Also added more tests as you recommended (imteekay/mini-typescript@ead71e2 and imteekay/mini-typescript@b04fc8a).

pos++;

let stringValue = '';
let start = pos;

while (true) {
if (pos >= s.length) {
// report unterminated string literal error
}

Comment on lines +80 to +82
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still not sure how to report this kind of error in the lexer scope 🤔

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you'll probably have to make the lexer able to report errors the same way the other phases do

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I decided to add a new exercise to add the support for the lexer to report errors (imteekay/mini-typescript@65a3270). I also saw an interesting tweet by Maria about reporting errors for string literals.

const char = s.charCodeAt(pos);

if (char === quote) {
stringValue += s.slice(start, pos);
pos++;
break;
}

if (char === CharCodes.backslash) {
stringValue += s.slice(start, pos);
stringValue += scanEscapeSequence();
start = pos;
continue;
}

pos++;
}

return stringValue;
}

function scanEscapeSequence() {
pos++;
const char = s.charCodeAt(pos);
pos++;

switch (char) {
case CharCodes.b:
return '\b';
case CharCodes.t:
return '\t';
case CharCodes.n:
return '\n';
case CharCodes.r:
return '\r';
case CharCodes.singleQuote:
// prettier-ignore
return "\'";
case CharCodes.doubleQuote:
// prettier-ignore
return '\"';
default:
return String.fromCharCode(char);
}
}
}

export function lexAll(s: string) {
Expand Down