hand/lexer: Add long-char.c test case.

Not quite sure what the best approach is for error detection here. Trying to be cleaver to potentially give better error messages, or be simple but consistent and fail early for invalid tokens. Approach 1) If trying to be cleaver, where is the cut-off point? 1 byte look-ahead, to locate 'cc' as invalid, 2 byte look-ahead, to also identify 'ccc', etc. Approach 2) Another approach is to backtrack to the first possible starting position of a valid token within the input byte stream. In the case of `'cc';`, the lexer would emit an "unterminated character literal" error for the first apostrophe and then backtrack to the position directly succeeding the first apostrophe (i.e. `cc';`) to continue lexing. The lexer would then emit the identifier `cc`, an error for the second unterminated apostrophe and a semicolon token. Approach 3) (current approach) Lex as many bytes as would be valid for the current token being lexed. In the case of `'cc'; // Not OK` being lexed, lex `'c` before emitting an "unterminated character literal" error, as the first apostrophe indicates that a character literal is to be lexed, the first `c` indicates a valid character was located within the character literal, and as the lexer tries to locate the terminating apostrophe, but fails to do so, it emits an error, and continues lexing from the next byte in the byte stream, i.e. from `c'; // Not OK`. Which would be lexed as the identifier `c`, an "unterminated character literal" error for `';`, and a comment for `// Not OK` @sangisos any idea on which approach that is preferable? What are the advantages and disadvantages with the three different approaches? Are there any other approaches we may try?
mewmew · Feb 22, 2016 · 5849769 · 5849769 · mewmew · Feb 23, 2016
1 parent 3ecddc7
commit 5849769
Showing 1 changed file with 118 additions and 2 deletions.
diff --git a/uc/hand/lexer/lexer_test.go b/uc/hand/lexer/lexer_test.go
@@ -537,8 +537,124 @@ func TestLexer(t *testing.T) {
 				},
 			},
 		},
+		{
+			path: "../../testdata/incorrect/lexer/long-char.c",
+			toks: []token.Token{
+				{
+					Kind: token.Ident,
+					Val:  "int",
+					Pos:  0,
+				},
+				{
+					Kind: token.Ident,
+					Val:  "main",
+					Pos:  4,
+				},
+				{
+					Kind: token.Lparen,
+					Val:  "(",
+					Pos:  8,
+				},
+				{
+					Kind: token.Ident,
+					Val:  "void",
+					Pos:  9,
+				},
+				{
+					Kind: token.Rparen,
+					Val:  ")",
+					Pos:  13,
+				},
+				{
+					Kind: token.Lbrace,
+					Val:  "{",
+					Pos:  15,
+				},
+				{
+					Kind: token.Ident,
+					Val:  "char",
+					Pos:  19,
+				},
+				{
+					Kind: token.Ident,
+					Val:  "c",
+					Pos:  24,
+				},
+				{
+					Kind: token.Semicolon,
+					Val:  ";",
+					Pos:  25,
+				},
+				{
+					Kind: token.Ident,
+					Val:  "c",
+					Pos:  29,
+				},
+				{
+					Kind: token.Assign,
+					Val:  "=",
+					Pos:  31,
+				},
+				{
+					Kind: token.CharLit,
+					Val:  "'c'",
+					Pos:  33,
+				},
+				{
+					Kind: token.Semicolon,
+					Val:  ";",
+					Pos:  36,
+				},
+				{
+					Kind: token.Comment,
+					Val:  " OK",
+					Pos:  38,
+				},
+				{
+					Kind: token.Ident,
+					Val:  "c",
+					Pos:  46,
+				},
+				{
+					Kind: token.Assign,
+					Val:  "=",
+					Pos:  48,
+				},
+				{
+					Kind: token.Error,
+					Val:  "unterminated character literal",
+					// TODO: Figure out how to handle position of errors.
+					Pos: 51,
+				},
+				{
+					Kind: token.Ident,
+					Val:  "c",
+					Pos:  52,
+				},
+				{
+					Kind: token.Error,
+					Val:  "unterminated character literal",
+					// TODO: Figure out how to handle position of errors.
+					Pos: 54,
+				},
+				{
+					Kind: token.Comment,
+					Val:  " Not OK",
+					Pos:  56,
+				},
+				{
+					Kind: token.Rbrace,
+					Val:  "}",
+					Pos:  66,
+				},
+				{
+					Kind: token.EOF,
+					Val:  "",
+					Pos:  68,
+				},
+			},
+		},
 		// TODO: Add tokens for the following test cases.
-		{path: "../../testdata/incorrect/lexer/long-char.c"},
 		{path: "../../testdata/incorrect/lexer/ugly.c"},
 		{path: "../../testdata/quiet/lexer/l01.c"},
 		{path: "../../testdata/quiet/lexer/l02.c"},
@@ -575,7 +691,7 @@ func TestLexer(t *testing.T) {
 				break
 			}
 		}
-		if looprun >= 1 {
+		if looprun >= 2 {
 			break // TODO: Remove this break to test all test cases and not just the first.
 		}
 	}