04 Statements #9

ngjunsiang · 2022-04-14T23:28:19Z

Statements

state (v.)

1590s, "to set in a position," from state (n.1); the sense of "declare in words" is first attested
1640s, from the notion of "placing" something on the record.

A statement is a declaration, a way of "placing" something on a record ... with words? In programming, we say that statements are a way of changing the state of the computer's internal data. Expressions are evaluated, their value extracted, and then they are forgotten; they came into being, produced a result, but never changed any data.

Statements, on the other hand, almost always change the state of the computer's data in some way, without necessarily returning any value. They are therefore not evaluated, but executed.

In 9608 pseudocode, statements look like the following examples:

DECLARE Pointer : INTEGER
OUTPUT "Customer name is ", "John Goh."
Index <- Index + 1

FUNCTION PolygonArea(Sides: INTEGER, Length: REAL) RETURNS REAL
    ...
    RETURN Area
ENDFUNCTION

The general pattern in 9608 pseudocode, as far as I can tell, is that they never start with a value or symbol. They start with one or more keywords, or a name, and end with a line break (\n).

We have expression() for parsing expressions. Here we implement statement() for parsing statements. We represented expressions with expr, and we'll use stmt for representing statements. We will continue using dicts to represent them, and decide on the necessary keys when we figure out what we need.

ngjunsiang · 2022-04-14T23:29:25Z

Entry point

Our entry point to parsing statements is statement(). It checks the first token for a matching keyword or name. Then it passes control to the appropriate statement-parsing function, and gets a stmt dict in return.

[eb280b4]

ngjunsiang · 2022-04-14T23:29:46Z

OUTPUT statement parsing

We are setting up the skeleton here for parsing other kinds of statements too. Each kind of statement will have its own function for consuming the tokens it is looking for. And every statement ends with a line break (\n) that must be consumed.

[83ce629]

ngjunsiang · 2022-04-14T23:33:03Z

More helper functions

With some foresight, we can also see that some patterns here are going to be repeated a lot. In particular, I notice:

Pattern 1 check()s for a specific token, consumes it, raises an error if it doesn't find it,
It expects a specific token.

Pattern 2 check() for a specific token, consumes it if it finds it, and proceeds with parsing.
It matches a specific token (and later, maybe multiple token values).

These two features are used to check for keyword tokens, which don't form part of the parsed stmt but are used to describe its structure.

Let's add two more helper functions to our menagerie of parser helpers: [0d0864c]

ngjunsiang · 2022-04-14T23:35:22Z

Bugfix

Noticed a bug: https://github.com/nyjc-computing/pseudo/blob/0d0864cf7d756ab271ad4d20b991e8c48a656b6c/parser.py#L106-L108

expression() should have been called with a list of tokens instead of a single token. Fix: [1c4feb7]

Testing:

import scanner
import parser
import interpreter

src = 'OUTPUT "Hello", ", ", "everyone", "!"'

tokens = scanner.scan(src)

statements = parser.parse(tokens)

for statement in statements:
    print(statement)

Result:

File "scanner.py", line 91, in scan
    raise ParseError(f"Unrecognised character {repr(char)}.")
builtin.ParseError: Unrecognised character ','.

Oops, scan() doesn't pick up that comma. Fix: [a92e5d8]

ngjunsiang · 2022-04-14T23:35:59Z

Refactor

Our statement parsers now use the new helper functions to simplify their code: [1fc0ec9]

ngjunsiang · 2022-04-14T23:38:55Z

The parsing loop

We have expression() as our entry function to parsing expressions, and statement() as our entry function to parsing statements. They match one expression or one statement, respectively.

We need another parsing loop to keep parsing statements, until we run out of them.

The EOF token

One problem: our statement-parsing functions also consume() \n keyword tokens, which we use to mark the end of statements. This means that halfway through parsing incomplete code, we might run into an IndexError when the parser runs out of tokens.

We shall mitigate this by introducing a new type of token: EOF. We add it to the end of our scanned token list before sending it to the parser: https://github.com/nyjc-computing/pseudo/blob/1e4ea8282a6094efb68e32d60f4e5e8bac7f704a/parser.py#L124-L125

And we define an atEnd() helper function, analogous to our scanner's atEnd(), to look out for this token: https://github.com/nyjc-computing/pseudo/blob/d4d4bcccd3d18f63dd2f2d40c63451b74148eae7/parser.py#L8-L11

Introducing the parsing loop: [1e4ea82]

ngjunsiang · 2022-04-14T23:42:56Z

Testing

import scanner
import parser
import interpreter

src = 'OUTPUT "Hello", ", ", "everyone", "!"'

tokens = scanner.scan(src)

statements = parser.parse(tokens)

for statement in statements:
    print(statement)

Result:

{
  'rule': 'output',
  'exprs': [
    {'type': 'string', 'word': '"Hello"', 'value': 'Hello'},
    {'type': 'string', 'word': '", "', 'value': ', '},
    {'type': 'string', 'word': '"everyone"', 'value': 'everyone'},
    {'type': 'string', 'word': '"!"', 'value': '!'}
  ]
}

ngjunsiang · 2022-04-14T23:43:39Z

Next steps

We're still missing the last step. This is where we execute() the statements, carrying out the stmts and evaluating the exprs we have painstakingly built up so far. This is the interpretation phase, and we have already written evaluate().

mrjsng added 9 commits April 14, 2022 23:12

Add statement parsing function interfaces

eb280b4

Implement OUTPUT statement parser

83ce629

Add statement parsing helper functions

0d0864c

fix: expression() takes in tokens instead of a single token

1c4feb7

refactor: statement parsers use helper functions

1fc0ec9

add main parsing loop

1e4ea82

refactor: atEnd() checks for EOF token presence instead of tokens length

d4d4bcc

fix: scan() recognises ',' as a valid character

a92e5d8

merge main

e81615b

ngjunsiang merged commit b59ffb0 into main Apr 14, 2022

ngjunsiang deleted the parser branch April 15, 2022 07:10

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

04 Statements #9

04 Statements #9

ngjunsiang commented Apr 14, 2022

ngjunsiang commented Apr 14, 2022 •

edited

Loading

ngjunsiang commented Apr 14, 2022 •

edited

Loading

ngjunsiang commented Apr 14, 2022

ngjunsiang commented Apr 14, 2022 •

edited

Loading

ngjunsiang commented Apr 14, 2022 •

edited

Loading

ngjunsiang commented Apr 14, 2022

ngjunsiang commented Apr 14, 2022 •

edited

Loading

ngjunsiang commented Apr 14, 2022

04 Statements #9

04 Statements #9

Conversation

ngjunsiang commented Apr 14, 2022

Statements

ngjunsiang commented Apr 14, 2022 • edited Loading

Entry point

ngjunsiang commented Apr 14, 2022 • edited Loading

OUTPUT statement parsing

ngjunsiang commented Apr 14, 2022

More helper functions

ngjunsiang commented Apr 14, 2022 • edited Loading

Bugfix

ngjunsiang commented Apr 14, 2022 • edited Loading

Refactor

ngjunsiang commented Apr 14, 2022

The parsing loop

The EOF token

ngjunsiang commented Apr 14, 2022 • edited Loading

Testing

ngjunsiang commented Apr 14, 2022

Next steps

ngjunsiang commented Apr 14, 2022 •

edited

Loading

ngjunsiang commented Apr 14, 2022 •

edited

Loading

ngjunsiang commented Apr 14, 2022 •

edited

Loading

ngjunsiang commented Apr 14, 2022 •

edited

Loading

ngjunsiang commented Apr 14, 2022 •

edited

Loading