Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

04 Statements #9

Merged
merged 9 commits into from
Apr 14, 2022
Merged

04 Statements #9

merged 9 commits into from
Apr 14, 2022

Conversation

ngjunsiang
Copy link
Contributor

Statements

state (v.)

1590s, "to set in a position," from state (n.1); the sense of "declare in words" is first attested
1640s, from the notion of "placing" something on the record.

A statement is a declaration, a way of "placing" something on a record ... with words? In programming, we say that statements are a way of changing the state of the computer's internal data. Expressions are evaluated, their value extracted, and then they are forgotten; they came into being, produced a result, but never changed any data.

Statements, on the other hand, almost always change the state of the computer's data in some way, without necessarily returning any value. They are therefore not evaluated, but executed.

In 9608 pseudocode, statements look like the following examples:

  1. DECLARE Pointer : INTEGER
  2. OUTPUT "Customer name is ", "John Goh."
  3. Index <- Index + 1
  4. FUNCTION PolygonArea(Sides: INTEGER, Length: REAL) RETURNS REAL
        ...
        RETURN Area
    ENDFUNCTION
    

The general pattern in 9608 pseudocode, as far as I can tell, is that they never start with a value or symbol. They start with one or more keywords, or a name, and end with a line break (\n).

We have expression() for parsing expressions. Here we implement statement() for parsing statements. We represented expressions with expr, and we'll use stmt for representing statements. We will continue using dicts to represent them, and decide on the necessary keys when we figure out what we need.

@ngjunsiang
Copy link
Contributor Author

ngjunsiang commented Apr 14, 2022

Entry point

Our entry point to parsing statements is statement(). It checks the first token for a matching keyword or name. Then it passes control to the appropriate statement-parsing function, and gets a stmt dict in return.

[eb280b4]

@ngjunsiang
Copy link
Contributor Author

ngjunsiang commented Apr 14, 2022

OUTPUT statement parsing

We are setting up the skeleton here for parsing other kinds of statements too. Each kind of statement will have its own function for consuming the tokens it is looking for. And every statement ends with a line break (\n) that must be consumed.

[83ce629]

@ngjunsiang
Copy link
Contributor Author

More helper functions

With some foresight, we can also see that some patterns here are going to be repeated a lot. In particular, I notice:

  1. https://github.com/nyjc-computing/pseudo/blob/83ce62945316923323b13dc08853911040846012/parser.py#L101-L105
  2. https://github.com/nyjc-computing/pseudo/blob/83ce62945316923323b13dc08853911040846012/parser.py#L87-L88

Pattern 1 check()s for a specific token, consumes it, raises an error if it doesn't find it,
It expects a specific token.

Pattern 2 check() for a specific token, consumes it if it finds it, and proceeds with parsing.
It matches a specific token (and later, maybe multiple token values).

These two features are used to check for keyword tokens, which don't form part of the parsed stmt but are used to describe its structure.

Let's add two more helper functions to our menagerie of parser helpers: [0d0864c]

@ngjunsiang
Copy link
Contributor Author

ngjunsiang commented Apr 14, 2022

Bugfix

Noticed a bug: https://github.com/nyjc-computing/pseudo/blob/0d0864cf7d756ab271ad4d20b991e8c48a656b6c/parser.py#L106-L108

expression() should have been called with a list of tokens instead of a single token. Fix: [1c4feb7]

Testing:

import scanner
import parser
import interpreter

src = 'OUTPUT "Hello", ", ", "everyone", "!"'

tokens = scanner.scan(src)

statements = parser.parse(tokens)

for statement in statements:
    print(statement)

Result:

File "scanner.py", line 91, in scan
    raise ParseError(f"Unrecognised character {repr(char)}.")
builtin.ParseError: Unrecognised character ','.

Oops, scan() doesn't pick up that comma. Fix: [a92e5d8]

@ngjunsiang
Copy link
Contributor Author

ngjunsiang commented Apr 14, 2022

Refactor

Our statement parsers now use the new helper functions to simplify their code: [1fc0ec9]

@ngjunsiang
Copy link
Contributor Author

The parsing loop

We have expression() as our entry function to parsing expressions, and statement() as our entry function to parsing statements. They match one expression or one statement, respectively.

We need another parsing loop to keep parsing statements, until we run out of them.

The EOF token

One problem: our statement-parsing functions also consume() \n keyword tokens, which we use to mark the end of statements. This means that halfway through parsing incomplete code, we might run into an IndexError when the parser runs out of tokens.

We shall mitigate this by introducing a new type of token: EOF. We add it to the end of our scanned token list before sending it to the parser: https://github.com/nyjc-computing/pseudo/blob/1e4ea8282a6094efb68e32d60f4e5e8bac7f704a/parser.py#L124-L125

And we define an atEnd() helper function, analogous to our scanner's atEnd(), to look out for this token: https://github.com/nyjc-computing/pseudo/blob/d4d4bcccd3d18f63dd2f2d40c63451b74148eae7/parser.py#L8-L11

Introducing the parsing loop: [1e4ea82]

@ngjunsiang
Copy link
Contributor Author

ngjunsiang commented Apr 14, 2022

Testing

import scanner
import parser
import interpreter

src = 'OUTPUT "Hello", ", ", "everyone", "!"'

tokens = scanner.scan(src)

statements = parser.parse(tokens)

for statement in statements:
    print(statement)

Result:

{
  'rule': 'output',
  'exprs': [
    {'type': 'string', 'word': '"Hello"', 'value': 'Hello'},
    {'type': 'string', 'word': '", "', 'value': ', '},
    {'type': 'string', 'word': '"everyone"', 'value': 'everyone'},
    {'type': 'string', 'word': '"!"', 'value': '!'}
  ]
}

@ngjunsiang
Copy link
Contributor Author

Next steps

We're still missing the last step. This is where we execute() the statements, carrying out the stmts and evaluating the exprs we have painstakingly built up so far. This is the interpretation phase, and we have already written evaluate().

@ngjunsiang ngjunsiang merged commit b59ffb0 into main Apr 14, 2022
@ngjunsiang ngjunsiang deleted the parser branch April 15, 2022 07:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant