01a Scanning #1

ngjunsiang · 2022-04-14T07:46:01Z

Pseudocode 9608

9608 is a somewhat-inconsistent syntax that is meant to describe algorithms for GCE A Level Computing. Let's try to turn it into a real-ish programming language, implementing as many of its features as we can with basic concepts.

A sample of 9608 pseudocode is shown below.

DECLARE i : INTEGER
DECLARE String : STRING
i <- 0
WHILE i < 10 DO
    CASE OF i
        0: String <- "0"
        1: String <- "1"
        2: String <- "2"
        3: String <- "3"
        OTHERWISE String <- "large number can't count"
    ENDCASE
    OUTPUT String
    i <- i + 1
ENDWHILE

It's a lot for a program to interpret at a go. In Python, this is just a string of letters, numbers, symbols, and whitespace (space and linebreaks). We've got to break it up into groupings of characters that represent things in 9608 pseudocode.

Lexical analysis

From a quick analysis, we can identify the following:

Keywords: These are words which seem to have special meaning in 9608 pseudocode: WHILE, CASE, ENDCASE, etc.
Integers: 0, 1, 2, 3, etc.
Strings: Unlike keywords, these are words which do not appear to have special meaning. I.e. they are just regular words and stuff. They are called strings, and are demarcated in double-quotes like "large number can't count"
Symbols: +, <-, :, which appear to relate numbers and other chunks to each other.

Let's put this all in Python code. We need to have a process by which we turn src above into chunks, which we shall call tokens. Since we are writing a program, that means we need rules for doing so.

Scanning code into tokens

This is how I would describe that process to a computer:

Check the first character.

If it is a space, ignore it. It typically has no special meaning.
If it is a line break (detected as the special character \n, that forms its own word. This is used to mark the end of a statement, or demarcate parts of statements.)
If it is an alphabet letter, keep going until a non-alphabet letter is encountered. This gives us a word.
If it is a number digit (0-9), keep checking the next character until a non-number is encountered. This gives us an integer.
If it is a double-quote ("), keep checking the next character until another double-quote is encountered. This gives us a string.
The word may be a keyword if it is found in KEYWORDS, otherwise it is a variable that may be used to refer to values.
If it appears to be a symbol ... we'll figure that out in a bit.

ngjunsiang · 2022-04-14T07:48:27Z

Helper functions [`a48c005`]

Throwing all the code we write into a single massive ball of script is going to make it difficult to see the abstract picture of what's going on. Let's make some helper functions for the following tasks:

check the current character: This simply returns us the current character we are looking at.
consume the current character, removing it from src, and returning the rest of src.
atEnd tells us if we are at the end of src. True means there are no more characters to scan.

ngjunsiang · 2022-04-14T07:50:54Z

Wrapping code

https://github.com/nyjc-computing/pseudo/blob/ccadc8dd31441f9e654debdca1f40f3314b580b0/scanner.py#L62

Because the functions cannot directly modify the source code, we wrap the code in an object (here, I use a dict) that enables them to access and modify the source code string.

In future we will modify the scanner so it does not have to do so. But meanwhile let's keep it simple.

ngjunsiang · 2022-04-14T07:52:48Z

https://github.com/nyjc-computing/pseudo/blob/ccadc8dd31441f9e654debdca1f40f3314b580b0/scanner.py#L61-L91

We loop the scanner as long as there are more characters, checking the first letter each time. We pick a scanning function to pass code to, for tokenising a word, symbol, integer, or string. And we add that token into a list of tokens.

ngjunsiang · 2022-04-14T07:55:19Z

Scanning functions [`396ba6d`]

These scanning functions are responsible for recognising words, integers, strings, and symbols. They are invoked by scan(), and run until they detect a terminating condition.

ngjunsiang · 2022-04-14T07:56:58Z

Detect line breaks [`ccadc8d`]

The scanner cannot recognise line breaks yet. This change enables it to do so, returning line breaks as a '\n' token.

ngjunsiang · 2022-04-14T07:58:08Z

Testing

Testing code:

from scanner import scan

src = """
DECLARE i : INTEGER
DECLARE String : STRING
i <- 0
WHILE i < 10 DO
    CASE OF i
        0: String <- "0"
        1: String <- "1"
        2: String <- "2"
        3: String <- "3"
        OTHERWISE String <- "large number can't count"
    ENDCASE
    OUTPUT String
    i <- i + 1
ENDWHILE
"""

tokens = scan(src)

print('Tokens:', tokens)

Sample output:

Tokens: ['\n', 'DECLARE', 'i', ':', 'INTEGER', '\n', 'DECLARE', 'String', ':', 'STRING', '\n', 'i', '<-', '0', '\n', 'WHILE', 'i', '<', '10', 'DO', '\n', 'CASE', 'OF', 'i', '\n', '0', ':', 'String', '<-', '"0"', '\n', '1', ':', 'String', '<-', '"1"', '\n', '2', ':', 'String', '<-', '"2"', '\n', '3', ':', 'String', '<-', '"3"', '\n', 'OTHERWISE', 'String', '<-', '"large number can\'t count"', '\n', 'ENDCASE', '\n', 'OUTPUT', 'String', '\n', 'i', '<-', 'i', '+', '1', '\n', 'ENDWHILE', '\n']

ngjunsiang and others added 3 commits April 14, 2022 07:34

Create main scanning loop and helper functions

a48c005

Add scanning functions

396ba6d

detect line break tokens

ccadc8d

ngjunsiang merged commit b3fdfc1 into main Apr 14, 2022

ngjunsiang changed the title ~~01 Scanning~~ 01a Scanning Apr 14, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

01a Scanning #1

01a Scanning #1

ngjunsiang commented Apr 14, 2022

ngjunsiang commented Apr 14, 2022

ngjunsiang commented Apr 14, 2022

ngjunsiang commented Apr 14, 2022

ngjunsiang commented Apr 14, 2022

ngjunsiang commented Apr 14, 2022

ngjunsiang commented Apr 14, 2022

01a Scanning #1

01a Scanning #1

Conversation

ngjunsiang commented Apr 14, 2022

Pseudocode 9608

Lexical analysis

Scanning code into tokens

ngjunsiang commented Apr 14, 2022

Helper functions [a48c005]

ngjunsiang commented Apr 14, 2022

Wrapping code

ngjunsiang commented Apr 14, 2022

ngjunsiang commented Apr 14, 2022

Scanning functions [396ba6d]

ngjunsiang commented Apr 14, 2022

Detect line breaks [ccadc8d]

ngjunsiang commented Apr 14, 2022

Testing

Helper functions [`a48c005`]

Scanning functions [`396ba6d`]

Detect line breaks [`ccadc8d`]