-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
01a Scanning #1
01a Scanning #1
Conversation
Helper functions [a48c005]Throwing all the code we write into a single massive ball of script is going to make it difficult to see the abstract picture of what's going on. Let's make some helper functions for the following tasks:
|
Wrapping codeBecause the functions cannot directly modify the source code, we wrap the code in an object (here, I use a dict) that enables them to access and modify the source code string. In future we will modify the scanner so it does not have to do so. But meanwhile let's keep it simple. |
We loop the scanner as long as there are more characters, checking the first letter each time. We pick a scanning function to pass |
Scanning functions [396ba6d]These scanning functions are responsible for recognising words, integers, strings, and symbols. They are invoked by |
Detect line breaks [ccadc8d]The scanner cannot recognise line breaks yet. This change enables it to do so, returning line breaks as a '\n' token. |
TestingTesting code:
Sample output:
|
Pseudocode 9608
9608 is a somewhat-inconsistent syntax that is meant to describe algorithms for GCE A Level Computing. Let's try to turn it into a real-ish programming language, implementing as many of its features as we can with basic concepts.
A sample of 9608 pseudocode is shown below.
It's a lot for a program to interpret at a go. In Python, this is just a string of letters, numbers, symbols, and whitespace (space and linebreaks). We've got to break it up into groupings of characters that represent things in 9608 pseudocode.
Lexical analysis
From a quick analysis, we can identify the following:
WHILE
,CASE
,ENDCASE
, etc.0
,1
,2
,3
, etc."large number can't count"
+
,<-
,:
, which appear to relate numbers and other chunks to each other.Let's put this all in Python code. We need to have a process by which we turn
src
above into chunks, which we shall call tokens. Since we are writing a program, that means we need rules for doing so.Scanning code into tokens
This is how I would describe that process to a computer:
Check the first character.
\n
, that forms its own word. This is used to mark the end of a statement, or demarcate parts of statements.)0-9
), keep checking the next character until a non-number is encountered. This gives us an integer."
), keep checking the next character until another double-quote is encountered. This gives us a string.The word may be a keyword if it is found in
KEYWORDS
, otherwise it is a variable that may be used to refer to values.