Skip to content

How Programming Language Internals Work

Daniel Grace edited this page Jan 8, 2016 · 1 revision

Tokenizing

The first thing that the bot does when it receives programming input is to "tokenize" the input. This is when the bot turns your one long string into a set of tokens that represent distinct elements. For instance echo "Yes"; would be turned into three tokens: ECHO, QUOTED_STRING, and SEMICOLON. An error in tokenizing will often produce and error that looks similar to the following:

If you use !< instead of !! the system will tokenize, do a slight amount of cleanup, and then send you the token list. If your program is short and the token list fits in a 2,000 character limit, this is a good way to debug.

Split into lines and blocks

Next the bot splits this list of tokens into lines and blocks. This process is mostly boring and rarely goes wrong, it is simply included for completeness.

Pattern Matching

This is the real core of how the programming language works. Each line is processed in turn by repeatedly finding patterns until the line is empty.

Terminating Patterns

Some patterns are "terminating patterns" that signal that the line is complete. For instance, ECHO, QUOTED_STRING, SEMICOLON is a terminating pattern. If that pattern is found, we echo the value of QUOTED_STRING and end the command. If no patterns apply and we have some tokens left, you will receive an error message that looks similar to this:

Got to end of command without being able to process it completely. Here's what's left: LEFT_ANGLE: <

Sometimes it is easy to see what is wrong when you get this sort of message. Sometimes it's not. I'm sorry and am looking at ways to make debugging easier.

Non-terminating Patterns

Many of the patterns simply replace themselves with more tokens. For instance, QUOTED_STRING, PLUS, QUOTED_STRING replaces itself with a QUOTED_STRING that is both of the input strings put together.

Putting together what we now know about terminating and non-terminating patterns we can see how ECHO, QUOTED_STRING, PLUS, QUOTED_STRING, SEMICOLON goes through the system. First, we look to see if it matches ECHO, QUOTED_STRING, SEMICOLON and see that it doesn't. Then we look for QUOTED_STRING, PLUS, QUOTED_STRING and see that that pattern does match part of the way into the input. Thus, the input becomes ECHO, QUOTED_STRING, SEMICOLON. That now matches the terminating pattern and echoes.

Clone this wiki locally