Write yourself a Brainfuck in an hour
This is a basic tutorial on how to create a fully featured Brainfuck interpreter in Haskell. Brainfuck is a (Turing complete!) programming language that aims at having very little syntax, and because of its simple instruction set it is very easy to implement.
This tutorial is very basic: having read LYAH is good enough. In fact, we'll see that we don't even need any Functors/Applicatives/Monads - everything will be pattern-matching, custom data types, and of course function application. To be more specific, we'll be using LYAH chapters 1-6, 8, very little of 9, and the beginning of 14.
The tutorial is divided into three parts. First, we'll create a data type to represent Brainfuck source code, and how to convert Brainfuck source code into this format. In the second part, we'll model the tape and convert the Brainfuck source code into a more suitable representation. The final section will be evaluation, in which we walk around on the tape doing the operations the source dictates.
Full language specification
Brainfuck source consists of a sequence of control characters, an infinitely long tape filled with zeros, and the so-called data pointer which points to the initial position on the tape. The current symbol in the code is read and executed, then the next symbol is looked at. This is repeated until the last instruction has been handled.
>moves the data pointer one field to the right.
<moves the data pointer one field to the left.
+adds 1 to the current field.
-subtracts 1 to the current field.
.prints the character that corresponds to the value of the current field (ASCII).
,reads a single character from the keyboard and stores its value in the current cell.
[if the current value is zero, then jump forward to the command after the matching ] command.
]if the current value is nonzero, then jump back to the command after the matching [ command.
- Any other character is treated as a comment.
That's it. Note how the only possible syntax error is mismatching brackets.
A few examples of Brainfuck code
To give you a glimpse at the abomination we're about to create:
[-]clears a cell containing a positive value: When the
[is reached, the cell is either zero already (in which case the program jumps after the matching
], i.e. the program terminates), or nonzero. In the nonzero case, the
[is simply ignored,
-is evaluated and decrements the current cell by one, and
]is reached. If the cell is zero now then ignore the
]and terminate, otherwise jump back to after the
[. From this you can see how
[ ]acts as a loop.
>[-]<[->+<]moves the data from one cell one cell to the right (overwriting its value). First, the data pointer moves one to the right with
>, then there's the "clear cell" command
[-]you know from above, and the pointer moves back. What we have now is a cell with some content, which has an empty neighbour to its right. Now the second loop starts: decrement the current cell, move one to the right, increment neighbour cell, move left again. This loop's body takes the content of the left cell and puts it into the right cell, one by one, until the left cell is empty. In the end, the pointer will be on the empty left cell, while the neighbour cell is filled with the data.
To print the letter "a", which has ASCII value 97, use
++++++++++[>++++++++++<-]>---.. This initializes cell 0 with 10, and then increments cell 1 ten times by another 10, giving us 100 in cell 1. Finally we subtract 3 and print the result. The standard "Hello World" program is built based on this principle.
As you can see, the programs are not very readable. How lucky we are that we're not trying to write but to implement Brainfuck, which is much easier.
Part 1: Brainfuck type, and how to parse (to) it
For the data type, we'll simply create one that has a constructor for each syntactic element:
data BrainfuckCommand = GoRight -- > | GoLeft -- < | Increment -- + | Decrement -- - | Print -- . | Read -- , | LoopL -- [ | LoopR -- ] | Comment Char -- anything else
Let's also create a type synonym to represent an entire Brainfuck program,
type BrainfuckSource = [BrainfuckCommand]
Our goal is now taking a string like
[-] and convert it into the Haskell
[LoopL, Decrement, LoopR]. Since we're traversing the input string
character by character (because each instruction is just one character long),
we can just use
map to convert between String and
parseBrainfuck :: String -> BrainfuckSource parseBrainfuck = map charToBF where charToBF x = case x of '>' -> GoRight '<' -> GoLeft '+' -> Increment '-' -> Decrement '.' -> Print ',' -> Read '[' -> LoopL ']' -> LoopR c -> Comment c
That's pretty much it for this part. Note how anything that does not match a valid command is interpreted as a comment in the last line.
Here are some improvements you could make in your code. (Don't worry if you don't do them, the rest of the tutorial will not take them into account.)
- (easy) Comments are not needed for evaluation, so we could just drop them
from the source when we parse it. Look up what
mapMaybedoes (for example using Hoogle) and use it to replace the
parseBrainfuckso that comments are ignored.
(easy) You cannot write a
Showinstance for a type synonym like
BrainfuckSource(why?). Rewrite it using a
datadeclaration to a type that holds a
[BrainfuckCommand], and define a
Showinstance for that new type that prints the contained source. For example,
show $ BFSource [LoopL, Decrement, LoopR]should yield "[-]" in GHCi.
(medium) The only possible syntax error in Brainfuck code is mismatching parentheses. For example
[-has an opening but no matching closing parenthesis, and
-]has a closing parenthesis but no corresponding opening one. Write a function
BrainfuckSource -> Maybe BrainfuckSourcethat returns
Nothingif the code is invalid, and otherwise
Just <valid code>. Use this function to modify
parseBrainfuckto reject invalid code, which then has the new type
String -> Maybe BrainfuckSource.
(hard) Modify the
Maybe BrainfuckSourcetype to
Either String BrainfuckSource. Correct code should yield
Right <valid code>, while incorrect one should result in a
Leftvalue telling the user about the problem, for example "mismatched opening parenthesis".
(hard) Make the error messages from above better: make the error messages tell the user about the position of the offending parenthesis, for example "the n-th source character is a closing parenthesis without an opening one".
Part 2: The tape
The tape is a long line of cells, each holding a number. The easiest type in
Haskell to represent a number is a list, but remember what we want to do with
the tape: traverse it in both directions, and update elements (potentially deep
down) frequently, both things lists are particularly bad at: traversal in both
directions is simply not possible (lists only go one way), and updating one
element requires traversing the entire list up to that element, ditching the
traversed spine (all the
: we encountered on the way there), do the change,
and then recreate all the
: we just got rid of. That's O(n) for random
access, which is also very awful.
But maybe we can use lists in some way, just not straightforward as
want our tape to have a "middle", i.e. it has an element we're currently
looking at, and then we also have to put the elements we're not looking at
somewhere. There's going to be a "left of the middle" and "right of the middle"
part to that, and taken together that's our new type:
data Tape a = Tape [a] -- Left of the pivot element a -- Pivot element [a] -- Right of the pivot element
Now we've got our tape, but there's nothing we can do with it beyond making one. But wait, we'll need to do that anyway - the program starts with an empty tape, which has a pivot of zero, and infinite zeros in both directions:
emptyTape :: Tape Int emptyTape = Tape zeros 0 zeros where zeros = repeat 0
As a nice benefit for Haskell's laziness, you get infinite tape both to the left and right for free, and the compiler will worry about dealing with the details.
Alright, what else do we need? We want to move left and right on the tape
<> do). The
moveRight function does just that: it takes one
element from the right list and puts it in focus, and puts the previous pivot
on the left list.
moveLeft is the same but the other way round:
moveRight :: Tape a -> Tape a moveRight (Tape ls p (r:rs)) = Tape (p:ls) r rs moveLeft :: Tape a -> Tape a moveLeft (Tape (l:ls) p rs) = Tape ls l (p:rs)
So that's the tape all encoded in Haskell. You probably guessed already that
moveRight is related what
> does, but that's part of the next section.
In the finished interpreter, we'll have to
Tape objects: one for the data
tape (the one with the numbers on it), and one for the source code (because
when we encounter a
] we have to walk backwards in the source code). While
the data tape is infinite, the source tape is finite and starts out with an
empty "left" side.
(easy) I mentioned you don't need
Functorfor this tutorial, but in case you want to have a go at it: write a
Tape(don't forget to check the laws).
moveRightfunctions have a problem: for some valid (i.e. well-typed) tapes they misbehave. In particular consider that, as mentioned above, the instruction tape is finite. What happens when we reach the end and focus right again? What would be ways of fixing the problem?
(medium) Since the data tape is always infinite, a list is not absolutely the right type for it - it allows an empty list. A better representation for this would be a
Streamtype, which is identical to lists except that it has no "empty" element, i.e. all values are of infinite length. Implement that type, and a
repeatfunction for it analogous to
Data.List.repeat, that takes an element and computes that element infinitely repeated in a
(hard) Modify the
Tapetype so it takes the type of the container to use as a type argument, allowing you to create a tape with
Streamin them. For each one, write
moveRight. The list-based type will suffer from the issues raised in the exercise above, but what about the
Stream-based tape rotations?
Part 3: Evaluation
Alright, the tools are finished, time to get the actual evaluation to work!
Let's think about the type of the
runBrainfuck function we'd like to write.
It takes Brainfuck source which we've conveniently parsed to
in part 1, and all it will do is read single characters (when encountering a
, in the source) or print them (
.), which are IO operations. Therefore, the
type we're looking at is
BrainfuckSource -> IO ().
BrainfuckSource is a list, which we've declared unsuitable for
representing our data! What to do? Well, write the list onto a
runBrainfuck :: BrainfuckSource -> IO () runBrainfuck = run emptyTape . bfSource2Tape where bfSource2Tape (b:bs) = Tape  b bs -- (`run` is defined below)
We've already added the
run function, which evaluates one instruction, and
starts off with an empty tape. We'll now construct this function piece by
piece. First of all, the type of
run should be so that it takes the data
Tape and the instruction
Tape as arguments so it can work with them:
-- Interpret the command currently focussed on the instruction tape run :: Tape Int -- Data tape -> Tape BrainfuckCommand -- Instruction tape -> IO ()
Now let's evaluate our first instruction,
GoRight! What should it do to the
data tape? Well, nothing besides moving the pivot:
run dataTape source@(Tape _ GoRight _) = advance (moveRight dataTape) source run dataTape source@(Tape _ GoLeft _) = advance (moveLeft dataTape) source
< already handled: when encountered, the focus on the data
tape will move one cell to the right or left, respectively. Up next, we need to
move on on the source tape, because otherwise we'd be interpreting the same
instruction over and over. That's what
advance is for.
advance :: Tape Int -- Data tape -> Tape BrainfuckCommand -- Instruction tape -> IO () advance dataTape (Tape _ _ ) = return () advance dataTape source = run dataTape (moveRight source)
Note the first case, which is invoked when we run out of source code, i.e. reach the end of the program, in which case we just terminate instead of recursing on.
Now that that's covered, let's move on to the next two instructions, addition and subtraction:
run (Tape l p r) source@(Tape _ Increment _) = advance (Tape l (p+1) r) source run (Tape l p r) source@(Tape _ Decrement _) = advance (Tape l (p-1) r) source
Those were the two dead simple ones, now for the two IO operations.
read the value of the pivot and print its corresponding character; the latter
is done with
Data.Char.chr, which you will have to import manually. For
technical reasons you should also
import System.IO (hFlush, stdout), which we
will use to get around buffering issues (if you don't understand why this is
necessary: it's an IO thing to print characters as soon as we say it should,
just ignore the associated lines and you'll be fine).
run dataTape@(Tape _ p _) source@(Tape _ Print _) = do putChar (chr p) hFlush stdout advance dataTape source
And similarly we'll implement
chr which is inverse to
gives us an Int associated with a
run dataTape@(Tape l _ r) source@(Tape _ Read _) = do p <- getChar advance (Tape l (ord p) r) source
Now for the last part: the looping constructs. Those are slightly trickier
because we have to keep track of how many sub-loops we've encountered so we
find the right matching braces. With
seekLoopX still undefined, we can at
least write down how to react to
run dataTape@(Tape _ p _) source@(Tape _ LoopL _) -- If the pivot is zero, jump to the -- corresponding LoopR instruction | p == 0 = seekLoopR 0 dataTape source -- Otherwise just ignore the `[` and continue | otherwise = advance dataTape source run dataTape@(Tape _ p _) source@(Tape _ LoopR _) | p /= 0 = seekLoopL 0 dataTape source | otherwise = advance dataTape source
What's left now is how to encode the
seekLoopX functions. Conceptually, they
should move along the source tape until a matching brace is found, and then
just continue normal evaluation. The first parameter encodes the braces nesting
level we're at to find matching braces - if we find another two opening
after the first one we encounter, we'll have to move over another two
-- Move the instruction pointer left until a "[" is found. -- The first parameter ("b" for balance) retains the current -- bracket balance to find the matching partner. When b is 1, -- then the found LoopR would reduce the counter to zero, -- hence we break even and the search is successful. seekLoopR :: Int -- Parenthesis balance -> Tape Int -- Data tape -> Tape BrainfuckCommand -- Instruction tape -> IO () seekLoopR 1 dataTape source@(Tape _ LoopR _) = advance dataTape source seekLoopR b dataTape source@(Tape _ LoopR _) = seekLoopR (b-1) dataTape (moveRight source) seekLoopR b dataTape source@(Tape _ LoopL _) = seekLoopR (b+1) dataTape (moveRight source) seekLoopR b dataTape source = seekLoopR b dataTape (moveRight source) seekLoopL :: Int -- Parenthesis balance -> Tape Int -- Data tape -> Tape BrainfuckCommand -- Instruction tape -> IO () seekLoopL 1 dataTape source@(Tape _ LoopL _) = advance dataTape source seekLoopL b dataTape source@(Tape _ LoopL _) = seekLoopL (b-1) dataTape (moveLeft source) seekLoopL b dataTape source@(Tape _ LoopR _) = seekLoopL (b+1) dataTape (moveLeft source) seekLoopL b dataTape source = seekLoopL b dataTape (moveLeft source)
And finally we must not forget evaluating comments of course, but that one's trivial, because a comment just doesn't do anything:
run dataTape source@(Tape _ (Comment _) _) = advance dataTape source
And there you have it, a fully featured Brainfuck interpreter! To use it,
simply provide the source to
runBrainfuck . parseBrainfuck, e.g. by
specifying a source file with
readFile "filename.bf" >>= runBrainfuck . parseBrainfuck. Try
Hello World from Wikipedia!
bfSource2Tapefunction won't work when you give it the valid program represented by an empty string. What would be an easy way of fixing this? Hint:
Comments don't do anything and may save you from adding a
(easy) The call to
chrproduces an error when applied to a negative element, and you probably won't want to print character number 9001 in Brainfuck ever anyway. How could you modify the command to constrain the output to ASCII?
(medium) Refactor the code! To not drown the text in nifty details, the functions implemented in part 3 are very verbose. You can eliminate a lot of common cases among those functions. For example consider how a
[if the pivot is 0 is the same as a comment, or how similar all but the first case for
There are many ways in which this interpreter could be improved.
(medium) If you've done the exercises of parts 1 and 2, you can incorporate their results into the final code.
(medium) Instead of having
LoopRas primitives, you could replace them by a type
Loop BrainfuckSource, representing the whole loop and the body. Evaluating such a
Loopcould jump to the beginning of the contained code much easier: there's no need to walk around the source to find the matching brace anymore. This eliminates the need for the
seekLoopXfunctions and allows the source code to be stored in a normal list instead of using
Tape. Note that this also makes execution faster: now jumping back
ninstructions is O(1) instead of O(n)!
(open-ended) Various optimizations: You could combine multiple uses of
Incrementso that instread of adding 1 five times, you could just add a single 5; successive uses of
-cancel out, as do
<. And then there are higher-level optimizations as well of course, such as rewriting
[-]to "set to zero or error if cell content is negative".