Permalink
Browse files

updated README

  • Loading branch information...
1 parent 7aa6fa7 commit 0298c9bbf3405153628443d1cc23439ad6a7584c @samsonjs committed Jan 20, 2010
Showing with 71 additions and 182 deletions.
  1. +71 −182 README.md
View
253 README.md
@@ -2,7 +2,7 @@ sjs<br>
[sami.samhuri@gmail.com](mailto:sami.samhuri@gmail.com)<br>
published : 2009-09-22<br>
-updated : 2009-09-24
+updated : 2010-01-19
Overview
@@ -13,91 +13,81 @@ on the tutorial "Let's Build a Compiler" by Jack Crenshaw[1].
[1]: http://compilers.iecc.com/crenshaw/
-The semantics are simple and familiar to all programmers. Eager
+The semantics are simple and familiar to most programmers. Eager
evaluation, assignment with the equals sign, arithmetic using + - *
-and /, loops, if/else statement, etc. Integers are the only data type.
+and /, loops, if/else statement, etc. Integers are the only data
+type.
+While the parser still closely resembles Crenshaw's recursive descent
+parser, back-end generates x85 machine code using a homegrown
+assembler in ~1000 lines of Ruby (just 650 lines of real code).
NOTE: OS X is the only platform that compiles working binaries right
-now. ELF support for Linux coming soon.
+now. ELF support for Linux coming ... eventually.
-Compiling
-========
+Pre-requisites
+==============
OS X
----
-You need gcc, so install Xcode or use MacPorts to build gcc.
-
+You need Ruby and gcc. Ruby is standard on Macs but you'll need to
+install Xcode for gcc. You can also compile it yourself or use
+MacPorts, or [homebrew](http://github.com/mxcl/homebrew).
Linux
-----
-You need ruby and ld which lives in the binutils package.
+You need Ruby and ld - which lives in the binutils package.
% sudo aptitude install ruby binutils
-That's it! The assembler is included in ~900 lines of Ruby (including
-comments).
+That's it!
-You should be fine letting the build script detect your platform. If
-not append 'elf' or 'macho' to the command.
+Compiling
+=========
- % ./build.rb filename.code [elf | macho]
+The build script should detect your platform. If not append 'elf' or
+'macho' to the command.
-The resulting native executable will be called 'filename' and you
-should be able to run it directly.
+ % ./build.rb filename.code [elf | macho]
- % ./filename <return>
+The resulting native executable is called 'filename' and you should be
+able it run it directly.
+ % ./filename
Syntax in 2 minutes
===================
-The recursive descent parser starts by parsing a block of code. A
-block consists of zero or more statements. Whitespace is largely
-ignored beyond delimiting tokens so statements can be grouped on one
-line or spread out over multiple lines. With no explicit terminator
-this can look strange so we will see how it works out when the syntax
-evolves into something more complicated.
+The parser starts by parsing a block of code. A block consists of one
+or more statements. Whitespace is largely ignored beyond delimiting
+tokens, so statements can be grouped on one line or spread out over
+multiple lines. With no explicit terminator this can look strange so
+we will see how it works out when the syntax evolves into something
+more complicated.
-There are no functions or function calls, no closures, arrays, hashes,
-or anything else you can think of.
+There are variables and integers. That's honestly about it. There
+are no functions or function calls, no closures, arrays, hashes, or
+anything else.
Supported statements are:
- * assignment
- e.g. foo = 4096
-
- * if/else
- e.g. if x < 0 a=0 else a=1 end
-
- * while
- e.g. while x > 0 x=x-1 end
-
- * until
- e.g. until x == 0 x=x-1 end
-
- * break
- e.g. break
-
- * repeat
- e.g. repeat x=x-1 if x == 0 break end end
-
- * for
- e.g. for i=1 to 5 x=x+1 end
-
- * do
- e.g. do 5 x=x+1 end
-
- * print
- e.g. a=1 print
+ * assignment<br> e.g. foo = 4096
+ * if/else<br> e.g. if x < 0 a=0 else a=1 end
+ * while<br> e.g. while x > 0 x=x-1 end
+ * until<br> e.g. until x == 0 x=x-1 end
+ * break<br> e.g. break
+ * repeat<br> e.g. repeat x=x-1 if x == 0 break end end
+ * for<br> e.g. for i=1 to 5 x=x+1 end
+ * do<br> e.g. do 5 x=x+1 end
+ * print<br> e.g. a=1 print
Print is strange, it prints the last value calculated in hex and that
-is all. Please don't look at the implementation. ;-)
+is all.
Supported operations are the following, in increasing order of
precedence:
@@ -114,140 +104,39 @@ precedence:
Parentheses are used to force a specific order of evaluation.
-As far as booleans go 0 is false and everything else is true. Right
+As far as booleans go, 0 is false and everything else is true. Right
now there are only integers so this makes sense.
Internals
=========
-I wasn't satisfied using an external assembler and outputing assembly
-text so I wrote an x86 assembler in Ruby. It assembles just the
-instructions I need for this compiler, so it is by no means complete.
-32-bit only and no prefixes are supported. It's basically just a
-handful of instructions and mod-rm encoding. I use the system's
-linker and have no intention of writing my own, don't worry!
-
-The code currently consists of a recursive descent parser that outputs
-x86 code in ELF binaries on Linux and Mach-O binaries on Darwin.
-Most of the code for outputing executables is Ruby, but ELF support is
-still in C and not published in the repository. Classes to output
-Mach-O and ELF binaries are found in asm/(elf|macho)writer.rb, but ELF
-support is not implemented yet so binaries only compile and run on OS
-X right now. ELF should come soon as I now have lights in my den. :)
-
-Some major refactoring is needed as the project grew organically and
-in order to keep up with the tutorials I have not yet made radical
-changes. The asm/ directory holds the assembler but also the MachO
-and ELF code, for now. The assembler is a from-scratch implementation
-in Ruby. This is my first assembler and first time working with the
-x86 ISA, so it probably isn't great. It outputs horribly inefficient
-code and there are no optimizations.
-
-Hopefully I can reduce the number of lines by factoring more, but it's
-pretty slim at ~3000 LOC. About 2100 of those are actual code. I did
-not write this compiler with the intention of anyone else reading it
-but there are a reasonable amount of comments.
-
-<table>
- <tr>
- <th>Real Lines</th>
- <th>Total Lines</th>
- <th>Filename</th>
- </tr>
-
- <tr>
- <td>87</td>
- <td>112</td>
- <td>build.rb</td>
- </tr>
- <tr>
- <td>617</td>
- <td>891</td>
- <td>compiler.rb</td>
- </tr>
- <tr>
- <td>12</td>
- <td>29</td>
- <td>asm/asm.rb</td>
- </tr>
- <tr>
- <td>569</td>
- <td>843</td>
- <td>asm/binary.rb</td>
- </tr>
- <tr>
- <td>197</td>
- <td>319</td>
- <td>asm/cstruct.rb</td>
- </tr>
- <tr>
- <td>4</td>
- <td>6</td>
- <td>asm/elfsymtab.rb</td>
- </tr>
- <tr>
- <td>4</td>
- <td>8</td>
- <td>asm/elfwriter.rb</td>
- </tr>
- <tr>
- <td>170</td>
- <td>374</td>
- <td>asm/machofile.rb</td>
- </tr>
- <tr>
- <td>95</td>
- <td>163</td>
- <td>asm/macho.rb</td>
- </tr>
- <tr>
- <td>19</td>
- <td>28</td>
- <td>asm/machosym.rb</td>
- </tr>
- <tr>
- <td>48</td>
- <td>77</td>
- <td>asm/machosymtab.rb</td>
- </tr>
- <tr>
- <td>19</td>
- <td>25</td>
- <td>asm/machowriter.rb</td>
- </tr>
- <tr>
- <td>16</td>
- <td>25</td>
- <td>asm/objwriter.rb</td>
- </tr>
- <tr>
- <td>20</td>
- <td>31</td>
- <td>asm/registers.rb</td>
- </tr>
- <tr>
- <td>42</td>
- <td>66</td>
- <td>asm/regproxy.rb</td>
- </tr>
- <tr>
- <td>56</td>
- <td>89</td>
- <td>asm/symtab.rb</td>
- </tr>
- <tr>
- <td>131</td>
- <td>183</td>
- <td>asm/text.rb</td>
- </tr>
- <tr>
- <td>2097</td>
- <td>3269</td>
- <td><b>total</b></td>
- </tr>
-</table>
-
-
-Happy hacking!
+It wasn't much fun generating assembly text, so I wrote an x86
+assembler library in Ruby. It implements just the instructions needed
+for this compiler and is by no means complete. It only does 32-bit
+and no prefixes are supported. It's basically just a handful of
+instructions and mod-rm encoding. I use the system's linker and have
+no intention of writing my own, don't worry!
+
+ELF support is still in C and not published in the repository. The
+class to output Mach-O binaries is found in asm/machofile.rb.
+
+The asm/ directory holds the assembler but also the Mach-O code, for
+now. This is my first assembler and first time working with the x86
+ISA, so it probably isn't great. It outputs horribly inefficient code
+and there are no optimizations.
+
+I did not write this compiler with the intention of anyone else
+reading it but there are a reasonable amount of comments.
+
+
+What next?
+==========
+
+Whatever interests me really, I don't know yet.. Right now I need to
+clean up some of the code, now that object files of any size can be
+generated and tests pass again.
+
+
+Happy hacking!<br>
-sjs

0 comments on commit 0298c9b

Please sign in to comment.