# Assembler

Open the .asm file to be read and save each line as an element in a list.

In [1]:
fin = open("Desktop/nand2tetris/projects/06/pong/Pong.asm")
asm = fin.readlines()
fin.close()

If the line is long enough (not an empty line) and if the whole line is not a comment, add the line to a separate list with the white space stripped off.

In [2]:
stripped = []
for line in asm:
    if len(line) > 2 and line[:2] != "//":
        stripped.append(line.strip())

Create the symbols table as a dictionary with the pre-defined symbols.

In [3]:
symbols = {"SP": 0, "LCL": 1, "ARG": 2, "THIS": 3, "THAT": 4, "R0": 0, "R1": 1, "R2": 2, "R3": 3, "R4": 4, "R5": 5, "R6": 6, "R7": 7, "R8": 8, "R9": 9, "R10": 10, "R11": 11, "R12": 12, "R13": 13, "R14": 14, "R15": 15, "SCREEN": 16384, "KBD": 24576}

If there is a comment ("//") later in a line, we wish to omit everything in the line that comes after the comment. We would also like to add all of the labels (first character "(", last character ")") and include these in the symbol table with the corresponding line number. Once the label lines are added to the symbols dictionary, we wish to delete these lines from our list.

It seems that the following loop has trouble executing sometimes. It will at first produce an IndexError ("list index out of range" on line 2), but after running it a few more times (without changing anything), it seems to work alright. The cause of this oddity is unknown.

In [7]:
for i in range(len(stripped)):
    commentindex = stripped[i].find("//")
    if commentindex != -1:
        stripped[i] = stripped[i][:commentindex]
    if stripped[i][0] == "(" and stripped[i][-1] == ")":
        symbols[stripped[i][1:-1]] = i
        stripped.pop(i)

We classify each line as either an A-instruction (if the first character of the line is "@") or a C-instruction (all other lines). Each line then goes into the appropriate dictionary. Because of how comments that occured later in lines were removed, we need to strip white space again.

In [8]:
ainstr = {}
cinstr = {}
for i in range(len(stripped)):
    stripped[i] = stripped[i].strip()
    if stripped[i][0] == "@":
        ainstr[i] = stripped[i]
    elif stripped[i][0] != "@":
        cinstr[i] = stripped[i]

Create dictionaries for comp, dest, and jump based on the given tables.

In [9]:
comp = {"0": "0101010", "1": "0111111", "-1": "0111010", "D": "0001100", "A": "0110000", "!D": "0001101", "!A": "0110001", "-D": "0001111", "-A": "0110011", "D+1":"0011111", "A+1": "0110111", "D-1": "0001110", "A-1": "0110010", "D+A": "0000010", "D-A": "0010011", "A-D": "0000111", "D&A": "0000000", "D|A": "0010101", "M": "1110000", "!M": "1110001", "-M": "1110011", "M+1": "1110111", "M-1": "1110010", "D+M": "1000010", "D-M": "1010011", "M-D": "1000111", "D&M": "1000000", "D|M": "1010101"}
dest = {"null": "000", "M": "001", "D": "010", "MD": "011", "A": "100", "AM": "101", "AD": "110", "AMD": "111"}
jump = {"null": "000", "JGT": "001", "JEQ": "010", "JGE": "011", "JLT": "100", "JNE": "101", "JLE": "110", "JMP": "111"}

First, we create a list of tuples of the format (*line number*, *line of machine language*) so that we can maintain a specific order of lines.

If a line is an A-instruction, we check to see if it exists as a key in the symbols table. If so, we convert the value in the dictionary to binary and append this converted line to our list of tuples, along with its corresponding line number. Otherwise, if the line is an A-instruction but only an integer, we simply convert this integer to binary and proceed similarly.

If a line is a C-instruction, we distinguish whether it has an equals sign or a semicolon. If it has an equals sign, then we consider the jump field to be "null". Likewise, if it has a semicolon, then we consider the dest field to be "null". We then string together the components of the C-instruction in the appropriate order to produce the correct line of machine language, which is then appended to the list of tuples.

Method for converting integer into binary with leading zeroes ("{0:016b}") taken from Stack Overflow solution here: https://stackoverflow.com/questions/10411085/converting-integer-to-binary-in-python

In [10]:
hack = []
for i in range(len(stripped)):
    if i in ainstr.keys():
        if stripped[i][1:] in symbols.keys():
            hack.append((i, "{0:016b}".format(symbols[stripped[i][1:]])))
        elif stripped[i][1:].isdigit():
            hack.append((i, "{0:016b}".format(int(stripped[i][1:]))))
    elif i in cinstr.keys():
        eqind = stripped[i].find("=")
        scind = stripped[i].find(";")
        if eqind != -1 and scind == -1:
            hack.append((i, "111" + comp[stripped[i][eqind + 1:]] + dest[stripped[i][:eqind]] + jump["null"]))
        if scind != -1 and eqind == -1:
            hack.append((i, "111" + comp[stripped[i][:scind]] + dest["null"] + jump[stripped[i][scind + 1:]]))

Now that we have a list of each line of machine language in order, we want to create a file with only the lines of machine language (without the line numbers), so we iterate through the list of tuples and write only the second element of each tuple to a file, along with a newline character after each line.

Refresher for writing files found here: https://learnpythonthehardway.org/book/ex16.html

In [13]:
target = open("test.hack", "w")
target.truncate()
for line in hack:
    target.write(str(line[0]) + " " + line[1] + "\n")
target.close()