# System Programming - CS

# * SIC/XE Assembler – Python Implementation

This project is a **Python-based assembler** for the **SIC/XE (Simplified Instructional Computer – Extended)** architecture. It processes assembly source code and generates object code in HTE (Header, Text, End) format, making it suitable for educational use, compiler labs, or system-level tool development.


## 📚 Table of Contents

- [🔁 pass1 – Build Symbol Table](#pass1)
- [🔄 pass2 – Generate Object Code](#pass2)
- [🧾 HTE_Record – Output HTE Format](#hte)

---
## 🔑 Key Functions

### 🔁 `pass1()`
Builds the **symbol table** and assigns **memory locations** to each line in the source program.

### 🔄 `pass2()`
Generates **object codes** for each instruction using proper **format** and **addressing mode** (immediate, indirect, PC-relative, base-relative, etc.).

### 🧾 `HTE_Record()`
Produces the **HTE format output**, including:
- `H` (Header) record with program name, starting address, and length.
- `T` (Text) records split into 30-byte segments.
- `M` (Modification) records for format 4 instructions.
- `E` (End) record with program starting address.
---

## * Features

- **Two-pass assembler** architecture.
- Supports:
  - Instruction formats: `Format 1`, `Format 2`, `Format 3`, and `Format 4`.
  - Addressing modes: **Immediate (`#`)**, **Indirect (`@`)**, **Indexed (`,X`)**, **PC-relative**, and **Base-relative**.
- Uses a `BASE` directive for base-relative addressing.
- Supports both numeric literals and symbolic labels.


---

## * Output Example

H^XXCOPY^000000^001077
T^000000^1E^141033^681033^...
M^001006^05
M^001014^05
E^000000

yaml
Copy
Edit

---

## * Purpose

This assembler demonstrates the internal steps of converting assembly language into object code, mimicking real-world assembler behavior for SIC/XE systems. It's built for **learning**, **experimentation**, and **modular enhancement**.

---

## * Educational Use

You can use this project to:

- Understand how assemblers resolve symbols and addresses.
- Practice implementing instruction decoding and encoding.
- Generate and verify machine-level object code.
- Explore base-relative and PC-relative addressing computations.

---

## * Quick Start

Ensure you have:

- A file `Input.xlsx` containing the source program.
- A file `OP_Code_ref.xlsx` containing opcode reference data.

Run the notebook or script to:

```bash
python SICXE.py





In [1]:
import pandas as pd
import numpy as np
import math

# ---------------------------------------------
# Load input files
# ---------------------------------------------
opCode_ref = pd.read_excel("OP_Code_ref.xlsx", index_col=False).replace(np.nan, "", regex=False)
file = pd.read_excel("Input.xlsx", index_col=False).replace(np.nan, "", regex=False)
file.rename(columns={'Opcode': 'Instruction', 'Operand': 'Reference'}, inplace=True)


In [2]:
print(opCode_ref.to_string(index=False))

   Mnemonic Opcode Format
      ADD m     18  "3/4"
     ADDF m     58  "3/4"
 ADDR r1,r2     90    "2"
      AND m     40  "3/4"
   CLEAR r1     B4    "2"
     COMP m     28  "3/4"
    COMPF m     88  "3/4"
COMPR r1,r2     A0    "2"
      DIV m     24  "3/4"
     DIVF m     64  "3/4"
 DIVR r1,r2     9C    "2"
        FIX     C4    "1"
      FLOAT     C0    "1"
        HIO     F4    "1"
         Jm     3C  "3/4"
      JEQ m     30  "3/4"
      JGT m     34  "3/4"
      JLT m     38  "3/4"
     JSUB m     48  "3/4"
      LDA m      0  "3/4"
      LDB m     68  "3/4"
     LDCH m     50  "3/4"
      LDF m     70  "3/4"
      LDL m      8  "3/4"
      LDS m     6C  "3/4"
      LDT m     74  "3/4"
      LDX m      4  "3/4"
      LPS m     D0  "3/4"
      MUL m     20  "3/4"
      SSK m     EC  "3/4"
      STA m     0C  "3/4"
      STB m     78  "3/4"
     STCH m     54  "3/4"
      STF m     80  "3/4"
      STI m     D4  "3/4"
      STL m     14  "3/4"
      STS m     7C  "3/4"
     STSW m 

In [3]:
print(file.to_string(index=False))


 Label Instruction Reference
  COPY       START         0
 FIRST         STL    RETADR
               LDB   #LENGTH
              BASE    LENGTH
 CLOOP       +JSUB     RDREC
               LDA    LENGTH
              COMP        #0
               JEQ    ENDFIL
             +JSUB     WRREC
                 J     CLOOP
ENDFIL         LDA       EOF
               STA    BUFFER
               LDA        #3
               STA    LENGTH
             +JSUB     WRREC
                 J   @RETADR
   EOF        BYTE    C'EOF'
RETADR        RESW         1
LENGTH        RESW         1
BUFFER        RESB      4096
 RDREC       CLEAR         X
             CLEAR         A
             CLEAR         S
              +LDT     #4096
 RLOOP          TD     INPUT
               JEQ     RLOOP
                RD     INPUT
             COMPR       A,S
               JEQ      EXIT
              STCH  BUFFER,X
              TIXR         T
               JLT     RLOOP
  EXIT         STX    LENGTH
              

 ### This function takes an instruction (e.g. 'STL') and looks it up in the opCode_ref table.
 ### It returns the 'Format' of the first row where the 'Mnemonic' column starts with the instruction.


In [4]:
# This function takes an instruction (e.g. 'STL') and looks it up in the opCode_ref table.
# It returns the 'Format' of the first row where the 'Mnemonic' column starts with the instruction.
# For example, if the instruction is 'STL' and the Mnemonic is 'STL m', it will match and return the corresponding Format.

# def lookup_format(inst):
#     return opCode_ref[opCode_ref['Mnemonic'].str.startswith(inst)]['Format'].iloc[0] if \
#            any(opCode_ref['Mnemonic'].str.startswith(inst)) else None

# file['Format'] = file['Instruction'].apply(lookup_format)

def lookup_format(inst):
    mask = opCode_ref['Mnemonic'].str.startswith(inst)
    if mask.any():
        return opCode_ref.loc[mask, 'Format'].iloc[0]
    return None

file['Format'] = file['Instruction'].apply(lookup_format) #calling function 




In [5]:
#file['Instruction'].str.startswith('+') returns a boolean Series (True/False for each row)

#file.loc[condition, 'Format'] = '4' sets the 'Format' column to '4' only where the condition is True



file.loc[file['Instruction'].str.startswith('+'), 'Format'] = '"4"'
file.loc[file['Instruction'].isin(["START", "END", "RESW", "RESB", "WORD", "BYTE", "BASE"]), 'Format'] = file['Instruction']
file['Format'] = file['Format'].replace('"', '', regex=True)

### 🔁 pass1 – Build Symbol Table <a id='pass1'></a>



In [6]:
# The pass1() function simulates the first pass of an assembler by calculating memory locations 
# for each instruction or directive in the assembly code. It initializes a location counter, 
# iterates over each row of the input DataFrame 'file', and updates the counter based on 
# instruction format or memory reservation directives such as RESW, RESB, WORD, and BYTE. 
# The resulting hexadecimal addresses are stored in a new 'Location' column to be used in 
# the second pass of the assembler.


def pass1() :
    
    location_counter = 0
    locations = []

    for index, row in file.iterrows():
        locations.append(f"{location_counter:04X}")

        fmt = str(row['Format']).strip()

        if fmt == '1':
            location_counter += 1
        elif fmt == '2':
            location_counter += 2
        elif fmt == '3/4':
            location_counter += 3
        elif fmt == '4':
            location_counter += 4
        elif row['Instruction'] == 'RESW':
            location_counter += 3 * int(row['Reference'])
        elif row['Instruction'] == 'RESB':
            location_counter += int(row['Reference'])
        elif row['Instruction'] == 'WORD':
            location_counter += 3
        elif row['Instruction'] == 'BYTE':
            ref = row['Reference']
            if ref.startswith(('C', 'c')):  
                location_counter += len(ref) - 3  
            else:  
                location_counter += math.ceil((len(ref) - 3) / 2)

    file['Location'] = locations
    return file
pass1()

Unnamed: 0,Label,Instruction,Reference,Format,Location
0,COPY,START,0,START,0000
1,FIRST,STL,RETADR,3/4,0000
2,,LDB,#LENGTH,3/4,0003
3,,BASE,LENGTH,BASE,0006
4,CLOOP,+JSUB,RDREC,4,0006
5,,LDA,LENGTH,3/4,000A
6,,COMP,#0,3/4,000D
7,,JEQ,ENDFIL,3/4,0010
8,,+JSUB,WRREC,4,0013
9,,J,CLOOP,3/4,0017


In [7]:
# The Symbol_Table() function builds a symbol table (SymTable) by scanning the 'Label' column 
# of each row in the assembly source code stored in the DataFrame 'file'. If a valid label 
# is found (i.e., not empty and not "NONE"), it is added to the SymTable dictionary with 
# its corresponding address from the 'Location' column. After building the table, the function 
# prints out all label-location pairs, which will be used later for symbol resolution 
# during the second pass of the assembler.



SymTable = {}

def Symbol_Table():
    for index, row in file.iterrows():
        label = str(row['Label']).strip()
        if label and label.upper() != "NONE": 
            SymTable[label] = row['Location']

    for label, location in SymTable.items():
        print(label, "\t", location)

    
    
    
Symbol_Table()

COPY 	 0000
FIRST 	 0000
CLOOP 	 0006
ENDFIL 	 001A
EOF 	 002D
RETADR 	 0030
LENGTH 	 0033
BUFFER 	 0036
RDREC 	 1036
RLOOP 	 1040
EXIT 	 1056
INPUT 	 105C
WRREC 	 105D
WLOOP 	 1062
OUTPUT 	 1076


In [8]:

# The lookup_format() function assigns machine opcodes to instructions by applying the nested 
# generate_op_code() function to each row of the assembly source code in the DataFrame 'file'. 
# It skips pseudo-instructions like WORD, RESW, BYTE, etc., which do not require object code. 
# For format 4 instructions (marked with '+'), the '+' is removed before lookup. Then it searches 
# the opCode_ref table using the instruction mnemonic and retrieves the corresponding opcode. 
# The opcode is converted to a two-digit uppercase hexadecimal string and stored in a new 
# 'op_Code' column in the file. This prepares the instruction set for object code generation.


# def lookup_format():
#     def generate_op_code(row):
#         inst = row['Instruction']
#         fmt = str(row['Format']).upper()

#         # handle no object code cases
#         special_formats = ['WORD', 'RESW', 'BYTE', 'RESB', 'END', 'START', 'BASE']
#         if fmt in special_formats:
#             return None

#         # remove "+"
#         if inst.startswith('+'):
#             inst = inst[1:]

#         # lookup opcode
#         mask = opCode_ref['Mnemonic'].str.startswith(inst)
#         if mask.any():
#             opcode = opCode_ref.loc[mask, 'Opcode'].iloc[0]
#             return f"{int(str(opcode), 16):02X}"

#         return None

#     file['op_Code'] = file.apply(generate_op_code, axis=1)

    
def generate_op_code(row):
    inst = row['Instruction']
    fmt = str(row['Format']).upper()

    # handle no object code cases
    special_formats = ['WORD', 'RESW', 'BYTE', 'RESB', 'END', 'START', 'BASE']
    if fmt in special_formats:
        return None

    # remove "+"
    if inst.startswith('+'):
        inst = inst[1:]

    # lookup opcode
    mask = opCode_ref['Mnemonic'].str.startswith(inst)
    if mask.any():
        opcode = opCode_ref.loc[mask, 'Opcode'].iloc[0]
        return f"{int(str(opcode), 16):02X}"

    return None


def lookup_format():
    file['op_Code'] = file.apply(generate_op_code, axis=1)



lookup_format()

In [9]:
file

Unnamed: 0,Label,Instruction,Reference,Format,Location,op_Code
0,COPY,START,0,START,0000,
1,FIRST,STL,RETADR,3/4,0000,14
2,,LDB,#LENGTH,3/4,0003,68
3,,BASE,LENGTH,BASE,0006,
4,CLOOP,+JSUB,RDREC,4,0006,48
5,,LDA,LENGTH,3/4,000A,00
6,,COMP,#0,3/4,000D,28
7,,JEQ,ENDFIL,3/4,0010,30
8,,+JSUB,WRREC,4,0013,48
9,,J,CLOOP,3/4,0017,3C


In [10]:

# The format1() function generates object code for Format 1 instructions in the assembly source. 
# It iterates through each row in the DataFrame 'file', and for rows where the instruction format 
# is '1', it assigns the corresponding opcode (from the 'op_Code' column) directly to the 
# 'Object_Code' column. Since Format 1 instructions are 1 byte long and contain only the opcode, 
# no further processing is needed.



file['Object_Code']=''

def format1 ():
    for index , row in file.iterrows():
        if row['Format']=='1':
            row['Object_Code']= row['op_Code']
            
    


In [11]:

# The get_register_code() function returns the numeric code for a given register name 
# based on a predefined register table. It supports common SIC/XE registers (A, X, L, B, S, T, F), 
# returning -1 for unrecognized names.

# The format2() function generates object code for Format 2 instructions, which include one or two 
# register operands. It loops through the DataFrame 'file' and, for each Format 2 instruction, 
# parses the 'Reference' field to extract register codes using get_register_code(). The resulting 
# object code is formed by concatenating the opcode with the appropriate register codes, and 
# stored in the 'Object_Code' column. If the instruction is not Format 2, it appends an empty string.



def get_register_code(register_name):
    registers = {
        'A': 0,
        'X': 1,
        'L': 2,
        'B': 3,
        'S': 4,
        'T': 5,
        'F': 6
    }
    return registers.get(register_name.upper(), -1)



    
def format2():
    results = []
    for index, row in file.iterrows():
        if row['Format'] == '2':
            ref = row['Reference']
            op = row['op_Code']
            
            if len(ref) == 1:
                x = get_register_code(ref[0])
                result = op + str(x)+"0"
            else:
                x1 = get_register_code(ref[0])  #first letter
                x2 = get_register_code(ref[-1]) #last letter
                result = op + str(x1) + str(x2)
            
            results.append(result)
        else:
            results.append('')
    
    file['Object_Code'] = results
#     file.at[index, 'Object_Code']=results

            
    
                
                
                
format1 ()
format2()            

In [12]:

file

Unnamed: 0,Label,Instruction,Reference,Format,Location,op_Code,Object_Code
0,COPY,START,0,START,0000,,
1,FIRST,STL,RETADR,3/4,0000,14,
2,,LDB,#LENGTH,3/4,0003,68,
3,,BASE,LENGTH,BASE,0006,,
4,CLOOP,+JSUB,RDREC,4,0006,48,
5,,LDA,LENGTH,3/4,000A,00,
6,,COMP,#0,3/4,000D,28,
7,,JEQ,ENDFIL,3/4,0010,30,
8,,+JSUB,WRREC,4,0013,48,
9,,J,CLOOP,3/4,0017,3C,


In [13]:
# This function assigns "No_Object_Code" to directives that do not produce object code,
# such as RESB, RESW, START, END, and BASE.


def directiv():
    for index ,row in file.iterrows():
        # If the row contains a directive that doesn't generate object code
        if row['Format'] in ['RESB' , 'RESW' ,'START' ,'END' ,'BASE']:
#             row['Object_Code'] = "No_Object_Code"
            file.at[index, 'Object_Code'] = "No_Object_Code"
directiv()

In [14]:

file

Unnamed: 0,Label,Instruction,Reference,Format,Location,op_Code,Object_Code
0,COPY,START,0,START,0000,,No_Object_Code
1,FIRST,STL,RETADR,3/4,0000,14,
2,,LDB,#LENGTH,3/4,0003,68,
3,,BASE,LENGTH,BASE,0006,,No_Object_Code
4,CLOOP,+JSUB,RDREC,4,0006,48,
5,,LDA,LENGTH,3/4,000A,00,
6,,COMP,#0,3/4,000D,28,
7,,JEQ,ENDFIL,3/4,0010,30,
8,,+JSUB,WRREC,4,0013,48,
9,,J,CLOOP,3/4,0017,3C,


In [15]:
# This function processes BYTE directives in the DataFrame.
# It checks if the 'Format' is 'BYTE' and extracts the hexadecimal value 
# from the 'Reference' field (e.g., X'F1' → F1), then stores it as the Object_Code.



def byte():
    for index, row in file.iterrows():
        if row['Format'] == 'BYTE':
            x = row['Reference'] 
            if x[0] == 'x' or x[0] == 'X':  
                y = x[2:-1]  
                file.at[index, 'Object_Code'] = y  # نحدث القيمة داخل الداتا فريم

byte()        

In [16]:

file

Unnamed: 0,Label,Instruction,Reference,Format,Location,op_Code,Object_Code
0,COPY,START,0,START,0000,,No_Object_Code
1,FIRST,STL,RETADR,3/4,0000,14,
2,,LDB,#LENGTH,3/4,0003,68,
3,,BASE,LENGTH,BASE,0006,,No_Object_Code
4,CLOOP,+JSUB,RDREC,4,0006,48,
5,,LDA,LENGTH,3/4,000A,00,
6,,COMP,#0,3/4,000D,28,
7,,JEQ,ENDFIL,3/4,0010,30,
8,,+JSUB,WRREC,4,0013,48,
9,,J,CLOOP,3/4,0017,3C,


In [17]:
# This function takes a single character as input and returns its corresponding
# ASCII hexadecimal code in string format using a predefined dictionary `ascii_hex`.
# If the character is not found in the dictionary, it returns '00' as a default value.




def ascii_code (code):
    
    ascii_hex = {
        'A': '41', 'B': '42', 'C': '43', 'D': '44', 'E': '45',
        'F': '46', 'G': '47', 'H': '48', 'I': '49', 'J': '4A',
        'K': '4B', 'L': '4C', 'M': '4D', 'N': '4E', 'O': '4F',
        'P': '50', 'Q': '51', 'R': '52', 'S': '53', 'T': '54',
        'U': '55', 'V': '56', 'W': '57', 'X': '58', 'Y': '59',
        'Z': '5A',

        'a': '61', 'b': '62', 'c': '63', 'd': '64', 'e': '65',
        'f': '66', 'g': '67', 'h': '68', 'i': '69', 'j': '6A',
        'k': '6B', 'l': '6C', 'm': '6D', 'n': '6E', 'o': '6F',
        'p': '70', 'q': '71', 'r': '72', 's': '73', 't': '74',
        'u': '75', 'v': '76', 'w': '77', 'x': '78', 'y': '79',
        'z': '7A'
    }
     
    return ascii_hex.get(code , '00')
   


In [18]:
ascii_code('A')

'41'

In [19]:



# This function processes rows in the DataFrame where the instruction format is 'BYTE' 
# and the operand (Reference) begins with 'C' or 'c', indicating a character constant. 
# It extracts the characters between the quotation marks, converts each character 
# to its ASCII hexadecimal representation using the ascii_code() function, 
# then concatenates the hex values into a single string. 
# The resulting object code is stored in the 'Object_Code' column of the DataFrame.

def byteC():
    for index, row in file.iterrows():
        if row['Format'] == 'BYTE':
            x = row['Reference'] 
            if x[0] == 'C' or x[0] == 'c':  
                chars = x[2:-1]
                hex_string = ''.join((ascii_code(c)) for c in chars)
                file.at[index, 'Object_Code'] = hex_string  # نحدث القيمة داخل الداتا فريم

byteC()

In [20]:

file

Unnamed: 0,Label,Instruction,Reference,Format,Location,op_Code,Object_Code
0,COPY,START,0,START,0000,,No_Object_Code
1,FIRST,STL,RETADR,3/4,0000,14,
2,,LDB,#LENGTH,3/4,0003,68,
3,,BASE,LENGTH,BASE,0006,,No_Object_Code
4,CLOOP,+JSUB,RDREC,4,0006,48,
5,,LDA,LENGTH,3/4,000A,00,
6,,COMP,#0,3/4,000D,28,
7,,JEQ,ENDFIL,3/4,0010,30,
8,,+JSUB,WRREC,4,0013,48,
9,,J,CLOOP,3/4,0017,3C,


In [21]:
# This function handles instructions with the 'WORD' format by converting the value 
# in the 'Reference' column into a 6-digit uppercase hexadecimal string. 
# It ensures the resulting string is zero-padded to maintain a consistent length 
# and stores the result in the 'Object_Code' column of the DataFrame. 

def word () :
    for index, row in file.iterrows():
        if row['Format'] == 'WORD':
            file.at[index, 'Object_Code'] = format(int(row['Reference']), '06X')
word()

In [22]:
file['N']=""
file['I']=""

In [23]:
def ni_flags() :
    for index , row in file.iterrows():
        if row['Format'] == '3/4' or  row['Format'] == '4' :
            if row['Reference'].startswith('#') :
                file.at[index, 'N'] = 0
                file.at[index, 'I'] = 1
            elif row['Reference'].startswith('@'):
                file.at[index, 'N'] = 1
                file.at[index, 'I'] = 0
                
            else :
                file.at[index, 'I'] = 1
                file.at[index, 'N'] = 1
ni_flags()
               
    

In [24]:

file

Unnamed: 0,Label,Instruction,Reference,Format,Location,op_Code,Object_Code,N,I
0,COPY,START,0,START,0000,,No_Object_Code,,
1,FIRST,STL,RETADR,3/4,0000,14,,1.0,1.0
2,,LDB,#LENGTH,3/4,0003,68,,0.0,1.0
3,,BASE,LENGTH,BASE,0006,,No_Object_Code,,
4,CLOOP,+JSUB,RDREC,4,0006,48,,1.0,1.0
5,,LDA,LENGTH,3/4,000A,00,,1.0,1.0
6,,COMP,#0,3/4,000D,28,,0.0,1.0
7,,JEQ,ENDFIL,3/4,0010,30,,1.0,1.0
8,,+JSUB,WRREC,4,0013,48,,1.0,1.0
9,,J,CLOOP,3/4,0017,3C,,1.0,1.0


In [25]:
#adding indexing colimn

file['X']=""
file
    

Unnamed: 0,Label,Instruction,Reference,Format,Location,op_Code,Object_Code,N,I,X
0,COPY,START,0,START,0000,,No_Object_Code,,,
1,FIRST,STL,RETADR,3/4,0000,14,,1.0,1.0,
2,,LDB,#LENGTH,3/4,0003,68,,0.0,1.0,
3,,BASE,LENGTH,BASE,0006,,No_Object_Code,,,
4,CLOOP,+JSUB,RDREC,4,0006,48,,1.0,1.0,
5,,LDA,LENGTH,3/4,000A,00,,1.0,1.0,
6,,COMP,#0,3/4,000D,28,,0.0,1.0,
7,,JEQ,ENDFIL,3/4,0010,30,,1.0,1.0,
8,,+JSUB,WRREC,4,0013,48,,1.0,1.0,
9,,J,CLOOP,3/4,0017,3C,,1.0,1.0,


In [26]:
def adding_x() :
    for index , row in file.iterrows():
        if row['Format'] == '3/4' or  row['Format'] == '4' :
            if row['Reference'].endswith(',x') or row['Reference'].endswith(',X'):
                file.at[index, 'X'] = 1
            else :
                file.at[index, 'X'] = 0
    
                
adding_x()                

In [27]:

file

Unnamed: 0,Label,Instruction,Reference,Format,Location,op_Code,Object_Code,N,I,X
0,COPY,START,0,START,0000,,No_Object_Code,,,
1,FIRST,STL,RETADR,3/4,0000,14,,1.0,1.0,0.0
2,,LDB,#LENGTH,3/4,0003,68,,0.0,1.0,0.0
3,,BASE,LENGTH,BASE,0006,,No_Object_Code,,,
4,CLOOP,+JSUB,RDREC,4,0006,48,,1.0,1.0,0.0
5,,LDA,LENGTH,3/4,000A,00,,1.0,1.0,0.0
6,,COMP,#0,3/4,000D,28,,0.0,1.0,0.0
7,,JEQ,ENDFIL,3/4,0010,30,,1.0,1.0,0.0
8,,+JSUB,WRREC,4,0013,48,,1.0,1.0,0.0
9,,J,CLOOP,3/4,0017,3C,,1.0,1.0,0.0


In [28]:
file['B']=""
file['P']=""
file['E']=""

In [29]:
def bp4 () :
        for index , row in file.iterrows():
            
            if row['Format'] == '4' :
                
                file.at[index, 'B'] = 0
                file.at[index, 'P'] = 0
                file.at[index, 'E'] = 1
            if row['Format'] == '3/4' :
                file.at[index, 'E'] = 0
                
            
             
bp4 ()   

In [30]:

file

Unnamed: 0,Label,Instruction,Reference,Format,Location,op_Code,Object_Code,N,I,X,B,P,E
0,COPY,START,0,START,0000,,No_Object_Code,,,,,,
1,FIRST,STL,RETADR,3/4,0000,14,,1.0,1.0,0.0,,,0.0
2,,LDB,#LENGTH,3/4,0003,68,,0.0,1.0,0.0,,,0.0
3,,BASE,LENGTH,BASE,0006,,No_Object_Code,,,,,,
4,CLOOP,+JSUB,RDREC,4,0006,48,,1.0,1.0,0.0,0.0,0.0,1.0
5,,LDA,LENGTH,3/4,000A,00,,1.0,1.0,0.0,,,0.0
6,,COMP,#0,3/4,000D,28,,0.0,1.0,0.0,,,0.0
7,,JEQ,ENDFIL,3/4,0010,30,,1.0,1.0,0.0,,,0.0
8,,+JSUB,WRREC,4,0013,48,,1.0,1.0,0.0,0.0,0.0,1.0
9,,J,CLOOP,3/4,0017,3C,,1.0,1.0,0.0,,,0.0


In [31]:

#this function to serve address4 function , get address of ref from symbol table

def get_address_from_symbol_table(ref):
    addr = SymTable.get(ref)
    if addr is None:
        return None
    return addr.upper().zfill(5)



In [32]:
file['address/disp']= ""

In [33]:

def address4():
    for index, row in file.iterrows():
        if str(row['Format']) == '4':
            ref_value = str(row['Reference']).strip()

            # remove #and @
            if ref_value.startswith("#") or ref_value.startswith("@"):
                clean_ref = ref_value[1:]
            else:
                clean_ref = ref_value

            address = get_address_from_symbol_table(clean_ref)

            if address is None:
                if clean_ref.isdigit():
                    address = clean_ref.zfill(5)
                else:
                    address = "?????"

            file.at[index, 'address/disp'] = address

                        
    
address4 () 

In [34]:


file


Unnamed: 0,Label,Instruction,Reference,Format,Location,op_Code,Object_Code,N,I,X,B,P,E,address/disp
0,COPY,START,0,START,0000,,No_Object_Code,,,,,,,
1,FIRST,STL,RETADR,3/4,0000,14,,1.0,1.0,0.0,,,0.0,
2,,LDB,#LENGTH,3/4,0003,68,,0.0,1.0,0.0,,,0.0,
3,,BASE,LENGTH,BASE,0006,,No_Object_Code,,,,,,,
4,CLOOP,+JSUB,RDREC,4,0006,48,,1.0,1.0,0.0,0.0,0.0,1.0,01036
5,,LDA,LENGTH,3/4,000A,00,,1.0,1.0,0.0,,,0.0,
6,,COMP,#0,3/4,000D,28,,0.0,1.0,0.0,,,0.0,
7,,JEQ,ENDFIL,3/4,0010,30,,1.0,1.0,0.0,,,0.0,
8,,+JSUB,WRREC,4,0013,48,,1.0,1.0,0.0,0.0,0.0,1.0,0105D
9,,J,CLOOP,3/4,0017,3C,,1.0,1.0,0.0,,,0.0,


In [35]:
base_address = None  # Global variable to store BASE address as hex string

def find_base_address():
    global base_address
    for i in range(len(file)):
        row = file.loc[i]
        if str(row['Instruction']).strip().upper() == 'BASE':
            reference = str(row['Reference']).strip()
            if reference in SymTable:
                base = int(SymTable[reference], 16)
                base_address = format(base, '04X')  # store as uppercase hex string
            break  # only first BASE encountered
find_base_address()

In [36]:
base_address

'0033'

In [37]:

# This function calculates the 12-bit address or displacement field (address/disp)
# for all Format 3/4 instructions in the program, including special cases like RSUB,
# direct, immediate, indirect, and indexed addressing.
# It determines whether to use PC-relative or Base-relative addressing, and updates
# the fields B, P, and address/disp accordingly based on symbol lookup and displacement range.


def format3():
    global base_address

    for i in range(len(file)):
        row = file.loc[i]

        # Handle RSUB explicitly (no operand)
        if row['Instruction'] == 'RSUB':
            file.at[i, 'B'] = 0
            file.at[i, 'P'] = 0
            file.at[i, 'address/disp'] = format(0, '03X')
            continue

        # Work only on Format 3/4 with actual Object Code
        if row['Format'] == '3/4' and row['Object_Code'] != 'No_Object_Code':
            reference = row['Reference']
            if not isinstance(reference, str):
                continue
            reference = reference.strip()

            # Indexed Addressing: check for ,X or ,x and X == 1
            is_indexed = (',X' in reference.upper()) and row['X'] == 1
            if is_indexed:
                reference = reference.replace(',X', '').replace(',x', '').strip()

            # Remove addressing symbols if present
            if reference.startswith('#') or reference.startswith('@'):
                reference = reference[1:]

                # Immediate addressing with constant value (e.g., #5)
                if reference.isnumeric():
                    file.at[i, 'B'] = 0
                    file.at[i, 'P'] = 0
                    file.at[i, 'address/disp'] = format(int(reference) & 0xFFF, '03X')
                    continue

            # Look up target address in symbol table
            if reference not in SymTable:
                continue
            TA = int(SymTable[reference], 16)

            # PC = address of next instruction
            if i + 1 >= len(file):
                continue
            PC = int(file.loc[i + 1]['Location'], 16)

            displacement = TA - PC

            # Try PC-relative
            if -2048 <= displacement <= 2047:
                file.at[i, 'B'] = 0
                file.at[i, 'P'] = 1
                file.at[i, 'address/disp'] = format(displacement & 0xFFF, '03X')
            else:
                # Fallback to BASE-relative
                if base_address is not None:
                    base = int(base_address, 16)
                    displacement = TA - base
                    file.at[i, 'B'] = 1
                    file.at[i, 'P'] = 0
                    file.at[i, 'address/disp'] = format(displacement & 0xFFF, '03X')

    # Replace any NaN values in 'address/disp' with empty string
    file['address/disp'] = file['address/disp'].fillna('')

# Call the merged Format 3/4 handler

format3()

In [38]:
file

Unnamed: 0,Label,Instruction,Reference,Format,Location,op_Code,Object_Code,N,I,X,B,P,E,address/disp
0,COPY,START,0,START,0000,,No_Object_Code,,,,,,,
1,FIRST,STL,RETADR,3/4,0000,14,,1.0,1.0,0.0,0.0,1.0,0.0,02D
2,,LDB,#LENGTH,3/4,0003,68,,0.0,1.0,0.0,0.0,1.0,0.0,02D
3,,BASE,LENGTH,BASE,0006,,No_Object_Code,,,,,,,
4,CLOOP,+JSUB,RDREC,4,0006,48,,1.0,1.0,0.0,0.0,0.0,1.0,01036
5,,LDA,LENGTH,3/4,000A,00,,1.0,1.0,0.0,0.0,1.0,0.0,026
6,,COMP,#0,3/4,000D,28,,0.0,1.0,0.0,0.0,0.0,0.0,000
7,,JEQ,ENDFIL,3/4,0010,30,,1.0,1.0,0.0,0.0,1.0,0.0,007
8,,+JSUB,WRREC,4,0013,48,,1.0,1.0,0.0,0.0,0.0,1.0,0105D
9,,J,CLOOP,3/4,0017,3C,,1.0,1.0,0.0,0.0,1.0,0.0,FEC


In [39]:
def assemble_opcode():
    for index, row in file.iterrows():
        format_value = str(row['Format']).strip()

        # Skip if no opcode or missing address/disp
        if format_value not in ['3/4', '4']:
            continue
        if pd.isna(row['op_Code']) or str(row['address/disp']).strip() == 'No_Object_Code':
            continue

        # 1. Convert op_Code to binary (first 6 bits)
        opcode_bin = bin(int(str(row['op_Code']).strip(), 16))[2:].zfill(8)
        opcode_6bit = opcode_bin[:6]

        # 2. Extract flags (0 if missing or empty)
        def safe_flag(value):
            return str(int(value)) if not pd.isna(value) and str(value).strip() != '' else '0'

        n = safe_flag(row['N'])
        i = safe_flag(row['I'])
        x = safe_flag(row['X'])
        b = safe_flag(row['B'])
        p = safe_flag(row['P'])
        e = '1' if format_value == '4' else '0'

        # 3. Combine opcode and flags
        flags_bits = n + i + x + b + p + e
        first_12_bits = opcode_6bit + flags_bits
        first_3_hex = hex(int(first_12_bits, 2))[2:].zfill(3).upper()

        # 4. Handle displacement size
        disp = str(row['address/disp']).strip()
        disp_size = 5 if format_value == '4' else 3
        disp_hex = disp.zfill(disp_size).upper()

        # 5. Final object code
        object_code = first_3_hex + disp_hex
        file.at[index, 'Object_Code'] = object_code
assemble_opcode()

# 🔄 pass2 – Generate Object Code <a id='pass2'></a>

In [40]:
file

Unnamed: 0,Label,Instruction,Reference,Format,Location,op_Code,Object_Code,N,I,X,B,P,E,address/disp
0,COPY,START,0,START,0000,,No_Object_Code,,,,,,,
1,FIRST,STL,RETADR,3/4,0000,14,17202D,1.0,1.0,0.0,0.0,1.0,0.0,02D
2,,LDB,#LENGTH,3/4,0003,68,69202D,0.0,1.0,0.0,0.0,1.0,0.0,02D
3,,BASE,LENGTH,BASE,0006,,No_Object_Code,,,,,,,
4,CLOOP,+JSUB,RDREC,4,0006,48,4B101036,1.0,1.0,0.0,0.0,0.0,1.0,01036
5,,LDA,LENGTH,3/4,000A,00,032026,1.0,1.0,0.0,0.0,1.0,0.0,026
6,,COMP,#0,3/4,000D,28,290000,0.0,1.0,0.0,0.0,0.0,0.0,000
7,,JEQ,ENDFIL,3/4,0010,30,332007,1.0,1.0,0.0,0.0,1.0,0.0,007
8,,+JSUB,WRREC,4,0013,48,4B10105D,1.0,1.0,0.0,0.0,0.0,1.0,0105D
9,,J,CLOOP,3/4,0017,3C,3F2FEC,1.0,1.0,0.0,0.0,1.0,0.0,FEC



### 🧾 HTE_Record – Output HTE Format <a id='hte'></a>


In [41]:
def HTE_Record():
    # Extract columns from file
    labels = file['Label'].tolist()
    instructions = file['Instruction'].tolist()
    locations = file['Location'].tolist()
    object_codes = file['Object_Code'].tolist()

    # Header Record (H)
    prog_name = str(labels[0]).strip().upper().ljust(6, 'X')
    start = int(locations[0], 16)

    # Find last location before END
    for i, inst in enumerate(instructions):
        if inst.upper() == "END":
            last = int(locations[i], 16)
            break

    length = last - start
    H = "H^" + prog_name + "^" + format(start, '06X') + "^" + format(length, '06X')
    print(H)

    # Prepare Modification Records list once
    modification_records = []

    # Text Records (T)
    i = 0
    while i < len(instructions):
        if object_codes[i] == "No_Object_Code" or instructions[i].upper() == "END":
            i += 1
            continue

        # Start new T record
        record_start = format(int(locations[i], 16), '06X')
        text = ""
        length = 0

        while i < len(instructions):
            code = object_codes[i]
            inst = instructions[i].upper()

            if code == "No_Object_Code" or inst == "END":
                break

            if length + len(code) // 2 > 30:
                break

            text += "^" + code if text else code
            length += len(code) // 2

            # Add to modification records if format 4
            if inst.startswith("+"):
                mod_location = format(int(locations[i], 16) + 1, '06X')
                modification_records.append("M^" + mod_location + "^05")

            i += 1

        print("T^" + record_start + "^" + format(length, '02X') + "^" + text)

    # Print all Modification Records
    for mod in modification_records:
        print(mod)

    # End Record (E)
    E = "E^" + format(start, '06X')
    print(E)

# Call the function
HTE_Record()


H^COPYXX^000000^001077
T^000000^06^17202D^69202D
T^000006^1D^4B101036^032026^290000^332007^4B10105D^3F2FEC^032010^0F2016^010003
T^000023^0D^0F200D^4B10105D^3E2003^454F46
T^001036^1D^B410^B400^B440^75104096^E32019^332FFA^DB2013^A004^332008^57C003^B850
T^001053^1D^3B2FEA^134000^4F0000^F1^B410^774000^E32011^332FFA^53C003^DF2008^B850
T^001070^07^3B2FEF^4F0000^05
M^000007^05
M^000014^05
M^000027^05
M^00103D^05
E^000000
