# Laboratory Exercise 7

# A Simple Processor

Figure 1 shows a digital system that contains a number of 16-bit registers, a multiplexer, an adder/subtracter, and a control unit (finite state machine). Information is input to this system via the 16-bit DIN input, which is loaded into the IR register. Data can be transferred through the 16-bit wide multiplexer from one register in the system to another, such as from register IR into one of the *general purpose* registers  $r0, \ldots, r7$ . The multiplexer's output is called *Buswires* in the figure because the term *bus* is often used for wiring that allows data to be transferred from one location in a system to another. The FSM controls the *Select* lines of the multiplexer, which allows any of its inputs to be transferred to any register that is connected to the bus wires.

The system can perform different operations in each clock cycle, as governed by the FSM. It determines when particular data is placed onto the bus wires and controls which of the registers is to be loaded with this data. For example, if the FSM selects r0 as the output of the bus multiplexer and also asserts  $A_{in}$ , then the contents of register r0 will be loaded on the next active clock edge into register A.

Addition or subtraction of signed numbers is performed by using the multiplexer to first place one 16-bit number onto the bus wires, and then loading this number into register A. Once this is done, a second 16-bit number is placed onto the bus, the adder/subtracter performs the required operation, and the result is loaded into register G. The data in G can then be transferred via the multiplexer to one of the other registers, as required.



Figure 1: A digital system.

A system like the one in Figure 1 is often called a *processor*. It executes operations specified in the form of *instructions*. Table 1 lists the instructions that this processor supports. The left column shows the name of an instruction and its operands. The meaning of the syntax  $rX \leftarrow Op2$  is that the second operand, Op2, is loaded into register rX. The operand Op2 can be either a register, rY, or *immediate data*, #D.

| Instruction |           | Function performed             |  |
|-------------|-----------|--------------------------------|--|
| mv          | rX, Op2   | $rX \leftarrow Op2$            |  |
| mvt         | rX, #D    | $rX_{15-8} \leftarrow D_{7-0}$ |  |
| add         | rX, $Op2$ | $rX \leftarrow rX + Op2$       |  |
| sub         | rX, $Op2$ | $rX \leftarrow rX - Op2$       |  |

Table 1: Instructions performed in the processor.

Instructions are loaded from the external input DIN, and stored into the IR register, using the connection indicated in Figure 1. Each instruction is encoded using a 16-bit format. If Op2 specifies a register, then the instruction encoding is IIIOXXX000000YYY, where III specifies the instruction, XXX gives the rX register, and YYY gives the rY register. If Op2 specifies immediate data #D, then the encoding is IIIIXXXDDDDDDDD, where the field DDDDDDDD represents a nine-bit signed (2's complement) value. Although only two bits are needed to encode our four instructions, we are using three bits because other instructions will be added to the processor later. Assume that III = 000 for the mv instruction, 001 for mvt, 010 for add, and 011 for sub.

The processor in Figure 1 can perform various tasks by using a sequence of instructions. For example, the sequence below loads the number 28 into register r0 and then calculates, in register r1, the 2's complement value -28.

```
mv r0, #28 // original number = 28 mvt r1, #0xFF add r1, #0xFF // r1 = 0xFFFF sub r1, r0 // r1 = 1's-complement of r0 add r1, #1 // r1 = 2's-complement of r0 = -28
```

This sequence of instructions produces the same result as the single instruction my r1, #-28.

|     | $T_0$     | $T_1$            | $T_2$            | $T_3$                    |
|-----|-----------|------------------|------------------|--------------------------|
| mv  | $IR_{in}$ | Select rY or IR, |                  |                          |
|     |           | $rX_{in}$ , Done |                  |                          |
| mvt | $IR_{in}$ | Select IR,       |                  |                          |
|     |           | $rX_{in}$ , Done |                  |                          |
| add | $IR_{in}$ | Select rX,       | Select rY or IR, | Select $G$ , $rX_{in}$ , |
|     |           | $A_{in}$         | $G_{in}$         | Done                     |
| sub | $IR_{in}$ | Select rX,       | Select rY or IR, | Select $G$ , $rX_{in}$ , |
|     |           | $A_{in}$         | $AddSub, G_{in}$ | Done                     |

Table 2: Control signals asserted in each instruction/time step.

### Part I

Implement the processor shown in Figure 1 using Verilog code, as follows:

1. Make a new folder for this part of the exercise. Part of the Verilog code for the processor is shown in parts a to c of Figure 2, and a more complete version of the code is provided with this exercise, in a file named proc.v. You can modify this code to suit your own coding style if desired—the provided code is just a suggested solution. Fill in the missing parts of the Verilog code to complete the design of the processor.

```
module proc(DIN, Resetn, Clock, Run, Done);
    input [15:0] DIN;
    input Resetn, Clock, Run;
    output Done;
    ... declare variables
    assign III = IR[15:13];
    assign IMM = IR[12];
    assign rX = IR[11:9];
    assign rY = IR[2:0];
    dec3to8 decX (rX_in, rX, R_in); // produce r0 - r7 register enables
    parameter T0 = 3'b000, T1 = 3'b001, T2 = 3'b010, T3 = 3'b011;
    // Control FSM state table
    always @(Tstep_Q, Run, Done)
        case (Tstep_Q)
            TO: // data is loaded into IR in this time step
            if (~Run) Tstep_D = T0;
            else Tstep_D = T1;
            T1: ...
      endcase
```

Figure 2: Skeleton Verilog code for the processor. (Part a)

```
parameter mv = 3'b000, mvt = 3'b001, add = 3'b010, sub = 3'b011;
// selectors for the BusWires multiplexer
parameter _R0 = 4'b0000, _R1 = 4'b0001, _R2 = 4'b0010,
   R3 = 4'0011, ..., R7 = 4'b0111, G = 4'b1000,
   _IR8_IR8_0 /* signed-extended immediate data */ = 4'b1001,
   // control FSM outputs
always @(*) begin
   rX_in = 1'b0; Done = 1'b0; ... // default values for variables
   case (Tstep_Q)
       T0: // store DIN into IR
           IR_in = 1'b1;
       T1: // define signals in time step T1
           case (III)
               mv: begin
                   if (!Imm) Select = rY;
                                                  // mv rX, rY
                   else Select = _IR8_IR8_0;
                                                  // mv rX, #D
                   rX_in = 1'b1;
                                                  // enable rX
                   Done = 1'b1;
               end
               mvt: // mvt rX, #D
           endcase
       T2: // define signals in time step T2
           case (III)
              . . .
           endcase
       T3: // define signals in time step T3
           case (III)
              . . .
           endcase
       default: ;
   endcase
end
// Control FSM flip-flops
always @ (posedge Clock, negedge Resetn)
   if (!Resetn)
regn reg_0 (BusWires, Resetn, R_in[0], Clock, r0);
regn reg_1 (BusWires, Resetn, R_in[1], Clock, r1);
regn reg_7 (BusWires, Resetn, R_in[7], Clock, r7);
... instantiate other registers and the adder/subtracter unit
```

Figure 2: Skeleton Verilog code for the processor. (Part *b*)

```
// define the internal processor bus
    always @(*)
        case (Select)
            _R0: BusWires = r0;
            _R1: BusWires = r1;
            _R7: BusWires = r7;
            _G: BusWires = G;
            IR8_IR8_0: BusWires = \{\{7\{IR[8]\}\}, IR[8:0]\};
            IR7_0_0: BusWires = {IR[7:0], 8'b00000000};
            default: BusWires = 16'bxxxxxxxxxxxxxx;
        endcase
endmodule
module dec3to8(E, W, Y);
    input E; // enable
    input [2:0] W;
    output [0:7] Y;
    reg [0:7] Y;
    always @(*)
        if (E == 0)
            Y = 8'b00000000;
        else
            case (W)
                3'b000: Y = 8'b10000000;
                3'b001: Y = 8'b01000000;
                3'b010: Y = 8'b00100000;
                3'b011: Y = 8'b00010000;
                3'b100: Y = 8'b00001000;
                3'b101: Y = 8'b00000100;
                3'b110: Y = 8'b00000010;
                3'b111: Y = 8'b00000001;
            endcase
endmodule
```

Figure 2: Skeleton Verilog code for the processor. (Part c)

2. Set up the required subfolder and files so that your Verilog code can be compiled and simulated using the ModelSim Simulator to verify that your processor works properly. An example result produced by using *ModelSim* for a correctly-designed circuit is given in Figure 3. It shows the value  $0 \times 101C$  being loaded into *IR* from *DIN* at time 30 ns. This pattern represents the instruction mv r0, #28, where the immediate value  $D = 28 \ (0 \times 1C)$  is loaded into r0 on the clock edge at 50 ns. The simulation results then show the instruction mvt r1, #0xFF at 70 ns, add r0, #0xFF starting at 110 ns, and sub r1, r0 starting at 190 ns.

You should perform a thorough simulation of your processor with the ModelSim simulator. A sample Verilog testbench file, *testbench.v*, execution script, *testbench.tcl*, and waveform file, *wave.do* are provided along with this exercise.



Figure 3: Simulation results for the processor.

## Part II

In this part we will design the circuit illustrated in Figure 4, in which a memory module and counter are connected to the processor. The counter is used to read the contents of successive locations in the memory, and this data is provided to the processor as a stream of instructions. To simplify the design and testing of this circuit we have used separate clock signals, *PClock* and *MClock*, for the processor and memory.

You are to implement the circuit in Figure 4 in the FPGA device on a DE1-SoC board. As part of the process for designing and debugging the circuit, you must use ModelSim to simulate your Verilog code's functionality. Also, you can use the DESim tool as a way of observing how your circuit will behave on a DE1-SoC board, even if you do not have access to a physical board (for example, at home). You are strongly encouraged to make use of the DESim tool to help in the design and debug of your Verilog code. But the use of DESim for this exercise is optional and is not required.



Figure 4: Connecting the processor to a memory module and counter.

#### Do the following:

1. A sample top-level Verilog file that instantiates the processor, memory module, and counter is shown in Figure 5. This code is provided, along with this exercise, in a file named *part2.v*. It is the top-level Verilog file for this part of the exercise. The code instantiates a memory module called *inst\_mem*. A diagram of

this memory module is depicted in Figure 6. Since this module has only a read port, and no write port, it is called a *synchronous read-only memory (synchronous ROM)*. The memory module includes a register for synchronously loading addresses. This register is required due to the design of the memory resources in the Intel FPGA chip.

The synchronous ROM is defined in a Verilog source-code file named <code>inst\_mem.v</code>, which is provided along with this exercise. You can use the provided file, or you can create one yourself (if you want to see how this is done) by using the Quartus software. The instructions for creating the <code>inst\_mem.v</code> file using the Quartus software are given below. If you do not wish to perform these steps, and just want to make use of the provided file, then skip to item 3, below.

```
module part2 (KEY, SW, LEDR);
    input [1:0] KEY;
    input [9:0] SW;
    output [9:0] LEDR;
    wire Done, Resetn, PClock, MClock, Run;
    wire [15:0] DIN;
    wire [4:0] pc;
    assign Resetn = SW[0];
    assign MClock = KEY[0];
    assign PClock = KEY[1];
    assign Run = SW[9];
    proc U1 (DIN, Resetn, PClock, Run, Done);
    assign LEDR[9] = Done;
    inst_mem U2 (pc, MClock, DIN);
    count5 U3 (Resetn, MClock, pc);
endmodule
module count5 (Resetn, Clock, Q);
    input Resetn, Clock;
    output reg [4:0] Q;
    always @ (posedge Clock, negedge Resetn)
        if (Resetn == 0)
            Q <= 5'b00000;
        else
            Q <= Q + 1'b1;
endmodule
```

Figure 5: Verilog code for the top-level module.

2. A Quartus project file is provided along with this part of the exercise. Use the Quartus software to open this project, which is called *part2.qpf*. The top-level file in this Quartus project is *part2.v*. Use the Quartus IP Catalog tool to create the memory module, by clicking on Tools > IP Catalog in the Quartus software. In the IP Catalog window choose the *ROM: 1-PORT* module, which is found under the Basic Functions > On Chip Memory category. Select Verilog HDL as the type of output file to create, and give the file the name *inst\_mem.v*.

Follow through the provided dialogue to create a memory that has one 16-bit wide read data port and is 32 words deep. Figures 7 and 8 show the relevant pages and how to properly configure the memory. To place processor instructions into the memory, you need to specify *initial values* that should be stored in the memory when your circuit is programmed into the FPGA chip. This can be done by initializing the memory



Figure 6: The 32 x 16 ROM with address register.

using the contents of a *memory initialization file (MIF)*. The appropriate screen is illustrated in Figure 9. We have specified a file named *inst\_mem.mif*, which then has to be created in the folder that contains the Quartus project. Clicking Next two more times will advance to the Summary screen, which lists the names of files that will be created for the memory IP. You should select *only* the Verilog file *inst\_mem.v*. Make sure that none of the other types of files are selected, and then click Finish.



Figure 7: Specifying memory size.

- 3. As described above, you need to provide a memory initialization file (MIF) called *inst\_mem.mif* to specify the contents of the ROM. An example of a memory initialization file is given in Figure 10. Note that comments (% ... %) are included in this file as a way of documenting the meaning of the provided instructions. Set the contents of your MIF file such that it provides enough processor instructions to test your circuit.
- 4. The Verilog code in Figure 5 includes the appropriate port names for implementation of the design on an FPGA board, like the DE1-SoC. The switch  $SW_9$  drives the processor's Run input,  $SW_0$  is connected to Resetn,  $KEY_0$  to MClock, and  $KEY_1$  to PClock. The Run signal is displayed on  $LEDR_9$  and Done is connected to  $LEDR_0$ .



Figure 8: Specifying which memory ports are registered.



Figure 9: Specifying a memory initialization file (MIF).

- 5. Use the ModelSim Simulator to test your Verilog code. Ensure that instructions are read properly out of the ROM and executed by the processor. An example of simulation results produced using ModelSim with the MIF file from Figure 10 is shown in Figure 11. The corresponding ModelSim setup files are provided along with this exercise.
- 6. Once your simulations show a properly-working circuit, you are required to implement your design on a DE1-SoC board. Use the Quartus project files that are provided along with this exercise to compile your Verilog code with the Quartus Prime software, and then download the resulting circuit to the board.

Demonstrate the functionality of your circuit to a TA during your lab period by toggling the switches on the board and observing the LEDs. Since the circuit's clock inputs are controlled by pushbutton switches, it is possible to step through the execution of instructions and observe the behavior of the circuit.

```
DEPTH = 32;
WIDTH = 16;
ADDRESS RADIX = HEX;
DATA_RADIX = BIN;
CONTENT
BEGIN
00:0001000000011100;
                        % mv r0, #28
                                         %
                        % mvt r1, #0xFF
01:00110010111111111;
                                         %
02:01010010111111111:
                        % add r1. #0xFF
                                         %
03:0110001000000000;
                        % sub r1, r0
                                         %
04:0101001000000001;
                        % add r1, #1
                                         %
05:00000000000000000;
... (some lines not shown)
1F: 0000000000000000;
END;
```

Figure 10: An example memory initialization file (MIF).



Figure 11: An example simulation output using the MIF in Figure 10.

### **Enhanced Processor**

It is possible to enhance the capability of the processor so that the counter in Figure 4 is no longer needed, and so that the processor has the ability to perform read and write operations using memory or other devices. These enhancements involve adding new instructions to the processor, as well as other capabilities—they are discussed in the next lab exercise.