

# Computer Hardware Engineering (IS1200) Computer Organization and Components (IS1500)

Spring 2023

Lecture 9: ALU and Single-Cycle Processors

Artur Podobas<sup>1</sup> (IS1500), Marco Chiesa<sup>2</sup> (IS1200)

<sup>1</sup>Assistant Professor, KTH Royal Institute of Technology

<sup>2</sup>Associate Professor, KTH Royal Institute of Technology

Slides by David Broman (extensions by Artur Podobas), KTH





#### **Course Structure**



Part I
Arithmetic
Logic Unit

Part II

Data Path in a

Single-Cycle Processor



#### **Abstractions in Computer Systems**



Part I
Arithmetic
Logic Unit

Part II

Data Path in a

Single-Cycle Processor



#### **Agenda**



# Part II Data Path in a Single-Cycle Processor Part II Data Path in a Single-Cycle Processor



Part I
Arithmetic
Logic Unit

Part II

Data Path in a

Single-Cycle Processor



# Part I Arithmetic Logic Unit



Acknowledgement: The structure and several of the good examples are derived from the book "Digital Design and Computer Architecture" (2013) by D. M. Harris and S. L. Harris.





#### **Arithmetic Logic Unit (ALU)**

An **ALU** (sv. Aritmetisk Logisk Enhet) saves hardware by combining different arithmetic and logic operations in one single unit/element.



ALU symbol: both figures have the

Input **F** specifies the function that the ALU should perform

ALUs can have different functions and be designed differently.

An ALU can also include **output flags**, for instance:

- Overflow flag (adder overflowed)
- **Zero flag** (output is zero)
- **Negative flag** (if the value is negative)
- Carry flag (result of addition)



same function



#### **Arithmetic Logic Unit (ALU)**







Part II

Data Path in a

Single-Cycle Processor



#### Part II

# Data Path in a Single-Cycle Processor



Acknowledgement: The structure and several of the good examples are derived from the book "Digital Design and Computer Architecture" (2013) by D. M. Harris and S. L. Harris.





#### **Data Path and Control Unit**

#### A processor is typically divided into two parts



#### **Data Path**

- Operates on a word of data.
- Consists of elements such as registers, memory, ALUs etc.



#### **Control Unit**

 Gets the current instruction from the data path and tells the data path how to execute the instruction.



#### Instructions

In this lecture, we construct a microarchitecture for a subset of a MIPS processor with the following instructions







# State Elements (1/3) Program Counter and Register File

The **architectural states** for this MIPS processor are the program counter (PC) and the 32 registers (\$0, \$t0, ... \$s0, \$s1, ... etc.)

Reads 32-bit data 5-bit address results in  $2^5 = 32$  registers CLK PC at the next PC at the current WE3 clock cycle clock cycle **A1** RD1 CLK **A2** RD2 32 A3WD3 Program Counter (PC) 32 registers of size 32-bit 32-bit register

Two read ports (**RD1** and **RD2**) and one write port (**WD3**).

Part I
Arithmetic
Logic Unit





### State Elements (2/3) Instructions and Data Memories



**Non-architectural states** are used to simplify logic or improve performance (introduced in the next lecture).

Part I
Arithmetic
Logic Unit





# State Elements (3/3) Reading combinationally, writing at clock edge

All the blocks below **read** combinationally: when the address changes, the data on the read port change after some propagation time.

The register file and the data memory **write** at the raising clock edge.

There is <u>no clock involved</u>.









#### Read Instruction from the Current PC









#### 1w instruction – Read Base Address



Part I
Arithmetic
Logic Unit





#### 1w instruction – Read Offset



Part I
Arithmetic
Logic Unit





#### 1w instruction – Read Data Word



Part I
Arithmetic
Logic Unit





#### 1w instruction – Write Back







#### 1w instruction - Increment PC



Increment the PC by 4. (Next instruction is at address PC + 4)

This is the complete data path for the load word (1w) instruction.

Part I
Arithmetic
Logic Unit



Part II

Data Path in a

Single-Cycle Processor



#### 1w instruction – Timing



Combinational logic during clock cycle: read instruction, sign extend, read from register file, perform ALU operation, and read from the data memory.

At the raising clock edge: Write to the register file and update the PC.







#### sw instruction – Increment PC

We need to read the base address, read the offset, and compute an address. Good news: **We have already done that!**.

Example sw \$s0,4(\$s1)



Part I
Arithmetic
Logic Unit



Part II

Data Path in a

Single-Cycle Processor



#### R-type instructions – Machine Encoding

We are now going to handle all R-type instructions the same uniform way. That is, we should handle add, sub, and, or, and slt.





#### R-type instructions – ALU Usage







#### R-type instructions – Write to Register







#### R-type instructions – Machine Encoding





#### R-type instructions – Use the rd field





#### beq instruction - Machine Encoding

Recall that the beq instruction is a branch instruction, encoded in the I-Type.



Recall how to compute the BTA:

$$BTA = PC + 4 + imm * 4$$

Example
beq \$s0,\$s1,loop



#### beq instruction

Compare if equal



Part I
Arithmetic
Logic Unit



Part II

Data Path in a

Single-Cycle Processor



#### **Pseudo-Direct Addressing (Revisited)**

The **J** and **JAL** instructions are encoded using the **J-type**. But, the address is not 32 bits, only 26 bits.



#### A **32-bit Pseudo-Direct Address** is computed as follows:

- Bits 1 to 0 (least significant) are always zero because word alignment of code.
- Bits 27 to 2 is taken directly from the addr field of the machine code instruction.
- Bits 31 to 28 are obtained from the four most significant bits from PC + 4.



#### j instruction







## Data Path for Instructions add, sub, and, or, slt, addi, lw, sw, beq, j



Part I
Arithmetic
Logic Unit





#### Part III

# Control Unit in a Single-Cycle Processor



Acknowledgement: The structure and several of the good examples are derived from the book "Digital Design and Computer Architecture" (2013) by D. M. Harris and S. L. Harris.

Part I
Arithmetic
Logic Unit

Part II
Data Path in a
Single-Cycle Processor





#### What to Control?



Part I
Arithmetic
Logic Unit

Part II

Data Path in a

Single-Cycle Processor





#### **Control Unit Input: Machine Code**



Part I
Arithmetic
Logic Unit





#### **Control Unit Structure**



Part I
Arithmetic
Logic Unit





#### **ALU Decoder**



Part I
Arithmetic
Logic Unit





#### **Main Decoder**



| Instr  | ор     | RegWrite | RegDst | ALUSrc | Branch | MemWrite | MemToReg | Jump | ALUOp |
|--------|--------|----------|--------|--------|--------|----------|----------|------|-------|
| R-Type | 000000 | 1        | 1      | 0      | 0      | 0        | 0        | 0    | 10    |
| lw     | 100011 | 1        | 0      | 1      | 0      | 0        | 1        | 0    | 00    |
| sw     | 101011 | 0        | ?      | 1      | 0      | 1        | ?        | 0    | 00    |
| beq    | 000100 | 0        | ?      | 0      | 1      | 0        | ?        | 0    | 01    |
| addi   | 001000 | 1        | 0      | 1      | 0      | 0        | 0        | 0    | 00    |
| j      | 000010 | 0        | ?      | ?      | ?      | 0        | ?        | 1    | ??    |



#### Performance Analysis (1/2) General View

How should we analyze the performance of a computer?

- By clock frequency?
- By instructions per program?

× clock cycles instruction

× seconds clock cycle

Number of instructions in a program (# = number of)

Average cycles per instruction (CPI)

Seconds per cycle =  $clock period T_{c.}$ 

Determined by programmer or the compiler or both.

Determined by the microarchitecture implementation. Determined by the critical path in the logic.

#### **Problem:**

- Your program may have many inputs.
- Not only one specific program might be interesting.

#### Solution:

Use a **benchmark** (a set of programs). Example: SPEC CPU Benchmark

Part I
Arithmetic
Logic Unit





#### Performance Analysis (2/2) Single-Cycle Processor

program (# = number of)

Determined by programmer or the compiler or both.

instruction (CPI)

Determined by the microarchitecture implementation.

Each instruction takes one clock cycle. That is, CPI = 1.

The **1w** instruction has longer path than R-Type instructions. However, because of synchronous logic, the clock period is determined by the slowest instruction.

Seconds per cycle =  $clock\ period\ T_{C}$ 

Determined by the critical path in the logic.

The main problem with this design is the long critical path.



#### Critical Path Example: Load Word (1w) Instruction



Part I
Arithmetic
Logic Unit

Part II

Data Path in a

Single-Cycle Processor





#### You can soon stretch your legs...

...but wait just a second more



Part I
Arithmetic
Logic Unit



#### Reading Guidelines



#### **Module 4: Processor Design**

Lecture 9: ALU and Single-Cycled Processors

H&H Chapters 5.2.4, 7.1-7.3.

Lecture 10: Pipelined processors

H&H Chapters 7.5, 7.8.1-7.8.2, 7.9

Reading Guidelines
See the course webpage
for more information.



#### Summary

#### Some key take away points:

- The ALU performs most of the arithmetic and logic computations in the processor.
- The data path consists of sequential logic that performs processing of words in the processor.
- The control unit decodes instructions and tells the data path what to do.
- The single-cycle processor has a long critical path. We will solve this in the next lecture by introducing a pipelined processor.



Thanks for listening!