# **CH4-Processor Architecture**

# 4.1 Introducing to Y86

## Y86-64 Processor State

program registers

Each register has 4-bit ID

| %rax | 0 |
|------|---|
| %rcx | 1 |
| %rdx | 2 |
| %rbx | 3 |







- Same encoding as in x86-64, except %r15

其中一个为空,这样就可以使用4个bits来表示寄存器

- program counter(也就是PC)
- condition codes(也就是CC:OF,ZF,SF)
- status code

程序正常执行或发生事件

### 分为:

- 。 1(AOK):正常执行
- 。 2(HLT):halt指令
- 。 3(ADR):非法地址 (取指或内存)
- 。 4(INS):非法指令
- memory

## Y86-64 Instructions

一共有12条变长指令,通过其编号就可以判断其长度

格式: icode+ifun+rA+rB+valC



## 其中cmovXX是rrmovq的子集

### Arithmetic and Logical Operations

### 这会设置CC

```
addq rA,rB #ifun = 0
subq rA,rB #ifun = 1
andq rA,rB #ifun = 2
xorq rA,rB #ifun = 3
```

## Move Operations

注意到这些指令都要通过寄存器作为媒介, 所以会缺失一些x86的指令

## Conditional Move Operations

其ifun类似于jXX,其中rrmovq对应的ifun = 0

## Jump Instructions

```
jmp Dest #ifun = 0
jle Dest #ifun = 1
jl Dest #ifun = 2
je Dest #ifun = 3
jne Dest #ifun = 4
jge Dest #ifun = 5
jg Dest #ifun = 6
```

### Stack Operations

```
y86的栈基本类似于x86
```

一些trick:

```
pushq %rsp -> save old %rsp
popq %rsp -> movq (%rsp) %rsp
```

- Subroutine Call and Return
- Miscellaneous Instructions

nop不做任何事情 halt中止执行指令

# Y86-64 Programs

```
# Execution begins at address 0
1
                                       # assembler directives,告诉汇编器从地址0产生代码
2
       .pos 0
3
       irmovq
                                       # Set up stack pointer,可以认为这里的stack类似于宏
                   stack, %rsp
       call
                                       # Execute main program
4
                           main
       halt
                                       # Terminate program
5
6
7
   # Array of 4 elements
                                       # 同为伪指令,指出8字节对齐
8
       .align 8
                                       # 声明一个数组
9
    array:
        .quad 0x000d000d000d
10
11
       .quad 0x00c000c000c0
12
       .quad 0x0b000b000b00
       .quad 0xa000a000a000
13
14
15
    main:
16
       irmovq
                  array,%rdi
17
       irmovq
                  $4,%rsi
                                       # sum(array, 4)
18
       call
                      sum
19
       ret
20
```

```
# long sum(long *start, long count)
21
    # start in %rdi, count in %rsi
22
23
    sum:
                $8,%r8
24
       irmovq
                                     # Constant 8
                 $1,%r9
                                     # Constant 1
25
       irmovq
                 %rax,%rax # sum = 0,相当于置零
26
       xorq
                                     # Set CC,判断count是否为零,同时不改变其值
27
       andq
                 %rsi,%rsi
28
                 test
                                     # Goto test
       jmp
29
    loop:
       mrmovq (%rdi),%r10
30
                                     # Get *start
       addq %r10,%rax
                                     # Add to sum
31
       addq %r8,%rdi
                                     # start++
32
33
       subq %r9,%rsi
                                     # count--. Set CC
34
    test:
35
       jne loop
                                     # Stop when 0
36
       ret
                                     # Return
37
    # Stack starts here and grows to lower addresses
38
39
       .pos 0x200
                                     # 指明栈从0x200开始
40
    stack:
```

通过观察每条指令执行后的状态码来debug

# **ISA(Instruction Set Architecture)**

ISA提供了软件与硬件之间的概念抽象层

### CISC vs.RISC

RISC:ARM,CISC:X86

| CISC                                                                    | 早期的 RISC                                                                                                 |
|-------------------------------------------------------------------------|----------------------------------------------------------------------------------------------------------|
| 指令数量很多。Intel 描述全套指令的文档[51]有<br>1200多页。                                  | 指令数量少得多。通常少于 100 个。                                                                                      |
| 有些指令的延迟很长。包括将一个整块从内存的一个部分复制到另一部分的指令,以及其他一些将多个寄存器的值复制到内存或从内存复制到多个寄存器的指令。 | 没有较长延迟的指令。有些早期的 RISC 机器甚至没有整数乘法指令,要求编译器通过一系列加法来实现乘法。                                                     |
| 编码是可变长度的。x86-64 的指令长度可以是 1~15 个字节。                                      | 编码是固定长度的。通常所有的指令都编码为 4 个字节。                                                                              |
| 指定操作数的方式很多样。在 x86-64 中, 内存操作数指示符可以有许多不同的组合, 这些组合由偏移量、基址和变址寄存器以及伸缩因子组成。  | 简单寻址方式。通常只有基址和偏移量寻址。                                                                                     |
| 可以对内存和寄存器操作数进行算术和逻辑运算。                                                  | 只能对寄存器操作数进行算术和逻辑运算。允许使用内存引用的只有 load 和 store 指令, load 是从内存读到寄存器, store 是从寄存器写到内存。这种方法被称为 load/store 体系结构。 |
| 对机器级程序来说实现细节是不可见的。ISA 提供了程序和如何执行程序之间的清晰的抽象。                             | 对机器级程序来说实现细节是可见的。有些 RISC 机器禁止某些特殊的指令序列,而有些跳转要到下一条指令执行完了以后才会生效。编译器必须在这些约束条件下进行性能优化。                       |
| 有条件码。作为指令执行的副产品,设置了一些特殊的标志位,可以用于条件分支检测。                                 | 没有条件码。相反,对条件检测来说,要用明确的测试<br>指令,这些指令会将测试结果放在一个普通的寄存器中。                                                    |
| 栈密集的过程链接。栈被用来存取过程参数和返回<br>地址。                                           | 寄存器密集的过程链接。寄存器被用来存取过程参数和返回地址。因此有些过程能完全避免内存引用。通常处理器有更多的(最多的有32个)寄存器。                                      |

现在的ISA综合了CISC和RISC的优点

# 4.2 Logical Design & HCL

## **Combinational Circuits**

- Bit Equal
- Bit-level Multiplexor(Bit MUX)

## **HCL Representation**

使用case expression表示,由多个select:expr组合而成,输出为第一个select为1的expr

```
Out = [
    s : A;
    1 : B;
]
```

这里的1可以认为是default

• Arithmetic Logic Unit

## Storage(Sequential Circuits)

**Clocked Registers** 

在clock呈现上升沿时, 才根据input改变output

### Register File

可同时支持读两个程序寄存器的值,同时更新第三个寄存器的状态

- 。读:根据src的寄存器ID,一段延迟后输出对应的val
- 。写:输入val和dst,在clock上升沿时写入

### Memory

数据处理器

类似的读与写,设置write为0或1,非法地址时,error设置为1





# 4.3 Sequential CPU Implementation

# **Instruction Execution Stages**

### Fetch

读取指令

因为PC是clock register,所以在上升沿的时候增加PC

Decode

读取register, 使用register file

Execute

执行指令,ALU用于算术/逻辑单元,可能修改或使用 CC

Memory

读写内存

Write Back

对寄存器进行写操作

PC

更新PC

当出现异常时(halt/非法指令/非法地址), processor loop停止



## **Computation Steps**

|            |                | OPq rA, rB          | call Dest                    |  |
|------------|----------------|---------------------|------------------------------|--|
|            | icode,ifun     | icode:ifun ← M₁[PC] | icode:ifun ← M₁[PC]          |  |
| Fetch      | rA,rB          | rA:rB ← M₁[PC+1]    |                              |  |
| reton      | valC           |                     | valC ← M <sub>8</sub> [PC+1] |  |
|            | valP           | valP ← PC+2         | valP ← PC+9                  |  |
| Decode     | valA, srcA     | valA ← R[rA]        |                              |  |
| Decode     | valB, srcB     | valB ← R[rB]        | valB ← R[%rsp]               |  |
| Execute    | valE,aluA,aluB | valE ← valB OP valA | valE ← valB + –8             |  |
| Execute    | Cond code      | Set CC              |                              |  |
| Memory     | valM,addr,data |                     | M <sub>8</sub> [valE] ← valP |  |
| Write Back | valE,dstE      | R[rB] ← valE        | R[%rsp] ← valE               |  |
|            | valM,dstM      |                     |                              |  |
| PC update  | PC, newPC      | PC ← valP           | PC ← valC                    |  |

| cmovXX rA, rB                   |                                                                                                                                                                                                                                                   |
|---------------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| icode:ifun $\leftarrow M_1[PC]$ | Read instruction byte                                                                                                                                                                                                                             |
| rA:rB ← M₁[PC+1]                | Read register byte                                                                                                                                                                                                                                |
| valP ← PC+2                     | Compute next PC                                                                                                                                                                                                                                   |
| valA ← R[rA]                    | Read operand A                                                                                                                                                                                                                                    |
| valE ← 0 + valA                 | Perform ALU operation                                                                                                                                                                                                                             |
| $Cnd \leftarrow Cond(CC,ifun)$  | Take move?                                                                                                                                                                                                                                        |
|                                 |                                                                                                                                                                                                                                                   |
| Cnd ? R[rB] ← valE : -          | Write back result if move                                                                                                                                                                                                                         |
|                                 | taken                                                                                                                                                                                                                                             |
| PC ← valP                       | Update PC                                                                                                                                                                                                                                         |
|                                 | icode:ifun $\leftarrow$ M <sub>1</sub> [PC]<br>rA:rB $\leftarrow$ M <sub>1</sub> [PC+1]<br>valP $\leftarrow$ PC+2<br>valA $\leftarrow$ R[rA]<br>valE $\leftarrow$ 0 + valA<br>Cnd $\leftarrow$ Cond(CC,ifun)<br>Cnd ? R[rB] $\leftarrow$ valE : - |

具体的就看书吧

### **Values**

#### Determinate Values:

Fetch: PC,icode,ifun,rA(寄存器编号),rB,valC(常量值),valP(增加后的PC值)

Decode: valA(寄存器存的值),valB

Execute: valE(ALU计算结果),CC,Cnd(之前的CC以判断情况)

Memory: valM(内存值) Write Back: valE,valM

### • Indeterminate Values:

Decode: srcA(valA对应的位置,rA,%rsp),srcB(rB,%rsp)

Execute: aluA(valA,valC,+8,-8),aluB(valB,0)

Memory: addr(valA,valE),data(valA,valP)

Write Back: dstE(rB,%rsp),dstM(rA,Cnd F)

PC: newPC(valP,valC,valM)

上述步骤同时在上升沿发生, status同时更新

# **SEQ CPU Implementation**

# 4.4 Pipeline

# **Principles of Pipeline**

### Limitations

- 1. 不均匀的划分: 时钟周期必须等于最长阶段延迟加上寄存器延迟
- 2. 流水线过深,收益下降:每个阶段之间都要塞入流水线寄存器延迟,导致其占比大,单条指令延迟上升
- 3. 指令之间的依赖关系:数据依赖、控制依赖,这需要Hazard

# **Pipeline Implemetation**

## **Pipeline Stages**

五级流水线:合并Fetch和PC(相当于PC被放在一开始,用于计算本条指令的位置而非下一条指令了)



整个硬件框图。实际上大部分内容与SEQ+相比,是相当类似或者说相同的。 变化有:

- 1. 信号的重新组织与命名。在原有输入信号前面加上流水线寄存器名称(大写)以区分各自用到的信号。因为例如icode就在Decode、Execute、Memory和Write back阶段都存在,而且这些信号的内容还不同(因为属于不同的指令),所以用流水线寄存器来加以区分。D\_icode, E\_icode, M\_icode, and W icode.
  - 如果这些信号是某一阶段产生的,则以小写字母作前缀。例如valE是由Execute阶段产生的,所以在Execute阶段,他的名字叫e\_valE.
- 2. 在Fetch阶段增加了Predict PC部件来预测下一条指令的地址。
- 3. 将valP和valA在Decode阶段合并为一个信号,所以多了一个Select A部件。书上P321。主要用处是减少控制信号和寄存器的数目。因为只有call指令会在memory阶段用到valP,只有jump指令会在execute阶段用到valP。这两种指令都不需要用到寄存器A。所以我们可以将这两个控制信号合并。这样,SEQ中的data部件就不需要了。因为在Fetch阶段本身就有Predict PC部件。这样valP在其他场合也不需要传播到Fetch阶段之外的场合去。

## **Predicting the PC**

在上一条指令完成取指后开始猜测PC,使得马上执行下一条指令

- 大部分无控制指令: valP, 总是能猜对
- call和无条件跳转: valC, 总是能猜对
- conditional jumps: valC,可能猜错,使用Select PC判断正误,猜错了要进行补救
- return指令:无法猜测,使用Select PC得到正确的值

## **Hazards**

由于指令之间的依赖关系,需要进行冒险,记住主要的四个就可以(ret,mispredicting jmp,load/use,exception)

### **Data Hazard**

### **Stalling**

使用bubble插入nop,一次插入一个; bubble在时钟上升沿时让某条指令及其之后的指令停在原处不进入下一级,从而实现一个nop(即修改icode),副作用是后面的指令都重复执行某个阶段。

## **Bypass Paths**

在Decode阶段,需要取得valA和valB,这要么来源于register file,要么来源于forward,从接下来的阶段提前取得

Forwarding Sources:

- Execute: e\_valE
- *Memory:* M\_valE,m\_valM

Write Back: W\_valE,W\_valM

优先取最近的阶段, register file优先级最低

一般来说都可以forwarding,从而不需要stall和插入bubble,但也有可能来不及(即load/use),此时只能使用一次stall来在Execute阶段插入一个bubble

```
2 3 4 5 6 7 8 9 10 11 12
                          F D E M W
F D E M W
F D E M
#demo-luh.ys
0x000: irmovq $128, %rdx
0x00a: irmovq $3, %rcx
0x014: rmmovq %ecx, 0(%rdx)
0x01e: irmovq $10, %rbx
#Load %rax
0x028: mrmovq 0(%rdx), %rax
                                            Cycle 8
#Use %rax
                                         Write Back
0x032: addq %rbx, %rax
                                       W dstE=%rbx
0x034: halt
                                        W valE=10
                                            Memory

    Stall using instruction for one cycle

                                       M dstM=%ra

    Can then pick up loaded value by

                                       m_valM - M[128]=3
  forwarding from memory
                                            Decode
                                       valA-W valM=10
                                       valB-m_valM=3 ◀
```

### **Control Hazard**

```
int f_PC = [
#mispredicted branch. Fetch at incremented PC
    M_icode == IJXX && !M_Cnd : M_valA;
#completion of RET instruction
    W_icode == IRET : W_valM;
#default: Use predicted value of PC
    1: F_predPC
];
```

一般来说都会选择TAKEN, 即valP

### **Branch Misprediction**

分别在Decode和Execute阶段插入两个bubble来及时阻止错误执行的两条指令,并把状态码改回来(所以会有M\_Cnd写回的路径)

#### Return

下一条指令需要在fetch阶段stall并插入3个bubble,直到ret指令到达write back阶段,从而读到正确的W\_valM

之所以没有在memory阶段就恢复,是因为pred PC在fetch最开始,而m\_valM则是在memory阶段结尾获得的,这会使得fecth阶段耗时翻倍,影响流水线性能,后面jump使用M\_valA也是这个原因(用大写)



## **Exceptions**

在每个阶段都加入状态码来传递,在Write Back阶段再集中顺序处理异常

# **PIPE CPU Implementation**

## **PIPE Control Logic**

### **Control Cases**

#### Detection:

| Condition           | Trigger                                                       |
|---------------------|---------------------------------------------------------------|
| Processing ret      | IRET in { D_icode, E_icode, M_icode }                         |
| Load/Use Hazard     | E_icode in { IMRMOVL, IPOPL } && E_dstM in { d_srcA, d_srcB } |
| Mispredicted Branch | E_icode = IJXX & !e_Cnd                                       |

#### Action:

| Condition           | F      | D      | E      | М      | w      |
|---------------------|--------|--------|--------|--------|--------|
| Processing ret      | stall  | bubble | normal | normal | normal |
| Load/Use Hazard     | stall  | stall  | bubble | normal | normal |
| Mispredicted Branch | normal | bubble | bubble | normal | normal |

#### **Control Combinations**



## **Combination A**

jmp错误预测,此时也应当按照错误预测处理且stall在fetch, PC使用M\_valA

| Condition           | F      | D      | E      | M      | W      |
|---------------------|--------|--------|--------|--------|--------|
| Processing ret      | stall  | bubble | normal | normal | normal |
| Mispredicted Branch | normal | bubble | bubble | normal | normal |
| Combination         | stall  | bubble | bubble | normal | normal |

### **Combination B**

load的是%rsp,此时优先处理load/use,并将ret控制在decode阶段

| Condition       | F     | D              | E      | M      | W      |
|-----------------|-------|----------------|--------|--------|--------|
| Processing ret  | stall | bubble         | normal | normal | normal |
| Load/Use Hazard | stall | stall          | bubble | normal | normal |
| Combination     | stall | stall + bubble | bubble | normal | normal |
| Desired         | stall | stall          | bubble | normal | normal |

# **Performance Analysis**

计算CPI, 理想情况下1.0

### **HCL of PIPE CPU**

```
########## Fetch Stage
                             ## What address should instruction be fetched at
word f_pc = [
       # Mispredicted branch. Fetch at incremented PC
       M icode == IJXX && !M Cnd : M valA;
       # Completion of RET instruction
       W icode == IRET : W valM;
       # Default: Use predicted value of PC
       1 : F_predPC;
];
## Determine icode of fetched instruction
word f_icode = [
       imem_error : INOP;
       1: imem_icode;
];
# Determine ifun
word f_ifun = [
       imem_error : FNONE;
       1: imem_ifun;
];
# Is instruction valid?
bool instr valid = f icode in
       { INOP, IHALT, IRRMOVQ, IIRMOVQ, IRMMOVQ, IMRMOVQ,
         IOPQ, IJXX, ICALL, IRET, IPUSHQ, IPOPQ, IIADDQ };
# Determine status code for fetched instruction
word f_stat = [
       imem_error: SADR;
       !instr_valid : SINS;
       f_icode == IHALT : SHLT;
       1 : SAOK;
];
# Does fetched instruction require a regid byte?
bool need_regids =
       f_icode in { IRRMOVQ, IOPQ, IPUSHQ, IPOPQ,
                    IIRMOVQ, IRMMOVQ, IMRMOVQ, IIADDQ };
```

```
# Does fetched instruction require a constant word?
bool need valC =
       f_icode in { IIRMOVQ, IRMMOVQ, IMRMOVQ, IJXX, ICALL, IIADDQ };
# Predict next value of PC
word f_predPC = [
       f icode in { IJXX, ICALL } : f valC;
       1 : f valP;
];
## What register should be used as the A source?
word d srcA = [
       D_icode in { IRRMOVQ, IRMMOVQ, IOPQ, IPUSHQ } : D_rA;
       D_icode in { IPOPQ, IRET } : RRSP;
       1 : RNONE; # Don't need register
];
## What register should be used as the B source?
word d_{srcB} = [
       D_icode in { IOPQ, IRMMOVQ, IMRMOVQ, IIADDQ } : D_rB;
       D_icode in { IPUSHQ, IPOPQ, ICALL, IRET } : RRSP;
       1 : RNONE; # Don't need register
];
## What register should be used as the E destination?
word d dstE = [
       D icode in { IRRMOVQ, IIRMOVQ, IOPQ, IIADDQ} : D rB;
       D_icode in { IPUSHQ, IPOPQ, ICALL, IRET } : RRSP;
       1 : RNONE; # Don't write any register
];
## What register should be used as the M destination?
word d dstM = [
       D_icode in { IMRMOVQ, IPOPQ } : D_rA;
       1 : RNONE; # Don't write any register
];
## What should be the A value?
## Forward into decode stage for valA
```

```
word d valA = [
       D_icode in { ICALL, IJXX } : D_valP; # Use incremented PC
       d srcA == e dstE : e valE; # Forward valE from execute
       d_srcA == M_dstM : m_valM; # Forward valM from memory
       d_srcA == M_dstE : M_valE;  # Forward valE from memory
       d_srcA == W_dstM : W_valM; # Forward valM from write back
       d_srcA == W_dstE : W_valE; # Forward valE from write back
       1 : d rvalA; # Use value read from register file
];
word d_valB = [
       d srcB == e dstE : e valE; # Forward valE from execute
       d srcB == M dstM : m valM; # Forward valM from memory
       d srcB == M dstE : M valE; # Forward valE from memory
       d srcB == W dstM : W valM; # Forward valM from write back
       d srcB == W dstE : W valE; # Forward valE from write back
       1 : d rvalB; # Use value read from register file
];
## Select input A to ALU
word aluA = [
       E_icode in { IRRMOVQ, IOPQ } : E_valA;
       E_icode in { IIRMOVQ, IRMMOVQ, IMRMOVQ, IIADDQ } : E_valC;
       E_icode in { ICALL, IPUSHQ } : -8;
       E_icode in { IRET, IPOPQ } : 8;
       # Other instructions don't need ALU
];
## Select input B to ALU
word aluB = [
       E icode in { IRMMOVQ, IMRMOVQ, IOPQ, ICALL,
                   IPUSHQ, IRET, IPOPQ, IIADDQ } : E_valB;
       E_icode in { IRRMOVQ, IIRMOVQ } : 0;
       # Other instructions don't need ALU
];
## Set the ALU function
word alufun = [
       E_icode == IOPQ : E_ifun;
       1 : ALUADD;
];
```

```
## Should the condition codes be updated?
bool set_cc = E_icode == IOPQ || E_icode == IIADDQ &&
       # State changes only during normal operation
       !m_stat in { SADR, SINS, SHLT } && !W_stat in { SADR, SINS, SHLT };
## Generate valA in execute stage
word e valA = E valA; # Pass valA through stage
## Set dstE to RNONE in event of not-taken conditional move
word e dstE = [
       E icode == IRRMOVQ && !e Cnd : RNONE;
       1 : E_dstE;
1;
## Select memory address
word mem_addr = [
       M_icode in { IRMMOVQ, IPUSHQ, ICALL, IMRMOVQ } : M_valE;
       M_icode in { IPOPQ, IRET } : M_valA;
       # Other instructions don't need address
];
## Set read control signal
bool mem_read = M_icode in { IMRMOVQ, IPOPQ, IRET };
## Set write control signal
bool mem_write = M_icode in { IRMMOVQ, IPUSHQ, ICALL };
#/* $begin pipe-m stat-hcl */
## Update the status
word m stat = [
       dmem_error : SADR;
       1 : M_stat;
];
#/* $end pipe-m stat-hcl */
## Set E port register ID
word w_dstE = W_dstE;
## Set E port value
word w_valE = W_valE;
```

```
## Set M port register ID
word w dstM = W dstM;
## Set M port value
word w_valM = W_valM;
## Update processor status
word Stat = [
       W stat == SBUB : SAOK;
       1 : W stat;
1;
# Should I stall or inject a bubble into Pipeline Register F?
# At most one of these can be true.
bool F_bubble = 0;
bool F_stall =
       # Conditions for a load/use hazard
       E icode in { IMRMOVQ, IPOPQ } &&
        E_dstM in { d_srcA, d_srcB } ||
       # Stalling at fetch while ret passes through pipeline
       IRET in { D_icode, E_icode, M_icode };
# Should I stall or inject a bubble into Pipeline Register D?
# At most one of these can be true.
bool D stall =
       # Conditions for a load/use hazard
       E icode in { IMRMOVQ, IPOPQ } &&
        E dstM in { d srcA, d srcB };
bool D bubble =
       # Mispredicted branch
       (E_icode == IJXX && !e_Cnd) ||
       # Stalling at fetch while ret passes through pipeline
       # but not condition for a load/use hazard
       !(E_icode in { IMRMOVQ, IPOPQ } && E_dstM in { d_srcA, d_srcB }) &&
         IRET in { D_icode, E_icode, M_icode };
# Should I stall or inject a bubble into Pipeline Register E?
# At most one of these can be true.
bool E stall = 0;
```

```
bool E_bubble =
    # Mispredicted branch
    (E_icode == IJXX && !e_Cnd) ||
    # Conditions for a load/use hazard
    E_icode in { IMRMOVQ, IPOPQ } &&
        E_dstM in { d_srcA, d_srcB};

# Should I stall or inject a bubble into Pipeline Register M?
# At most one of these can be true.
bool M_stall = 0;
# Start injecting bubbles as soon as exception passes through memory stage
bool M_bubble = m_stat in { SADR, SINS, SHLT } || W_stat in { SADR, SINS, SHLT };
# Should I stall or inject a bubble into Pipeline Register W?
bool W_stall = W_stat in { SADR, SINS, SHLT };
bool W_bubble = 0;
#/* $end pipe-all-hcl */
```