HGA101图形加速器技术手册

V0.3

RV/AT项目组

2020-10-25

# HGA101技术规格书

## HGA101硬件规格

HGA101是一个以RISC-V EMFC指令集为基础，配合定制化的浮点SIMD单元构成的图形加速器组件。本设计为PVS464D工程配套外设之一。计划对OpenGL v2.0/miniGL计算机图形学加速库提供有限支持（单精度、双精度功能一律返回半精度结果）

本图形加速器核心技术规格如下：

|  |  |  |
| --- | --- | --- |
| 项目 | 指标 | 备注 |
| 浮点支持 | 半精度（SIMD单元） |  |
| 多边形填充率 | >10KT@1MHz |  |
| Framebuffer尺寸 | 800x600@32b RGBA |  |
| 图层支持 | 1Text+1Sprite+2FB |  |
|  |  |  |
|  |  |  |
| 可调用显示内存 | 8MiB | For EG4D20 |

## HGA101架构简述

HGA101是一个5级流水线的单发射顺序处理器。

SIMD部分具有32个128位GPR(本文简记为SPR，指令中助记符为SS/SD)，可同时执行4个半精度数据运算。

FPU部分遵守RISC-V F指令集标准，具32个32位浮点数寄存器。

GF24 24位精度图形浮点数

GF24是一种为FPGA应用地位宽乘法器场景专门设计的中等精度浮点数格式，不兼容IEEE 754标准

其数据格式为

|  |  |  |
| --- | --- | --- |
| 23 | [22:16] | [15:0] |
| Symbol | Radix | Significand |

Half 半精度浮点数格式

|  |  |  |
| --- | --- | --- |
| 15 | [14:10] | [9:0] |
| Symbol | Radix | Significand |

## HGA101指令集

HGA101流程控制部分使用了简化CSR的RV32I指令集，在本文档中不做过多赘述，重点介绍GS16 SIMD数学运算指令集和CAT坐标运算指令集

GS16 SIMD扩展指令集

GS16是本处理器中的半精度数学运算加速指令集。

**SIMD寄存器运算类指令编码**

|  |  |  |  |  |  |  |  |
| --- | --- | --- | --- | --- | --- | --- | --- |
| [24:20] | [19:15] | [14] | [13] | [12] | [11:7] | [6:5] | [4:0] |
| SS2/RS | SS1 | Mask[2] | MASKEN | SVSEL | SD | Mask[1:0] | 5’b11111 |
| [31:27] | | | | | [26:25] | | |
| OP5 | | | | | Mask[4:3] | | |

该类型指令占用RISC-V 长指令空间，本处理器无32bit以上指令

SVSEL:向量指令选择

MASKEN:执行槽屏蔽使能

|  |  |  |  |
| --- | --- | --- | --- |
| 助记符 | 解释 | 指令格式 | OP5 |
| SFADD16 | 半精度批量浮点加 | SFADD16 SD,SS1,SS2,(MASK) | 5’b00000 |
| SFSUB16 | 半精度批量浮点减 | SFSUB16 SD,SS1,SS2 | 5’b00001 |
| SFMUL16 | 半精度批量浮点乘 | SFMUL16 SD,SS1,SS2 | 5’b00010 |
| SFDIV16 | 半精度批量浮点除 | SFDIV16 SD,SS1,SS2 | 5’b00011 |
|  |  |  |  |
| SADD16 | 批量短整型加 |  | 5’b10000 |
| SSUB16 | 批量短整型减 |  | 5’b10001 |
| SAND16 | 批量短整型与 |  | 5’b10010 |
| SOR16 | 批量短整型或 |  | 5’b10011 |
| SXOR16 | 批量短整型异或 |  | 5’b10100 |
| SLSH16 | 批量短整型左移 |  | 5’b10101 |
| SRSA16 | 批量短整型算数右移 |  | 5’b10110 |
| SRSL16 | 批量短整型逻辑右移 |  | 5’b10111 |
|  |  |  |  |
|  |  |  |  |

备注：每条指令有对应的V版本，SS2被换为RS（整数运算指令）/FS（浮点运算指令），运行统一乘除操作，每条指令都有对应的lane Masked版本，整数加减全部为有符号操作

**运算伪指令**

为了方便汇编编写，HGA101规定了一系列伪指令

|  |  |  |
| --- | --- | --- |
| 助记符 | 用途 | 本质 |
| ZERO SD | 清零SD | VAND16 SD,SD,R0 |
| ZEROM SD,MASK | 清零lane | VANDM16 SD,SD,R0,MASK |
| NOT SD | 取反SD | VXOR16 SD,SD,R0 |
| NOTM SD,MASK | 取反lane | VXORM16 SD,SD,R0,MASK |

**“V类”向量-标量运算指令**

寄存器-寄存器运算类指令

|  |  |  |  |  |  |  |  |
| --- | --- | --- | --- | --- | --- | --- | --- |
| [24:20] | [19:15] | [14] | [13] | [12] | [11:7] | [6:5] | [4:0] |
| RS | SS1 | Mask[2] | MASKEN | 1 | SD | Mask[1:0] | 5’b11111 |
| [31:27] | | | | | [26:25] | | |
| OP5 | | | | | Mask[4:3] | | |

例如：

|  |  |  |  |
| --- | --- | --- | --- |
| 助记符 | 解释 | 指令格式 | OP5 |
| VFADD16 | 半精度批量浮点加 | VFADD16 SD,SS1,SS2,(MASK) | 5’b00000 |

V类比较指令

|  |  |  |  |  |  |  |  |
| --- | --- | --- | --- | --- | --- | --- | --- |
| [31:25] | [24:20] | [19:15] | [14] | [13] | [12] | [11:7] | [6:0] |
| Funct7 | RS2 | RS1 | IFSEL | MASKEN | SVSEL | RD | 7’b0101011 |

|  |  |  |  |  |  |  |  |
| --- | --- | --- | --- | --- | --- | --- | --- |
| 7’b0001101 | RS2 | SS1 | 0 | X(0) | 1 | SD | VMINI |
| 7’b0010000 | RS2 | SS1 | 0 | X(0) | 1 | RD | VGQI |

**运算掩码功能（MASK寄存器）**

本质上，若13位（MASKEN）为0，MASK统一为0，此时运算掩码统一为1，所有SIMD通道（Lane）全部参与运算,若MASKEN=1，则从整数寄存器组中选取MASK，若MASK对应bit为1，则该通道参与运算并写回结果，否则写回SS1对应通道内容.

**SIMD比较类功能指令**

比较指令编码

|  |  |  |  |  |  |  |  |
| --- | --- | --- | --- | --- | --- | --- | --- |
| [31:25] | [24:20] | [19:15] | [14] | [13] | [12] | [11:7] | [6:0] |
| Funct7 | RS2 | RS1 | IFSEL | MASKEN | SVSEL | RD | 7’b0101011 |
| 7’b0001000 | X(5’b0) | SS | 0 | X(0) | X(0) | SD | SFTI |
| X(5’b0) | SS | 1 | X(0) | X(0) | SD | SITF |
| 7’b0001100 | SS2 | SS1 | 0 | X(0) | 0 | SD | SMAXI |
| SS2 | SS1 | 1 | X(0) | 0 | SD | SMAXF |
| 7’b0001101 | SS2 | SS1 | 0 | X(0) | 0 | SD | SMINI |
| SS2 | SS1 | 1 | X(0) | 0 | SD | SMINF |
| 7’b0010000 | SS2 | SS1 | 1 | X(0) | 0 | RD | SGQF |
| SS2 | SS1 | 0 | X(0) | 0 | RD | SGQI |
| 7’b0010001 | SS2 | SS1 | 1 | X(0) | 0 | RD | SLTF |
| 7’b0010010 | SS2 | SS1 | 1 | X(0) | 0 | RD | SEQF |
| 7’b0010011 | SS2 | SS1 | 1 | X(0) | 0 | RD | SNQF |

该类指令占用RISC-V CUSTOM1指令扩展区

MASKEN:（对最大最小值指令）使能遮罩

IFSEL:整数/浮点功能选择

|  |  |  |  |
| --- | --- | --- | --- |
| 助记符 | 解释 | 格式 | Funct7 |
| SFTI | 批量浮点化整数 | SFTI SD,SS1 | 7’b0001000 |
| SITF | 批量整数化浮点 | SITF SD,SS1 |
|  |  |  |  |
| SMAXI | 求整数最大值 | SMAXI SD,SS1,SS2 | 7’b0001100 |
| SMAXF | 求浮点最大值 |  |
| SMINI | 求整数最小值 |  | 7’b0001101 |
| SMINF | 求浮点最小值 |  |
|  |  |  |  |
| SCMPF | 浮点与向量参数比较 | (S/V)  (GQ/LT/EQ/NQ)  (I/F) | GQ:7’b0010000 |
| SCMPI | 整数与向量参数比较 | LT:7’b0010001 |
| VCMPF | 浮点与标量参数比较 | EQ:7’b0010010 |
| VCMPI | 整数与标量参数比较 | NQ:7’b0010011 |
|  |  |  |  |

MAX/MIN类指令用于在两个lane中比较生成最大/最小值，写回SD寄存器

CMP类指令主要用于生成MASK，统一写回整数寄存器组。

备注：整浮点互转指令未遵守754 Rounding标准

**Lane读写指令编码**

|  |  |  |  |  |  |  |  |
| --- | --- | --- | --- | --- | --- | --- | --- |
| [31:25] | [24:20] | [19:15] | [14] | [13] | [12] | [11:7] | [6:0] |
| Funct7 | RS2 | RS1 | IFSEL | DIR | IMMEN | RD | 7’b0101011 |
| 7’b1111111 | LANE | SS1 | 1 | 0 | 0 | FD | SLTF |
| LANE | SS1 | 0 | 0 | 0 | RD | SLTI |
| LANE | RS1 | 0 | 1 | 0 | SD | SITL |
| LANE | FS1 | 1 | 1 | 0 | SD | SFTL |
| IMM | SS1 | 1 | 0 | 1 | FD | SLTFI |
| IMM | SS1 | 0 | 0 | 1 | RD | SLTII |
| IMM | RS1 | 0 | 1 | 1 | SD | SITLI |
| IMM | FS1 | 1 | 1 | 1 | SD | SFTLI |

该类型指令也是占用RISC-V CUSTOM1指令空间，使用特定Funct7进行区分（0x7F）

该类指令使用FUNCT3空间标示指令操作

|  |  |  |  |
| --- | --- | --- | --- |
| 助记符 | 解释 | 格式 | Funct7 |
| SLTF | 向量项到浮点寄存器 | SLTF FD,SS,LANE | 7’b1111111 |
| SLTI | 向量项到整数寄存器 | SLTI RD,SS,LANE |
| SITL | 整数寄存器到向量项 | SITL SD,RS,LANE |
| SFTL | 浮点寄存器到向量项 | SFTL SD,FS,LANE |
| SLTFI | 向量项到浮点寄存器 | SLTF FD,SS,IMM |
| SLTII | 向量项到整数寄存器 | SLTI RD,SS,IMM |
| SITLI | 整数寄存器到向量项 | SITL SD,RS,IMM |
| SFTLI | 浮点寄存器到向量项 | SFTL SD,FS,IMM |

对整数指令，取低16bit，对浮点指令，自动进行半精度-单精度互转

其中LANE寄存器统一取整数寄存器RS2，或者为五位IMM(IMMEN=1)

**特殊功能指令**

|  |  |  |  |  |  |  |  |
| --- | --- | --- | --- | --- | --- | --- | --- |
| [31:25] | [24:20] | [19:15] | [14] | [13] | [12] | [11:8] | [6:0] |
| FORCESYNC:强制Cache写回 | | | | | | | |
| 7‘b1111000 | X | X | X | X | X | X | 7’b0101011 |
| LOOP:硬件循环，使用寄存器 | | | | | | | |
| 7‘b1110111 | RCOUNT | RINDEX | IMM[] | | | |  |
| LBRK：跳出硬件循环 | | | | | | | |
| 7‘b11100XX |  |  |  |  |  |  |  |
|  |  |  |  |  |  |  |  |
|  |  |  |  |  |  |  |  |

**SIMD存取类指令编码**

SIMD Load

|  |  |  |  |  |  |  |  |
| --- | --- | --- | --- | --- | --- | --- | --- |
| [31:25] | [24:20] | [19:15] | [14] | [13] | [12] | [11:7] | [7:0] |
| IMM[6:0] | RMASK/IMM[12:8] | RINDEX | IMM[7] | MASKEN | 0 | RD | 7’b0001011 |
| IMM[6:0] | IMM[12:8] | RS | IMM[7] | 0 | 0 | SD | SLOAD |
| IMM[6:0] | RMASK | RS | IMM[7] | 0 | 0 | SD | SLOADM |

SIMD Store

|  |  |  |  |  |  |  |  |
| --- | --- | --- | --- | --- | --- | --- | --- |
| [31:25] | [24:20] | [19:15] | [14] | [13] | [12] | [11:7] | [7:0] |
| IMM[6:0] | SS | RINDEX | IMM[7] | MASKEN | 1 | RMASK/IMM[12:8] | 7’b0001011 |
| IMM[6:0] | SS | RINDEX | IMM[7] | 0 | 1 | IMM[12:8] | SSTOR |
| IMM[6:0] | SS | RINDEX | IMM[7] | 1 | 1 | RMASK | SSTORM |

该类指令占用RISC-V CUSTOM0扩展指令空间，提供定长的128b寄存器存取

|  |  |  |
| --- | --- | --- |
| 助记符 | 解释 | 指令格式 |
| SLOAD | 批量加载，将SPR压入内存 | SLOAD SS,RA,OFFSET |
| SSTOR | 批量存储，从内存取SPR | SSTOR SD,RA,OFFSET |
| SLOADM | 遮罩使能批量加载 | SLOAD SS,RA,MASK,OFFSET |
| SSTORM | 遮罩使能批量存储 | SSTOR SD,RA,MASK,OFFSET |

CAT 坐标运算指令集(Design In Progress)

CAT是一个基于CORDIC算法运算器的三角函数/坐标运算及坐标系转换扩展指令集

坐标运算输入输出为48bit三坐标格式，其数据格式如下：

|  |  |  |  |  |
| --- | --- | --- | --- | --- |
| [63:50] | [49:48] | [47:32] | [31:16] | [15:0] |
| Reserved | Cor Type | Cor3 | Cor2 | Cor1 |

其中Coordinate type部分的数据定义如下

|  |  |
| --- | --- |
| 值 | 坐标系 |
| 2’b00 | 直角坐标系(x,y,z) |
| 2’b01 | 圆柱坐标系(r,φ,z) |
| 2’b10 | 球坐标系(r,θ,φ) |
| 2’b11 | 非法 |

备注：其中角度的表示为

备注：最终输出到光栅贴图单元的坐标系必须为直角坐标系

光栅贴图单元

光栅贴图单元是基于有限状态机，独立于数学运算单元的固定管线DSP，将流程控制部分输入的顶点、贴图指令进行处理，将贴图输出填充入到framebuffer。