• Weitek Business: Solutions To Floating Point Intensive Problems

• 500 Man-years Experience In Floating Point Chip Design & Marketing

• Weitek Makes Chips (Apples)

• IEEE 754 Is A System Level Spec (Oranges)



# All Generalizations Are False



#### **ALL WEITEK CHIPS**

Conform To IEEE Format (Except 2364/2365)

 (6,\*) - (0,\*) - (0,\*) - (0,\*) - (0,\*) - (0,\*) - (0,\*) - (0,\*) - (0,\*) - (0,\*) - (0,\*) - (0,\*) - (0,\*) - (0,\*) - (0,\*) - (0,\*) - (0,\*) - (0,\*) - (0,\*) - (0,\*) - (0,\*) - (0,\*) - (0,\*) - (0,\*) - (0,\*) - (0,\*) - (0,\*) - (0,\*) - (0,\*) - (0,\*) - (0,\*) - (0,\*) - (0,\*) - (0,\*) - (0,\*) - (0,\*) - (0,\*) - (0,\*) - (0,\*) - (0,\*) - (0,\*) - (0,\*) - (0,\*) - (0,\*) - (0,\*) - (0,\*) - (0,\*) - (0,\*) - (0,\*) - (0,\*) - (0,\*) - (0,\*) - (0,\*) - (0,\*) - (0,\*) - (0,\*) - (0,\*) - (0,\*) - (0,\*) - (0,\*) - (0,\*) - (0,\*) - (0,\*) - (0,\*) - (0,\*) - (0,\*) - (0,\*) - (0,\*) - (0,\*) - (0,\*) - (0,\*) - (0,\*) - (0,\*) - (0,\*) - (0,\*) - (0,\*) - (0,\*) - (0,\*) - (0,\*) - (0,\*) - (0,\*) - (0,\*) - (0,\*) - (0,\*) - (0,\*) - (0,\*) - (0,\*) - (0,\*) - (0,\*) - (0,\*) - (0,\*) - (0,\*) - (0,\*) - (0,\*) - (0,\*) - (0,\*) - (0,\*) - (0,\*) - (0,\*) - (0,\*) - (0,\*) - (0,\*) - (0,\*) - (0,\*) - (0,\*) - (0,\*) - (0,\*) - (0,\*) - (0,\*) - (0,\*) - (0,\*) - (0,\*) - (0,\*) - (0,\*) - (0,\*) - (0,\*) - (0,\*) - (0,\*) - (0,\*) - (0,\*) - (0,\*) - (0,\*) - (0,\*) - (0,\*) - (0,\*) - (0,\*) - (0,\*) - (0,\*) - (0,\*) - (0,\*) - (0,\*) - (0,\*) - (0,\*) - (0,\*) - (0,\*) - (0,\*) - (0,\*) - (0,\*) - (0,\*) - (0,\*) - (0,\*) - (0,\*) - (0,\*) - (0,\*) - (0,\*) - (0,\*) - (0,\*) - (0,\*) - (0,\*) - (0,\*) - (0,\*) - (0,\*) - (0,\*) - (0,\*) - (0,\*) - (0,\*) - (0,\*) - (0,\*) - (0,\*) - (0,\*) - (0,\*) - (0,\*) - (0,\*) - (0,\*) - (0,\*) - (0,\*) - (0,\*) - (0,\*) - (0,\*) - (0,\*) - (0,\*) - (0,\*) - (0,\*) - (0,\*) - (0,\*) - (0,\*) - (0,\*) - (0,\*) - (0,\*) - (0,\*) - (0,\*) - (0,\*) - (0,\*) - (0,\*) - (0,\*) - (0,\*) - (0,\*) - (0,\*) - (0,\*) - (0,\*) - (0,\*) - (0,\*) - (0,\*) - (0,\*) - (0,\*) - (0,\*) - (0,\*) - (0,\*) - (0,\*) - (0,\*) - (0,\*) - (0,\*) - (0,\*) - (0,\*) - (0,\*) - (0,\*) - (0,\*) - (0,\*) - (0,\*) - (0,\*) - (0,\*) - (0,\*) - (0,\*) - (0,\*) - (0,\*) - (0,\*) - (0,\*) - (0,\*) - (0,\*) - (0,\*) - (0,\*) - (0,\*) - (0,\*) - (0,\*) - (0,\*) - (0,\*) - (0,\*) - (0,\*) - (0,\*) - (0,\*) - (0,\*) - (0,\*) - (0,\*) - (0,\*) - (0,\*) - (0,\*) - (0,\*) - (0,\*) - (0,\*) - (0,\*) - (0,\*) - (0,\*) - (0,\*) - (0,\*) - (0,\*) - (0,\*) - (0,\*) - (0,\*) - (0,\*) - (0

Allow Control Of Overflow Trap

Do Not Support 80-bit Arithmetic

Do Not Support De-Norms In Hardware

Have Multiplier Arrays

## WEITEK SINGLE CHIP SOLUTIONS

- Attack System Cost
- Reduce Time To Market
- ansi C
- fortran 77
- CMOS
- 32 Registers
- 3132, 1167/3167, 3164/3364

### 3132

- 32-bits Only
- Graphics/DSP
- No Denorm, NaN or Infinity Support

Multily arena relate

- MAC Operations Does Not Obey IEEE Rounding 14 444 #0
- Ignores Underflow, Invalid, Inexact. Divide By Zero Gives Overflow
- Divide, and SQRT In Software (not IEEE)
- No Remainder Operation
- All Four Rounding Modes

## 1167/3167

- 32 + 64-bit IEEE Formats
- 386 Coprocessor
- User Controllable Traps On All Five Exceptions Can't wastant
- Cannot Restart After An Exception
- No Denorm Support (Hardware Or Software)
- All Four Rounding Modes
- ADD, SUB, MUL, DIV On Chip
- **SQRT On 3167**
- Remainder In Software
- **Transcendental Library**



## **WEITEK 3164/3364**

First Single Chip To Make Possible Complete
 Compliance To The IEEE 754 Spec In A Pipelined
 Environment.

Trap Handler Supplied To OEM's.

WTL 3164 SINGLE-PORT
AND WTL 3364 THREE-PORT
64-BIT FLOATING-POINT
DATA PATH UNITS



## **OUTLINE**

ARCHITECTURE AND FEATURES

IEEE COMPLIANCE

PERFORMANCE

STATUS AND SCHEDULE

FUTURES

WEITEK



**3X64 PRODUCT UPDATE** 



#### WTL 3164 / 3364 BLOCK DIAGRAM



#### **ARCHITECTURE**

- 64-BIT IEEE FLOATING-POINT DATA PATH
- THREE INDEPENDENT ARITHMETIC UNITS CAN OPERATE IN PARALLEL
  - 64-BIT ALU
  - 64-BIT MULTIPLIER
  - 64-BIT DIVIDE/SQRT UNIT
- REGISTER FILE: SIX-PORT, 32-DEEP BY 64-BIT-WIDE
  - THREE READ PORTS
  - THREE WRITE PORTS
  - CAN BE BYPASSED ON LOADS, STORES, AND REGISTER-TO-REGISTER OPERATIONS
- INDEPENDENT LOAD/STORE
- SIX MAJOR INTERNAL 64-BIT BUSES
- EXTENSIVE STATUS AND CONTROL LOGIC



## **ARCHITECTURE CONT'D**

- FLEXIBLE I/O STRUCTURE
  - X PORT: I/O
  - Y PORT: INPUT ONLY





- THREE 32-BIT PORTS
- SINGLE 64-BIT I/O PORT



SINGLE 32-BIT I/O PORT

WEITEK

### **ARCHITECTURE CONT'D**

- TWO-CYCLE REGISTER-TO-REGISTER LATENCY
- SINGLE-CYCLE THROUGHPUT FOR

• 
$$\sum$$
 (Xi + Yi) (requires 3364)

| • DIVIDE/SQRT LATENCY |        | <u>DIVIDE</u> | SQRT |
|-----------------------|--------|---------------|------|
|                       | 64-BIT | 17            | 30   |
|                       | 32-BIT | 10            | 16   |

• DIV/SQRT CAN OVERLAP WITH MULTIPLIER AND/OR ALU OPERATION



## **FULL FUNCTION**

- 64- AND 32-BIT FLOATING-POINT AND 32-BIT INTEGER INSTRUCTIONS
  - MULTIPLY, ADD, MULTIPLY-ADD, DIV, SQRT, ETC.
- 64-BIT LOGICAL
- ABSOLUTE VALUE
- COMPARE
- MIN/MAX
- FORMAT CONVERSION

### **FULLY INTERRUPTIBLE**

NEUT, STALL, ABORT



#### IEEE COMPLIANCE

- FULL IEEE SUPPORT IN PIPELINED ENVIRONMENT
   INCLUDING DIVIDE AND SQUARE ROOT
- ALL FOUR ROUNDING MODES
- DENORMALIZED NUMBER SUPPORT
   FAST MODE IF DNRM SUPPORT NOT REQUIRED
- FULL STATUS AND CONDITION SUPPORT
  - FPEX PIN SIGNALS OCCURENCE OF ENABLED EXCEPTION
  - EXCEPTIONS
    - SOURCE: NAN, DNRM, DVZ, INV
    - RESULT: OVF, IOVF, UNF, INX
  - FLOATING-POINT CONDITION PIN REPORTS RESULTS OF COMPARE OPERATIONS: EQ, LT, GT, UNORDERED



## SIMPLE PROGRAMMING MODEL

- REGISTER-BASED PROGRAMMING MODEL
- MATCHED LATENCY FOR ALL OPERATIONS
  - EXCEPT DIVIDE AND SQRT
- BOTH SOURCES AND DESTINATIONS SPECIFIED ONLY ONCE -- WHEN INSTRUCTION IS ISSUED
  - DESTINATION ADDRESSES ARE DELAYED AUTOMATICALLY
- ALL INFORMATION NECESSARY FOR EXCEPTION HANDLING INCLUDED ON-CHIP
  - DEDICATED STATUS REGISTERS STORE THE STATUS AND REG FILE DESTINATION ADDRESSES



## HIGH-LEVEL LANGUAGE SUPPORT

- AVAILABLE FOR BOTH 3164 AND 3364 WHEN USED IN XL SYSTEM
  - HLL COMPILERS (C, FORTRAN)DEVELOPMENT SYSTEM

  - OTHER SUPPORT: DEBUGGERS, SIMULATORS
- PERFORMANCE OF XL-3164 IN XL8164 SYSTEM

| · LINPACK (MFLOPS) (HAND-CODED BL | -/ <i>®®</i><br>-AS) | -080   |
|-----------------------------------|----------------------|--------|
| SINGLE OR DOUBLE PRECISION        | 4.7                  | 5.8    |
| · WHETSTONES (MWHETS)             |                      |        |
| SINGLE OR DOUBLE PRECISION        | 8.7                  | 10.9   |
| • DHRYSTONES                      | 11,834               | 14,793 |

## WEITEK 3164/3364

Pipe 15 ? desep an ran do must & 131 m

- 32 + 64-Bit IEEE Formats
- Denorms Handled In Software
- User Controlled Traps On All Five IEEE Exceptions
- Status Regs Allow Recovery From Exceptions
- ADD, SUB, MUL, DIV, SQRT On Chip
- DREM In Software

## 3164/3364 EXCEPTION HANDLING

Source Exceptions:

(Invalid, Divide-By-Zero)

00/00

- Stop Current Operation 1 10 mg . T the boar after mant

- Denorm him, to behindled in software

(3) muiltiply

mylt.

( wrop denorm

1 15 miles flows

 Destination Exceptions (Underflow, Overflow, Inexact)

- Operation Completes Before Trapping

wormp the denenn

 Correctly Rounded Result Supplied To User Trap Handler

Ring input 4. co other Do the add 11. 6 i4 5

and line

Icahan says that denoun could be done in more lumane

Do hardwar ... h. h. 1 HUD 10, KI, RJ. ingert Nov hone





Operations' destination/status information and their storage in the status register

Multiplier FPEX-Internal SR<sub>0</sub> 0 0 Rounding Mode Sticky Latency **NEUT- on** FPEX-SR1 I/O Mode Bypass on Delay INV DVZ **DNRM OVF** UNF INX NaN SR2 EN EN EN EN ΕN EN EN SR<sub>3</sub> NaN INV DVZ **DNRM OVF** UNF INX SR4 0 **TDESTO** MDEST0 SR5 0 0 0 ADEST0 SR6 **ASTATO** MSTAT0 SR7 0 0 0 DIVDEST

TDEST1

**ASTAT1** 

0

**FPCN** 

Carry

0

**DSR** 

in progress

5

BIT#

3

4

2

MDEST1

ADEST1

MSTAT1

DIVSTAT

1

0

Fast

Mode

**IOVF** 

EN

**IOVF** 

Non lether Sum

Ale fore your

get induffer

must fille

denover

(an't go to do no.

other

with chopping

System toop headly

always tops

on under floor

**COMMENTS** 

Modes

Modes

Trap

Enables

Sticky Bits

Destination

Destination

Status

Destination

**Destination** 

Destination

Status

Status

Status register structure

SR8

SR9

**SR10** 

**SR11** 

0

0

FPEX-

Taken

SR#

7

6



System types



Source exception on C1



Result exception on C1



Source exception on C1, undelayed FPEX



Result exception on C1, undelayed FPEX