## Learning Algorithm



- Hidden layer transferfunction: SELU
- Softmax output layer
- one-to-one connection Hidden→Softmax
- Crossentropy as cost function

## Hardware Improvements

New learning algorithm does not require a transferfunction other than argmax for classification:

$$argmax(x) = argmax(softmax(SELU(x)))$$

- replaced one-hot encoding of the output with actual output value
- $\Rightarrow$  FPGA $\rightarrow$ CPU transmission reduced by 90% to 1B per sample.
  - increased Tiling value to 128 samples



# Speedup and Accuracy

#### On the hardware:

FPGA accuracy: 8.15% validation error CPU accuracy: 8.06% validation error

FPGA time: 0.004762924000033308 CPU time: 0.2810866919999171

FPGA has a 59.02x speedup

### In the simulation:

| Latency |        | Interval |        | Pipeline |
|---------|--------|----------|--------|----------|
| min     | max    | min      | max    | Туре     |
| 374311  | 374311 | 374312   | 374312 | none     |

### **Problems**

- Runtime on the hardware appears to be longer than in simulation. Why?
- Vivado HLS 2017.1 does not reveal, how a loop is actually pipelined. Makes it hard to find further improvements.
- The same mistake in both hls code and testbench is almost impossible to find.