# [xhls] Squared\_difference\_accumulate

R08943007 黄聖竣

### HLS C-sim/Synthesis/Cosim (Screenshot + brief intro):

這次的實驗是運用 Vivado HLS 實作"Squared Difference Accumulate" function,其 code 如下

```
#include "../src/diff_sq_acc.h"

void diff_sq_acc(din_t a[N], din_t b[N], dout_t *dout)

int i;
int acc= 0;
int a_reg, b_reg, sub, sub2;

for(i=0; i<N; i++)
{
    //#pragma HLS PIPELINE II=1

    a_reg = a[i];
    b_reg = b[i];
    sub = a_reg - b_reg;
    sub2 = sub*sub;
    acc += sub2;
}

*dout = acc;
}</pre>
```

#### C-sim:

```
2 INFO: [SIM 4] CSIM will launch GCC as the compiler.
3 Compiling ../../../src/diff_sq_acc.cpp in debug mode
4 Generating csim.exe
5 got 442798871 expected 442798871
6 got 262947932 expected 262947932
7 got 314183194 expected 314183194
8 got 465177013 expected 465177013
9 got 704061072 expected 704061072
0 got 575273685 expected 575273685
1 got 544620712 expected 544620712
2 got 521730637 expected 521730637
3 got 220538590 expected 220538590
4 got 229031173 expected 229031173
5 TEST SUCCESS!
6 INFO: [SIM 1] CSim done with 0 errors.
8
```

### Synthesis:

#### Performance Estimates

#### □ Timing

#### 

| Clock  | Target  | Estimated | Uncertainty |  |
|--------|---------|-----------|-------------|--|
| ap_clk | 4.00 ns | 3.187 ns  | 0.50 ns     |  |

#### ■ Latency

#### 

| Latency (cycles) |         |  | Latency ( | absolute) | Interval |     |      |
|------------------|---------|--|-----------|-----------|----------|-----|------|
|                  | min max |  | min       | max       | min      | max | Туре |
|                  | 31 31   |  | 0.124 us  | 0.124 us  | 31       | 31  | none |

#### Detail

**∓** Loop

#### **Utilization Estimates**

#### □ Summary

| Name            | BRAM_18K  | DSP48E | FF     | LUT    | URAM |
|-----------------|-----------|--------|--------|--------|------|
| DSP             | -         | 1      | -      | -      | -    |
| Expression      | -         | -      | - 0    |        | -    |
| FIFO            | ance -    |        |        |        | -    |
| Instance        |           |        |        |        | -    |
| Memory          |           |        |        |        | -    |
| Multiplexer     | -         | -      | -      | 45     | -    |
| Register        | egister - |        | 44     | -      | -    |
| Total           | 0         | 1      | 44     | 66     | 0    |
| Available       | 1080      | 1700   | 406256 | 203128 | 0    |
| Utilization (%) | 0         | ~0 ~0  |        | ~0     | 0    |

### Cosim:

# Cosimulation Report for 'diff\_sq\_acc'

#### Result

|         |        | Latency |     |     | Interval |     |     |
|---------|--------|---------|-----|-----|----------|-----|-----|
| RTL     | Status | min     | avg | max | min      | avg | max |
| VHDL    | NA     | NA      | NA  | NA  | NA       | NA  | NA  |
| Verilog | Pass   | 31      | 31  | 31  | 32       | 32  | 32  |

Export the report(.html) using the Export Wizard

### System level bring-up (Pynq or U50)



### Improvement - throughput, area

因為這次的 code 部分相對簡短,因此這裡所做的優化只有將 pipeline 設成 II = 1,由下面 synthesis 比較的結果可以發現,latency 減少到約原本的一半,

### Synthesis comparisom



#### Cosim:

## Cosimulation Report for 'diff\_sq\_acc'

## Result

|         |        | Latency |     |     | Interval |     |     |
|---------|--------|---------|-----|-----|----------|-----|-----|
| RTL     | Status | min     | avg | max | min      | avg | max |
| VHDL    | NA     | NA      | NA  | NA  | NA       | NA  | NA  |
| Verilog | Pass   | 15      | 15  | 15  | 16       | 16  | 16  |

Export the report(.html) using the Export Wizard

**Github**: https://github.com/schuang23/MSOC.git