# ML CHIP FP

312591037

葉舜良





# Operations flow and PE mapping

| Dataflow | Layer | Operations |                |    |
|----------|-------|------------|----------------|----|
| 0        | CTR   | Biases     | Weights        |    |
| 1        | L1    | CONV       | RELU           | MP |
| 2        | L2    | CONV       | RELU           | MP |
| 3        | L3    | CONV       | RELU           |    |
| 4        | L4    | CONV       | RELU           |    |
| 5        | L5    | CONV       | RELU           | MP |
| 6        | L6    | FC         | RELU           |    |
| 7        | L7    | FC         | RELU           |    |
| 8        | L8    | FC         |                |    |
| 9        | CTR   | SOFTMAX    | Classification |    |

#### Simulation Result of Cat



### Challenges and insights

- 1. It takes a very long time for the weights and biases to be loaded to each PEs, compile -03 flag can be used in compiler directive to speed up the simulation process.
- 2. Auto-Connection can be used to speed up the port connection process.
- 3. Using the high-level synthesis feature of Verilog can helps creating the complex design of the Router block.
- 4. Marking the debug breakpoint is important when doing this large design project.
- Beware of signed value transformation when taking value out of vector field when doing Verilog coding, wrong signed value that might get produced if you are not careful.

## 3x3 NoC Configuration:

- Routing Algorithm: XY-Routing
- NoC topology: Torus topology
- Virtual Channel: Not presented
- Flow Control: Ack-Nack flow control