**Choice of Matrices : 2 matrices of 1024\*1024 each**

Total Number of Bit Operations in a Cycle = 2\*n2(Including XOR and AND Operations)

Total number of operations for a Normal Matrix Multiplication = 2\*n^3(n^3 multiplication and n^3 additions)

**64\*64 Matrix Multiplication**

Speed of Ethernet Communication: 500Mbps

Total Transfer in bits into the Hardware through Ethernet : ([64\*64]bits \* 17) for one block \* 16\*16 blocks = 17\*220 bits

Delay Incurred = 17\*220 bits / 500\*220 = 34 msec

Total Transfer in bits from the Hardware through Ethernet : 64\*64]bits \* 16) for one block \* 16\*16 blocks = 16 \* 220 bits = 32 msec

**Total Delay for Input and Output Blocks of Data through Ethernet = 66 msec**

Overhead Delay Incurred in Ethernet Transmission = 64 usec for every block of transfer

**Total Overhead Delay(Both to and from) for the Multiplier = 64 \* 10-6 \* 256 \* 2 = 32.7 msec**

**Computation Time**

Total number of clock cycles per computation of 32\*32 resultant output = 3270

Total Clock Cycles taken for the entire 1024\*1024 matrices = 16\*16\*3270 = 837120 clocks

**Total Time taken for the computation** = 837120/167Mhz = **5 msec**

**Total Time Taken from the Application = 388 msec**

**Computation Time = 5/(97.6+5) = 4.8%**

**Peak Computational power of the system = BOP/cycle \* clock frequency**

**= 2\* (64)2 \* 167\*106 = 1.368 TOPS**