Research & Reading 1

a) (1) RISC ISAs have fewer microcode bugs than CISC ISAs. (2) Because RISC instructions are typically as simple as microinstructions and code be executed directly by the hardware, there is no need for a microcode interpreter, and then the fast memory which used for the microcode interpreter of a CISC ISA, could be repurposed to be a cache of RISC instructions. (3) Register allocators based on Gregory Chaitin’s graph-coloring scheme made it much easier for compilers to use registers efficiently, which benefited those register-register ISAs including RISC ISAs. (4) Now there are enough transistors to include a full 32-bit datapath. (5) Experiments showed that RISC ISAs had a much better performance than CISC ISAs, with a little more instruction in a program but dramatically reduced CPI.

b) VLIW failed to achieve high performance for integer programs that had less predictable cache misses or less predictable branches. However, in applications with small programs and simpler branches and omit caches including digital-signal processing, VLIW will still be available.

c) The main challenge is how to improve the performance by using more advanced semiconductor devices (or other fundamental technologies) instead of using more transistors all the time. Nowadays, manufacturing size and transistor scale are approaching physical limits.

d) There are 2 approaches mentioned in this paper:

(1) Domain-specific architectures: Domain-specific architectures are designed to a specific problem domain and offer significance performance gains (and efficiency) for that domain.

(2) Domain-specific languages: Domain-specific languages are programming languages of limited expressiveness focused on particular domains. Compared with DSAs, which need to extract information from general-purpose languages, DSLs greatly reduce the difficulty of implementation by designing their specific programming language.

Research & Reading 2

Increasing rate of transistor counts (Approximately):

70’s: 29.1% per year

80’s: 44.9% per year

90’s: 42.9% per year

00’s: 39.5% per year

10’s: 18.3% per year

Exercise 1

a) Die yield:

b) This is because processor A has a larger manufacturing size. Smaller manufacturing size means higher requirements for production equipment and are more prone to defects

Exercise 2

It’s approximately 30 million times stronger than the VAX-11/780.Exercise 3

The performance will be improved by 24.4% if A is run parallelized but everything else is still run serially.

Exercise 4

183 servers could be cooled with one cooling door.

Case Study 1

a) CPI & MIPS:

b) CPI & MIPS:

c) Old & New die yields / cost per processor:

Comment: While cost per processor reduced by 27.3%, performance only reduced by 6.3%. This remove is very helpful for cost control and have little impact on performance.

Case Study 2

After processor clock reduced by 25%, the total energy of the battery does not change, but the peak power consumption is reduced, so the battery life will be improved. However, reducing clock frequency results in lower performance. Therefore, we require a balance between performance and battery life. The better idea was to allows users to determine performance by changing clock frequency dynamically. And today almost all mobile devices provide a power saving mode, which reduces clock frequency but increases battery life.