

## APJ ABDUL KALAM TECHNOLOGICAL UNIVERSITY SECOND SEMESTER M.TECH DEGREE EXAMINATION, APRIL/MAY 2018

## Branch: COMPUTER SCIENCE & ENGINEERING

Stream(s): Computer Science & Engineering

## 01CS6102:PARALLEL COMPUTER:ARCHITECTURE

Answer any two full questions from each part. Limit answers to the required points.

Max. Marks: 60

Duration: 3 hours

## PART A

| 1.                                                                                                                                            | a. | A 2GHz processor was used to execute a benchmark program with the following instruction mix and clock cycle count:                                                                                                                                                                                                                                                                                                       |                         |                                       | [5]  |  |
|-----------------------------------------------------------------------------------------------------------------------------------------------|----|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-------------------------|---------------------------------------|------|--|
|                                                                                                                                               |    | Instruction Type                                                                                                                                                                                                                                                                                                                                                                                                         | Instruction Count       | Clock cycle count                     |      |  |
|                                                                                                                                               |    | Integer arithmetic                                                                                                                                                                                                                                                                                                                                                                                                       | 450 K                   | 1                                     |      |  |
|                                                                                                                                               |    | Data Transfer                                                                                                                                                                                                                                                                                                                                                                                                            | 320 K                   | 2                                     |      |  |
|                                                                                                                                               |    | Floating Point                                                                                                                                                                                                                                                                                                                                                                                                           | 150 K                   | 2                                     |      |  |
|                                                                                                                                               |    | Control Transfer                                                                                                                                                                                                                                                                                                                                                                                                         | 80 K                    | 2                                     |      |  |
|                                                                                                                                               |    | Determine the effec                                                                                                                                                                                                                                                                                                                                                                                                      | tive CPI, MIPS rate, an | d execution time for this program.    |      |  |
| b. What are the different kinds of parallelism in applications? Which a major ways to exploit the different kinds of application parallelism? |    |                                                                                                                                                                                                                                                                                                                                                                                                                          |                         |                                       | `[4] |  |
| 2.                                                                                                                                            | a. | . Explain about different types of data dependences with examples.                                                                                                                                                                                                                                                                                                                                                       |                         |                                       | [4]  |  |
|                                                                                                                                               | b. | Explain the concept parallelism.                                                                                                                                                                                                                                                                                                                                                                                         | of basic pipeline sche  | duling for exposing instruction level | [3]  |  |
|                                                                                                                                               | C. | What is the limitation of a 1-bit prediction scheme? How it is overcome in a 2-bit prediction scheme?                                                                                                                                                                                                                                                                                                                    |                         |                                       |      |  |
| 3.                                                                                                                                            | a. | Determine the number of clock cycles required to process a program with 300 instructions in a six stage pipeline.                                                                                                                                                                                                                                                                                                        |                         |                                       | [2]  |  |
|                                                                                                                                               | b. | In order to improve the performance we decided to replace the processor used for web processing. Assuming that the original processor is busy with computation 30% of the time and is waiting for I/O 70% of the time. If the overall speedup gained by incorporating the enhancement is 1.3986, how much faster would be the new processor on computation in the web servicing application than the original processor. |                         |                                       |      |  |

|    | c. | Write the dependencies existing between the various instructions in the                                                                                                                                       | [3] |  |  |  |
|----|----|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-----|--|--|--|
|    |    | following code.                                                                                                                                                                                               |     |  |  |  |
|    |    | ADD.D F2,F1,F4                                                                                                                                                                                                |     |  |  |  |
|    |    | MUL.D F4,F1,F8<br>S.D F2,0(R1)                                                                                                                                                                                |     |  |  |  |
|    |    | SUB.D F8,F4,F14                                                                                                                                                                                               |     |  |  |  |
|    |    | MUL.D F14,F1,F10                                                                                                                                                                                              |     |  |  |  |
|    |    | PART B                                                                                                                                                                                                        |     |  |  |  |
| 4. | a. | What is the purpose of reservation stations in Tomasulo's approach. Also write its structure.                                                                                                                 | [4] |  |  |  |
|    | b. | What are the different steps involved in instruction execution in a system which supports hardware based speculation.                                                                                         | [5] |  |  |  |
| 5. | a. | With figure explain about different vector-access memory schemes.                                                                                                                                             |     |  |  |  |
|    | b. | Suppose we have 8 memory banks with a bank busy time of 6 clocks and a total memory latency of 12 cycles. How long will it take to complete a 64-element vector load with a stride of 1? With a stride of 32? | [3] |  |  |  |
| 6. | a. | Explain about NVIDIA GPU computational structures.                                                                                                                                                            |     |  |  |  |
|    | b. | Describe the intel core i7 pipeline structure.                                                                                                                                                                |     |  |  |  |
|    | c. | What is a super scalar processor?                                                                                                                                                                             | [1] |  |  |  |
|    |    | PART C                                                                                                                                                                                                        |     |  |  |  |
| 7. | a. | What do you understand by blocking and non-blocking networks? Give examples for each.                                                                                                                         | [4] |  |  |  |
|    | b. | What are the advantages of using multiport memory?                                                                                                                                                            | [2] |  |  |  |
|    | C. | Draw a 16x16 Omega network built with 4x4 switches. Explain the routing of a message from 1011 to 0110.                                                                                                       | [6] |  |  |  |
| 8. | a. | Explain in detail about the snoopy protocols.                                                                                                                                                                 |     |  |  |  |
|    | b. | With suitable figure explain about centralized shared memory architecture. Point out its difference with distributed shared memory architecture.                                                              | [6] |  |  |  |
| 9. | a. | Explain the basic concept of a directory based cache coherence scheme. Also explain about the three classes of cache directories.                                                                             | [8] |  |  |  |
|    | b. | What are the reasons which cause the cache inconsistencies in a multiprocessor system?                                                                                                                        | [4] |  |  |  |