| Author | Jason Lowe-Power |
|---|---|
| Editor | Maryam Babaie, Mahyar Samani |
| Title | ECS 201A Assignment 1 |
Originally from University of Wisconsin-Madison CS/ECE 752 .
Modified for ECS 201A, Winter 2023.
Due on 01/20/2023 11:59 pm (PST): See Submission for details
- Administrivia
- Introduction
- Workload
- Experimental setup
- Analysis and simulation
- Submission
- Grading
- Academic misconduct reminder
- Hints
You should submit your report in pairs and in PDF format. Make sure to start early and post any questions you might have on Piazza. The standard late assignemt policy applies.
In this assignment you are going to:
- measure the performance differences of a single-cycle like processor vs an in-order pipelined processor,
- see how the measured performance scales as CPU clock frequency changes,
- and see the effect of memory bandwidth and latency on measured performance.
You are going to use a matrix multiplication program as the workload for your experiments. Matrix multiplication is a commonly used kernel in many domains such as linear algebra, machine learning, and fluid dynamics.
For this assignment we are going to use a matrix multiplication program as our workload. The program takes and integer as input that determines the size of the square matrices A, B, and C.
void multiply(double **A, double **B, double **C, int size)
{
for (int i = 0; i < size; i++) {
for (int k = 0; k < size; k++) {
for (int j = 0; j < size; j++) {
C[i][j] += A[i][k] * B[k][j];
}
}
}
}You can find the definitions for the workload objects in gem5 under workloads/workloads.py.
In this assignment, we will only be using MatMulWorkload.
In order to create an object of MatMulWorkload you just need to pass matrix size (an integer) mat_size to its constructor (__init__) function.
In your configuration choose an appropriate value for mat_size. It should be large enough that it makes your workload interesting.
Since changing mat_size will influence simulation time, as a guideline, choose a value that results in simulation times less than 10 minutes (hostSeconds < 600).
We found that setting mat_size to 224 will result in a simulation time of around 5 minutes which is a reasonable compromise.
CAVEAT [PLEASE READ CAREFULLY]: When using this workload with gem5, your simulation will output two sets of statistics in the same stats.txt file. Each set of statistics start with a line like below.
---------- Begin Simulation Statistics ----------
Please make sure to ignore the second set of generated statistics in your analysis.
For this assignment, we will set up an experiment to see effect of changing a system's component on it performance. You will need to write configuration scripts using gem5 stdlib that allow you to change the CPU model, CPU and cache frequency, and memory model.
Under the components directory, you will find modules that define the different models that you should use in your configuration scritps.
- Board models: You can find all the models you need to use for your CPU (processor) under
components/boards.py. You will only be usingHW1RISCVBoardin this assignment. - CPU models: You can find all the models you need to use for your CPU (processor) under
components/processors.py. - Cache models: You can find all the models you need to use for your cache hierarchy under
components/cache_hierarchies.py. You will only useHW1MESITwoLevelCachein this assignment. - Memory models: You can find all the models you need to use for your memory under
components/memories.py.
Complete the following steps and answer the questions for your report. Collect data from your simulation runs and use simulator statistics to answer the questions. Use clear reasoning and visualization to drive your conclusions. You are allowed to submit your reports in pairs and in PDF format.
Before starting with simulations, answer the following questions in your report.
- What metrics should you use to measure the performance of a computer system? Why?
- Why is it not always possible to use the same metrics for performance to evaluate computer systems?
Before running any simulations try to answer these questions.
- At the same clock frequency, between a single-cycle CPU (
HW1TimingSimpleCPU) and an in-order pipelined CPU (HW1MinorCPU) which CPU will exhibit better performance? Why? - Between a single-cycle CPU (
HW1TimingSimpleCPU) and an in-order pipelined CPU (HW1MinorCPU) CPU which one is going to be more sensitive to changing the clock frequency? Why?
In your configuration script allow for:
- changing the CPU model between
HW1TimingSimpleCPUandHW1MinorCPU - and changing the clock frequency between
1GHz,2GHz, and4GHz
Use HW1DDR3_1600_8x8 as the memory model.
In your report, answer the same questions after simulation supported with data. A complete set of simualtion data for this step should include 6 configurations (2 options for CPU model * 3 options for clock frequency).
Before running any simulations try to answer these questions:
- If you double the double the performance of memory (double the bandwidth and halfen the latency) in a computer system, will the overall perforamance double as well? Why?
- Which CPU model (between
HW1TimingSimpleCPUandHW1MinorCPU) will benefit more from improving memory performance? Why?
In your configuration allow for:
- changing the CPU model between
HW1TimingSimpleCPUandHW1MinorCPU - and changine the memory model between
HW1DDR3_1600_8x8,HW1DDR3_2133_8x8, andHW1LPDDR3_1600_1x32.
Use 4GHz as the clock frequency.
NOTE: To become familiar with the different memory models you will use in this assignment, please read through the documentation for the different memory models in components/memories.py.
In your report, answer the same questions after simulation supported with data. A complete set of simualtion data for this step should include 6 configurations (2 options for CPU model * 3 options for memory model).
Now that you have completed your simulation runs and analyses. Answer this last question in your report.
- If you were to use a different application, do you think your conclusions would change? Why?
Your submission is split into two parts. Read the following sections for details on each part.
As part of your submission, you should include any script/code/file that might be needed to rerun your gem5 experiments. This may include configuration scripts that define set up the simulation, python/shell/etc. scripts that drive your simulations using your configuration scripts, any document including instruction on how to run your simulations. You should do this through your assignment's repository. Make sure to commit and push your changes to your local repository to your remote. Add clear and relevant commit messages to your commits. NOTE: Any commits/pushes past the assignment deadline will be ignored.
As mentioned before, you are allowed to submit your assignments in pairs and in PDF format. You should submit your report on gradescope. In your report answer the questions presented in Analysis and simulation, Analysis and simulation: Step I, Analysis and simulation: Step II,and Analysis and simulation: Step III. Use clear reasoning and visualization to drive your conclusions.
Like your submission, your grade is split into two parts.
- Reproducibility Package (50 points): 1.1 Instruction and automation to run simulations for different section and dump statistics (20 points) 1.1.a Instructions (10 points) 1.1.b Automation (10 points) 1.2 Configuration scripts and correct simulation setup (30 points): 2.5 points for each configuration as described in Analysis and simulation: Step I and Analysis and simulation: Step II
- Report (50 points): 7 points for each question presented in Analysis and simulation, Analysis and simulation: Step I, Analysis and simulation: Step II,and Analysis and simulation: Step III.
You are required to work on this assignment in teams. You are only allowed to share you scripts and code with your teammate(s). You may discuss high level concepts with others in the class but all the work must be completed by your team and your team only.
Remember, DO NOT POST YOUR CODE PUBLICLY ON GITHUB! Any code found on GitHub that is not the base template you are given will be reported to SJA. If you want to sidestep this problem entirely, don’t create a public fork and instead create a private repository to store your work.
- Start early and ask questions on Piazza and in discussion.
- If you need help, come to office hours for the TA, or post your questions on Piazza.