Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 9 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -24,4 +24,12 @@ Makefile
Makefile.in
*.dirstamp
sst-unit-test/*.out
.vscode/
.vscode/
*.diff
*.pyc
macsim_traces/
macsim
params.in
trace_file_list
test_run/
AUDIT.md
2 changes: 1 addition & 1 deletion .gitmodules
Original file line number Diff line number Diff line change
Expand Up @@ -6,4 +6,4 @@
url = https://github.com/cameron314/readerwriterqueue
[submodule "tools/CUDA_trace_generator"]
path = tools/CUDA_trace_generator
url = git@github.com:ejchung0406/CUDA_trace_generator.git
url = https://github.com/gthparch/Macsim_tracer.git
192 changes: 171 additions & 21 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,11 +1,12 @@
# Macsim
# MacSim

## Introduction

* MacSim is a heterogeneous architecture timing model simulator that is
developed from Georgia Institute of Technology.
MacSim is a trace-based cycle-level GPGPU simulator developed by [HPArch](https://sites.gatech.edu/hparch/) at Georgia Institute of Technology.

* It simulates x86, ARM64, NVIDIA PTX and Intel GEN GPU instructions and can be configured as
either a trace driven or execution-drive cycle level simulator. It models
detailed mico-architectural behaviors, including pipeline stages,
either a trace driven or execution-driven cycle level simulator. It models
detailed micro-architectural behaviors, including pipeline stages,
multi-threading, and memory systems.
* MacSim is capable of simulating a variety of architectures, such as Intel's
Sandy Bridge, Skylake (both CPUs and GPUs) and NVIDIA's Fermi. It can simulate homogeneous ISA multicore
Expand All @@ -14,10 +15,24 @@
cores) and SMT or MT architectures as well.
* Currently interconnection network model (based on IRIS) and power model (based
on McPAT) are connected.
* MacSim is also one of the components of SST, so multiple MacSim simulatore
* MacSim is also one of the components of SST, so multiple MacSim simulators
can run concurrently.
* The project has been supported by Intel, NSF, Sandia National Lab.

## Table of Contents
- [Note](#note)
- [Intel GEN GPU Architecture](#intel-gen-gpu-architecture)
- [Documentation](#documentation)
- [Installation](#installation)
- [Quick Start](#quick-start)
- [Downloading Traces](#downloading-traces)
- [Generating Your Own Traces](#generating-your-own-traces)
- [Known Bugs](#known-bugs)
- [People](#people)
- [Q & A](#q--a)
- [Tutorial](#tutorial)
- [SST+MacSim](#sstmacsim)

## Note

* If you're interested in the Intel's integrated GPU model in MacSim, please refer to [intel_gpu](https://github.com/gthparch/macsim/tree/intel_gpu) branch.
Expand All @@ -38,40 +53,175 @@

Please see [MacSim documentation file](https://github.com/gthparch/macsim/blob/master/doc/macsim.pdf) for more detailed descriptions.

## Installation

### Prerequisites

- **zlib** (development library)
```bash
# Ubuntu/Debian
sudo apt install zlib1g-dev
# RHEL/CentOS/Fedora
sudo dnf install zlib-devel
```

- **Python >= 3.11** and **SCons** (build tool)
```bash
uv venv
uv pip install scons
```

Optionally, activate the virtual environment so you can omit `uv run`:
```bash
source .venv/bin/activate
```

## Download
### Clone and Build

* You can download the latest copy from our git repository.
```bash
git clone https://github.com/gthparch/macsim.git --recursive
cd macsim
./build.py --ramulator -j 32

# Or without activating the virtual environment:
uv run ./build.py --ramulator -j 32
```
git clone -b intel_gpu https://github.com/gthparch/macsim.git

download traces
/macsim/tools/download_trace_files.py
For more build options, see `./build.py --help`.

## Quick Start

This section walks you through downloading a trace, setting up the simulation, and running it.

### 1. Download a Sample Trace

```bash
uv pip install gdown
gdown -O macsim_traces.tar.gz 1rpAgIMGJnrnXwDSiaM3S7hBysFoVhyO1
tar -xzf macsim_traces.tar.gz
rm macsim_traces.tar.gz
```
## build
./build.py --ramulator
(please see /macsim/INSTALL)

## People
This will extract sample traces from the [Rodinia benchmark suite](https://github.com/yuhc/gpu-rodinia) into a `macsim_traces/` directory.

### 2. Set Up a Run Directory

* Prof. Hyesoon Kim (Project Leader) at Georgia Tech
Hparch research group
(http://hparch.gatech.edu/people.hparch)
You need three files in the same directory to run a simulation:
- `macsim` — the binary executable
- `params.in` — GPU configuration
- `trace_file_list` — list of paths to GPU traces

Copy them from the build output:

```bash
mkdir run
cp bin/macsim bin/params.in bin/trace_file_list run/
cd run
```

### 3. Set Up the Trace Path

Edit `trace_file_list`. The first line is the number of traces, and the second line is the path to the trace:

```
1
/absolute/path/to/macsim_traces/hotspot/r512h2i2/kernel_config.txt
```

### 4. Run

```bash
./macsim
```

Simulation results will appear in the current directory. For example, check `general.stat.out` for the total cycle count:

```bash
grep CYC_COUNT_TOT general.stat.out
```

## Q & A
> **Note:** The parameter file must be named `params.in`. The macsim binary looks for this exact filename in the current directory.

If you have a question, please use github issue ticket.
### 5. Run All Benchmarks

To run all downloaded traces and verify the build:

```bash
mkdir -p test_run && cp bin/macsim bin/params.in test_run/
cd test_run
for trace in ../macsim_traces/*/; do
name=$(basename $trace)
subdir=$(ls -d $trace/*/kernel_config.txt 2>/dev/null || ls $trace/kernel_config.txt 2>/dev/null)
[ -z "$subdir" ] && continue
printf "1\n$(realpath $subdir)\n" > trace_file_list
result=$(timeout 120 ./macsim 2>&1 | grep "finalize" | head -1)
echo "$name: $result"
done
```

## Downloading Traces

### Publicly Available Traces

| Dataset | Download |
|---------|----------|
| Rodinia | [Download](https://www.dropbox.com/scl/fi/qyqk9yuxaut0f9490k5n3/pytorch_nvbit.tar.gz?rlkey=dgq53t37k38izawacgxdkqxsw&st=fbvchdmw&dl=0) |
| PyTorch | [Download](https://www.dropbox.com/scl/fi/otaiy3gnmkcrexy66hkez/rodinia_nvbit.tar.gz?rlkey=w2pa56a0ik42zydl0incogc99&st=y3ki6xyy&dl=0) |
| YOLOPv2 | [Download](https://www.dropbox.com/scl/fi/srmp7cp2uw6lup34j4keg/yolopv2.tar.gz?rlkey=s5pg7dhdub7jofit3omy446n3&st=d6dfq6uy&dl=0) |
| GPT2 | [Download](https://www.dropbox.com/scl/fi/qn72hfwyeo5qq120kyade/gpt2_nvbit.tar.gz?rlkey=pal8q77bwf4iarypfts2osus3&st=cmjslv8o&dl=0) |
| GEMMA | [Download](https://www.dropbox.com/scl/fi/ewcyrogwv7odc6soi9v6n/gemma_nvbit.tar.gz?rlkey=arifvlad3kj9tcw6ogze7n04m&st=66fbac0t&dl=0) |

## Generating Your Own Traces

> **Warning:** The trace generation tool is experimental — use at your own risk.

To generate traces for your own CUDA workloads, use the [MacSim Tracer](https://github.com/gthparch/Macsim_tracer).

Simply prepend `CUDA_INJECTION64_PATH` to your original command. For example:

```bash
CUDA_INJECTION64_PATH=/path/to/main.so python3 your_cuda_program.py
```

Available environment variables:

| Variable | Description | Default |
|----------|-------------|---------|
| `TRACE_PATH` | Path to save trace files | `./` |
| `KERNEL_BEGIN` | First kernel to trace | `0` |
| `KERNEL_END` | Last kernel to trace | `UINT32_MAX` |
| `INSTR_BEGIN` | First instruction to trace per kernel | `0` |
| `INSTR_END` | Last instruction to trace per kernel | `UINT32_MAX` |
| `COMPRESSOR_PATH` | Path to the compressor binary | (built with tracer) |
| `DEBUG_TRACE` | Generate human-readable debug traces | `0` |
| `OVERWRITE` | Overwrite existing traces | `0` |
| `TOOL_VERBOSE` | Enable verbose output | `0` |

See the [MacSim Tracer README](https://github.com/gthparch/Macsim_tracer) for full installation and usage instructions.

## Known Bugs

1. **`src/memory.cc:1043: ASSERT FAILED`** — Happens with FasterTransformer traces + too many cores (40+). **Solution:** Reduce the number of cores.

2. **`src/factory_class.cc:77: ASSERT FAILED`** — Happens when `params.in` file is missing or has a wrong name. **Solution:** Use `params.in` as the config file name.

3. **`src/process_manager.cc:826: ASSERT FAILED ... error opening trace file`** — Too many trace files open simultaneously. **Solution:** Add `ulimit -n 16384` to your `~/.bashrc`.

## People

* Prof. Hyesoon Kim (Project Leader) at Georgia Tech
Hparch research group
(http://hparch.gatech.edu/people.hparch)

## Q & A

If you have a question, please use github issue ticket.

## Tutorial

* We had a tutorial in HPCA-2012. Please visit [here](http://comparch.gatech.edu/hparch/OcelotMacsim_tutorial.html) for the slides.
* We had a tutorial in ISCA-2012, Please visit [here](http://comparch.gatech.edu/hparch/isca12_gt.html) for the slides.


## SST+MacSim

* Here are two example configurations of SST+MacSim.
Expand Down
4 changes: 1 addition & 3 deletions SConscript
Original file line number Diff line number Diff line change
Expand Up @@ -47,9 +47,7 @@ env['CPPPATH'] = ['#src']
env['CPPDEFINES'] = ['LONG_COUNTERS', 'NO_MPI']
env['LIBPATH'] = ['/usr/lib', '/usr/local/lib']

## MAC OS X does not support static linking
if sys.platform != "darwin" and flags.get('qsim') != '1':
env['LINKFLAGS'] = ['--static']
env['LINKFLAGS'] = []
# env['CXX'] = ['icpc']


Expand Down
1 change: 0 additions & 1 deletion scripts/knobgen.pl
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,6 @@
#knobgen.pl for use with componentMACSIM

use File::stat;
use Time::localtime;

##### VARIABLES #####
#all parameter definitions
Expand Down
1 change: 0 additions & 1 deletion scripts/statgen.pl
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,6 @@
# (tri pho 07-11-2011 rewrite for component macsim)

use File::stat;
use Time::localtime;

#search for *.param.def files
@files = <../def/*.stat.def>;
Expand Down
6 changes: 6 additions & 0 deletions src/core.cc
Original file line number Diff line number Diff line change
Expand Up @@ -269,6 +269,8 @@ core_c::core_c(int c_id, macsim_c* simBase, Unit_Type type) {
// hardware prefetcher
if (*m_simBase->m_knobs->KNOB_PREF_FRAMEWORK_ON && m_knob_enable_pref)
m_hw_pref = new hwp_common_c(c_id, type, m_simBase);
else
m_hw_pref = NULL;

// const / texture cache
if ((m_core_type == "ptx" || m_core_type == "nvbit") &&
Expand Down Expand Up @@ -333,6 +335,10 @@ core_c::~core_c() {
delete m_schedule;
delete m_retire;
delete m_icache;
delete m_hw_pref;
delete m_const_cache;
delete m_texture_cache;
delete m_shared_memory;
}

// start core simulation
Expand Down
6 changes: 1 addition & 5 deletions src/frontend.cc
Original file line number Diff line number Diff line change
Expand Up @@ -173,11 +173,7 @@ frontend_c::~frontend_c() {
// fetch stage - fetch instruction from the icache every cycle
void frontend_c::run_a_cycle(void) {
// bind core id
static bool map_core = false;
if (!map_core) {
m_core = m_simBase->m_core_pointers[m_core_id];
map_core = false;
}
m_core = m_simBase->m_core_pointers[m_core_id];

m_cur_core_cycle = m_simBase->m_core_cycle[m_core_id];

Expand Down
Loading
Loading