Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory policy of AllCache causes SMAUG to crash in simulation #102

Closed
xyzsam opened this issue Oct 16, 2021 · 1 comment
Closed

Memory policy of AllCache causes SMAUG to crash in simulation #102

xyzsam opened this issue Oct 16, 2021 · 1 comment
Assignees

Comments

@xyzsam
Copy link
Member

xyzsam commented Oct 16, 2021

Reported by user daecheol.you@samsung.com:

During examining SMAUG simulator, I noticed that three memory interfaces are possible: DMA, ACP and cache.

It seems that ACP supports I/O coherency, and cache supports full coherency.

I ran the minerva sample model with ACP interrface succesfully, but there is a problem when the memory interface is set to cache.

Following is the procedure I took.

  • Generate minerva pbtxt and pb file with the memory policy of 'AllCache' by modifying the Python model script like below:

    with sg.Graph(name="minerva_smv_cache", backend="SMV", mem_policy=sg.AllCache) as graph:

  • Modify model_files so that 'topo_file' and 'params_file' point to the generated pbtxt and pb file.

  • Generate dynamic_trace_acc0.gz file with trace.sh script.

  • Modify 'memory_type' in gem5.cfg to cache

When simulation started, page mapping occurs like below:

40086105600: system.acc0_datapath: Setting host_a to memory type cache.
40086199200: system.acc0_datapath: Setting host_b to memory type cache.
40086235200: system.acc0_datapath: Setting host_results to memory type cache.
40090416000: system.acc0_datapath: Inserting array label mapping host_results -> vpn 0x3739a0, size 512.
40090416000: system.acc0_datapath: Mapping vaddr 0x3739a0 -> paddr 0x1a839a0.
40090416000: system.acc0_datapath: Inserting TLB entry vpn 0x373000 -> ppn 0x1a83000.
40092144000: system.acc0_datapath: Inserting array label mapping host_a -> vpn 0x3aafa0, size 1568.
40092144000: system.acc0_datapath: Mapping vaddr 0x3aafa0 -> paddr 0x1abafa0.
40092144000: system.acc0_datapath: Inserting TLB entry vpn 0x3aa000 -> ppn 0x1aba000.
40092144000: system.acc0_datapath: Mapping vaddr 0x3abfa0 -> paddr 0x1abbfa0.
40092144000: system.acc0_datapath: Inserting TLB entry vpn 0x3ab000 -> ppn 0x1abb000.
40093368000: system.acc0_datapath: Inserting array label mapping host_b -> vpn 0x3e4e80, size 25088.
40093368000: system.acc0_datapath: Mapping vaddr 0x3e4e80 -> paddr 0x1bd7e80.
40093368000: system.acc0_datapath: Inserting TLB entry vpn 0x3e4000 -> ppn 0x1bd7000.
40093368000: system.acc0_datapath: Mapping vaddr 0x3e5e80 -> paddr 0x1bd8e80.
40093368000: system.acc0_datapath: Inserting TLB entry vpn 0x3e5000 -> ppn 0x1bd8000.
...
40093368000: system.acc0_datapath: Mapping vaddr 0x3ebe80 -> paddr 0x1bdee80.
40093368000: system.acc0_datapath: Inserting TLB entry vpn 0x3eb000 -> ppn 0x1bde000.

However, at the start of exeuction, memory access to a strange address occurs:

40094148000: system.acc0_datapath: issueTLBRequestTiming for trace addr: 0xd3a8c0

Thus, simulation fails with the error message below:

fatal: An error occurred during cache access to trace virtual address 0xd3a8c0 at node 70: Could not find a virtual address mapping for array "". Please ensure that you have called mapArrayToAccelerator() with the correct array name parameter.

Did I configure something wrong or misunderstand SMAUG operations?

I would really appreciate if you give some advice for it.

I am able to reproduce this issue. It looks like it mostly due to not correctly looking up the array name for a host memory access when it is accessed directly via virtual memory (aka caching). I suspect that since this memory policy has not been used very heavily in the past, the code regressed relative to DMA or ACP, which has seen heavier use. Still investigating.

@xyzsam
Copy link
Member Author

xyzsam commented Oct 21, 2021

Apart from harvard-acc/ALADDIN#43, the other question is why this only happens with MemoryPolicy = AllCache, rather than DMA or ACP. The reason here is that in gem5-aladdin, when we say "map this array to a cache", we really mean "replace the scratchpad for this array with an L1 cache entirely". It applies only to the memory on the accelerator side. But in the SMAUG context, the memory policy of "AllCache" just means "when you copy data from the host to the accelerator, get the data from the host through normal virtual memory, not by sending DMA or ACP requests". In other words, "caching" in a MemoryPolicy refers to the mechanism of how you get the data, not the physical place where data is stored.

This mode of fetching data is not supported in gem5-aladdin because it's really no different from using ACP as the transport mechanism. I will send a patch to remove AllCache as a valid MemoryPolicy, because it's not.

If you were interested in replacing the scratchpads on the accelerators with a cache, that's done differently. Go into the smv-accel.cfg file (which configures the accelerator) and replace all the "partition,cyclic" lines with "cache". Then configure the cache itself in gem5.cfg by changing memory_type=cache and updating cache_size=xxkB. This will require #104 to fix a small bug with missing files. Once that's submitted, update SMAUG's submodules (git submodule update), and it should work for you (I just tested it).

xyzsam added a commit to xyzsam/smaug that referenced this issue Oct 21, 2021
AllCache as a memory policy doesn't make sense. A MemoryPolicy specifies
*how* the data is moved, not where it gets stored after it is moved.
The latter is specified in the Aladdin config files (scratchpads or
caches). This option, when used, causes simulations to fail because
the logic to handle this case was not added for
HybridDatapath::handleCacheMemoryOp; however, in practice it is almost
the same as using ACP, which *does* handle it. So, we just drop this
policy as an option altogether.

Fixes issue harvard-acc#102.
xyzsam added a commit that referenced this issue Oct 22, 2021
AllCache as a memory policy doesn't make sense. A MemoryPolicy specifies
*how* the data is moved, not where it gets stored after it is moved.
The latter is specified in the Aladdin config files (scratchpads or
caches). This option, when used, causes simulations to fail because
the logic to handle this case was not added for
HybridDatapath::handleCacheMemoryOp; however, in practice it is almost
the same as using ACP, which *does* handle it. So, we just drop this
policy as an option altogether.

Fixes issue #102.
@xyzsam xyzsam closed this as completed Oct 23, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant