Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BaseCache::recvTimingResp can trigger an assertion error from getTarget() due to MSHR in senderState having no targets #100

Closed
BobbyRBruce opened this issue Jul 19, 2023 · 4 comments
Labels
bug classic caches Classic Caches and Coherence

Comments

@BobbyRBruce
Copy link
Member

BobbyRBruce commented Jul 19, 2023

This was identified by Eliot moss on our gem5-users mailing list (https://www.mail-archive.com/gem5-users@gem5.org/msg21771.html). Copy and pasted verbatim:

Dear gem5-ers -

I've run across something, which unfortunately I cannot reliably repeat, that
suggests an oversight in the code of src/mem/cache/base.cc.  In particular, it
appears that a HardPFResp can arrive where the MSHR remembered in the packet's
senderState no longer has any targets.  This causes this line to fail with an
assertion error in getTarget():

const QueueEntry::Target *initial_tgt = mshr->getTarget();

Presumably some suitable conditionalization on mshr->hasTargets() could be
used to fix things, unless the problem is that somehow the target(s)
disappeared when they should not have.  My suspicion is that something to do
with snooping by another cache caused the target to go away, and the situation
should just be ignored, but I did not want to attempt a fix along the lines of
testing hasTargets() without further confirmation.  There is also the question
of what to do about the stats in this case, there being no obvious basis for
determining a latency.  [We could change the senderState to include the time
the prefetch began, though, and use HardPFReq as the command.]

Regards - Eliot Moss

The const QueueEntry::Target *initial_tgt = mshr->getTarget(); line mentioned in the email refers to:

const QueueEntry::Target *initial_tgt = mshr->getTarget();

@BobbyRBruce BobbyRBruce added bug classic caches Classic Caches and Coherence labels Jul 19, 2023
@eliotmoss
Copy link
Contributor

The configuration where this seems to have happened (it's not an easily repeated thing) is ARM DerivO3 with two processors. The setup is not entirely symmetrical - one "host" processor runs Linux has has at least an L2 cache. The other "embedded" processor is more "bare metal" and has at least an L1 cache. Their point of mutual connection is the system's membus.

If desired I can propose a patch that avoids doing getTarget when there are no targets, and supplies something reasonable for incrementing the statistics (namely the command of the packet received).

@BobbyRBruce BobbyRBruce changed the title BaseCache::recvTimingResp can trigger in getTarget() assertion error due to MSHR in senderState having no targets BaseCache::recvTimingResp can trigger an assertion error from getTarget() due to MSHR in senderState having no targets Jul 19, 2023
@BobbyRBruce
Copy link
Member Author

My worry is this could symptomatic of a bigger bug. If we work around the assertion we may struggle to ever detect it. I'm tempted to wait until we can reproduce it, explain why it happens, or someone has a good explanation as to why it's ok for the MSHR in senderState to have no targets at this point.

@eliotmoss
Copy link
Contributor

That's reasonable, Bobby. While I've dug into - and extended - various parts of cache behavior, prefetching is one part I have not. I'll see if I can make some time to dig out how targets might disappear while a prefetch is pending. My guess is that some kind of snoop may have caused contents, including some outstanding MSHRs / targets, to be dropped, but that just a starting point for exploration. Another angle would be an eviction. Happy if anybody else chooses to look into it as well, of course!

@eliotmoss
Copy link
Contributor

I wonder about this line:

//@todo remove hw_pf here

@ivanaamit ivanaamit closed this as not planned Won't fix, can't repro, duplicate, stale Jan 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug classic caches Classic Caches and Coherence
Projects
None yet
Development

No branches or pull requests

3 participants