-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Indirect memory fix #1098
Open
anooprac
wants to merge
2
commits into
gem5:stable
Choose a base branch
from
anooprac:indirect_memory_fix
base: stable
Could not load branches
Branch not found: {{ refName }}
Could not load tags
Nothing to show
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
+5
−5
Open
Indirect memory fix #1098
Changes from all commits
Commits
Show all changes
2 commits
Select commit
Hold shift + click to select a range
File filter
Filter by extension
Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that the code is wrong (here we were sending the same address to be prefetched multiple times), but I don´t think the fix is correct.
From the text:
-> "A[B[i]] ) = (B[i] << shift) + BaseAddr", where "BaseAddr is the address of A[0]", and shift is the number of bits of each element in A
-> "When software accesses B[i], IMP will prefetch and read the value of B[i + ∆]. It then uses predicted values of Coeff and BaseAddr to calculate the memory address of A[B[i + ∆]] ... and prefetch that line".
What I don´t understand from the text:
-> "IMP issues one or more prefetches for that pattern" - How can one issue one prefetch if the premise is that we always prefetch at least 2 (B[i + ∆] and A[B[i + ∆]])?
-> "The prefetch distance, the distance in the access stream between the current access and the one we prefetch, is initially small and linearly increased as more hits happen to the same pattern." - I could not find any sentence explaining how the distance is calculated. "initially small" is not a value. "linearly increased" is not a delta value. I will assume that the way we currently do it suffices.
Therefore, my understanding is (using variables from the code):
BTW, there are likely many more bugs in this code, as I don´t think it was tested. For example, checkAddressMatch seems to be the place that checks MissAddr1, but it does not check MissAddr2.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the clarification!
I still am also confused by the first part that you don't understand from the text. From the introduction of the paper. I think the distance might start at 0 based off the calcSaturation() function, but I also haven't looked into the implementation of it.
I see that there are other bugs in the code, but for this specific issue on elminating adding repetitive addresses to the queue, would it be fine to remove the for loop and add
pt_entry->baseAddr + (pt_entry->index << pt_entry->shift)
? I still am a little confused by that equation though because it never takes into account distance, but the paper does mention that as used for indexing into the B array. Is that just an unnecessary calcualtion then, and we should be fine just using pt_entry->index?Also, just so my understadning is correct, the distance is used to find the correct entry in the indirect table, but from there, we prefetch only
pt_entry->baseAddr + (pt_entry->index << pt_entry->shift)
? The paper does state multiple prefetches can be made, but it doesn't explain how.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here is my understanding from skimming through the paper:
This is how I would fix this block given the conclusions in my previous comment:
Replace
By
Alternatively, if we want to take the "increase linearly" part into account, this is how I think it could play out. However, I don´t think this is correct, because there is no point in prefetching multiple Bs if they don´t have a corresponding A prediction. i.e., we would need to use multiple pt_entry here.
The ideal scenario would be to contact the original authors and have them help review and fix this code, because I am sure many changes would be needed, and we don´t fully understand the paper.
(Make sure to not exceed the line char limit: 79 characters - if you haven´t already, make sure to read CONTRIBUTING.md)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am also re-reading the paper and am trying to understand what different addresses are added to the queue.
A part of 3.1 Key Idea says "That is, for some contiguous values of i, they access B[i] then A[B[i]]." I think this can mean that what are enqueued is addresses A[B[i]] to A[B[i + distance]] linearly. So it seems like we would increment index by one until distance.
For a specific index i, the address to be enqueued can be calcualted by ADDR( A[B[i]] ) = Coeff × B[i] + BaseAddr. Coeff is determined by shift in the gem5 code.
In terms of there being two different types of prefetching, one that is stream and one that is indirect, the code already accounts for this starting at line 92:
if (pt_entry->streamCounter >= streamCounterThreshold) { int64_t delta = addr - pt_entry->address; for (unsigned int i = 1; i <= streamingDistance; i += 1) { addresses.push_back(AddrPriority(addr + delta * i, 0)); } } pt_entry->ad
I don't think the fix you provided would be correct, but also I'm not sure what the correct solution is. PC is used to get the index, which is correct according to the paper because it finds the correct entry in the prefetch table acording to the diagrams. But that only gets you the index, which you use to index into the B array going from that index to (index + distance). Distance is also calculated in this section of code, but the only part that confuses me is how to get access to the B array. Once we have that, we use index to index into B and shift it by shift and add it to baseAddr. Then increment index up until it is equal to index + distance.
Does this make sense in context of the paper? From there, how do we get B?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed, the stream prefetcher is already there, so there is no need to add it again.
"which you use to index into the B array". The "index" is not and index into B. "index" is the a prediction of what Bs value is. i.e., Bs are predicted using pt_entry->index. Therefore, the getting a B value is trivial: just use the current PC to get the corresponding pt_entry: B==pt_entry->index.
However, I couldn´t understand how to generate multiple B values, since each PT entry contains a single index. My best guess is that instead of indexing the table using only the PC, we also need to use both the access address and generated stream addresses: