Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cpu-o3: Support non-block prefetch inst. #667

Closed
wants to merge 4 commits into from

Conversation

seanzw
Copy link
Contributor

@seanzw seanzw commented Dec 8, 2023

By default, the prefetch instruction in x86 is implemented as normal load, and blocks the O3 cpu to wait for the response. This causes performance problem when we validate gem5 against the MKL library. This commit adds a flag to allow O3 cpu to directly commit a prefetch instruction after it's issued to memory.

@andysan andysan requested a review from giactra December 8, 2023 08:05
Copy link
Contributor

@powerjg powerjg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One small style thing. Otherwise, it looks good to me.

Could you explain the reasoning behind making this a parameter and not changing the code so that it is always unblocked on prefetches?

void
LSQ::LSQRequest::setWBToRegister()
{
bool needWritebackToReg = false;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should be snake_case

@giactra
Copy link
Contributor

giactra commented Dec 8, 2023

This change makes sense to me, but I am wondering if at this point we shouldn't re-evaluate what we are doing in the classic caches when receiving a soft prefetch:

https://github.com/gem5/gem5/blob/stable/src/mem/cache/cache.cc#L368

The cache assumes the CPU is expecting a response and it is even cloning a new packet in order to reply as soon as possible... IMHO we should be careful here

@Harshil2107
Copy link
Contributor

Hi @seanzw ,

All pull requests to gem5 require Change IDs to be present in all commit messages. The commits in this PR do not have Change IDs. Please follow these steps to add a change ID to your commits:

  1. Run the following commands:
f=.git/hooks/commit-msg
mkdir -p $f
curl -Lo $f https://gerrit-review.googlesource.com/tools/hooks/commit-msg
chmod +x $f/commit-msg
  1. Then, amend the commit with git commit --amend --no-edit, and update your pull request.

Change-Id: I5883354176ba2e29680b87412915dbd970092a8a
@seanzw seanzw force-pushed the o3cpu-non-block-prefetch-inst branch from 7c727e1 to 1b68446 Compare December 8, 2023 20:29
Change-Id: Ic8522b357f3b9d2e43c6449a0ded7c4d22775327
@seanzw
Copy link
Contributor Author

seanzw commented Dec 8, 2023

Thanks for the help on correcting the change-id @Harshil2107 !

@powerjg I fixed the local variable name, and changed to enable non-block prefetch instruction by default. Originally I am trying to be conservative, but I think it's better to enable by default and see if this breaks any tests.

@giactra Oh I didn't know this behavior in classic cache. I mainly work on Ruby these days and it does not reply early -- all the latency is exposed to the core. Maybe we can do something similar at the RubySequencer (which handles the communication between the core and the ruby). But at least I hope this change won't break the classic cache -- when the reply comes back, the self-owned LSQRequest should free itself without causing problem.

@seanzw
Copy link
Contributor Author

seanzw commented Dec 11, 2023

Oops testing failed. I will try to investigate on my local machine and see if I can fix it.

@powerjg
Copy link
Contributor

powerjg commented Dec 12, 2023

I'm re-running the jobs. We had some problems with our runners last week and over the weekend. It may not be your code that caused the failures :)

@ivanaamit ivanaamit added the cpu-o3 gem5's Out-Of-Order CPU label Jan 3, 2024
@powerjg
Copy link
Contributor

powerjg commented Jan 12, 2024

Oh I didn't know this behavior in classic cache. I mainly work on Ruby these days and it does not reply early -- all the latency is exposed to the core. Maybe we can do something similar at the RubySequencer (which handles the communication between the core and the ruby). But at least I hope this change won't break the classic cache -- when the reply comes back, the self-owned LSQRequest should free itself without causing problem.

@seanzw Do we need to take any action on this right now? Is there any chance that your PR will affect the performance of prefetchers in the classic caches? I doubt this is tested rigorously, so we will need to instead investigate the code paths to check if anything breaks.

@seanzw
Copy link
Contributor Author

seanzw commented Jan 12, 2024

After reconsidering the situation, I think this patch can be dropped. A better solution is actually implementing the early reply behavior in the RubySequencer, similar to the classic cache. That would also work for all the CPU models and Ruby protocols.

@seanzw seanzw closed this Jan 12, 2024
@powerjg
Copy link
Contributor

powerjg commented Jan 12, 2024

Thanks for letting us know. Is this something you're working on actively? Or, should we make an issue to track it for the future?

@seanzw
Copy link
Contributor Author

seanzw commented Jan 12, 2024

That's a good point. I am not currently working on this. I have created an simple issue here #768

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cpu-o3 gem5's Out-Of-Order CPU
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants