New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Test] Big atomic cleanup and futex_waitv support for Linux #14403
Conversation
@@ -77,8 +77,8 @@ namespace utils | |||
std::vector<u8> data; | |||
atomic_t<u64> m_size = 0; | |||
atomic_t<u64> duration_ms = 0; | |||
atomic_t<bool> track_fully_decoded{false}; | |||
atomic_t<bool> track_fully_consumed{false}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't like this. Can't you make it opaque to the user (developer) with some template magic?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think it's possible to change atomic bool to use 32-bit storage without breaking some PS3 struct (possibly in future as well)
rpcs3/headless_application.cpp
Outdated
@@ -60,8 +60,7 @@ void headless_application::InitializeCallbacks() | |||
|
|||
return false; | |||
}; | |||
callbacks.call_from_main_thread = [this](std::function<void()> func, atomic_t<bool>* wake_up) | |||
{ | |||
callbacks.call_from_main_thread = [this](std::function<void()> func, atomic_t<u32>* wake_up) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is exactly why the clang-format is the way it is right now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tweaked it for the lesser evil, because otherwise the whole lambda gets shifted right.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But that's fine. GalCiv and I weighed the pro's and cons last time we updated the clang-format.
Having the indentation in a lambda or not is both allowed and not nitpicked at the moment.
There are some bugs in clangformat with the inline version.
There are worse format issues that happen in different cases.
I don't remember the details, but I remember that it was easier to use the current settings all things considered.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I'll remove AllowShortLambdasOnASingleLine. The other tweak should be fine though?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Idk. I'd have to test with existing code
1095e61
to
b0b9146
Compare
{ | ||
m_head.template wait<Flags>(nullptr); | ||
utils::bless<atomic_t<u32>>(&m_head)[1].wait(0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just like lf_queue has a specalization for waiting on nullptr, so do pointers IMO.
Just simply booting up rpcs3 with this test pr build causes some irregular log entries that is not typically seen on the master build. Enumeration of PC Specs would occur without any of those entries on master in a single block of information. Besides that observation, tested Persona 5 and found no significant difference in performance between Master and PR. Master - 105/80/68 FPS (Average/1%/0.1% FPS) Test build - 106/80/69 FPS (Average/1%/0.1% FPS) |
Updated, please retest |
This PR includes potential fix for arm64 architecture related to incorrect pointer size assumptions. |
Prevents implementing thread priority on Linux.
3ad8195
to
51b8e64
Compare
In order to make this possible, some unnecessary features were removed.
@@ -14,8 +14,7 @@ if(WITH_LLVM) | |||
option(LLVM_INCLUDE_TESTS OFF) | |||
option(LLVM_INCLUDE_TOOLS OFF) | |||
option(LLVM_INCLUDE_UTILS OFF) | |||
# we globally enable ccache | |||
set(LLVM_CCACHE_BUILD OFF CACHE BOOL "Set to ON for a ccache enabled build") | |||
option(LLVM_CCACHE_BUILD ON) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why did you set LLVM_CCACHE_BUILD
to ON
? On Linux the command line for LLVM files looks like this:
ccache ccache c++ ...
ccache
is called twice.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Somehow it worked fine until the changes in LLVM_CCACHE_BUILD, afterwards it started to rebuild llvm after unrelated changes.
@cipherxof what is your distro/kernel? Also you can try to disable futex_waitv in atomic.cpp and compare |
OS: Arch Linux This PR is definitely the reason for the performance gains. However, even with it disabled I am still getting slightly better performance than I was previously. For some reason my frame times are spiking more than before, although that may just be something with my system so I'll need to re-compile an old custom build to test that. Another thing worth noting (bare with me here) is that before this PR, Metal Gear Online required some specific changes in order have playable framerates. Now, with futex_waitv enabled I no longer need these changes and the game performs even better now. The problem with this of course is that this change only affects Linux. |
@cipherxof do you use mitigations=off on Linux by any chance? |
I do, yes. I also re-tested with a different custom kernel (Liquorix) and my frametimes are back to normal 👍 |
Some time ago I noticed when profiling on Linux that atomic waiting implementation may be possibly inefficient. A huge portion of the CPU time was outside of the underlying futex syscall but inside the atomic wait routines. But what atomic wait does is very similar to futex. Hence the idea to use futex directly but this would need to remove some superfluous features from atomic wait support. I'm not sure how it will work out. Please test for regressions.