Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Test] Big atomic cleanup and futex_waitv support for Linux #14403

Merged
merged 4 commits into from Aug 2, 2023

Conversation

Nekotekina
Copy link
Member

Some time ago I noticed when profiling on Linux that atomic waiting implementation may be possibly inefficient. A huge portion of the CPU time was outside of the underlying futex syscall but inside the atomic wait routines. But what atomic wait does is very similar to futex. Hence the idea to use futex directly but this would need to remove some superfluous features from atomic wait support. I'm not sure how it will work out. Please test for regressions.

@@ -77,8 +77,8 @@ namespace utils
std::vector<u8> data;
atomic_t<u64> m_size = 0;
atomic_t<u64> duration_ms = 0;
atomic_t<bool> track_fully_decoded{false};
atomic_t<bool> track_fully_consumed{false};
Copy link
Contributor

@Megamouse Megamouse Jul 31, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't like this. Can't you make it opaque to the user (developer) with some template magic?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it's possible to change atomic bool to use 32-bit storage without breaking some PS3 struct (possibly in future as well)

@@ -60,8 +60,7 @@ void headless_application::InitializeCallbacks()

return false;
};
callbacks.call_from_main_thread = [this](std::function<void()> func, atomic_t<bool>* wake_up)
{
callbacks.call_from_main_thread = [this](std::function<void()> func, atomic_t<u32>* wake_up) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is exactly why the clang-format is the way it is right now.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tweaked it for the lesser evil, because otherwise the whole lambda gets shifted right.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But that's fine. GalCiv and I weighed the pro's and cons last time we updated the clang-format.
Having the indentation in a lambda or not is both allowed and not nitpicked at the moment.

There are some bugs in clangformat with the inline version.
There are worse format issues that happen in different cases.
I don't remember the details, but I remember that it was easier to use the current settings all things considered.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I'll remove AllowShortLambdasOnASingleLine. The other tweak should be fine though?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Idk. I'd have to test with existing code

{
m_head.template wait<Flags>(nullptr);
utils::bless<atomic_t<u32>>(&m_head)[1].wait(0);
Copy link
Contributor

@elad335 elad335 Aug 1, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just like lf_queue has a specalization for waiting on nullptr, so do pointers IMO.

@solarmystic
Copy link

solarmystic commented Aug 1, 2023

Just simply booting up rpcs3 with this test pr build causes some irregular log entries that is not typically seen on the master build. Enumeration of PC Specs would occur without any of those entries on master in a single block of information.

Test build
test

Master for comparison
master

Besides that observation, tested Persona 5 and found no significant difference in performance between Master and PR.

Master - 105/80/68 FPS (Average/1%/0.1% FPS)

Screenshot

p5 master

Test build - 106/80/69 FPS (Average/1%/0.1% FPS)

Screenshot

p5 neko pr

@Nekotekina
Copy link
Member Author

Updated, please retest

@solarmystic
Copy link

Updated, please retest

It just produces this Fatal Error on boot now in Windows.

image

@solarmystic
Copy link

Latest commit boots up fine. System Specs enumeration is back to normal now.

image

Persona 5 performance remains unaffected compared to master at 106 FPS on average.

Screenshot

image

@Nekotekina
Copy link
Member Author

This PR includes potential fix for arm64 architecture related to incorrect pointer size assumptions.

In order to make this possible, some unnecessary features were removed.
@Nekotekina Nekotekina merged commit d34287b into RPCS3:master Aug 2, 2023
5 checks passed
@@ -14,8 +14,7 @@ if(WITH_LLVM)
option(LLVM_INCLUDE_TESTS OFF)
option(LLVM_INCLUDE_TOOLS OFF)
option(LLVM_INCLUDE_UTILS OFF)
# we globally enable ccache
set(LLVM_CCACHE_BUILD OFF CACHE BOOL "Set to ON for a ccache enabled build")
option(LLVM_CCACHE_BUILD ON)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why did you set LLVM_CCACHE_BUILD to ON? On Linux the command line for LLVM files looks like this:

ccache ccache c++ ...

ccache is called twice.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Somehow it worked fine until the changes in LLVM_CCACHE_BUILD, afterwards it started to rebuild llvm after unrelated changes.

@cipherxof
Copy link
Contributor

I'm just posting this here for future reference. This PR seems to have massively boosted performance for MGS4 under Linux. In some cases I'm gaining +40fps.

(custom build here, same results on master though)
image

image

@cipherxof
Copy link
Contributor

cipherxof commented Aug 20, 2023

Seeing improvements in RDR as well. Granted, I am assuming this PR is what caused these gains because I'm not seeing the improvements in Windows and I was unable to test linux until a recent crash regression was fixed.

image

image

@Nekotekina
Copy link
Member Author

@cipherxof what is your distro/kernel? Also you can try to disable futex_waitv in atomic.cpp and compare

@cipherxof
Copy link
Contributor

@cipherxof what is your distro/kernel? Also you can try to disable futex_waitv in atomic.cpp and compare

OS: Arch Linux
Kernel: x86_64 Linux 6.2.6-273-tkg-cfs

This PR is definitely the reason for the performance gains. However, even with it disabled I am still getting slightly better performance than I was previously. For some reason my frame times are spiking more than before, although that may just be something with my system so I'll need to re-compile an old custom build to test that.

futex_waitv disabled

Screenshot from 2023-08-20 15-29-53

futex_waitv enabled

Screenshot from 2023-08-20 15-34-10

Another thing worth noting (bare with me here) is that before this PR, Metal Gear Online required some specific changes in order have playable framerates. Now, with futex_waitv enabled I no longer need these changes and the game performs even better now. The problem with this of course is that this change only affects Linux.

Metal Gear Online

image

without the changes above:

image

with the changes:

image

futex_waitv enabled:

image

@Nekotekina
Copy link
Member Author

@cipherxof do you use mitigations=off on Linux by any chance?

@cipherxof
Copy link
Contributor

@cipherxof do you use mitigations=off on Linux by any chance?

I do, yes.

I also re-tested with a different custom kernel (Liquorix) and my frametimes are back to normal 👍

Screenshot from 2023-08-20 16-37-37

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants