Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SPU: Inline and batch MFC list transfers #12763

Merged
merged 3 commits into from Oct 9, 2022
Merged

Conversation

elad335
Copy link
Contributor

@elad335 elad335 commented Oct 6, 2022

Reduces overall CPU profiling load of this function from 9.6% to 6.8% and grants me about 2 fps in Sly 4. This has a bit more effect when disabling Atomic RSX FIFO although it's partially active without it.

@elad335 elad335 force-pushed the mfc-list branch 2 times, most recently from 51dc9ad to 8c85bb3 Compare October 7, 2022 05:01
Copy link
Contributor

@kd-11 kd-11 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My suggestion here is to move the method to its own file to try and keep SPUThread.cpp navigable (not that its in good shape even now). The inlining just makes the function way too large. Create a SPUListTransfer.cpp or something and move it there. This whole file needs some refactoring for the large meta-functions.

@kd-11 kd-11 requested a review from Nekotekina October 7, 2022 12:56
constexpr usz _128 = 128;

// Force constexpr std::max
#define mov_t(type, index, _ea) { const usz ea = _ea; *reinterpret_cast<type*>(dst + index * std::integral_constant<u64, std::max<u64>(sizeof(type), sizeof(v128))>::value + (ea & 0xf)) = *reinterpret_cast<const type*>(src + ea); } void()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use ALL_CAPS style for macro

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done,


u64 addr = begin;

// Optimization: if range_locked is not used, the addr check will always pass
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where did this code go?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The complete check is now in the function so register limit won't be reached, most of the time it's unlocked so the initial check suffices. I've optimized the underlying function for it.

@elad335
Copy link
Contributor Author

elad335 commented Oct 9, 2022

Enabled fully the optimization for Atomic RSX FIFO which makes the perf boost more significant yet stability hasn't been sacrificed.

@elad335 elad335 force-pushed the mfc-list branch 6 times, most recently from 166c905 to b1ae277 Compare October 9, 2022 09:40
@Nekotekina Nekotekina merged commit a6dfc3b into RPCS3:master Oct 9, 2022
@elad335 elad335 deleted the mfc-list branch October 9, 2022 16:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants