Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve readyqueue perf, rapid fire, small improvements #651

Merged
merged 7 commits into from
Feb 11, 2013

Conversation

unknownbrackets
Copy link
Collaborator

This does the following:

  • Adds very basic rapid fire when holding shift (e.g. hold shift+tab+z.) Windows keyboard only.
  • Adds an atomic check variable around MoveEvents(), since they're rare. I had trouble verifying this improved perf, but @KentuckyCompass's fps counter shows a reproducible improvement of maybe 2%.
  • Moves threadReadyQueue back to a std::vector. It did show up in profile, after all. I opted to create a simple wrapper class.
  • Avoids some slow runtime type casts / other minor stuff.

This significantly improves (mostly because of rapid fire, heh) the time it takes to get through the intro of Hexyz Force. Overall, in game fps improvement by ~5% on Windows release once past the intro.

sceKernelThread.cpp is kinda big, I'm wondering if breaking callbacks and maybe some things out into another file is realistic... I guess it's not hurting anyone, though...

-[Unknown]

@raven02
Copy link
Contributor

raven02 commented Feb 11, 2013

@unknownbrackets , should we expect Sol trigger work again in this commit ? :)

@unknownbrackets
Copy link
Collaborator Author

No. I think I know why though. I think sceUtilityLoadModule() needs to eat cycles AND reschedule while doing so (e.g. because it's reading the module from umd/etc.) It's happening to some Tales games and a few others too.

I managed to get Tales of Destiny 2 farther but it still crashes, so I'm not sure I did it right. sceKernelCreateThread() will need to do it too, among a few others and probably IO funcs.

-[Unknown]

@hrydgard
Copy link
Owner

Agree that sceKernelThread is growing a bit too big, but it contains a lot of heavily interconnected functionality.

So need to think carefully about which parts to break out and how their external interfaces should look.

@@ -66,6 +67,8 @@ struct BaseEvent
Event *eventPool = 0;
Event *eventTsPool = 0;
int allocatedTsEvents = 0;
// Optimization to skip MoveEvents when possible.
volatile u32 hasTsEvents = false;
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't need to use volatile and atomic stores/loads as all the stuff touching this will always run on the main emulation thread. If it doesn't, we have other problems..

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, the whole point was to make ts (threadsafe) events faster. Those are ones scheduled from off thread, like savestates. Actually, savestates are the only consumer currently.

So before it kept calling MoveEvents(), creating a mutex (which isn't slow but even so), and checking tsFirst and allocatedTsEvents. I wanted to cut that down to just a single check without a mutex and saving a function call. It's a small gain but when fast forwarding it adds up, and Advance is called not inoften.

-[Unknown]

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh right, sorry, it's me who's confused here. Never mind my comment.

@unknownbrackets
Copy link
Collaborator Author

Right... the only way really would be for sceKernelThread to expose more one way or another, unfortunately.

-[Unknown]

@@ -26,8 +26,12 @@
};

int KeyboardDevice::UpdateState() {
bool alternate = GetAsyncKeyState(VK_SHIFT) != 0;
static int alternator = 0;
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops, this should be unsigned.

-[Unknown]

Pretty sure this is needed, but apparently it breaks Sol Trigger.
This saves ~1% during fast forward on a release build.
It showed up in a profile after all.  Cut down more than 1%.
Just like .1% but was hoping Mr. Optimizer would do this for me.
hrydgard added a commit that referenced this pull request Feb 11, 2013
Improve readyqueue perf, rapid fire, small improvements
@hrydgard hrydgard merged commit 480592c into hrydgard:master Feb 11, 2013
@xsacha
Copy link
Collaborator

xsacha commented Feb 11, 2013

So those atomic functions aren't working on many ARM devices.
They only work #ifdef HAVE_GCC_INT_ATOMICS afaik.

@hrydgard
Copy link
Owner

Are there alternatives?

@unknownbrackets
Copy link
Collaborator Author

I think we could just use __sync_synchronize() potentially on those devices if that still exists?

Otherwise, we could just revert the change, although it did improve perf and would be nice to have.

-[Unknown]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants