Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Gods Eater Burst (ULUS10563): Severe performance regression since v0.9.6-334-gce8f98e on Windows. (Updated, possibly affects other games too.) #5070

Closed
solarmystic opened this issue Jan 10, 2014 · 32 comments

Comments

@solarmystic
Copy link
Contributor

UPDATE:- It seems like multiple games may be affected too. Documented cases include God of War: Ghosts of Sparta (#5070 (comment)) and The Force Unleashed (#5070 (comment)).

UPDATE2:- Ridge Racer 2 added to the list. #5070 (comment)

Issue as stated in the title. This is just for the record since the game remains functionally unaffected and completely playable; the regression is only evident when the unthrottling functions are engaged. This is still quite shocking since it is a performance loss of over 50% as compared to the previous build.

Ingame performance comparisons of the same scene:-

First build affected (v0.9.6-334-gce8f98e ce8f98e )

334

Last build unaffected (v0.9.6-331-g4d477f0 4d477f0)

331

System used for testing:-
sysspec

Same settings were used between builds, both executables were dropped into the same working PPSSPP folder with the same subfolders and ppsspp.ini file.

@unknownbrackets
Copy link
Collaborator

Right. I knew it would cost something in places. For me, it goes from 796 -> 760 in that area (which is still a drop.)

It may be possible to determine if the depth buffer was even written to (at some point) and only copy if it was, or something, but that would also be inaccurate. Blitting isn't entirely cheap though, apparently worse on ATI cards...

-[Unknown]

@dbz400
Copy link
Contributor

dbz400 commented Jan 10, 2014

It dropped some bit on my Nvidia system as well

-With depthcopy
screen00254

-Without depthcopy
screen00255

@solarmystic
Copy link
Contributor Author

Detailed GPU usage % numbers from GPUz indicates that for the same scene, GPU usage is much higher for 334 as compared to 331 which is still CPU limited.

GPUz readings from v0.9.6-334-gce8f98e (completely turns it into a GPU limited situation)

334g

GPUz readings from v0.9.6-331-g4d477f0 (nearly half as much as 334, and twice the FPS)

331g

@unknownbrackets
Copy link
Collaborator

Oh, it affects me less for sure since I'm using 1x render resolution.

It's simply a fact that copying a large amount of bytes takes time, and especially at higher resolutions the perf impact will be greater. At best we can turn it off with a speed hack or get smarter about cases we don't need to do it in.

-[Unknown]

@dbz400
Copy link
Contributor

dbz400 commented Jan 10, 2014

I do test it if any other cases still okay .This one works for Ys 7 , Saint Seiya , Jennan D arc and Gundam AGE universe

        // Let's only do this if not clearing.
        if (!gstate.isClearModeDepthMask() && gstate.isModeThrough()) {
            fbo_bind_for_read(currentRenderVfb_->fbo);
            glBlitFramebuffer(0, 0, currentRenderVfb_->renderWidth, currentRenderVfb_->renderHeight, 0, 0, vfb->renderWidth, vfb->renderHeight, GL_DEPTH_BUFFER_BIT, GL_NEAREST);
        }

@solarmystic
Copy link
Contributor Author

@unknownbrackets

Hmm.. you're totally right on that one. Retested with 1x render resolution and the hit is only 30 odd FPS this time around. (350 vs 320 FPS)

The results in the issue report were obtained at 3x render resolution which is what I normally use for my test suites when running in windowed mode.

@dbz400
Copy link
Contributor

dbz400 commented Jan 10, 2014

This one loss about few FPS for me only on God Burst Eater .Not too sure if it is good enough .( From 392 -> 386)

screen00257

@dbz400
Copy link
Contributor

dbz400 commented Jan 10, 2014

@solarmystic , just wonder above code helping GBE on your AMD system ?

@solarmystic
Copy link
Contributor Author

@raven02

Yes, it most definitely does! Performance is almost back to where it used to be when your commit is added to the current master, the loss is only ~10 FPS now. Nice work!

391r

I also tested Jeanne D'arc again with your change and it still seems to be working correctly.

@psennermann
Copy link

Well look at this, more than 100 fps lost in a really not too demanding scene (there are only letters displayed) :-/

not affected:

screen00000

affected:

screen00001

@dbz400
Copy link
Contributor

dbz400 commented Jan 11, 2014

The above commit should somehow help get back some performance.

@psennermann
Copy link

But it hasn't yet been merged because it could cause other problems or what?

@psennermann
Copy link

In my opinion this issue should be renamed "Severe performance regression since v0.9.6-334-gce8f98e on Windows" because it affects every PSP game (not only God Eater Burst)

@solarmystic
Copy link
Contributor Author

@psennermann

Not every game is affected though, based on my own findings, and to the same degree as is evident in this game. You are welcome to test out others to see whether or not they have the same root cause for their regression. (seems like you've shown that The Force Unleashed is affected too, but 2 games don't make a list)

Basically only games which deal with the "blitting" that is introduced in that commit will be affected. Fortunately, they are not that numerous.

I chose this specific title because the issue is well documented for this particular game and easy to demonstrate.

@solarmystic
Copy link
Contributor Author

Also it goes without saying that only folk with hardware capable of GLES3/OpenGL 3.3 and above would only experience this regression since the feature implemented is only relevant to hardware capable of supporting blitting as is implemented by that commit.

@psennermann
Copy link

Until now every game that I've tried is affected, for example in Syphon Filer Dark Mirror I lost more than 50 fps and in GOW, well look by yourself ;-)

9.6.317

screen00000

9.6.429

screen00000

@solarmystic
Copy link
Contributor Author

@psennermann

EDIT:- Scratch that, you're right after all @psennermann. It seems like v0.9.6-334-gce8f98e ce8f98e is the first responsible commit for the performance regression in GOW too.

v0.9.6-331-g4d477f0 (faster)

331gow

v0.9.6-334-gce8f98e (slower, by almost half)

334gow

Unlike GBE however, the commit that @raven02 proposed earlier (#5070 (comment)) does not help to restore the performance with GOW even when applied and recompiled into a new build.

@hrydgard
Copy link
Owner

Yes yes there's not really any need for more testing, the cause is very well known and so is the effect. Copying the depth buffer like this is needed to fix Jeanne D'arc.

Soon (maybe this week, I'm pretty busy with all kinds of stuff right now) I plan to try tracking depth buffers separately from color buffers, on platforms where we can have stencil separately from depth, so that stencils can be kept together with color to at least passably imitate what the PSP does. This should let use have the same fix but with no or very low performance impact, and on non-GLES3 platforms.

@psennermann
Copy link

Ok, and that would eventually fix #5068 as well, or that's another "story"?

@psennermann
Copy link

Well at least in the worst of cases there would always be the possibility to have the old bahaviour reintroduced as a "speed hack"...definitely losing 35% of general performance for just one game wouldn't be a great deal ;-)

@hrydgard
Copy link
Owner

Not sure if it would fix #5068, maybe. As you say, though, a worst case is just to add an option but I hope we won't need to.

@solarmystic
Copy link
Contributor Author

We can add Ridge Racer 2 to the list, just for the record. Similar performance regression (nearly 50%) between v0.9.6-334-gce8f98e and v0.9.6-331-g4d477f0 when tested at higher rendering resolutions (6x in this case).

v0.9.6-331-g4d477f0

331

v0.9.6-334-gce8f98e

334

@solarmystic
Copy link
Contributor Author

#5197 helps to restore performance in Gods Eater Burst and the God of War games.

Ridge Racer 2 still has the performance regression though.

@psennermann
Copy link

Star Wars Force Unleashed has improved a bit, but is still far away from previous performance

@mckimiaklopa
Copy link

Does this affect windows only,because starting with the build you have discussed about, gamesnsuch as dissidia 012,kh bbs and tekken dr have been slower or have experienced frequent slowdowns which were not present on old builds

@papel
Copy link

papel commented Feb 12, 2014

Does it happen with Tekken DR? Is #5210 the same problem?

@solarmystic
Copy link
Contributor Author

@papel

Last I tested, Tekken DR is not one of the games affected by this issue. GPU usage is roughly still same across both revisions with around the same performance.

v0.9.6-331-g4d477f0
tdr2

v0.9.6-334-gce8f98e
tdr1

@unknownbrackets
Copy link
Collaborator

I think a bunch of changes have been made to try to minimize the copying here. What games are still slower than without the depth copying?

-[Unknown]

@solarmystic
Copy link
Contributor Author

@unknownbrackets Ridge Racer 2 is still affected by this issue at higher rendering resolutions.

@unknownbrackets
Copy link
Collaborator

What does it log as far as depth buffer reuse?

If you step in the GE debugger, starting with "Step Frame", does it do its first draw using clearmode?

-[Unknown]

@solarmystic
Copy link
Contributor Author

@unknownbrackets

My bad. On a closer comparison with how it used to perform, it's actually not as terrible as before and its well within the parameters:-

Intially (the fastest prior revision, v0.9.6-331-g4d477f0)

331

After (the performance regression occured, v0.9.6-334-gce8f98e)

334

And now (latest master, v0.9.8-1255-g359f720)

098

This was long overdue for a close.

Closed.

@solarmystic
Copy link
Contributor Author

@unknownbrackets As penance, I'll do you the courtesy of tracking down the build that improved this game's performance again, heh.

EDIT:- It seems like #6211 is the pull request responsible for the performance improvement.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants