Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Super lag in HLE mode, at least with r600/radeonsi Mesa drivers #1561

Closed
Jj0YzL5nvJ opened this issue Sep 1, 2017 · 67 comments · Fixed by #1735
Closed

Super lag in HLE mode, at least with r600/radeonsi Mesa drivers #1561

Jj0YzL5nvJ opened this issue Sep 1, 2017 · 67 comments · Fixed by #1735

Comments

@Jj0YzL5nvJ
Copy link
Contributor

Jj0YzL5nvJ commented Sep 1, 2017

From the moment of implementation a625225, this generates some kind of delay in radeon Mesa driver (r600g).

Xubuntu 16.04.3 LTS
glxinfo | grep OpenGL

OpenGL vendor string: X.Org
OpenGL renderer string: AMD JUNIPER (DRM 2.43.0 / 4.4.0-93-generic, LLVM 6.0.0)
OpenGL core profile version string: 3.3 (Core Profile) Mesa 17.3.0-devel - padoka PPA
OpenGL core profile shading language version string: 3.30
OpenGL core profile context flags: (none)
OpenGL core profile profile mask: core profile
OpenGL core profile extensions:
OpenGL version string: 3.0 Mesa 17.3.0-devel - padoka PPA
OpenGL shading language version string: 1.30
OpenGL context flags: (none)
OpenGL extensions:
OpenGL ES profile version string: OpenGL ES 3.0 Mesa 17.3.0-devel - padoka PPA
OpenGL ES profile shading language version string: OpenGL ES GLSL ES 3.00
OpenGL ES profile extensions:

The CPU consumption is not higher when compared to previous versions. And compiling with -DCRC_OPT=On changes almost nothing when it comes to FPS. Examples:

Running SM64 until the moment the white star appears:

dddb3ae
cmake -DMUPENPLUSAPI=On ../../src/
LIBGL_SHOW_FPS=1 MESA_GL_VERSION_OVERRIDE=3.3COMPAT MESA_GLSL_VERSION_OVERRIDE=420

libGL: FPS = 0.5
libGL: FPS = 40.0
libGL: FPS = 40.0
libGL: FPS = 51.0
libGL: FPS = 60.0
libGL: FPS = 4.6
libGL: FPS = 54.0
libGL: FPS = 59.0
libGL: FPS = 59.0
libGL: FPS = 60.0
libGL: FPS = 60.0
libGL: FPS = 60.0
libGL: FPS = 60.0

a625225
cmake -DMUPENPLUSAPI=On ../../src/
LIBGL_SHOW_FPS=1 MESA_GL_VERSION_OVERRIDE=3.3COMPAT MESA_GLSL_VERSION_OVERRIDE=420

libGL: FPS = 0.5
libGL: FPS = 20.6
libGL: FPS = 15.0
libGL: FPS = 15.2
libGL: FPS = 15.0
libGL: FPS = 15.0
libGL: FPS = 15.2
libGL: FPS = 16.4
libGL: FPS = 16.4
libGL: FPS = 16.6
libGL: FPS = 16.4
libGL: FPS = 16.4
libGL: FPS = 4.1
libGL: FPS = 18.1
libGL: FPS = 8.2
libGL: FPS = 8.2
libGL: FPS = 8.3
libGL: FPS = 8.5
libGL: FPS = 8.2
libGL: FPS = 8.2
libGL: FPS = 8.3
libGL: FPS = 8.3
libGL: FPS = 8.2
libGL: FPS = 8.3
libGL: FPS = 8.3
libGL: FPS = 8.2
libGL: FPS = 8.2
libGL: FPS = 8.2
libGL: FPS = 8.2
libGL: FPS = 8.3
libGL: FPS = 9.0
libGL: FPS = 8.2
libGL: FPS = 8.3
libGL: FPS = 9.2
libGL: FPS = 10.7
libGL: FPS = 8.5
libGL: FPS = 8.3
libGL: FPS = 9.0
libGL: FPS = 8.2
libGL: FPS = 8.1
libGL: FPS = 8.3
libGL: FPS = 8.2
libGL: FPS = 9.0
libGL: FPS = 8.3
libGL: FPS = 8.2
libGL: FPS = 8.2
libGL: FPS = 9.0
libGL: FPS = 8.2
libGL: FPS = 8.3
libGL: FPS = 8.2
libGL: FPS = 9.0
libGL: FPS = 8.3
libGL: FPS = 8.2
libGL: FPS = 9.2
libGL: FPS = 11.0
libGL: FPS = 8.9

bbc7131
cmake -DCRC_OPT=On -DMUPENPLUSAPI=On ../../src/
LIBGL_SHOW_FPS=1

libGL: FPS = 0.3
libGL: FPS = 21.8
libGL: FPS = 16.3
libGL: FPS = 16.4
libGL: FPS = 16.5
libGL: FPS = 16.6
libGL: FPS = 16.3
libGL: FPS = 16.6
libGL: FPS = 16.5
libGL: FPS = 16.4
libGL: FPS = 16.4
libGL: FPS = 6.9
libGL: FPS = 18.4
libGL: FPS = 8.3
libGL: FPS = 8.4
libGL: FPS = 8.4
libGL: FPS = 8.3
libGL: FPS = 8.4
libGL: FPS = 8.2
libGL: FPS = 8.4
libGL: FPS = 8.2
libGL: FPS = 8.3
libGL: FPS = 8.3
libGL: FPS = 8.4
libGL: FPS = 8.3
libGL: FPS = 8.3
libGL: FPS = 8.3
libGL: FPS = 8.3
libGL: FPS = 8.3
libGL: FPS = 9.2
libGL: FPS = 8.2
libGL: FPS = 8.3
libGL: FPS = 9.0
libGL: FPS = 10.7
libGL: FPS = 8.3
libGL: FPS = 8.3
libGL: FPS = 9.1
libGL: FPS = 8.2
libGL: FPS = 8.2
libGL: FPS = 8.3
libGL: FPS = 8.3
libGL: FPS = 9.1
libGL: FPS = 8.3
libGL: FPS = 8.3
libGL: FPS = 8.3
libGL: FPS = 9.0
libGL: FPS = 8.2
libGL: FPS = 8.3
libGL: FPS = 8.2
libGL: FPS = 9.3
libGL: FPS = 8.2
libGL: FPS = 8.3
libGL: FPS = 9.2
libGL: FPS = 10.6

If anyone knows how to put spoilers, let me know how.

@loganmc10
Copy link
Contributor

Can you post the full output of glxinfo? Maybe put it on pastebin or upload a txt file or something so it's not a huge post

@Jj0YzL5nvJ
Copy link
Contributor Author

http://sprunge.us/DehV

@Jj0YzL5nvJ
Copy link
Contributor Author

I've been doing more testing and apparently VBO just makes the problem even more evident, so this problem reside in other place.

Just as note, the use of the GPU is almost nil in lag times and CPU usage does not exceed 33%. Any mesa-utils uses more GPU... (tested with radeontop). But if LIBGL_ALWAYS_SOFTWARE=1 is used, the CPU usage is stupidly high (89% average).

@Jj0YzL5nvJ Jj0YzL5nvJ changed the title Something is wrong with VBO, at least with r600g Super lag, at least with r600g Sep 3, 2017
@loganmc10
Copy link
Contributor

Do you have anything non-default set in the config? If so, can you post your mupen64plus.cfg?

@Jj0YzL5nvJ
Copy link
Contributor Author

Jj0YzL5nvJ commented Sep 3, 2017

http://sprunge.us/cAII

I modify VideoPlugin and RspPlugin constantly, delete everything related to Video-GLideN64 every time I change the version.
In theory I'm always using defaults, the only custom configuration that I remember doing is in Input-SDL-Control1.
I have been finding multiple bugs in mupen64plus itself, so I'm going to have to do tests with old versions too.

Edit:
I even tested with modesetting driver and r600g with EXA acceleration (DRI2), all the same, nothing changed.

@Jj0YzL5nvJ
Copy link
Contributor Author

Jj0YzL5nvJ commented Sep 4, 2017

I found the origin! 313741d

In my tests, that comment causes much lag in:
Harvest Moon 64
Perfect Dark
Paper Mario

And partial lag (only in certain scenes) in:
Space Station Silicon Valley
Super Smash Bros.
Bomberman 64
Donkey Kong 64
Kirby 64: The Crystal Shards

Further comments increase the lag in other games, but a625225 generalizes the lag in all games, "it was the straw that broke the camel's back". At least in my hardware, I deduce.

@loganmc10
Copy link
Contributor

loganmc10 commented Sep 4, 2017

It seems like your machine has an issue with gl_arb_buffer_storage

Using the latest master, can you try forcing this variable to false:

https://github.com/gonetz/GLideN64/blob/master/src/Graphics/OpenGLContext/opengl_GLInfo.cpp#L58-L59

Just get rid of that whole statement and replace it with bufferStorage = false

@Jj0YzL5nvJ
Copy link
Contributor Author

Nope, not is it. With that change the lag is slightly worse (between 0.1 to 1.2 FPS less in SM64).

@Jj0YzL5nvJ
Copy link
Contributor Author

I think I found something interesting. See this first: https://imgur.com/a/rEMEu

In some parts of Carrington Institute, the lag disappears completely (No. 1, No. 6 and No. 7). Usually, places with poor lighting or being very close to any wall, lit up or not.
But are some exceptions (No. 2, No. 4 and No. 5), the FPS drop relatively little when compared to most of the other lights (No. 3).

In the case of image No. 7 and No. 8, both are exactly the same spot. The only difference is that No. 8 was taken after activating Hi-Res.
With Hi-Res enabled, the FPS are maintained very similar to No. 8 in places where they were previously perfect.

Save state: https://0x0.st/RpV.zip
Current cfg: http://sprunge.us/Pdia
gliden64.log: http://sprunge.us/MNCH

P.S: The gliden64.log only is generated when GLideN64 is compiled with Clang...

@Jj0YzL5nvJ
Copy link
Contributor Author

Jj0YzL5nvJ commented Sep 18, 2017

I've been "playing a lot", updating and recompiling many dependencies in my PC...
After doing test after test, in different games and messing with the configuration file. I found some mitigators for the symptoms, unfortunately none such universal solution.

The first one is use cxd4-sse2 in "HLE mode" instead of the HLE plugin.
The second one is use DisableFBInfo = False or/and EnableCopyColorToRDRAM = 0. #1559
The last one is useing LIBGL_ALWAYS_SOFTWARE=1. But in many games this will become an backfire.

Examples: https://imgur.com/a/HAqJZ

Edit:
Added more images differentiating LLE and HLE in cxd4-sse2.
It's like if GLideN64 forces an LLE mode into "mupen64plus-rsp-hle.so".

@Jj0YzL5nvJ Jj0YzL5nvJ changed the title Super lag, at least with r600g Super lag in mupen64plus-rsp-hle plugin, possible LLE obfuscation. Sep 18, 2017
@Jj0YzL5nvJ
Copy link
Contributor Author

I've been testing angrylion-plus... and I have noticed that in my previous post I confused HLE and LLE configuration in cxd4-sse2 completely (thanks to this). So the images description are inverted... cxd4-sse2 (HLE) in reality is cxd4-sse2 (LLE) and vice versa.

My lag problem in HLE persists ... only now I know that cxd4 is also affected, Even angrylion-plus works for me faster than GLideN64 in HLE mode (DisplayListToGraphicsPlugin = True).
I gonna stick to bc00985, the last commit that works well for me u.u

@Jj0YzL5nvJ Jj0YzL5nvJ changed the title Super lag in mupen64plus-rsp-hle plugin, possible LLE obfuscation. Super lag in HLE mode Oct 2, 2017
@fzurita
Copy link
Contributor

fzurita commented Oct 2, 2017

So the one that made things slow for you is this?

313741d

Can you try disabling Copy color buffer to RDRAM and check if performance improves? It sounds like AMD VESA drivers don't like buffer storage either.

Edit: While you are at it, can you test this branch?
https://github.com/fzurita/GLideN64/tree/threaded_GLideN64

I'm curious how threaded GLideN64 performs with AMD hardware.

@loganmc10
Copy link
Contributor

Can you try disabling Copy color buffer to RDRAM and check if performance improves? It sounds like AMD VESA drivers don't like buffer storage either.

I had him try disabling buffer storage (#1561 (comment)) it didn't seem to make a difference

@fzurita
Copy link
Contributor

fzurita commented Oct 2, 2017

Let's double check by turning off "Copy color buffer to RDRAM".

@Jj0YzL5nvJ
Copy link
Contributor Author

Jj0YzL5nvJ commented Oct 2, 2017

The second one is use DisableFBInfo = False or/and EnableCopyColorToRDRAM = 0.

Disabling those two things is the only thing that helped me a bit. But I don't remember trying to disable that with the modification suggested by loganmc10 (#1561 (comment)), I'll try again.

Edit: While you are at it, can you test this branch?
https://github.com/fzurita/GLideN64/tree/threaded_GLideN64

Okay, I don't have much to do anyway. I'll try it when I'm at home.

@Jj0YzL5nvJ
Copy link
Contributor Author

No. I don't see much difference between disabling bufferStorage or by using EnableCopyColorToRDRAM = 0 in the cfg file. Or by using the two, at most there will be an average difference of 1 FPS, 2 VI/S and 3 % in the counters (in Banjo-Kazooie). At least with HLE.

@fzurita, the same history is with thethreaded_GLideN64 branch, I can test in LLE tomorrow.

@fzurita
Copy link
Contributor

fzurita commented Oct 3, 2017

So this is very odd. The commit that made things slow for you is 313741d but that code is only invoked when EnableCopyColorToRDRAM is not zero.

I'm not sure what is going on.

Could it be that https://github.com/gonetz/GLideN64/blob/master/ini/GLideN64.custom.ini is overwriting your setting?

@Jj0YzL5nvJ
Copy link
Contributor Author

Jj0YzL5nvJ commented Oct 8, 2017

I've been testing in my free time (which is very little) and I did some discoveries. Unfortunately, nothing directly related to my problem... apparently.

Could it be that https://github.com/gonetz/GLideN64/blob/master/ini/GLideN64.custom.ini is overwriting your setting?

I really doubt it. I don't use any GLideN64*.ini files. Or more precisely, I don't know where to put them to make them work with mupen64plus. And I can see the effects caused after edit my configuration file.

In respect to 313741d see this first #1561 (comment)
In addition to the ROM's already mentioned, I also tested this others without detect lag problems: Banjo-Kazooie, GoldenEye 007, The Legend of Zelda (OoT and MM), Mario Kart 64, Resident Evil 2, Super Mario 64, the both Castlevania's and Conker's Bad Fur Day
Something worth to mention is that I only test the intros without interaction at that time.
Recently it was tried again and in the case of Perfect Dark, the lag only appears in "the first boot". The lag only appear in the logos animation (RARE, Nintendo, N64 and Perfect Dark), in the second round the same logos move smoothly, the gameplay too. Interestingly, if I left START pressed in the game's boot time, when leaving the Controller Pak menu, the lag never occurs.

I know very little of programming (Turbo C, Turbo Pascal, Visual FoxPro, Delphi, etc.) and I'm more used to writing maintenance scripts (Batch, VBScript, AHK, etc.) for Windows.
This seems very AMD specific. So, I don't know if this is worth to refer to this, sorry if not.

I split the VRAM into 8×8 blocks. All hazards and dependencies are tracked at this level. I chose 8×8 simply because it fits neatly into 64 threads on blits (wavefront size on AMD), and the smallest texture window is 8 pixels large, nice little coincidence 🙂

https://www.libretro.com/index.php/introducing-vulkan-psx-renderer-for-beetlemednafen-psx/

In my last test I seen lag and freezing times in bc00985 when GLideN64 try to fill more of 542M in VRAM. After that the VRAM usage reduces a few MB and start to fill again.
In 313741d this never occur, but is very difficult to fill more of 120M of VRAM, is like the code is doing more cleaning VRAM than using it.
I had to destroy many things in Perfect Dark to achieve to fill the VRAM and surpass the erasing VRAM code. But again this never use more of 542M or the 60% of VRAM (max 1024M).

Edit:
Forget to mention that I found specific places in Banjo-Kazooie which the lag is more intense with LLE than in HLE (with deaf612 and 3cf7377 threaded_GLideN64). So this not HLE specific, but is much more notorious in it.

@Jj0YzL5nvJ
Copy link
Contributor Author

@fzurita, I get this using your 'further_reduce_shader_logic' branch: https://0x0.st/sX2R.log
In my case #1665 (comment) doesn’t make any difference.

@Jj0YzL5nvJ
Copy link
Contributor Author

Jj0YzL5nvJ commented Dec 24, 2017

I have found the true origins (I believe). 313741d and 3aa365d

As I mentioned before, 313741d cause lag (and glitches) to me on the boot time and the first scenes of Perfect Dark and others games #1561 (comment). But not in the gameplay.

On the other hand, 3aa365d fixes the glitches caused by 313741d, but the lag become permanent in Perfect Dark, in HLE with Hi-Res enabled. With Hi-Res disabled, the FPS only become unstable.
So the problem was very difficult to find... because Hi-Res enabled does not offer any benefit in emulation.

And I quote myself:

Further comments increase the lag in other games, but a625225 generalizes the lag in all games, "it was the straw that broke the camel's back". At least in my hardware, I deduce.

@Jj0YzL5nvJ Jj0YzL5nvJ changed the title Super lag in HLE mode Super lag in HLE mode, at least with r600g Dec 24, 2017
@fzurita
Copy link
Contributor

fzurita commented Dec 24, 2017

Do you have color buffer to RDRAM enabled? If you don't have it enabled, the first commit shouldn't make a difference.

@Jj0YzL5nvJ
Copy link
Contributor Author

I tested with the default value EnableCopyColorToRDRAM = 2.
Yeah, with EnableCopyColorToRDRAM = 0, neither 313741d nor 3aa365d manifests lag.

Now I have to find the comment that makes the lag manifest with EnableCopyColorToRDRAM = 0, I suppose. u.u

@fzurita
Copy link
Contributor

fzurita commented Dec 24, 2017

Well before those commits, EnableCopyColorToRDRAM was always disabled for GLES 2.0 devices.

Edit: whoops, you have a AMD device. Too many issues and got things confused. It seems like buffer storage causes slow downs with AMD. At least with that specific driver.

@AaronBPaden
Copy link

Neither patch appears to effect my ~1.0 GCN card.

I mentioned earlier that I was interested in running this in a profiler. Caveat: I have no idea what I'm doing.

Using the apitrace protocol, it looks like gl calls are using no more than ~2 ms in the gpu.

In the CPU, however, the graph is skewed because some calls to glTexSubImage2D is occasionally take around 100 ms or more (!!). However, this happening enough for it to be the problem on a frame-by-frame basis. I am seeing calls to glDrawElementsBaseVertex taking ~15ms. When I zoom in the graph looks like this.

screenshot from 2018-02-25 23-47-18

The pattern here looks like several calls to glDrawElementsBaseVertex taking around 10-15ms, followed by a call to glFlushMappedBufferRange also taking about 10ms.

Is there anything anyone would be interested in me looking at?

@Jj0YzL5nvJ
Copy link
Contributor Author

@loganmc10, my tests without disabling buffer storage and VBOs, take them all with a grain of salt.
GL_ARB_buffer_storage is certainly broken in my drivers.

Test with EnableCopyColorToRDRAM = 0

Personally I did not notice any significant changes, much less by disabling buffer storage and VBOs.
But comparing results with the previous test, the differences are very significant, especially in the CPU activity, GPU activity and buffer wait time.

@BPaden, try to run using MESA_GL_VERSION_OVERRIDE=3.3COMPAT MESA_GLSL_VERSION_OVERRIDE=410 MESA_EXTENSION_OVERRIDE="-GL_ARB_buffer_storage"

Can you put the results of the following commands?

glxinfo | grep OpenGL
cat /var/log/Xorg.0.log | grep -i enabled
cat /var/log/Xorg.0.log | grep -i load
cat /var/log/Xorg.0.log | grep -i swap

@loganmc10
Copy link
Contributor

@BPaden that trace is very helpful. My next test will be replacing glDrawElementsBaseVertex with glDrawArrays. I assumed this might be an issue since @Jj0YzL5nvJ mentioned that LLE works better, I believe LLE always uses glDrawArrays (I could be remembering wrong though).

Disabling VBO's is good for testing, but it can't be a long-term solution. Core OpenGL requires the use of VBO's (that's why you need the environment variable to get it to work). In a future version of Mesa, they could remove support altogether for non-VBO rendering if they wanted, so we can't count on that.

@BPaden the long glTexSubImage2D is unfortunate but not unexpected. That is when the emulator is uploading texture data to the GPU. In a normal game you would do that at the beginning, not during rendering, but the emulator doesn't know about the texture data until right before it's needed, so we have to upload it like that

@loganmc10
Copy link
Contributor

loganmc10 commented Feb 26, 2018

Ok @BPaden @Jj0YzL5nvJ can you try this commit:

loganmc10@9bcfa67

I tested this on my Nvidia laptop and saw no difference in performance, but it may make a difference for you. I'm also curious if this makes any difference on Adreno devices with buffer storage @fzurita

@AaronBPaden
Copy link

libGL: FPS = 60.9
libGL: FPS = 59.9
libGL: FPS = 59.9
libGL: FPS = 59.9
libGL: FPS = 56.9
libGL: FPS = 60.9
libGL: FPS = 59.9
libGL: FPS = 59.9
libGL: FPS = 59.9
libGL: FPS = 59.9
libGL: FPS = 59.9
libGL: FPS = 59.9
libGL: FPS = 59.9
libGL: FPS = 59.9
libGL: FPS = 60.9
libGL: FPS = 59.9
libGL: FPS = 59.9
libGL: FPS = 59.9
libGL: FPS = 59.9
libGL: FPS = 59.9
libGL: FPS = 59.9
libGL: FPS = 59.9
libGL: FPS = 59.9
libGL: FPS = 60.9
libGL: FPS = 59.9
libGL: FPS = 59.9
libGL: FPS = 59.9
libGL: FPS = 59.9
libGL: FPS = 59.9
libGL: FPS = 59.9
libGL: FPS = 59.9
libGL: FPS = 59.9
libGL: FPS = 60.9
libGL: FPS = 59.9
libGL: FPS = 59.9
libGL: FPS = 59.9
libGL: FPS = 59.9
libGL: FPS = 59.9
libGL: FPS = 59.9
libGL: FPS = 59.9
libGL: FPS = 59.9
libGL: FPS = 60.9
libGL: FPS = 59.9
libGL: FPS = 59.9
libGL: FPS = 59.9

😁

@fzurita
Copy link
Contributor

fzurita commented Feb 26, 2018

Sure, I'll try that. It will actually probably help performance with slower Android devices. I do know that VBOs and EBOs are slower with them. I remember in the past they were about 10% slower.

@loganmc10
Copy link
Contributor

loganmc10 commented Feb 26, 2018

Yeah well it definitely looks like we found the bottleneck in the Mesa driver. I'm going to hop on their IRC channel and ask about this.

It's a little counterintuitive, the whole point of the elements (glDrawElements), is that you can reduce the amount of bandwidth used in uploading the vertex data. But maybe when used in conjunction with VBO streaming the benefits are negated, I'll be curious to hear if there is any difference on a mobile device.

@Jj0YzL5nvJ
Copy link
Contributor Author

This time the improvement is huge:
Test with new arrays

This time I don't consider necessary a test by disabling buffer storage and VBOs. (The true is that I'm short of time =P)

@fzurita
Copy link
Contributor

fzurita commented Feb 26, 2018

Copy color to RDRAM is still slow even with that change it seems.

@loganmc10
Copy link
Contributor

Yeah I suspect that disabling buffer storage for the copies might fix that, we can test that in a bit.

Sorry to do this to you, but @Jj0YzL5nvJ and @BPaden can you do one more test for me?

loganmc10@6527537

I've been looking at Dolphin's code, and it looks like they don't use buffer storage or any buffers for the element arrays, I suspect that may be the actual problem.

@loganmc10
Copy link
Contributor

Also, can one or both of you add:

printf("%s\n", strRenderer);

Here:

https://github.com/gonetz/GLideN64/blob/master/src/Graphics/OpenGLContext/opengl_GLInfo.cpp#L29

And post the output.

We need to see how your card reports it's name so we can disable the buffer storage extension for Copy color to RDRAM

@loganmc10
Copy link
Contributor

loganmc10 commented Feb 26, 2018

Another update: someone from Mesa responded and indicated that it is our use of GL_UNSIGNED_BYTE that is the issue, please also test this:

loganmc10@fa0ab9d

We are looking for whatever solution offers the best FPS really (although the use of glDrawArrays is probably out).

So I'm curious between these 2 solutions:

loganmc10@6527537

loganmc10@fa0ab9d

Which is faster

@loganmc10
Copy link
Contributor

@fzurita I wonder if that is the reason the Copy Color to RDRAM is slow as well, I believe glReadPixels is using GL_UNSIGNED_BYTE right?

@AaronBPaden
Copy link

can you do one more test for me?

No problem. :)

printf("%s\n", strRenderer);

AMD Radeon HD 8500 series (OLAND / DRM 3.23.0 / 4.15.6-1-ARCH, LLVM 5.0.1)

loganmc10@6527537

this is about 5-10 FPS faster than master, but still slow.

loganmc10@fa0ab9d

This is running at full speed for me.

@AaronBPaden
Copy link

Oh also I have EnableCopyColorToRDRAM=2 for this test.

@fzurita
Copy link
Contributor

fzurita commented Feb 26, 2018

@loganmc10 That could be the reason for your specific device.

All the devices I have in my possession are pretty fast when using glReadPixels in a async way, except for Adreno 530, which I had to use floats for the pixel buffer. GLES 2.0 devices can't do a glReadPixels in a async way due to lack PBOs, but a lot can use the EGL image extension which is REALLY fast.

Anyways, devices that have fast glReadPixels in my possession have PowerVR, Adreno, and Mali GPUs. What GPU that you have has slow glReadPixels?

@loganmc10
Copy link
Contributor

Anyways, devices that have fast glReadPixels in my possession have PowerVR, Adreno, and Mali GPUs. What GPU that you have has slow glReadPixels?

Sorry I meant why @Jj0YzL5nvJ might be experiencing slow glReadPixels, you've looked into the format stuff a lot so I thought you might have some insight on that. Maybe I'll try using the FLOAT type for the Radeon cards like the Adreno 530 has to see if it works faster.

@LegendOfDragoon
Copy link

Any idea if Intel IGP's could benefit from any of these changes?

@loganmc10
Copy link
Contributor

Any idea if Intel IGP's could benefit from any of these changes?

According to the person on the Mesa issue tracker (https://bugs.freedesktop.org/show_bug.cgi?id=105256#c4):

This should affect all amd/ati hw up to Sea Islands....
At a very quick glance at the driver code, it looks like newer nvidia chips can handle it (everything from g80). nv30/nv40 cannot, but there it looks like nv30 will emulate index buffers anyway, so the only chips which this might make a difference is nv40 family.
Looks like all intel chips can handle it fine.

But this is talking about the Mesa (Linux) drivers. This bug didn't affect the AMD Windows drivers. Basically there is no way to know what it might affect. The PR I submitted (#1735) isn't GPU specific, so this fix will be applied to all GPU's, since it doesn't harm anything. This was a pretty crippling bug on the AMD/Linux driver, so I suspect if Intel was affected, you would have noticed it already.

@Jj0YzL5nvJ
Copy link
Contributor Author

Jj0YzL5nvJ commented Feb 27, 2018

printf("%s\n", strRenderer);

AMD JUNIPER (DRM 2.50.0 / 4.13.0-36-generic, LLVM 5.0.1)

Crippled GL_ARB_buffer_storage? (EnableCopyColorToRDRAM = 2)

Is any nouveau user out there?...

@Jj0YzL5nvJ
Copy link
Contributor Author

Jj0YzL5nvJ commented Feb 28, 2018

Benchmark 06746ac vs 06746ac+1735.patch (SHA256: ba4f4db46dd861a08cd8dec222e20714c391c3d682739d9bb04326c60e358e79)

EnableCopyColorToRDRAM = 2

EnableCopyColorToRDRAM = 0

P.S: Just in case this isn't clear to someone Revision P06746ac is 06746ac+1735.patch

@Jj0YzL5nvJ
Copy link
Contributor Author

Adding the GL_CLIENT_STORAGE_BIT flag at here fixes the issue completely.

Info: https://bugs.freedesktop.org/show_bug.cgi?id=102204#c7

@fzurita
Copy link
Contributor

fzurita commented Nov 3, 2018

You should make a pull request, but based on the wine link, it looks like we only want to use that flag on Mesa.

@Lithium64
Copy link

@loganmc10 @gonetz

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants