Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Still strange rendering #3336

Closed
impaktor opened this issue Jan 26, 2015 · 42 comments
Closed

Still strange rendering #3336

impaktor opened this issue Jan 26, 2015 · 42 comments

Comments

@impaktor
Copy link
Member

So a continuation of #3308, the BBS is fixed as I mentioned there, but world view is still strange.
(On latest master, 77bd4bb, GNU/Linux)

ps2

@fluffyfreak
Copy link
Contributor

It looks to me as though the zbuffer clearing is just not happening on Linux. Perhaps the calls to glDepthRange used to clear it but for some reason they don't anymore?

Do a quick test and find the call in the renderer where glDepthRange lives and after it put a call to ClearDepthBuffer. If that works then it's not really the optimal solution but it might suffice.

@impaktor
Copy link
Member Author

Not sure I follow. As far as I can tell, glDepthRange is used in graphics/opengl/RendererGL.cpp wrapped in a SetDepthRange call, which in turn is used in some places in SectorView.cpp, which doesn't feel like the correct place for this to go wrong, since it happens without going into sector view.

Or should I muck about in SectorView.cpp?

@fluffyfreak
Copy link
Contributor

Hmm, damn that was my best guess. Basically I don't know why but it looks like on Linux the depth buffer is not being cleared. However I can't think of anywhere that has changed. The glDepthRange was just my best shot in the dark.

@impaktor
Copy link
Member Author

Perhaps something to do with #3306, as that does introduce changes to some of the code I looked at when following your advice.

@fluffyfreak
Copy link
Contributor

I've checked that before but I can't see any reason it would fail on Linux and not on Windows. Also there's no changes to the depth clearing.

I need to try this out on a Linux machine and just debug it.

@impaktor
Copy link
Member Author

Happy hunting!

@fluffyfreak
Copy link
Contributor

I am now having to compile older versions to find when the problem started. I'm totally baffled by this one :/

If you can find which version it actually started with that would help me greatly.

@impaktor
Copy link
Member Author

Will do.

@fluffyfreak
Copy link
Contributor

Ok, I think you're right that it might come in with #3306 but I am baffled as to why. I can't see what would cause it.

@fluffyfreak
Copy link
Contributor

Just trying to find the specific commit from that range to see exactly what happened, maybe that will help me figure this out.

@fluffyfreak
Copy link
Contributor

@impaktor ok I can't find an intermediate commit between robn@84ef0d7 and robn@8b5c721 that will compile so it's going to be a case of figuring out why it doesn't work.

The only other likely candidate is before the #3306 commits was 81e728f by me but I tested that one and it was ok.

Could you verify my findings?

@fluffyfreak
Copy link
Contributor

Looking at it, it's almost like the z-ordering is reversed.

It's too late tonight for me to do any more on this but I'll try to find some time asap.

@fluffyfreak
Copy link
Contributor

Hey @robn just wanted you to know about this bug we're having, I can't fathom why it's happening so wanted to loop you in for brighter ideas :)

I had a couple of thoughts before bed last night like:

  • possibly calling methods on the wrong renderer?
  • compilation/linking issue when dummy renderer not in project?
  • not setting depth order / range / clearing correctly?

That last one though I would expect to affect all systems but it seems to only be certain Linux machines/builds.
I'm testing the second idea right now but don't know how to test the first yet.

@impaktor
Copy link
Member Author

I can confirm that 81e728f works for me.

Also, I should mention that not all Linux systems experience this bug, as @jmf confirmed last night that master works for him.

@fluffyfreak
Copy link
Contributor

Do you have an AMD/Ati GPU?
What GPU does @jmf have?

I just tried it (I have AMD GPU) and it's like everything is ok for the first frame, and trashed after that.

@impaktor
Copy link
Member Author

% glxinfo | grep OpenGL
42:OpenGL vendor string: NVIDIA Corporation
43:OpenGL renderer string: GeForce 210/PCIe/SSE2
44:OpenGL version string: 3.3.0 NVIDIA 340.32
45:OpenGL shading language version string: 3.30 NVIDIA via Cg compiler

But I can't swear it isn't using some integrated thing on the board instead of the graphics card. I've done zero setup on this machine.

@fluffyfreak
Copy link
Contributor

Damn, I was starting to hope it was we could track that way.

@fluffyfreak
Copy link
Contributor

Ok that's interesting, if I disable planet rendering then everything works ok. I wonder if this is state being persisted when it shouldn't.

@robn
Copy link
Member

robn commented Jan 27, 2015

Is the repro is to start game, open BB, request launch, and note bits of terrain shape hanging around?

If so, its working fine here on:

  • OpenGL version 3.1.0 NVIDIA 340.65, running on NVIDIA Corporation GeForce GTX 770M/PCIe/SSE2
  • OpenGL version 3.3 (Core Profile) Mesa 10.4.2, running on Intel Open Source Technology Center Mesa DRI Intel(R) Haswell Mobile

@fluffyfreak testing the first point means just shoving calls to abort() in every method on the dummy renderer. But I'm beyond 100% sure that its not calling it - it wouldn't make any sense at all since there's nothing that can actually instantiate it.

@fluffyfreak
Copy link
Contributor

Yeah I've eliminated everything else now.
It's definitely the GeoSphere rendering. I just cannot fathom out why now :/

Nothing has changed in that section that I can see. Especially not with the dummy renderer changes. The biggest change I can see is an int to Uint32 in GeoSphereMaterial.cpp... that's about it.

@robn
Copy link
Member

robn commented Jan 27, 2015

Yeah, that looks rather benign.

Also couldn't reproduce it on the Mesa software renderer: OpenGL version 3.3 (Core Profile) Mesa 10.4.2, running on VMware, Inc. Gallium 0.4 on llvmpipe (LLVM 3.5, 256 bits)

@impaktor do you know how to drive git bisect? Since you can reproduce it you're probably best placed to try it.

@fluffyfreak
Copy link
Contributor

We know it worked in 81e728f and was broken at 8b5c721 it just doesn't make sense as to why it would be.

How can I force it to run with the Mesa software renderer? Might be able to eliminate it as a local build issue.

@robn
Copy link
Member

robn commented Jan 27, 2015

LIBGL_ALWAYS_SOFTWARE=1 ./pioneer

@fluffyfreak
Copy link
Contributor

hmm, I just get "Failed to set video mode". Guess I need the driver.

@robn
Copy link
Member

robn commented Jan 27, 2015

Probably libgl1-mesa-glx.

@robn
Copy link
Member

robn commented Jan 27, 2015

Yeah, I can't see anything interesting between those two commits either. The only vaguely interesting thing I saw (that is, enough to make me check some headers) is that glDepthRange takes GLclampd args while SetDepthRange takes doubles, but these should be compatible.

@fluffyfreak
Copy link
Contributor

Possibly range wrapping? I've seen that on some GPUs but not others to do with colours and alpha.
Still might not relevant as I don't think that the GeoSphere or related code uses it does it?

@robn
Copy link
Member

robn commented Jan 27, 2015

Possibly. But hang on, that's SectorView (yeah, didn't read properly). What could the UI be doing to bollocks it up?

@robn
Copy link
Member

robn commented Jan 27, 2015

Depth testing?

Grr, but I still don't understand what's changed.

@fluffyfreak
Copy link
Contributor

Exactly, there's nothing that should affect what we're seeing.
Later on I'll try just blitzing all of the rendering state before and after rendering the terrain to see if I can work back from a known good thing and identify it.

I find it rather telling though that it behaves differently for different makes/models of GPU/driver.

@fluffyfreak
Copy link
Contributor

That software rendering still doesn't work for me. I'll try again this evening.

@impaktor
Copy link
Member Author

@impaktor do you know how to drive git bisect? Since you can reproduce it you're probably best placed to try it.

Never used it, but I can learn. Is it still needed, since the suspected commits don't compile?

@laarmen
Copy link
Contributor

laarmen commented Jan 27, 2015

Bad @robn, non-compiling commits are EVIL!

On Tue, Jan 27, 2015 at 12:17 PM, Karl F notifications@github.com wrote:

@impaktor https://github.com/impaktor do you know how to drive git
bisect? Since you can reproduce it you're probably best placed to try it.

Never used it, but I can learn. Is it still needed, since the suspected
commits don't compile?


Reply to this email directly or view it on GitHub
#3336 (comment)
.

@robn
Copy link
Member

robn commented Jan 27, 2015

@impaktor Probably not, if they don't compile. Dang.

@laarmen Maybe. Refactoring in contained chunks with meaningful commit messages can complicate that a little. But bisecting across merges is tough anyway.

@jmf
Copy link
Contributor

jmf commented Jan 27, 2015

EDIT:
The bug doesn't occur on both of my computers. I guess that the working directory on my laptop was not clean and that caused the bugs to appear.

@impaktor
Copy link
Member Author

Just like a married couple, I don't have much to add that hasn't already been said, but to project the illusion of a successful life together, I took git bisect out for a ride today on our drive way to impress the neighbours. These are my findings:

8b5c721 compiles, has the bug
818588e linker error
b750e61 linker error
22d3804 linker error
6a0423e linker error
825918d linker error
84ef0d7 segfaults
81e728f works (as previously noted, dear)
008db46 segfaults
44be22a works

EDIT: Ops, wrong again, updated the list. What a surprise. (I don't know what we ever saw in each other to begin with.)

@impaktor
Copy link
Member Author

Seems like I'm not good enough for anything other than taking out the trash (read: "close issues"). I was wrong in my findings, updated the list accordingly, in the "Edit". I reached the same conclusions as @fluffyfreak.

(I want a divorce.)

@fluffyfreak
Copy link
Contributor

I'll try to find time to do a make clean build this weekend and see if that changes anything.

@impaktor do you still get the bug with @robn new #3339 PR?

@impaktor
Copy link
Member Author

@impaktor do you still get the bug with @robn new #3339 PR?

Oh, I haven't tried. Will test on Monday, if no one else beats me to it.

@fluffyfreak
Copy link
Contributor

@impaktor I just grabbed the latest on my LInux box, did a make clean and sudo make -j4 and it now runs fine without any glitches.

Might be worth trying latest and cleaning to see if your problem goes away like @jmf did.

@impaktor
Copy link
Member Author

impaktor commented Feb 1, 2015

Will do, first thing tomorrow.

@impaktor
Copy link
Member Author

impaktor commented Feb 2, 2015

Works.

@impaktor impaktor closed this as completed Feb 2, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants