New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Vertex cache optimisation and GeoSphere render call reduction #1507

Merged
merged 4 commits into from Sep 16, 2012

Conversation

Projects
None yet
3 participants
@fluffyfreak
Contributor

fluffyfreak commented Sep 13, 2012

Purpose

The purpose of this patch is two-fold, first to reduce the number of calls to OpenGL to render parts of an index buffer from 5 to 1 and secondly to make that call closer to being optimal.

Future work

In future it should be possible to optimise LMR objects with the vcacheopt library

Method

There are 16 possible configurations that a terrain patch can be rendered in from having all neighbours at lower resolution through to all neighbours at the same resolution.

Because of this we can precalculate all 16 configurations and create index buffers for them, we then choose which index buffer to use based on a patches edgeFriend pointers and render all of the applicable vertices with a single draw call.

Once each of the index buffers has been created it is then optimised, at lower resolutions this makes very little difference but at the higher resolutions the index ordering is substantially different.

Issues

None, it works fine.

No really, what's wrong with it?

Oh, ok, so the way I decide how to assign which combination of edges to which index buffer was a bit... "evolved", i.e: there was no intelligent design about it.
So whilst it's a perfectly workable system I'm 100% sure that there's a better way of doing it, it's just there on the tip of my mind I just can't see it clearly yet!
Other than that it really is pretty solid I think :)

What else? C'mon...

Cripes ok, so the "vcacheopt" part isn't strictly necessary, you could just build the 16 index buffers and not bother to optimise them. The reason I don't just do that is that I started out trying to optimise the buffers using the vcacheopt system and only then did I realise the draw call reduction optimisation.

...that really is everything though ;)

fluffyfreak added some commits Sep 13, 2012

Implement the single index buffer rendering.
Chooses the relevant index buffer from a selection of 16 possible pre-generated index buffers.
Cleanup defunct index buffers.
Move index buffer selection into it's own const method.
Just fix a case where i was being dumb, had a pointer that was never …
…used, or more importantly initialised and I was deleting it on destruction. Obvious cause of crash.
@johnbartholomew

This comment has been minimized.

Show comment
Hide comment
@johnbartholomew

johnbartholomew Sep 13, 2012

Contributor

I've made some house-keeping changes on top of this, at johnbartholomew/geosphere-vcache.

  • vcacheopt.h moved to contrib/ and noted in AUTHORS.txt
  • Smart pointers instead of SAFE_DELETE_ARRAY macro
  • Got rid of the edgebools thing

Plus a few even more minor things.

Contributor

johnbartholomew commented Sep 13, 2012

I've made some house-keeping changes on top of this, at johnbartholomew/geosphere-vcache.

  • vcacheopt.h moved to contrib/ and noted in AUTHORS.txt
  • Smart pointers instead of SAFE_DELETE_ARRAY macro
  • Got rid of the edgebools thing

Plus a few even more minor things.

@johnbartholomew

This comment has been minimized.

Show comment
Hide comment
@johnbartholomew

johnbartholomew Sep 13, 2012

Contributor

Oh, and general review comments/opinion: Terrain still appears to work. I haven't tried to measure the performance difference. Basic idea is good. Don't really care about the vcacheopt thing, but I don't have any objection to it.

Contributor

johnbartholomew commented Sep 13, 2012

Oh, and general review comments/opinion: Terrain still appears to work. I haven't tried to measure the performance difference. Basic idea is good. Don't really care about the vcacheopt thing, but I don't have any objection to it.

@fluffyfreak

This comment has been minimized.

Show comment
Hide comment
@fluffyfreak

fluffyfreak Sep 13, 2012

Contributor

Nice changes! Glad you were able to see what needed changing about that bizarre edgeBools things I came up with - it was "sort of" logical and it worked as I hoped, I could see that it was just... bonkers but when you've come up with something like that it's hard to come back from the madness ;)

I'm unsure of how much a performance difference this will make in most situations we currently encounter, maybe we'll see a change up at the very high detail settings. Originally I worked on this stuff for the Orbital code I wrote because it generated such a flipping enormous amount of geometry.

Hopefully it'll pay off in the longer run, plus, it's just plain cleaner! :D

Contributor

fluffyfreak commented Sep 13, 2012

Nice changes! Glad you were able to see what needed changing about that bizarre edgeBools things I came up with - it was "sort of" logical and it worked as I hoped, I could see that it was just... bonkers but when you've come up with something like that it's hard to come back from the madness ;)

I'm unsure of how much a performance difference this will make in most situations we currently encounter, maybe we'll see a change up at the very high detail settings. Originally I worked on this stuff for the Orbital code I wrote because it generated such a flipping enormous amount of geometry.

Hopefully it'll pay off in the longer run, plus, it's just plain cleaner! :D

@Ae-2222

This comment has been minimized.

Show comment
Hide comment
@Ae-2222

Ae-2222 Sep 16, 2012

Contributor

Hey, nice work:)

I tested this briefly just now and terrain rendering seems to work fine:)

Some notes about the vcacheopt, after looking through this paper johnbartholomew found:

  • It's a statistical optimisation not an analytical method. It's designed to work reasonably, not optimally, over a range of arbitrary mesh topologies. The connectivity we have is just mainly a grid which is used intensively.
    • The scoring system was optimised for a set of test meshes and not specifically for our case
  • The ideal solution would probably be an analytical method. Given grids are very common there is probably a known proper solution.
    • JohnJ said the ACME for our current ordering is around 1.0 (IIRC). What ACME values might this be generating.. (in the simulated LRU cache and the actual GPU FIFO cache)
    • What's the theoretical limit for the ACME.. Each inner grid vertex is linked to 8 triangles - 2 triangles per grid square, so the ACME should be a bit more than 3/8.. perhaps half?
  • The benefit per vertex depends on the computations in the vertex shader (modelview+normal matrix multiplications+zhack mainly) compared to how many fragments per vertex we average during high load and on the ratio of fragment to vertex shader computational load, I guess. At the moment the integration for scattering and the lighting is done in the fragment shader.
    • An improvement may not be quite as noticeable, right now, as it might be when we move to a better version of scattering eventually that uses the vertex shader more (e.g. the Bruneton that was GPLed recently).

It's a nice point to consider, and would end up being useful in other cases such as huge cities:).

Contributor

Ae-2222 commented Sep 16, 2012

Hey, nice work:)

I tested this briefly just now and terrain rendering seems to work fine:)

Some notes about the vcacheopt, after looking through this paper johnbartholomew found:

  • It's a statistical optimisation not an analytical method. It's designed to work reasonably, not optimally, over a range of arbitrary mesh topologies. The connectivity we have is just mainly a grid which is used intensively.
    • The scoring system was optimised for a set of test meshes and not specifically for our case
  • The ideal solution would probably be an analytical method. Given grids are very common there is probably a known proper solution.
    • JohnJ said the ACME for our current ordering is around 1.0 (IIRC). What ACME values might this be generating.. (in the simulated LRU cache and the actual GPU FIFO cache)
    • What's the theoretical limit for the ACME.. Each inner grid vertex is linked to 8 triangles - 2 triangles per grid square, so the ACME should be a bit more than 3/8.. perhaps half?
  • The benefit per vertex depends on the computations in the vertex shader (modelview+normal matrix multiplications+zhack mainly) compared to how many fragments per vertex we average during high load and on the ratio of fragment to vertex shader computational load, I guess. At the moment the integration for scattering and the lighting is done in the fragment shader.
    • An improvement may not be quite as noticeable, right now, as it might be when we move to a better version of scattering eventually that uses the vertex shader more (e.g. the Bruneton that was GPLed recently).

It's a nice point to consider, and would end up being useful in other cases such as huge cities:).

@johnbartholomew

This comment has been minimized.

Show comment
Hide comment
@johnbartholomew

johnbartholomew Sep 16, 2012

Contributor

The ideal solution would probably be an analytical method. Given grids are very common there is probably a known proper solution.

Optimal Grid Rendering. There might be others, but that one was easy to find. It does want to know the cache size though.

Contributor

johnbartholomew commented Sep 16, 2012

The ideal solution would probably be an analytical method. Given grids are very common there is probably a known proper solution.

Optimal Grid Rendering. There might be others, but that one was easy to find. It does want to know the cache size though.

@fluffyfreak

This comment has been minimized.

Show comment
Hide comment
@fluffyfreak

fluffyfreak Sep 16, 2012

Contributor

Yes I've read that one, but we're not rendering a grid here.

There's an underlying set of gridded vertices but only one case where we're rendering all of it and 15 where we're rendering a variety of different edge triangulations that ignore some of the vertices.

Also, I picked this case because of A) the stated non-grid indexing above, B) as a test case before applying it to the object/model loader.

Contributor

fluffyfreak commented Sep 16, 2012

Yes I've read that one, but we're not rendering a grid here.

There's an underlying set of gridded vertices but only one case where we're rendering all of it and 15 where we're rendering a variety of different edge triangulations that ignore some of the vertices.

Also, I picked this case because of A) the stated non-grid indexing above, B) as a test case before applying it to the object/model loader.

@fluffyfreak

This comment has been minimized.

Show comment
Hide comment
@fluffyfreak

fluffyfreak Sep 16, 2012

Contributor

@robn etc
Would it be easier to integrate if I cherry-pick jpab's changes back into my branch or does it not matter?

Contributor

fluffyfreak commented Sep 16, 2012

@robn etc
Would it be easier to integrate if I cherry-pick jpab's changes back into my branch or does it not matter?

@johnbartholomew

This comment has been minimized.

Show comment
Hide comment
@johnbartholomew

johnbartholomew Sep 16, 2012

Contributor

Would it be easier to integrate if I cherry-pick jpab's changes back into my branch or does it not matter?

Doesn't really matter. I would've merged already but I was waiting for someone to do a little more testing/review, mostly to check I hadn't broken something when I got rid of the edgebools thing. I probably should have been more explicit.

Anyway, I guess we've had enough eyes on this now. I'll merge today.

Contributor

johnbartholomew commented Sep 16, 2012

Would it be easier to integrate if I cherry-pick jpab's changes back into my branch or does it not matter?

Doesn't really matter. I would've merged already but I was waiting for someone to do a little more testing/review, mostly to check I hadn't broken something when I got rid of the edgebools thing. I probably should have been more explicit.

Anyway, I guess we've had enough eyes on this now. I'll merge today.

@fluffyfreak

This comment has been minimized.

Show comment
Hide comment
@fluffyfreak

fluffyfreak Sep 16, 2012

Contributor

Woohoo! :D

Contributor

fluffyfreak commented Sep 16, 2012

Woohoo! :D

@Ae-2222

This comment has been minimized.

Show comment
Hide comment
@Ae-2222

Ae-2222 Sep 16, 2012

Contributor

B) as a test case before applying it to the object/model loader.

The index orders could probably be added to the model cache to speed-up start time..

The connectivity we have is just mainly a grid

It's close to to a grid, in spite of the edge stuff Tomm put in (even if all of the vertices in edge triangles miss the cache there might be so many inner triangles to make the ACME improvement worthwhile, assuming it's not as good now), so an analytical approach with a modification is something to keep in mind when we have heavier loads for the vertex shaders.

Contributor

Ae-2222 commented Sep 16, 2012

B) as a test case before applying it to the object/model loader.

The index orders could probably be added to the model cache to speed-up start time..

The connectivity we have is just mainly a grid

It's close to to a grid, in spite of the edge stuff Tomm put in (even if all of the vertices in edge triangles miss the cache there might be so many inner triangles to make the ACME improvement worthwhile, assuming it's not as good now), so an analytical approach with a modification is something to keep in mind when we have heavier loads for the vertex shaders.

@johnbartholomew johnbartholomew merged commit 442a531 into pioneerspacesim:master Sep 16, 2012

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment