what's NVGpathCache? #216

Open
wtholliday opened this Issue Apr 5, 2015 · 19 comments

Comments

Projects
None yet
5 participants
@wtholliday

Just curious is this is already doing some caching of path tessellations? (A lot of my runtime is spent in nvg__tesselateBezier, tessellating the same paths).

Thanks!

@memononen

This comment has been minimized.

Show comment
Hide comment
@memononen

memononen Apr 5, 2015

Owner

It is only used to cache the points per frame.

Owner

memononen commented Apr 5, 2015

It is only used to cache the points per frame.

@wtholliday

This comment has been minimized.

Show comment
Hide comment
@wtholliday

wtholliday Apr 5, 2015

Sorry, what do you mean by "cache the points per frame"?

Sorry, what do you mean by "cache the points per frame"?

@memononen

This comment has been minimized.

Show comment
Hide comment
@memononen

memononen Apr 5, 2015

Owner

That is, it is the buffer that holds the tessellated points for rendering.

Owner

memononen commented Apr 5, 2015

That is, it is the buffer that holds the tessellated points for rendering.

@wtholliday

This comment has been minimized.

Show comment
Hide comment
@wtholliday

wtholliday Apr 5, 2015

Ok, cool. Do you have a sense of whether it would be a good idea for me to add something that caches tessellations between frames? I was thinking of hashing the path commands (I'd prefer that to having the client need to keep track of a path object).

Ok, cool. Do you have a sense of whether it would be a good idea for me to add something that caches tessellations between frames? I was thinking of hashing the path commands (I'd prefer that to having the client need to keep track of a path object).

@memononen

This comment has been minimized.

Show comment
Hide comment
@memononen

memononen Apr 5, 2015

Owner

Probably not a bad idea. I postponed the geometry caching in order to see how the GL3 backend evolved. Also, I did not want to add any more memory consumption.

The smallest amount of data to hash is the commands ctx->ncommands, and I would recommend to cache the results of nvg__expandFill and nvg__expandStroke. It means to also include the stroke width and line-join in the hash too. A really smart cache would allow affine transformation of the path data too, but that is a bit mot involved.

Owner

memononen commented Apr 5, 2015

Probably not a bad idea. I postponed the geometry caching in order to see how the GL3 backend evolved. Also, I did not want to add any more memory consumption.

The smallest amount of data to hash is the commands ctx->ncommands, and I would recommend to cache the results of nvg__expandFill and nvg__expandStroke. It means to also include the stroke width and line-join in the hash too. A really smart cache would allow affine transformation of the path data too, but that is a bit mot involved.

@ytsedan

This comment has been minimized.

Show comment
Hide comment
@ytsedan

ytsedan Apr 6, 2015

Hi,

I'm currently working on this too. So you might want to check out my fork. I did not want to create a pull request yet as I still got some problems with antialiasing when scaling baked paths correctly - the most important features should work though. See https://github.com/ytsedan/nanovg.

My fork adds the following features:

  1. display lists. Issue #134/#113
  2. font fallbacks. Issue #193
  3. line breaks at hyphen
  4. pointer to next glyph in nvgTextGlyphPositions
  5. stb libs updated

Here is some more detail.

  1. Display lists

Display lists are implemented in the front end. All calls to the back end render functions (renderFill, etc.) are cached if a display list is bound. In my use case, I still want to be able to change the transform and alpha of the baked paths when finally rendering the display list. Imagine animating the position or alpha of a UI button without re-tesselating the button geometry or the button's label text layout. That's why I did a major change to nanovg, which might break some scenarios. Instead of transforming the vertices in the front end on the CPU before passing them to the back end, all vertex transformations are now performed in the vertex shader. That means I had to change the interface for the back end (which will break other back end, e.g. DX11). Also, it is no longer possible to change the transform while constructing a path, e.g. changing the transform between two calls to nvgLineTo - which is ok for me and you can still transform your points before passing them to nanovg. I added a define that marks all the code changes releated to this feature (see NVG_TRANSFORM_IN_VERTEX_SHADER). I hope I got all antialiasing handling correctly (as far as possible without retesselating) I had to change the tesselation of butt/square caps for this.

When drawing the display list the baked paths are still copied to the back end's vertex buffer - I thought a while if it possible to avoid this additional copy, but could not come up with a simple solution that does not require a lot more front end back end interaction (like different VBO for less dynamic stuff). I also added a little unit test to the demo app (see CACHE define in gl2 demo) which renders the demo scene every 60th frame to the cache and in all other frames just the display list with an animated transform. Note, when rendering the same animation directly without draw lists the text layouting begins du 'shake', probably due to pixel alignment/rounding when scaling the scene; so this is another advantage of using the lists.

  1. Font fallbacks

A font fallback can be used to basically mix multiple fonts: e.g. use the system font for rendering text and use a custom icon font for glyphs in a certain unicode range. The advantage of doing this directly in fontstash is, that text layouting just works. This really comes in handy when writing tool tips that include icons etc. You probably can use this feature for chinese symbols too.

  1. Line breaks at hyphen

The text layouting did not break lines at hyphens, which is a great feature to have especially for languages that can have really long words.

  1. Pointer to next glyph

I needed this to call nvgTextGlyphPositions with a fixed size array of NVGglyphPosition. When returning the next iterator you can call nvgTextGlyphPositions in chunks of e.g. 32 glyphs (similar the next row pointer in nvgTextBreakLines).

Thanks for this great library .. I'll create a pull request once I have tested the new features a little better; but feel free to review my changes.

ytsedan commented Apr 6, 2015

Hi,

I'm currently working on this too. So you might want to check out my fork. I did not want to create a pull request yet as I still got some problems with antialiasing when scaling baked paths correctly - the most important features should work though. See https://github.com/ytsedan/nanovg.

My fork adds the following features:

  1. display lists. Issue #134/#113
  2. font fallbacks. Issue #193
  3. line breaks at hyphen
  4. pointer to next glyph in nvgTextGlyphPositions
  5. stb libs updated

Here is some more detail.

  1. Display lists

Display lists are implemented in the front end. All calls to the back end render functions (renderFill, etc.) are cached if a display list is bound. In my use case, I still want to be able to change the transform and alpha of the baked paths when finally rendering the display list. Imagine animating the position or alpha of a UI button without re-tesselating the button geometry or the button's label text layout. That's why I did a major change to nanovg, which might break some scenarios. Instead of transforming the vertices in the front end on the CPU before passing them to the back end, all vertex transformations are now performed in the vertex shader. That means I had to change the interface for the back end (which will break other back end, e.g. DX11). Also, it is no longer possible to change the transform while constructing a path, e.g. changing the transform between two calls to nvgLineTo - which is ok for me and you can still transform your points before passing them to nanovg. I added a define that marks all the code changes releated to this feature (see NVG_TRANSFORM_IN_VERTEX_SHADER). I hope I got all antialiasing handling correctly (as far as possible without retesselating) I had to change the tesselation of butt/square caps for this.

When drawing the display list the baked paths are still copied to the back end's vertex buffer - I thought a while if it possible to avoid this additional copy, but could not come up with a simple solution that does not require a lot more front end back end interaction (like different VBO for less dynamic stuff). I also added a little unit test to the demo app (see CACHE define in gl2 demo) which renders the demo scene every 60th frame to the cache and in all other frames just the display list with an animated transform. Note, when rendering the same animation directly without draw lists the text layouting begins du 'shake', probably due to pixel alignment/rounding when scaling the scene; so this is another advantage of using the lists.

  1. Font fallbacks

A font fallback can be used to basically mix multiple fonts: e.g. use the system font for rendering text and use a custom icon font for glyphs in a certain unicode range. The advantage of doing this directly in fontstash is, that text layouting just works. This really comes in handy when writing tool tips that include icons etc. You probably can use this feature for chinese symbols too.

  1. Line breaks at hyphen

The text layouting did not break lines at hyphens, which is a great feature to have especially for languages that can have really long words.

  1. Pointer to next glyph

I needed this to call nvgTextGlyphPositions with a fixed size array of NVGglyphPosition. When returning the next iterator you can call nvgTextGlyphPositions in chunks of e.g. 32 glyphs (similar the next row pointer in nvgTextBreakLines).

Thanks for this great library .. I'll create a pull request once I have tested the new features a little better; but feel free to review my changes.

@wtholliday

This comment has been minimized.

Show comment
Hide comment
@wtholliday

wtholliday Apr 6, 2015

@ytsedan, cool stuff! I see you pass a scale factor into nvg__tesselateBezier to get the tolerance right.

Tessellation in local space would definitely be better for my path hashing idea (I'd like to avoid having to keep track of display lists in my client code). What sort of performance improvements are you seeing in the demo?

@ytsedan, cool stuff! I see you pass a scale factor into nvg__tesselateBezier to get the tolerance right.

Tessellation in local space would definitely be better for my path hashing idea (I'd like to avoid having to keep track of display lists in my client code). What sort of performance improvements are you seeing in the demo?

@ytsedan

This comment has been minimized.

Show comment
Hide comment
@ytsedan

ytsedan Apr 9, 2015

Tesselating the whole demo scene vs. just copying the baked paths to the backend is more than 10x faster, just talking about the CPU side. But I haven't measured alot yet; FPS increase is not equally high.

For hashing you will still have to fill the command buffers in the front end to calculate the hash etc. so that will probably be slower. Other problem with hashing is, you need some simple kind of garbage collection to free no longer used paths, but depends on how you want to use that. For my UI use case the display lists work pretty well.

ytsedan commented Apr 9, 2015

Tesselating the whole demo scene vs. just copying the baked paths to the backend is more than 10x faster, just talking about the CPU side. But I haven't measured alot yet; FPS increase is not equally high.

For hashing you will still have to fill the command buffers in the front end to calculate the hash etc. so that will probably be slower. Other problem with hashing is, you need some simple kind of garbage collection to free no longer used paths, but depends on how you want to use that. For my UI use case the display lists work pretty well.

@wtholliday

This comment has been minimized.

Show comment
Hide comment
@wtholliday

wtholliday Apr 16, 2015

@ytsedan 10x! Great!

My thought for collecting the paths was just to delete any that aren't used by the time nvgEndFrame is called. That's rather eager, but should work fine for most cases.

@ytsedan 10x! Great!

My thought for collecting the paths was just to delete any that aren't used by the time nvgEndFrame is called. That's rather eager, but should work fine for most cases.

@hugoam

This comment has been minimized.

Show comment
Hide comment
@hugoam

hugoam Apr 28, 2015

After noticing my Ui renderer sometimes spend ~50 to ~90% of the profiler time in nvg__tesselateBezier, I'm really interested by your changes ytsedan. Caching stuff simply makes more sense in the context of an Ui library anyway. Can we imagine your changes ever being merged into nanovg vanilla ? I guess that depends on Mikko ?
I'm gonna give a try at your fork and see how it's working out

EDIT : just switched my Ui elements to one list per element and I've got about 5X performance improvement. Congrats for the work !

hugoam commented Apr 28, 2015

After noticing my Ui renderer sometimes spend ~50 to ~90% of the profiler time in nvg__tesselateBezier, I'm really interested by your changes ytsedan. Caching stuff simply makes more sense in the context of an Ui library anyway. Can we imagine your changes ever being merged into nanovg vanilla ? I guess that depends on Mikko ?
I'm gonna give a try at your fork and see how it's working out

EDIT : just switched my Ui elements to one list per element and I've got about 5X performance improvement. Congrats for the work !

@GValiente

This comment has been minimized.

Show comment
Hide comment
@GValiente

GValiente Jul 2, 2015

Hi!

I'm working on a cocos-like game engine based on nanovg.

I've tried @ytsedan 's fork to improve performance, but there are some issues when rendering using flip transforms even without using display lists:

if(mFlipX)
{
    if(mFlipY)
    {
        const float otherTransform[6] = { -1, 0, 0, -1, 0, 0 }; // WORKS
        nvgTransformPremultiply(transform, otherTransform);
    }
    else
    {
        const float otherTransform[6] = { -1, 0, 0, 1, 0, 0 }; // NOTHING SHOWS
        nvgTransformPremultiply(transform, otherTransform);
    }
}
else if(mFlipY)
{
    const float otherTransform[6] = { 1, 0, 0, -1, 0, 0 }; // NOTHING SHOWS
    nvgTransformPremultiply(transform, otherTransform);
}

The problem dissapears disabling NVG_TRANSFORM_IN_VERTEX_SHADER.

Any ideas?

Hi!

I'm working on a cocos-like game engine based on nanovg.

I've tried @ytsedan 's fork to improve performance, but there are some issues when rendering using flip transforms even without using display lists:

if(mFlipX)
{
    if(mFlipY)
    {
        const float otherTransform[6] = { -1, 0, 0, -1, 0, 0 }; // WORKS
        nvgTransformPremultiply(transform, otherTransform);
    }
    else
    {
        const float otherTransform[6] = { -1, 0, 0, 1, 0, 0 }; // NOTHING SHOWS
        nvgTransformPremultiply(transform, otherTransform);
    }
}
else if(mFlipY)
{
    const float otherTransform[6] = { 1, 0, 0, -1, 0, 0 }; // NOTHING SHOWS
    nvgTransformPremultiply(transform, otherTransform);
}

The problem dissapears disabling NVG_TRANSFORM_IN_VERTEX_SHADER.

Any ideas?

@ytsedan

This comment has been minimized.

Show comment
Hide comment
@ytsedan

ytsedan Jul 4, 2015

My guess: might be the backface culling. Typically nanovg transforms the vertices prior to generating triangles, which ensures that there are no backfacing polygons. If you use NVG_TRANSFORM_IN_VERTEX_SHADER however, the vertices are transformed later - so triangles may be backfacing. You could try to remove all glEnable(GL_CULL_FACE); in the nanovg source to see if this makes any difference.

ytsedan commented Jul 4, 2015

My guess: might be the backface culling. Typically nanovg transforms the vertices prior to generating triangles, which ensures that there are no backfacing polygons. If you use NVG_TRANSFORM_IN_VERTEX_SHADER however, the vertices are transformed later - so triangles may be backfacing. You could try to remove all glEnable(GL_CULL_FACE); in the nanovg source to see if this makes any difference.

@GValiente

This comment has been minimized.

Show comment
Hide comment
@GValiente

GValiente Jul 4, 2015

Removing all glEnable(GL_CULL_FACE); calls fixes the problem with direct rendering, but it appears again when display lists are used.

Removing all glEnable(GL_CULL_FACE); calls fixes the problem with direct rendering, but it appears again when display lists are used.

@memononen

This comment has been minimized.

Show comment
Hide comment
@memononen

memononen Jul 5, 2015

Owner

Sorry for not replying earlier to this thread, building a house has taken it's toll to my free time :) A couple of quick answers:

  • I think the idea of displaylist + transform on vertex shader is good, I have to look in detail how you handled AA-fringe
  • If the determinant of the top 2x2 matrix is negative, culling should be reversed.
  • I have been thinking of removing the pixel rounding and use the stb_truetype oversampling instead
  • I wonder if the text fallback could be implemented as rich text? Any good APIs for that (html? ;))
Owner

memononen commented Jul 5, 2015

Sorry for not replying earlier to this thread, building a house has taken it's toll to my free time :) A couple of quick answers:

  • I think the idea of displaylist + transform on vertex shader is good, I have to look in detail how you handled AA-fringe
  • If the determinant of the top 2x2 matrix is negative, culling should be reversed.
  • I have been thinking of removing the pixel rounding and use the stb_truetype oversampling instead
  • I wonder if the text fallback could be implemented as rich text? Any good APIs for that (html? ;))
@ytsedan

This comment has been minimized.

Show comment
Hide comment
@ytsedan

ytsedan Jul 7, 2015

No problem. Feel free to review my fork, if you find some time. I changed a lot of stuff that I needed for my project, that might not be of interest for everybody though (like a fast and simple way to render rectangles).

There a still some problems with the caching: e.g. when a display cache contains text and therefore references the glyph atlas, it will become invalid if the atlas needs to grow and is recreated in the endFrame method. I just added a utility function to poll if the atlas will change, so that I can also recreate the cached paths .. but that's only a workaround. Also combining a cached scissor with the current scissor is only correct if both are axis aligned (otherwise clip shape would not be rectangle, I guess).

HTML? Seriously ;) I actually like the simplicity of defining a unicode range. Though I already thought about supporting soft hyphen in LaTeX style \-

@GValiente I cannot reproduce the problem. I tried to applied different transforms to the cached demo scene and it worked.

ytsedan commented Jul 7, 2015

No problem. Feel free to review my fork, if you find some time. I changed a lot of stuff that I needed for my project, that might not be of interest for everybody though (like a fast and simple way to render rectangles).

There a still some problems with the caching: e.g. when a display cache contains text and therefore references the glyph atlas, it will become invalid if the atlas needs to grow and is recreated in the endFrame method. I just added a utility function to poll if the atlas will change, so that I can also recreate the cached paths .. but that's only a workaround. Also combining a cached scissor with the current scissor is only correct if both are axis aligned (otherwise clip shape would not be rectangle, I guess).

HTML? Seriously ;) I actually like the simplicity of defining a unicode range. Though I already thought about supporting soft hyphen in LaTeX style \-

@GValiente I cannot reproduce the problem. I tried to applied different transforms to the cached demo scene and it worked.

@memononen

This comment has been minimized.

Show comment
Hide comment
@memononen

memononen Jul 7, 2015

Owner

I think fast way to render a quad / blit image and 9-sprite should be part of the API.

html, why not? ;) More seriously, one option would be to add a bit more state and allow text spans, something like this:

nvgBeginText(vg, x,y);
nvgTextColor(vg, ...);
nvgTextSpan(vg, "Red ", NULL);
nvgTextColor(vg, ...);
nvgTextSpan(vg, "Blue", NULL);
nvgEndText(vg);

Ditto for text box. Multiline text will become even more complicated, though. Maybe it would be better to just allow easy co-operation with pango :)

Owner

memononen commented Jul 7, 2015

I think fast way to render a quad / blit image and 9-sprite should be part of the API.

html, why not? ;) More seriously, one option would be to add a bit more state and allow text spans, something like this:

nvgBeginText(vg, x,y);
nvgTextColor(vg, ...);
nvgTextSpan(vg, "Red ", NULL);
nvgTextColor(vg, ...);
nvgTextSpan(vg, "Blue", NULL);
nvgEndText(vg);

Ditto for text box. Multiline text will become even more complicated, though. Maybe it would be better to just allow easy co-operation with pango :)

@GValiente

This comment has been minimized.

Show comment
Hide comment
@GValiente

GValiente Sep 13, 2015

Finally! :D

The engine uses display lists for rendering everything except when a node (a bunch of NanoVG primitives):

  • Has an scissor assigned (the node is not rendered with display lists).
  • Is flipped horizontally or vertically (the node is not rendered with display lists).
  • Is showing transparent text or images (the node renders the content more transparent with display lists).

Without these exceptions, the output of the two branches seems to be the same.

@ytsedan I'll try to reproduce the problem when I have time.

Finally! :D

The engine uses display lists for rendering everything except when a node (a bunch of NanoVG primitives):

  • Has an scissor assigned (the node is not rendered with display lists).
  • Is flipped horizontally or vertically (the node is not rendered with display lists).
  • Is showing transparent text or images (the node renders the content more transparent with display lists).

Without these exceptions, the output of the two branches seems to be the same.

@ytsedan I'll try to reproduce the problem when I have time.

@ytsedan

This comment has been minimized.

Show comment
Hide comment
@ytsedan

ytsedan Sep 20, 2015

Just some quick notes:

  • The current scissor when drawing to a display list is cached. When later drawing the display, it tries to combine the cached scissor with one now set (which can be a different one). This might not be what you expect here (?) So just reset the scissor when drawing to the display list and set it only when drawing the list. (Note: combining scissors also does not work if your transform contains a rotation; combining rotated scissor rects would result in more complex clipping shapes that are not supported by nanovg)
  • Flipped transforms is - as discussed - related to backface culling (disable culling all together or reverse in this case as Mikko suggested)
  • The alpa "problem" is kind of similar to the scissor issue. The display list will cache the current transparency when it is bound; when drawing the display list it will combine the global alpha that is now set with cached one. This allows changing the transparency of a the display list's content without having to recreate anything. Suggestion: just draw to the display list with global alpha = 1.0 and set the alpha when drawing the list.

Another known issue you should be aware of is, that text will be rendered incorrectly if the glyph atlas grows (as pointed out above). A workaround is to call nvgFindOutdatedDisplayListResources before nvgEndFrame and invalidate all display lists that use text.

ytsedan commented Sep 20, 2015

Just some quick notes:

  • The current scissor when drawing to a display list is cached. When later drawing the display, it tries to combine the cached scissor with one now set (which can be a different one). This might not be what you expect here (?) So just reset the scissor when drawing to the display list and set it only when drawing the list. (Note: combining scissors also does not work if your transform contains a rotation; combining rotated scissor rects would result in more complex clipping shapes that are not supported by nanovg)
  • Flipped transforms is - as discussed - related to backface culling (disable culling all together or reverse in this case as Mikko suggested)
  • The alpa "problem" is kind of similar to the scissor issue. The display list will cache the current transparency when it is bound; when drawing the display list it will combine the global alpha that is now set with cached one. This allows changing the transparency of a the display list's content without having to recreate anything. Suggestion: just draw to the display list with global alpha = 1.0 and set the alpha when drawing the list.

Another known issue you should be aware of is, that text will be rendered incorrectly if the glyph atlas grows (as pointed out above). A workaround is to call nvgFindOutdatedDisplayListResources before nvgEndFrame and invalidate all display lists that use text.

@GValiente

This comment has been minimized.

Show comment
Hide comment
@GValiente

GValiente Sep 20, 2015

@ytsedan Thanks for the quick notes :)

Steps to reproduce the flipped transforms problem:

  1. Remove all glEnable(GL_CULL_FACE); calls in nanovg_gl.h

  2. Create a nvgFlipX function in nanovg:

void nvgFlipX(NVGcontext* ctx)
{
NVGstate* state = nvg__getState(ctx);
float t[6] = { -1, 0, 0, 1, 0, 0 };
nvgTransformPremultiply(state->xform, t);

#if NVG_TRANSFORM_IN_VERTEX_SHADER
nvgTransformInverse(state->invxform, state->xform);
#endif
}

  1. Add to example_gl2.c:

if (cache) //draw display list to screen with custom transform
{
nvgFlipX(vg);

float scale = 0.8f;
// ...

}

@ytsedan Thanks for the quick notes :)

Steps to reproduce the flipped transforms problem:

  1. Remove all glEnable(GL_CULL_FACE); calls in nanovg_gl.h

  2. Create a nvgFlipX function in nanovg:

void nvgFlipX(NVGcontext* ctx)
{
NVGstate* state = nvg__getState(ctx);
float t[6] = { -1, 0, 0, 1, 0, 0 };
nvgTransformPremultiply(state->xform, t);

#if NVG_TRANSFORM_IN_VERTEX_SHADER
nvgTransformInverse(state->invxform, state->xform);
#endif
}

  1. Add to example_gl2.c:

if (cache) //draw display list to screen with custom transform
{
nvgFlipX(vg);

float scale = 0.8f;
// ...

}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment