New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Batch rendering for display objects #73

Merged
merged 6 commits into from Oct 8, 2018

Conversation

Projects
None yet
2 participants
@nadako

nadako commented Oct 1, 2018

This implements batched rendering for OpenFL display objects. The implementation is a bit dirty, but I tried to make it as less-intrusive as possible for now, but since we're on the verge of forking now anyway, we can refactor it together with everything else later.

It works like this:

  • When traversing the scene for rendering, in GLBitmap.render/GLShape.render instead of doing a draw call, we batch a Quad object (maintained by the display object) into the BatchRenderer instance.
  • After traversing, we call BatchRenderer.flush, which will join Quads that can be rendered together in groups. Each group can render multiple textures in a single drawcall (16 on my macbook and The Beast). It then iterates over these groups and draws them with a special shader that is able to select correct texture based on the id passed into the vertex buffer.
  • Sometimes, BatchRenderer.flush is called early to break the current batch. Currently that happens when masks are involved, because they need to change the stencil state and draw using a special mask shader. Also in GLTilemap, because I didn't implement quad batching for those as they will become obsolete in favor of normal display objects.

There's a limitation: currently, bitmaps from the same texture but with different smoothing modes will be drawn with the smoothing mode set by the first bitmap. Hopefully we won't see this in practice, but technically this behaviour is of course wrong. It can be fixed by breaking the group when smoothing mode changes and/or using WebGL 2 Sampler objects (so no IE/Edge support).

In future it would be interesting to experiment with instanced rendering for this, because we could save some CPU<->GPU bandwidth by supplying data per quad instance instead of per vertex.

@nadako nadako changed the title from Batch rendering for display objects (PREVIEW, DO NOT MERGE!) to Batch rendering for display objects Oct 5, 2018

@@ -38,6 +39,7 @@ class RenderSession {
public var gl(default, set):GLRenderContext;
// public var lockTransform:Bool;
public var renderer:AbstractRenderer;
public var batcher:BatchRenderer;

This comment has been minimized.

@nadako

nadako Oct 8, 2018

Ideally, this should be in GLRenderer, not in RenderSession, but it's super awkward to retrieve at the moment, because we'd have to cast renderer to GLRenderer first. In future, I think, we should simply get rid of RenderSession in and pass specific renderer instances to __render* methods.

@@ -64,6 +67,10 @@ class GLMaskManager extends AbstractMaskManager {
gl.colorMask (false, false, false, false);
mask.__renderGLMask (renderSession);
// flush batched mask renders, because we're changing state again
renderSession.batcher.flush ();

This comment has been minimized.

@nadako

nadako Oct 8, 2018

Currently we don't render masks with batcher, so this will exit early and do nothing.

@@ -141,6 +151,10 @@ class GLMaskManager extends AbstractMaskManager {
gl.colorMask (false, false, false, false);
mask.__renderGLMask (renderSession);
// flush batched mask renders, because we're changing state again
renderSession.batcher.flush ();

This comment has been minimized.

@nadako

nadako Oct 8, 2018

Currently we don't render masks with batcher, so this will exit early and do nothing.

@@ -215,8 +228,14 @@ class GLRenderer extends AbstractRenderer {
renderSession.allowSmoothing = (stage.quality != LOW);
renderSession.forceSmoothing = #if always_smooth_on_upscale (displayMatrix.a != 1 || displayMatrix.d != 1); #else false; #end
// setup projection matrix for the batcher as it's an uniform value for all the draw calls
renderSession.batcher.projectionMatrix = flipped ? projectionFlipped : projection;

This comment has been minimized.

@nadako

nadako Oct 8, 2018

Maybe it makes sense to do the same for "normal" shaders, not sure why we calculate projection on CPU when we can easily do that in the shader.

@@ -44,6 +44,9 @@ class GLTilemap {
if (tilemap.__tileArray == null || tilemap.__tileArray.length == 0) return;
// break the batch as we don't batch tilemaps for now
renderSession.batcher.flush ();

This comment has been minimized.

@nadako

nadako Oct 8, 2018

with batched rendering, we don't need Tilemap at all, so we don't bother supporting batch rendering for them. if/when we fork away, we can just remove Tilemap as it's not part of Flash API anyway.

vertexBufferDatas = [];
var i = 1, l = nextPow2(maxQuads);
while (i <= l) {
vertexBufferDatas.push(new Float32Array(i * 4 * MultiTextureShader.floatPerVertex));

This comment has been minimized.

@nadako

nadako Oct 8, 2018

I'm not sure if this makes sense actually. I ported that from PIXI code, but I'm now thinking that just having a single Float32Array of maximum size and updating it with bufferSubData would be cleaner and better. Need to test the performance though, so let's leave this for later.

import lime.graphics.opengl.GLTexture;
class TextureData {

This comment has been minimized.

@nadako

nadako Oct 8, 2018

This Texture/TextureData naming is quite poor, gotta think of something.
Basically, TextureData is a wrapper around GPU texture reference to store additional info used in the batcher, and Texture contains TextureData and the UV coordinates.

We use TextureDatas to check texture change in the batcher (so there should be only one TextureData object per texture), and we use Texture for actual texture mapping (because we can have different SubBitmapDatas referencing the same texture atlas).

This comment has been minimized.

@nadako

nadako Oct 8, 2018

PS I changed Texture to QuadTextureData - it's a less confusing name that describes actually what it is - information about texture when it comes to rendering a quad :)

}
var snapToPixel = renderSession.roundPixels || __snapToPixel ();
var transform = (cast renderSession.renderer : GLRenderer).getDisplayTransformTempMatrix (__renderTransform, snapToPixel);

This comment has been minimized.

@nadako

nadako Oct 8, 2018

Hmm, I wonder if we should pass GLRenderer here and access renderSession from its field instead...

@@ -301,6 +336,7 @@ class Bitmap extends DisplayObject implements IShaderDrawable {
smoothing = false;
__setRenderDirty ();
__batchQuadDirty = true;

This comment has been minimized.

@nadako

nadako Oct 8, 2018

Is there any other cases besides changing bitmapData or transform where we need to invalidate the quad? I guess not...

@@ -73,8 +74,6 @@ class SubBitmapData extends BitmapData {
readable = false;
image = null;
var gl = GL.context;

This comment has been minimized.

@nadako

nadako Oct 8, 2018

don't want to access statics that shouldn't exist in the first place :) so instead we forward the getTextrue call to the parent and not store texture/textureContext at all

vertexBufferDatas = [];
var i = 1, l = nextPow2(maxQuads);
while (i <= l) {
vertexBufferDatas.push(new Float32Array(i * 4 * MultiTextureShader.floatPerVertex));

This comment has been minimized.

@NenadBojkovski

NenadBojkovski Oct 8, 2018

Could you add this 4 number in a constant so it is gonna be more readable. it is Num of vertices right ?

setVertex(2);
setVertex(3);
vertexBufferIndex += MultiTextureShader.floatPerVertex * 4;

This comment has been minimized.

@NenadBojkovski

NenadBojkovski Oct 8, 2018

Please use a constant mentioned above for the number 4

}
}
if (nextTexture.textureUnitId == -1)
throw "WAT";

This comment has been minimized.

@NenadBojkovski

NenadBojkovski Oct 8, 2018

Please add a verbose error message.

This comment has been minimized.

@nadako

nadako Oct 8, 2018

This was actually more for debugging and it should not happen, but just in case, I'll add something more descriptive, good catch!

@nadako nadako merged commit b58c888 into develop Oct 8, 2018

@nadako nadako deleted the batch branch Oct 8, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment