Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Impeller] ☂️ Use hardware features to improve performance of advanced blends. #120223

Open
2 of 4 tasks
bdero opened this issue Feb 7, 2023 · 18 comments
Open
2 of 4 tasks
Labels
e: impeller Impeller rendering backend issues and features requests engine flutter/engine repository. See also e: labels. P2 Important issues not at the top of the work list team-engine Owned by Engine team triaged-engine Triaged by Engine team

Comments

@bdero
Copy link
Member

bdero commented Feb 7, 2023

In order to perform advanced blends today, we:

  1. end the pass,
  2. store the resolve texture,
  3. perform the advanced blend via FilterContents, reading the the resolved texture as the backdrop,
  4. create a new pass,
  5. draw the previously resolved texture to the new pass,
  6. draw the blend result to the new pass.

We can avoid this Rube Goldberg machine by taking better advantage of hardware/API features:

  • For OpenGL, advanced blends can be performed using builtin pipeline blend modes. OpenGL extension: KHR_blend_equation_advanced
    image
  • For Vulkan, advanced blends can also be performed using builtin pipeline blend modes. Vulkan extension: VK_EXT_blend_operation_advanced
    image
  • For Vulkan devices that don't support the advanced blend equation extension, we can read from the framebuffer by setting up passes. This is complicated because it requires subcaching VkPipelines with unique subpass variations.
  • For Metal on iOS, it's also possible to read from the framebuffer and perform the blend as part of the shader inline. I'm not sure this can be done with compiler magic in ImpellerC -- we'll probably need a special compilation path for these in the build system. Completed with: [Impeller] read from framebuffer for advanced blends on iOS. engine#39567
@bdero bdero added the e: impeller Impeller rendering backend issues and features requests label Feb 7, 2023
@bdero bdero self-assigned this Feb 7, 2023
@jonahwilliams
Copy link
Member

As a motivating example, this is the advanced blend/pull quote in wonderous timeline

image (10)

@bdero bdero changed the title [Impeller] Improve advanced blends [Impeller] Improve advanced blends by avoiding intermediary pass storage Feb 7, 2023
@jonahwilliams
Copy link
Member

breadcrumb: gpuweb/gpuweb#442

@bdero bdero removed their assignment Feb 7, 2023
@jonahwilliams
Copy link
Member

It looks like spirvcross supports this, but we'd need to specifically use MSL 2.0. For example:

// Copyright 2013 The Flutter Authors. All rights reserved.
// Use of this source code is governed by a BSD-style license that can be
// found in the LICENSE file.

#version 450
#extension GL_ARB_fragment_shader_interlock : require
layout(pixel_interlock_unordered) in;

layout(binding = 2, std430) coherent buffer Buffer
{
    int foo;
    uint bar;
} _30;

layout(binding = 0, rgba8) uniform writeonly image2D img;
layout(binding = 1, r32ui) uniform uimage2D img2;

void main()
{
    beginInvocationInterlockARB();
    imageStore(img, ivec2(0), vec4(1.0, 0.0, 0.0, 1.0));
    uint _27 = imageAtomicAdd(img2, ivec2(0), 1u);
    _30.foo += 42;
    uint _41 = atomicAnd(_30.bar, 255u);
    endInvocationInterlockARB();
}

And this generates:

#pragma clang diagnostic ignored "-Wmissing-prototypes"
#pragma clang diagnostic ignored "-Wunused-variable"

#include <metal_stdlib>
#include <simd/simd.h>
#include <metal_atomic>

using namespace metal;

// The required alignment of a linear texture of R32Uint format.
constant uint spvLinearTextureAlignmentOverride [[function_constant(65535)]];
constant uint spvLinearTextureAlignment = is_function_constant_defined(spvLinearTextureAlignmentOverride) ? spvLinearTextureAlignmentOverride : 4;
// Returns buffer coords corresponding to 2D texture coords for emulating 2D texture atomics
#define spvImage2DAtomicCoord(tc, tex) (((((tex).get_width() +  spvLinearTextureAlignment / 4 - 1) & ~( spvLinearTextureAlignment / 4 - 1)) * (tc).y) + (tc).x)

struct Buffer
{
    int foo;
    uint bar;
};

fragment void blend_fragment_main(volatile device Buffer& _RESERVED_IDENTIFIER_FIXUP_30 [[buffer(1), raster_order_group(0)]], texture2d<float, access::write> img [[texture(0), raster_order_group(0)]], texture2d<uint> img2 [[texture(1), raster_order_group(0)]], device atomic_uint* img2_atomic [[buffer(0), raster_order_group(0)]])
{
    img.write(float4(1.0, 0.0, 0.0, 1.0), uint2(int2(0)));
    uint _30 = atomic_fetch_add_explicit((device atomic_uint*)&img2_atomic[spvImage2DAtomicCoord(int2(0), img2)], 1u, memory_order_relaxed);
    uint _RESERVED_IDENTIFIER_FIXUP_27 = _30;
    _RESERVED_IDENTIFIER_FIXUP_30.foo += 42;
    uint _45 = atomic_fetch_and_explicit((volatile device atomic_uint*)&_RESERVED_IDENTIFIER_FIXUP_30.bar, 255u, memory_order_relaxed);
    uint _RESERVED_IDENTIFIER_FIXUP_41 = _45;
}


This requires setting MSL 2.0 in the compiler and probably in other places too

@jonahwilliams
Copy link
Member

All of command line options use --std=ios-metal1.2 too so we'd need to add new targets to build these

@bdero
Copy link
Member Author

bdero commented Feb 7, 2023

Note that we need to find a way to read from the actual framebuffer that's being written to. If we can't do that, then there's not a way to get around the need to use multiple render passes for custom blend behavior AFAICT (which is where the requirement to store/load the texture is coming from). Metal for iOS has a special feature that let's you read from the framebuffer (which isn't available on MacOS).

@jonahwilliams
Copy link
Member

I believe that can be done, see KhronosGroup/SPIRV-Cross#634

#extension GL_APPLE_shader_framebuffer_fetch : require
void main()
{
// RGB to grayscale
mediump float lum = dot(gl_LastFragData[0].rgb, vec3(0.30,0.59,0.11));
gl_FragColor = vec4(lum, lum, lum, 1.0);
}

becomes

fragment half4
paint_grayscale(half4 dst_color [[color(0)]])
{
// RGB to grayscale
half lum = dot(dst_color.rgb, half3(0.30h, 0.59h, 0.11h));
return half4(lum, lum, lum, 1.0h);
}

@jonahwilliams
Copy link
Member

Of couse combing the two I get '#extension' : extension not supported: GL_APPLE_shader_framebuffer_fetch. so we might need more tweaks to the build

@jonahwilliams
Copy link
Member

jonahwilliams commented Feb 7, 2023

ahh nvm, more caveats:

Right, for this case, SPIRV-Cross can already implement it.
You'd do something like set_remapped_variable_state for some input or subpassInput variable. Rename it to gl_LastFragData with set_name, then add the extension using require_extension.

Framebuffer fetch does not exist in SPIR-V, so you'll have to use a dummy input which can be repurposed as the LastFragData. A subpassInput is the closest analog to this.

@jonahwilliams
Copy link
Member

So in conclusion, it should be possible but we might need to hack support into impellerc for any rewriting

@danagbemava-nc danagbemava-nc added the engine flutter/engine repository. See also e: labels. label Feb 8, 2023
@chinmaygarde chinmaygarde changed the title [Impeller] Improve advanced blends by avoiding intermediary pass storage [Impeller] Improve advanced blends by avoiding intermediary pass storage. Feb 8, 2023
@jonahwilliams
Copy link
Member

So after more investigation, getting the correct code generated out of a particular SPIRV binary is a lot simpler than finding out how to get GLSL to compile to the correct SPIRV code. I have not found any particular mechanism to make this work after a few hours of investigation (that is par for the course though).

Some combination of decorations/variable names/ extensions might make this work but its not clear what that is.

@jonahwilliams
Copy link
Member

It may be easier to set up a metal source code pipeline that bypasses impellerc, especially since these particular shaders would only ever run on iOS. That leaves some questions about how to setup the bindings, but TBH I could probably write those by hand.

@jonahwilliams
Copy link
Member

I managed to get this compiling. Shaders don't actually work yet, but closer!

@bdero
Copy link
Member Author

bdero commented Feb 13, 2023

TL;DR: I recommend not using FilterContents for this

Up until now, the filter graph hasn't supported allowing filters to access the parent pass directly. flutter/engine#39560 makes it possible to return the Entity that will used to render the filter result to the parent pass, which makes it possible for the framebuffer reading solution to be implemented as a FilterContents -- however, there would be inconsistencies with using the filter graph to implement this path:

  1. FilterContents is intended for filtering: Taking a set of textures and converting them to a third texture (or now, an output render command). Reading from the RenderPass as an input is a major departure from this interface that will be difficult to reconcile. I.e. do we add placeholder inputs?
  2. The BlendFilterContents advanced blend case (which takes two FilterInputs) needs to remain as it is today to continue supporting the actual blend color filter. EntityPass just so happens to use BlendFilterContents to implement the advanced blend fallback case because it's convenient to do so; the needed rendering operation is compatible with the FilterContents protocol because the blend destination is a texture input.

For these reasons, I recommend implementing the framebuffer read solution as its own FramebufferBlendContents (just a regular Contents with SetSourceSnapshot and SetBlendMode).

TL;DR: Changes needed in EntityPass

Since the framebuffer read solution can only work on iOS (maybe it can work on M1 macs as well?), the current/fallback solution needs to continue working, and so separate branches for performing the framebuffer read solution needs to be added in EntityPass (and eventually, similar adjustments will need to be made for the simpler OpenGL/Vulkan extension solution):

  1. EntityPass::AddEntity needs to not increment reads_from_pass_texture_ when on iOS, otherwise the last pass of the layer will end up unnecessarily storing the stencil.
  2. EntityPass::OnRender needs a special path for handling iOS advanced blends using the new FramebufferBlendContents (this is the code that intercepts Entities to handle advanced blends). In particular, the new branch shouldn't call pass_context.EndPass(), which is the special sauce that makes the "slow load" happen in the fallback solution (explanation of how it works below).
How the fallback solution for advanced blends works:

After the current pass is ended with pass_context.EndPass(), the next time we're about to render an entity on the same layer (which, keep in mind, may not happen if the Entity is culled), a new pass gets created and "slow loaded" using a pipeline draw. This was intentionally designed to make slotting in specialized advanced blend solutions relatively easy while continuing to allow all of the surrounding optimizations + fallback machinery to work on any platforms where we need it.

@chinmaygarde chinmaygarde added the P2 Important issues not at the top of the work list label Feb 13, 2023
@jonahwilliams
Copy link
Member

This is now done for iOS. Remaining platforms will need to wait until we get to them

@jonahwilliams
Copy link
Member

xref: flutter/engine#39567

@flutter-triage-bot flutter-triage-bot bot added team-engine Owned by Engine team triaged-engine Triaged by Engine team labels Jul 8, 2023
@chinmaygarde chinmaygarde added this to the Impeller on Android milestone Sep 7, 2023
@bdero bdero changed the title [Impeller] Improve advanced blends by avoiding intermediary pass storage. [Impeller] ☂️ User hardware features to improve performance of advanced blends. Sep 7, 2023
@chinmaygarde chinmaygarde self-assigned this Sep 12, 2023
@chinmaygarde chinmaygarde changed the title [Impeller] ☂️ User hardware features to improve performance of advanced blends. [Impeller] ☂️ Use hardware features to improve performance of advanced blends. Sep 13, 2023
auto-submit bot pushed a commit to flutter/engine that referenced this issue Oct 10, 2023
flutter/flutter#120223 for OpenGLES. Checks for support of the framebuffer fetch extension: https://registry.khronos.org/OpenGL/extensions/EXT/EXT_shader_framebuffer_fetch.txt . This is supported on a Pixel 6 at least, we should double check the distribution of the extension.

![d3c](https://github.com/flutter/engine/assets/8975114/d2392dc8-e1b1-4084-ac5d-c5744c651a39)
harryterkelsen pushed a commit to flutter/engine that referenced this issue Oct 23, 2023
flutter/flutter#120223 for OpenGLES. Checks for support of the framebuffer fetch extension: https://registry.khronos.org/OpenGL/extensions/EXT/EXT_shader_framebuffer_fetch.txt . This is supported on a Pixel 6 at least, we should double check the distribution of the extension.

![d3c](https://github.com/flutter/engine/assets/8975114/d2392dc8-e1b1-4084-ac5d-c5744c651a39)
@jonahwilliams
Copy link
Member

I investigated using the Vulkan and Open advanced blend extensions in flutter/engine#47131

What I found was: The pIxel 4xl claimed to support the extension, but downloading a necessary driver update removed support for it.

When targetin openGL I've not been able to get the shader source required for the feature accepted, as we target GLSL 100 which does not support layout qualifiers.

@chinmaygarde chinmaygarde removed their assignment Dec 1, 2023
auto-submit bot pushed a commit to flutter/engine that referenced this issue Dec 6, 2023
…TACHMENT_ACCESS (#48458)

Support framebuffer fetch on devices that have the extension VK_ARM_RASTERIZATION_ORDER_ATTACHMENT_ACCESS which gives us a fairly easy way to add subpass self dependencies.

Part of flutter/flutter#120223
@chinmaygarde
Copy link
Member

This is complete for Metal and Vulkan.

@chinmaygarde
Copy link
Member

@bdero @jonahwilliams Does VK_EXT_blend_operation_advanced need to be attempted for Vulkan? Can we cross that out in the original list?

@chinmaygarde chinmaygarde removed this from the Impeller on Android milestone Apr 2, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
e: impeller Impeller rendering backend issues and features requests engine flutter/engine repository. See also e: labels. P2 Important issues not at the top of the work list team-engine Owned by Engine team triaged-engine Triaged by Engine team
Projects
No open projects
Status: 🛠️ Internals
Development

No branches or pull requests

4 participants