New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
libobs/graphics, libobs-opengl, libobs-d3d11, docs/sphinx: Introduce effect results #4351
Conversation
I understand there may be a lot going here, and apologize in advance for any heavy-handedness! We really hope to make image template matching easily accessible to users, and I'm hoping to get the ball rolling with this PR. Looking forward to your feedback. |
For future reference, something this substantial should likely have been submitted and discussed as an RFC to avoid unnecessary work on your end. As an additional FYI, we're gearing up for v27 release very soon, so please be patient with us until we have time to do a full review. Thanks! |
I was not aware of the RFCs. I understand that the cumulative sum of the changes is substantial and see that an RFC could be more appropriate. There is a standalone page with the in-depth overview of the changes, which may be easier to read than the expandable section I've pasted into the PR description: Please, do let me know if there is anything I can do to help people review, test and consider these changes. |
@@ -741,7 +752,7 @@ static bool gl_shader_buildstring(struct gl_shader_parser *glsp) | |||
return false; | |||
} | |||
|
|||
dstr_copy(&glsp->gl_string, "#version 330\n\n"); | |||
dstr_copy(&glsp->gl_string, "#version 460\n\n"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No associated changes with our required openGL version? I presume it takes an OpenGL 4.6 context to compile glsl 4.6?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No associated changes with our required openGL version? I presume it takes an OpenGL 4.6 context to compile glsl 4.6?
I know that it works fine when testing on Windows and Linux, and I see no warnings about shader version and context mismatch. Looking more into what is appropriate...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do not have exact documentation confirming this precisely, but it seems possible for an older GL context to compile a shader with a newer version: https://stackoverflow.com/questions/35354496/why-does-my-version-330-shader-run-on-older-opengl
It could be more appropriate to force the required GL version to be 4.6. It appears a newer version of the GLAD library would be needed, because the one currently in the build tree seems to only have awareness of 4.3.
I think ideally a 4.6 version will be requested for GLSL programs that need results, and an older 3.3 could be used on everything else. This would be nice because users with older hardware can continue using OBS, and will only run into problems when newer features, such as results, are requested by plugins.
I have tried, in the past, to implement atomic counters using an ARB extension to 3.30 instead of upping the version to 4.6. This, unfortunately, has somehow ended up breaking shaders for the existing effects, and so has created more problems than it has fixed.
I really hope that upping the GL version will not be a deal breaker. It could be good to observe the behavior on older hardware that only supports 3.30, but I do not exactly have access to that. The closest I have is a linux machine that runs mesa drivers and they only support version 4.3. I will have to confirm this to be sure, but what I remember happens is this: When running OBS (with our changes) everything that does not require 4.6 works (even when shaders request version 460). But then when features needed for the atomic counters are actually used in the shader the effect will break, and only that effect. Other effects and the rest of OBS would continue working fine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
macOS deprecated OpenGL long ago and supports 4.10 max. So without a Metal renderer this would've been a non-starter for us.
…effect results. Introduce results; they are tied to respective params. Result variables are passed in a lot like other params before draw; they are also fetched back as results after draw. Only one type of result is introduced, which is atomic_uint. This type will be used with atomicCounterIncrement() effect statements to allow atomic increments of a counter. Change HLSL shader model from ps_4_0 to ps_5_0 to support UAV variables. Change GLSL shader version from #version 330 to #version 460 to support atomic counters.
I suspect the shader version bumps are problematic. The last time this was discussed, we didn't want to lose GPU support, or increase the maintenance burden of more rendering paths. The way to do this without shader atomics is to render results to a render target and downsample against another render target, ping-ponging until you get down to 1x1. Then copy to a circular buffer of 1x1 staging textures to avoid having to block on the CPU waiting for the GPU to complete, and map for read. It's not going to be as efficient, but it'll be compatible. |
I skimmed the code, and it looks like you're reading data off of the GPU immediately after making the draw call, which is going to stall the graphics thread like I mentioned in the previous comment. It's probably better to do this sort of management external to the graphics abstraction layer rather than introducing a "result" type. |
Actually, the ping-pong version could be faster despite the extra memory traffic and render target switches. Shader atomics are supposed to be slow. In an ideal world, you'd do the parallel reduction with compute shaders and groupshared memory rather than atomics, but we don't support compute shaders either since we only support Shader Model 4. Also, don't allow more CPU access on a resource than you need because that tends to have an adverse affect on GPU performance. Maybe not so important if the buffer is tiny, but it's a good rule to follow in general. |
Allow me to try to defend our implementation of results. Here are some things to consider:
My understanding is that doing it externally would imply dragging a lot of platform-specific stuff outside of the graphics layer. Either that, or it will be slow. We tasked ourselves with image matching that would be fast :) Also, atomic counter is currently the only result type, but the result system can, in theory, also be extended to do useful work with other types of data that can be read back.
I have looked into the progressive upsampling and matching before, and see great potential in it, especially since it will work well for matching an image anywhere in video and not just specific location. It is not, however, clear to me about how it can be faster. As you've mentioned, the shader pipeline in OBS is kept simple. There are a lot of details to consider, but it seems that multiple rounds of draw and reading back large chunks of texture would be slower than just reading back a few counter variables. So the only way that progressive upsampling will be faster is in the event that atomics are really really slow, and I am not sure about how much that is the case. Again, an effort has been made to introduce the results and the atomic counters in ways that do not disrupt the existing project structure too much, and should have no performance effect whatsoever for users that do not invoke the use of results or counters. It is a feature that is available, and we have a proof of concept plugin that shows that results, atomic counters, and a pretty simple matching algorithm can all work in the framework of OBS to do useful work of switching scenes and toggling scene items when template images are matched in video sources. In the context of image matching, a (slightly lazy) argument against progressive upsampling is that is significantly more complicated that using atomic counters. Whether you can do multiple resolution rounds within a single draw or not, it is complicated. This means more work to implement it and more work to maintain it and keep it from breaking. We already have a proof of concept for image matching that can already so useful work, should be plenty fast on a system with a dedicated GPU, and we want to share it! And of course, progressive upsampling could one day be combined with atomic counters, to provide the fastest image matching ever, anywhere in the video :) Our question to the community is whether the concept of results, and atomic counters specifically, can live in the OBS upstream, and thus grant users an easy access to an image matching system we have created. (Even if the plugin is not super polished or super optimized, yet) Yes, I see that upping the shader version will be a big concern. If it will be a dealbraker, I can experiment more with ways of how everyone but users that need counters will continue using older shaders. So, newer shader versions will be requested only by effects that need counters. And this way, users of older hardware will stray away from our plugin and would continue using OBS unaffected. |
This change below will make it so newer versions of shaders are only requested when atomic counters were encountered in the effect code. When they were not, #version 330 for GLSL and ps_4_0 for HLSL will continue to be used, so users of older hardware will not suffer. (Except suffering from envy of not having fast image matching :) I will merge it into this PR, unless somebody objects |
I appreciate the submission, and please don't kill me for this, but this really should have started as an RFC, or at the very least discussion of some sort with us somewhere. I feel like the last time somebody brought up pixel matching was at least two years ago, and even then it wasn't a deep or serious discussion. I want to help everyone, but at the same time we have our own things we have to deal with. This adds a significant amount of code to our graphics backend that we'd have to be responsible for. And make no mistake: when we merge code, we are responsible for that code. We are responsible for making sure it works, we are responsible for making sure there are no bugs, and in this case, we are responsible for adding it to any other graphics API implementations we might add in the future (Metal/Vulkan). There is no doubt that this PR helps you, but when I look at a something like this, I have to ask myself: why would I want to merge it? We're going to have to deal with Metal/Vulkan support in the future, and that means we'd have to add it for those too. This is harsh for me to say, but I can't find a reasonable benefit for merging this. Users aside, this code exists solely for your plugin. This would be code we would be reviewing, testing, maintaining, and adding to other graphics implementations solely for the sake of your plugin. That's not to say that I don't care about you or your plugin, but what I am saying is that it's asking a lot. When you're in the maintainer position and you get these sort of PRs all the time, it's absolutely asking a lot. Compute shaders and SM5 sound more like something I'd be interested in, but even in that case I'd want to measure what sort of compatibility hit it would cause to devices. I think compute/SM5 is worthy of discussion though, because it would have a much wider variety of uses. I understand that a lot of people want to try to make OBS a platform, but I feel like as the maintainer, I have a responsibility to say no to some things in order to keep the project healthy and focused. I understand that you want to implement this for your plugin, and it's possible you might have a lot riding on this pull request, but this is something where you need to work with us beforehand to make sure that we can fit it into our scope and priorities. I understand the desire to be defensive. I know that it's frustrating considering you spent a lot of time writing this. I'm sorry. Please try to see it from our perspective. Besides, even if I didn't close it, introducing something that changes the backend this much would be unlikely to be reviewed within any reasonable timeframe. You have to understand that it's very difficult for us to be motivated to review and test code that only affects a very specific use case like this (arguably niche). When we have so many things to do, it is so incredibly easy to just keep putting lower priority things on the backburner. We have our own priorities and our own goals, and that's why it's so important to get involved with the community beforehand and make sure that we're receptive to it before you start pouring your heart into it. For now, I just have to say no to this particular implementation. I would rather have a compute/SM5 pipeline or something. But even then, that's something that would need discussion, and I'm not sure if I'd be interested in it any time soon. If you can, I'd recommend using the 1x1 downsample (ping-pong) method for the time being, I know it's annoying but it definitely should be something you can do for the time being. |
I completely understand where you are coming from and I apologize for not being proper in approaching developers with something like this. No hard feelings are taken. This PR was made under impression that it was the right thing to do to get folks thinking about the functionality and considering it, and not under any expectation of immediate acceptance. I did not know about RFCs and I can see that it would be more appropriate. I understand the demands of maintaining a complex project, with so many aspects of hardware and software involved. The fork and the plugin were started in response to what many folks have been asking for, for some time. We know that there is a strong demand for this functionality, and several folks who had the opportunity of playing with the plugin are excited. I sincerely hope we find the right path to sharing our work with the community. |
I measured performance of matching 1920x1080 on an RTX 2080 Ti with this atomic accesses and CPU-GPU sync point, and the results are not good. GPU draw: about 10 ms, measured with Intel GPA. For comparison, I tricked the shader to skip over the atomic accesses without stripping the UAVs by adding impossible branches: GPU draw: 0.046 ms I haven't used shader atomics before, so I wasn't sure what to expect, but it makes sense that performance would tank if you essentially mutex a counter, and send 2 million GPU threads at it at the same time. I can almost guarantee that render target ping-ponging will not perform worse than this. It also didn't look like the code uses actual UAV counters, which may perform better than regular UAV buffer storage, but I would be surprised if that would be good enough. |
I want to briefly introduce myself. I'm the catalyst for this plugin. I designed the functionality to meet a need I have, I co-architected it with @HoneyHazard, and I funded him to work on the development. I just want to clarify the purpose of this PR, which perhaps was mis-communciated by @HoneyHazard. This code was developed to meet a streaming need I had, and so I paid-for it's development for myself and others streamers like me. That need is to automatically and reliable "hide" or coverup sensitive data from streams. At the time development started, the target was hiding data from EVE Online and Albion Online, both open-world PVP games that are notoriously stream-ghosted. Our pixel-match fork works well to address this use case, and will continue to do so with no action by any OBS devs. We very much appreciate that OBS is open source, and we could build on it. Thank you! In addition, because I very much believe in open source, our code is also open source. I asked @HoneyHazard to open this Pull Request simply to make OBS Devs aware of the code's existence, should it be something the wider OBS community was interested in now or in the future. And so you are now aware of it, mission accomplished! If you are interested in a future version of OBS where a plugin like this can be built without modifying OBS itself, whether that be via atomics, or ping-pong, or compute shaders, or whatnot, both @HoneyHazard and myself are more than happy to participate in an RFC discussion, and I'm willing to fund that development as well. If not, that's great too, and we'll keep doing what we're doing, and scratching our own itch for our users. Thanks again for OBS! |
We considered using the previous frame's result, but it did not meet our design goal.. Which is to automatically and reliably make sure that sensitive information does not appear in the stream. (aka to minimize the benefit of stream-ghosting) While casual viewers might not notice a single frame of sensitive information, savvy users would still have access to that frame of information and could extract it. (and in fact, they could use pixel-matcher to easily do so in real time) Certainly this could be a toggle to use previous frame (less secure, faster) or current frame (more secure, slower) for matching. I'm interested to know if you have ideas about how to minimize the sync stall while still using the current frame to act on the current frame. The bigger question for us, however, is what the sync stall affects. If it merely delays when OBS receives a frame, that's not a big concern. If it hangs up the game 3d render target, that is a bigger issue. |
You can use the 1x1 texture as an SRV input to a draw that can screen out the information to cover the one or two frames it will take to cover the gap.
To cull the draw completely, issuing an indirect draw and filling out the instance count as 0 or 1 is an option, but you'd usually write to the indirect argument buffer from a compute shader. It might also be possible to use predication to cull the draw, but I don't have experience with that API. |
Were you matching an image that is literally 1920x1080, or what was the size of your image? A typical use case will involve small images, and there will be a lot less blocking when an image mask is used. |
Yes, I was matching the whole image. The time will be shorter at smaller sizes, but I would be surprised if this approach wasn't still magnitudes slower than non-atomic approaches. OBS often needs to live in the leftover scraps of another PC game, so GPU time is at a premium. As for the CPU, the readback stall can take much longer if a significant chunk of work is still processing before the GPU can get around to the pixel match draw, and not just work from OBS. Three slots (2 frame delay) is generally a safe number for a ring buffer; that's what OBS uses for reading back image data for CPU encoding. You can cover up the sensitive frames using the GPU if the GPU result says it crossed the threshold. I'm with NVIDIA regarding CPU-GPU sync points, "Not even one." |
Again, not to assert that we absolutely expect OBS to have this feature, but it has been framed as a feature that users could activate when they need it, and never worry about when they don't. There is a plugin that uses the feature. While things are not super optimized yet, the plugin already works. And performance impact with properly designed, practical examples and gaming hardware is nowhere near catastrophic. People are already requesting extending the functionality of what happens when images are matched. To make a scene switching or scene item toggle decision, or any other custom action in OBS, you have to read information back after a draw to know what happened. You could read back the whole frame, then scan it in CPU, which would be slower and less inefficient, but could be doable without changing OBS. Or, let shaders do the work that is better suited for GPU. Yes, some blocking will occur in the shader due to atomics, but should be nowhere near catastropic for smaller images and is reduced further by masking. (There could be strategies to reduce impact of this blocking, too) And after the draw, you are reading back as little data as possible during the sync point. I also did not want folks to have an impression that this would be something that is dropped on folks and completely abandoned. Given the opportunity I would be glad to help with whatever maintenance woes may arise due to having results. And again, there were no expectations of this being instantly approved. We wanted to get it on your radar. And it would have been fantastic if (with some work, I'm sure) this could be part of OBS, because this would make it much easier for users to find and use this feature. But we still have ways of sharing it with them. |
Hey, just wanted to jump in to avoid any more wheel spinning. The proposal hasn't been rejected outright, we are basically requesting that it be submitted as an RFC so that all the discussion can take place there, and when it's in a state that we are prepared to accept, it can be then submitted as a PR. However, please keep in mind that I am also not guaranteeing that we will accept it. There is enough disagreement on how it should be properly implemented to warrant a proper RFC (which, honestly, you basically already have in the body of this PR). Next steps to keeping this discussion going would be to submit the RFC. |
Another question worth asking here is what GPU you were testing on. Nummber of ms taken will depend entirely on the hardware. "Old" hardware today would be something like a GTX 1060, a 750 Ti or something alike is getting ancient. I also agree with jeske here that its important to differentiate if it stalls other apps that might need to make use of the GPU. I think being able to automatically mask out sensitive data from a live stream is a necessary feature. I see no reason why you wouldnt throw your hats up in the sky for such a generous PR 🙂Ive had a lot of complaints from people about there Discord messages ending up in live streams, currently OBS doesnt offer any functionality to mask out specific applications for that matter. |
As he stated earlier, this was tested on a 2080 Ti.
I don't know how feasible it is to accomplish this using image recognition, practically speaking, and discussion of such a feature would be best suited for somewhere else (such as ideas.obsproject.com). However, at least on Windows 10 2004 and later, processes can mark themselves as hidden from display capture if they want. If you want to be able to hide Discord from capture, I recommend to send that feature request to them. |
Description
atomic_uint
. This type will be used withatomicCounterIncrement(...)
effect statements to allow atomic increments of a counter, which enabled us to build the Pixel Match Switcher plugin.ps_4_0
tops_5_0
shader model to support UAV variables#version 330
to#version 460
to support atomic countersIn-depth Look at the Changes
Click to Expand
Outline of the changes
Introducing the atomic counter results
ps_4_0
for D3D11 orversion 460
for OpenGL)atomic_uint
keyword to represent atomic counters being introduced into the OBS effect system. This keyword is also how the counters are represented in GLSL. Like other global variables of the effect, it must be preceded byuniform
.atomicCounterIncrement(...)
function, also borrowed from GLSL, to perform an atomic increment on a counter.Effects
Effects provide a cross platform wrappers for GLSL and HLSL shader use by OBS. In the OBS effect system you specify both vertex and pixel shader behavior in the same effect file.
Roughly speaking, the effect initialization is as follows:
effect-parser
module, generating data structures for all building blocks of the shaders it needs to generate.shader-parser
module to generate data structures representing the shader functionality.shader-parser
are passed to down to eithergl-shaderparser
ord3d11-shaderprocessor
modules to write actual shaders that will do the effect's work.Additions to the effect language
We introduce two new syntax elements to support use of the atomic counters in the OBS effect language. The two are borrowed from GLSL, which, arguably, has a more user-friendly syntax for implementing atomic counters than HLSL.
atomic_uint
is the variable type that will be used for the atomic counters. Like other global variables of the effect, it must be preceded byuniform
keyword.uniform atomic_uint myCounter;
atomicCounterIncrement(...)
statement increments an atomic counter.atomicCounterIncrement(myCounter);
layout(...)
qualifier in the effect code, but the OpenGL graphics subsystem will have to generate the qualifier in the event it is used.Effect data structures
The data structures built for each effect are used to dynamically generate intermediate shader code, and provide mapping of high-level abstractions for effect variables to lower-level abstractions of shader variables. While modifying the effect data structures, we mirror existing abstractions for effect parameters and their pathways to the Graphics API, as we introduce new abstractions and new pathways for interfacing with program and shader results.
gs_effect [new members]
This struct provides a high level interface for interfacing an effect parameter.
It has an array of parameter data structures called
params
of typegs_effect_param
. So we also add a new memberresults
, which is an array ofgs_effect_result
.gs_effect_result [new struct]
High level interface for working with a result. Members are:
name
result nametype
variable type; currently onlyGS_SHADER_PARAM_ATOMIC_UINT
is supportedcur_val
value that was last retrieved from result.Analogous to
gs_effect_param
.pass_shaderresult [new struct]
Serves as mapping of higher level effect result interface
gs_effect_result
to a lower-level shader result interface handlegs_sresult_t
.Analogous to
pass_shaderparam
.gs_effect_pass [added members]
Has handle pointers for vertex and pixel shaders, and an array for mapping effect params to shader params. These are used later to make passing in of parameter values possible.
So we add
program_results
, which is an array ofpass_shaderresult
, to also do the mapping of effect results to shader results, and make fetching of the results possible.Parsing the effect code
effect-parser
module converts the effect code into data structures and uses them to generate intermediate shader code. Input effect string is tokenized by whitespace, and is then processed, token by token, to build data structures representing individual behavior units of the effect. These are then used to generate intermediate versions of vertex and pixel shaders, so they can be passed down toshader-parser
.Here the goal was to introduce support for results and
atomic_uint
type.ep_param [new members]
This data structure maintains information about individual effect parser parameters as they are being parsed from the effect code. Since some parameters are now also results, we add a new field to be used with those results:
is_result
gets set totrue
for any parameter that is also result. This will later lead togs_effect_result
being instantiated for every param marked with this flag.ep_param_init(...) [modified function]
Assigns fields to
ep_param
. Modified to receive and assign the new fieldis_result
.ep_parse_param(...) [modified function]
Calls
ep_param_init(...)
with values received fromep_parse_other(...)
and makes sure no erroneous symbols follow the param declaration.Our modifications here are limited to propagating
is_result
value toep_param_init(...)
.ep_parse_other(...) [modified function]
This function is for parsing anything in the effect code that is not whitespace,
struct
,technique
, orsampler_state
. The function has variables for reacting toproperty
,const
, andtype
keywords, and activates functions for parsingfunctions
s andparam
s. Our modifications are for special handling of theatomic_uint
results.is_result
variable is added, and is set totrue
wheneveratomic_uint
type token is encountered.ep_parse_param(...)
also passesis_result
variable to the functionWriting intermediate shaders
After the effect code is parsed into data structures, these data structures are used to generate the intermediate shader code (so it can be processed later by the
shader-parser
). During this building of the intermediate shader code for a vertex/pixel shader, a mapping is constructed so that higher level effect params can be linked with corresponding lower-level shader params.The additions here are for constructing analogous mapping between effect results and shader results.
ep_write_param(...) [modified function]
This function receives a pointer to
ep_param
as input and writes the param's declarations in the intermediate shader code. It also appends a name of every used param to the arrayused_params
.We teach the function to also append a param name, that is also a result, to the list of results. It will now receive a pointer
used_results
, which is an array of strings representing names of the results used in the effect. If theep_param
, passed into the function, is marked with theis_result
flag, it is appended toused_results
.ep_write_func_param_deps(...), ep_write_func_func_deps(...), ep_write_func(...), ep_makeshaderstring(...) [modified functions]
These functions call each other and other lower level functions as the intermediate shader is being built from the data structures representing it. They maintain
used_params
- an array of strings representing params that were encountered, to be eventually passed down toep_write_param(...)
, where a name is appended to the array for every used param.They are all modified so that
used_results
, an array of strings representing names of the results used in the effect, can also make its way toep_write_param(...)
function, where it can be updated for every used result.ep_compile_result(...) [new function]
gs_effect_result
so its fields can be initialized.gs_effect_param
of the same name in the effect's array of params, and retains the pointer ings_effect_result
being initialized. Because this is all called (and has to be called) after the params are "compiled", the array of params is stable and a pointer into that array is safe.gs_effect_result
.ep_compilepass_shaderresults(...) [new function]
This function is responsible for building mapping between higher level handles of effect results and lower level handles of shader results, and also invokes mapping between effect params and effect results.
For every result name in the input array of strings
used_results
:gs_effect_result
from theeffect_parser
that corresponds to the result name.ep_compile_result(...)
sogs_effect_result
receives a pointer to correspondinggs_effect_param
.gs_sresult_t
fromgs_shader_t
that correspots to the result name.pass_shaderresult
, is appended to thepass_results
array, which is passed in by pointer.Analogous to
ep_compile_pass_shaderparams(...)
.ep_compile_pass_shader [modified function]
This function invokes generation of shader strings for either vertex or pixel shader. While doing so, it also generates mapping of effect parameters to shader parameters, which is used later to make setting of params possible.
We need to provide a mapping of effect results to shader results, very similar to how it is done for params, so the result retrieval works. Mapping between effect results and corresponding effect params will also be indirectly invoked.
used_results
, which is an array of strings representing names of results used in the shader.used_results
is updated during the calls toep_makeshaderstring(...)
.used_params
is worked with.pass_results
of typepass_shaderresult
. It is pointing to the data memberprogram_results
ofep_pass
pointer, and will be modified in-place to retain the results mapping inep_pass
data structurevertshader_params
andpixelshader_params
members ofep_pass
.ep_compilepass_shaderresults(...)
.used_results
andpass_results
variables are passed in.pass_results
will be updated.ep_compile_pass_shaderparams(...)
is called.Interacting with the effects
In a typical scenario the effect user obtains handles to an effect's parameters, and uses those handles to pass the parameters to the effect, so the values can be passed down to the lower level layers and eventually end up in the actual graphics system and shader machinery.
We have to expand the usage to include the results. Similarly to params, the user will be able to lookup result handles by name. The user will then be able to retrieve result values from the result handles.
The result variables are also connected to param variables of the same name.
atomic_uint
will require a new type of param that is notint
, and also requires some special care by either graphics system when using it as a parameter.gs_effect_get_result_by_name(...) [new function]
Obtains a
gs_eresult_t
pointer by name. This handle can then be used bygs_effect_get_atomic_uint_result(...)
to obtain an effect result.Analogous to
gs_effect_get_param_by_name(...)
.gs_effect_set_atomic_uint(...) [new function]
This allows you pass a value to an unsigned integer atomic counter in the shader like you would set any other param. Nothing new here, except the new parameter type that will be used with the
atomic_uint
variable in the effect code. All lower-level details are in other functions.Analogous to
gs_effect_set_int(...)
.gs_effect_get_atomic_uint_result(...) [new function]
Given a
gs_eresult_t
pointer/handle, retrieves the value of theatomic_uint
result after drawing with the effect has finished.Again, no super low-level stuff here; just some memory copies. Very similar to
gs_effect_set_xyz(...)
functions (where xyz is a data type) except we get instead of set since the results are coming back after the draw, instead of being passed in before the draw.effect_setval_inline(...) [modified function]
This is a wrapper for updating effect parameters with new values. In addition to some error-checking, the functions prevents updating
uniform
data in the shaders when the values given are no different from the previously assigned value. (so, no update should be necessary)We customize the behavior to always force updates anytime the parameter type is
GS_SHADER_PARAM_ATOMIC_UINT
. Our atomic counters are both a param and a result, and are treated a little different from otheruniform
s by the graphics systems. So, we must ensure the value is always passed in to the shaders before every draw, even if we keep sending the same value.effect_pass_free(...) [modified function]
This is a cleanup function for
gs_effect_pass
. We modify it to also cleanup the newly introducedprogram_results
array.effect_free(...) [modified function]
This is a cleanup function for
gs_effect
. We modify it to also cleanup the newly introducedresults
array.Parsing the intermediate shaders
shader-parser
processes the intermediate vertex and pixel shader code generated by theeffect-parser
. Data structures are generated to represent the shader behavior. This allows reformatting the intermediate shader into GLSL or HLSL code, depending on which graphics subsystem is used.Our changes here are for supporting results, and for assigning a unique index for each atomic counter variable, so either graphics system will be ready to allocate resources for its specialized handling of the counters.
shader_parser [new members]
This structure holds an instance of a
cf_parser
which is a c-style parser, and has arrays of data structures for params, structs, samplers, and funcs.We need an incrementing counter to assign unique, increasing index for each atomic counter we encounter in the intermediate shader code. This struct seems to be a fine place to keep the next index to be assigned, so we add
atomic_counter_next_index
integer field.shader_var [new members]
This struct represents a variable in a shader code. We want results to be special kind of variables, so we add
is_result
flag to the fields. We also addatomic_counter_index
so that eachatomic_uint
variable can be uniquely identified in preparation to be handled by either graphics system. Increasing values will be assigned to indices of counters in the order of the counter's declaration in the intermediate shader.shader_parser_init(...) [modified function]
This initializes members of
shader_var
. We modify it to also initializeatomic_counter_next_index
to0
, so the counter enumeration index will begin with0
.shader_var_init_param(...) [modified function]
This function initializes a
shader_var
that was previous allocated, assigning its fields based on function parameters.New function parameters are added to support results and atomic counters; specifically:
is_result
set totrue
when the variable is also a resultatomic_counter_next_index
is an integer that is passed by pointer. Whenever the variable is of typeatomic_uint
this integer is copied to the fieldatomic_counter_index
inshader_var
, and then the next index to be assigned is incremented.sp_parse_param(...) [modified function]
Very similarly to
ep_parse_param(...)
, this callsshader_var_init_param(...)
to initialize ashader_var
and do some error-checking.We modify the call to
shader_var_init_param(...)
so the flagis_result
is passed down to it, and also pass the pointer toatomic_counter_next_index
ofgs_shader
as the other new argument.sp_parse_other(...) [modified function]
Very similar to
ep_parse_other(...)
and is responsible for parsing anything in the intermediate shader code that is not whitespace,struct
orsampler_state
. We add special handling for variables of typeatomic_uint
.is_result
is added, and is set totrue
when a variable ofatomic_uint
type is encountered.is_result
is also passed down tosp_parse_param(...)
.Graphics API
graphics
is "an API-independent graphics subsystem wrapper". Many data types are defined. Among other things, it has function pointers that need be assigned to functions specific to OpenGL vs Direct3D11 operation.Here we integrate some new functionality needed to make results work.
New data types
gs_shader_result [new struct]
This is actually defined by either D3D11 or OpenGL subsystems. It will contain data that either system will need to interact with an actual shader variable associated with the result.
However, the
graphics
code will pass this data around using the [gs_eresult_t
](#gs_eresult_t-new typedef) typedef wrapper.See OpenGL and D3D11 implementations of
gs_shader_result
.gs_eresult_t [new typedef]
This is a typedef for
gs_effect_result
so a pointer handle to an effect result can be used by modules that don't need knowledge of the graphics internals.Analogous to
gs_eparam_t
.gs_sresult_t [new typedef]
This is a typedef for
gs_shader_result
so a pointer handle to a shader result can be used by modules that don't need knowledge of the shader internals.Analogous to
gs_sparam_t
.gs_shader_param_type [extended enum]
This enum for shader param data types is extended to include
GS_SHADER_PARAM_ATOMIC_UINT
.New function signatures
The following new function signatures will be declared so they can be defined by either OpenGL or D3D11 subsystems. This will allow platform-agnostic effect abstractions to interact with the actual shader machinery of either subsystem.
gs_shader_get_result_by_name(...) [new function signature]
Fetches a pointer/handle
gs_sresult_t
by name from a shader program.See OpenGL and D3D11 implementations.
Analogous to
gs_shader_get_param_by_name(...)
.gs_shader_set_atomic_uint(...) [new function signature]
Passes an unsigned integer value to the atomic counter variable in the shader represented by the
gs_sresult_t
pointer/handle.Expands on the existing
gs_shader_set_xyz(...)
function declarations, where xyz is a data type.See OpenGL and D3D11 implementations.
gs_shader_get_result(...) [new function signature]
Copies new result data from the shader into the
dst
, provided ags_sresult_t
pointer/handle.See OpenGL and D3D11 implementations.
Naming is analogous to
gs_shader_set_val(...)
declaration.New and changed functions
download_results(...) [new function]
This uses the array of mapping pairs
pass_shaderresult
to transfer data from shader result handles
gs_sresult_t
(where new data is available after a technique draw is finished) into the associatedgs_effect_result
, which makes the data available to effect users. It callsgs_shader_get_result(...)
.Naming is analogous to
upload_parameters(...)
.gs_technique_end_pass(...) [modified function]
This function is doing some cleanup after a technique draw has finished, and it was a convenient place for us to insert a call to
download_results(...)
.libobs-opengl
This module provides implementation of the Graphics API for the OpenGL graphics system.
Roughly speaking, most of the changes fall into the categories of writing low-level code to implement the atomic counters
with features available in OpenGL, and adding the structure and logic necessary to make the results work.
gl-subsystem
gl-subsystem.h/.cpp
does many GL-specific implementations of the Graphics API, including GL-specific implementations of the data structure for shader and program params.We will modify and add new data structures here to add support for results, with some additions being specific to support
atomic_uint
variables.gs_shader_param [new members in libobs-opengl implementation]
This is the GL-specific implementation for the shader_param, that has some low-level details for interacting with the params. In platform-agnostic sections of the code the pointers to
gs_shader_param
are passed around usinggs_sparam_t
pointer handles.Our concept of a result implies being linked with a param of the same name, and using
atomic_uint
variable as a shader param requires some special handling too. So, we choose this struct to contain low-level data necessary for interacting with the atomic counter params AND results, as well as flags that were similarly added to other structures to support results.New members are:
is_result
whentrue
indicates this param also has an associated resultbuffer_id
is the ID of the OpenGL Buffer Object that will be used to interact with the atomic counter. This will be received by callinginit_atomic_buffer(...)
layout_binding
in the GL subsystem this represents the index into the indexed targetGL_ATOMIC_COUNTER_BUFFER
that represents graphics data for our counters.layout_offset
in the GL subsystem we can have multiple atomic counters share the same layout binding but have different offsets.gs_shader_result [libobs-opengl implementation]
This will be our implementation of the shader result for the GL subsystem.
Members are:
name
: name of the resultparam
: pointer togs_shader_param
, which has fields with low-level details for interacting with atomic counter params/results.cur_value
data array where retrieved result data will be storedgs_shader [new members in libobs-opengl implementation]
This is the GL-specific definition for representing an active shader. One of the members is
params
, which is an array ofgs_shader_param
.We add
results
, which is an array ofgs_shader_result
.program_result [new struct]
This just has a pointer
gs_shader_result
, so a program result can be associated with a shader result.Analogous to
program_param
.gs_program [new members]
This represents an OpenGL shader program. It has
gs_shader
pointers to a vertex and pixel shader, as well as an array ofprogram_param
s.We add
results
, which is an array ofprogram_result
s.gl-shaderparser
After
shader-parser
has parsed the intermediate shader code into data structures,gl-shaderparser
code will generate the final GLSL shader code from these data structures.Our changes here are for generating code that utilizes the atomic counters feature of OpenGL/GLSL.
atomic_uint
type is native to GLSL, but we need to generate thelayout(...)
qualifier block that is required foratomic_uint
variable declarations, which we have omitted from the effect language.Notably,
atomicCounterIncrement(...)
(or decrement) effect statements, that we aim to support, are borrowed from GLSL, and require no special translation when regenerated from intermediate shader code into GLSL.gl_write_var [modified function]
This function writes variable declarations into GLSL, taking into account various qualifiers.
To support
atomic_uint
s in GLSL, the declaration must be preceded by alayout(...)
qualifier block. So, our declarations for atomic counters will need to look like this:(layout binding = 1, offset = 0) uniform atomic_uint myCounter;
We modify the function to insert the layout qualifiers for variable declarations that are of type
atomic_uint
. We will useatomic_counter_index
ofshader_var
as the source for the binding index.gl_shader_buildstring [modified function]
Dispatches lower-level functions for putting together the final GLSL code.
We change
#version 330
preprocessor declaration to#version 460
to support atomic counters.gl-shader
Defines many functions for activating and interacting with a GLSL shader and an OpenGL shader program.
Our changes are for integrating the flow of data for results, and introducing the low-level code for activation and interaction specific to atomic counters (
atomic_uint
).gl_add_result(...) [new function]
Instantiates and stores a new result in the
results
array ofgs_shader
. New instance ofgs_shader_result
gets a copy of the param/result name, so the corresponding param and the result can be linked once the intermediate shader parsing has finished and the array of params is stable.gl_add_param(...) [modified function]
Initializes a
gs_shader_param
given ashader_var
as input. Textures get some special treatment here. Once initialized, new instance ofgs_shader_param
is pushed back to theparams
array ofgs_shader
.We modify the function as follows:
GS_SHADER_PARAM_ATOMIC_UINT
itslayout_binding
gets set to the value ofatomic_counter_index
ofshader_var
, and itslayout_offset
is0
.is_result
field ofshader_var
propagates to the newgs_shader_param
.is_result
is true,gl_add_result(...)
is called to instantiate and store ags_shader_result
corresponding to the param's name.init_atomic_buffer(...) [new function]
Generates a new OpenGL Buffer Object of type
GL_ATOMIC_COUNTER_BUFFER
, which will be used for writing to and reading from an atomic counter variable. Buffer ID is retained to be stored ings_shader_param
.We use
glGenBuffers(...)
,glBindBuffer(...)
,glBufferData(...)
,glBindBuffer(...)
, andglBindBufferBase(...)
to get an atomic counter buffer initialized.gs_shader_set_atomic_uint(...) [libobs-opengl implementation]
Implements
gs_shader_set_atomic_uint(...)
function signature for the OpenGL subsystem.Very similar to most of the other
gs_shader_set_xyz(...)
implementations (where xyz is a data type) that just copy data intocur_value
data array ofgs_shader_param
.gs_shader_get_result_by_name(...) [libobs-opengl implementation]
Implements
gl_shader_get_result_by_name(...)
function signature for the OpenGL subsystem.Analogous to
gs_shader_get_param_by_name(...)
and just searches for the right result withname
field that matches.gs_shader_get_result(...) [libobs-opengl implementation]
Implements
gs_shader_get_result(...)
function signature for the OpenGL subsystem.Just copies data into destination pointer from the
cur_value
data array ofgs_shader_result
.gs_shader_destroy(...) [modified function]
This is a cleanup function for
gs_shader
and we modify it to also cleanup theresults
array ofgs_shader_result
.program_set_param_data(...) [modified function]
This function works on getting param values/data into
uniform
variables of an active GLSL shader. For most supported data types this means callingglUniformXyz(...)
function, with some specialized work needed for textures.The atomic counter variables are also special and we cannot use
glUniformXyz(...)
style functions to set values before draw. When param type isGS_SHADER_PARAM_ATOMIC_UINT
we add another specialization to callglBindBuffer(...)
,glBindBufferBase(...)
andglBufferSubData(...)
functions so the value can make its way toatomic_uint
variable of interest in the shader, before the draw.assign_program_param(...) [modified function]
This function finds the uniform locations of a given
gs_shader_param
by callingglGetUniformLocation(...)
with the param's name, and then pushes the given shader param to theparams
array ofgs_program
.Because the mechanisms used for initializing and using the atomic counters are different from regular
uniform
s, we modify the function to skip finding and assigning a uniform location any time a param is of type isGS_SHADER_PARAM_ATOMIC_UINT
.assign_program_shader_results(...) [new function]
This function works on connecting each constructed
gs_shader_result
with the respectivegs_shader_param
of the same name. It also constructs a newprogram_result
linked with the shader result, and pushes it toresults
array ofgs_program
.Naming is analogous to
assign_program_shader_params(...)
.assign_program_results(...) [new function]
This function calls
assign_program_shader_results(...)
for both vertex and pixel shaders ofgs_program
.Analogous to
assign_program_params(...)
.gs_program_create(...) [modified function]
After GLSL shaders were compiled this function creates a new shader program, attaches the shaders to the program, and links it. After linking it calls
assign_program_params(...)
andassign_program_attribs(...)
, so we also add a call toassign_program_results(...)
.gs_program_destroy(...) [modified function]
This is a cleanup function for
gs_program
and we modify it to also destroy theresults
array containingprogram_result
s.libobs-d3d11
This module provides implementation of the Graphics API for the Direct3D11 graphics system.
Roughly speaking, most of the changes fall into the categories of implementing atomic counters using Direct3D's UAV variables system, and adding the structure and logic necessary to make the results work.
d3d11-subsystem
d3d11-subsystem.h/.cpp
does many D3D11-specific implementations implementations of the Graphics API, including D3D11-specific implementations of the data structure for shader and program params.We will modify and add new data structures here to add support for results, with some additions being specific to implement
atomic_uint
variables using the UAV system.gs_shader_param [new members in libobs-d3d11 implementation]
This is the D3D11-specific implementation for the shader_param, that has some low-level details for interacting with the params. In platform-agnostic sections of the code the pointers to
gs_shader_param
are passed around usinggs_sparam_t
handle.Our concept of a result implies being linked with a param of the same name, and using a UAV variable as a shader param requires some special handling too. So, we choose this struct to contain low-level data necessary for interacting with the atomic counter params AND results, as well as flags that were similarly added to other structures to support results.
New members are:
is_result
whentrue
indicates this param also has an associated resultatomicCounterIndex
in the D3D11 subsystem will represent index into the buffer of UAV memory of unsigned integers where variable resides. The value will be copied fromatomic_counter_index
ofshader_var
.Situational repurposing of an existing member:
pos
member is being used to store the byte address of the variable in the memory chunk used to set const/uniform variables. For UAV counter variables we will reuse the same variable as the byte address into the UAV memory chunks that we send and receive from the shader. Since each counter variable will be 4 bytes, anything that is a counter will have itspos
assigned toatomicCounterIndex * 4
.gs_shader_result [libobs-d3d11 implementation]
This will be our implementation of the shader result for the D3D11 subsystem.
Members are:
name
: name of the resultparam
: pointer togs_shader_param
, which has fields with low-level details for interacting with UAV counter params/results.curValue
data array where retrieved result data will be storedgs_shader [new members in libobs-d3d11 implementation]
This is the D3D11-specific definition for representing an active shader. Among other things, it has
params
member, which is a vector ofgs_shader_param
, which partakes in passing the param values to the shader before the draw. So, we addresults
, which is a vector ofgs_shader_result
, and will partake in fetching the results after the draw.The class also holds const data size and descriptor used for initializing the const/uniform buffer. So, we introduce UAV data size and several descriptor variables used to initialize the UAV buffer and its use. The descriptors are initialized in the
gs_shader::BuildUavBuffer(...)
(called from the vertex and pixel shader constructors) and are reused ings_vertex/pixel_shader::Rebuild(...)
.uavBd
is aD3D11_BUFFER_DESC
structure for describing the UAV buffer. Gets passed down toID3D11Device::CreateBuffer(...)]
to create the UAV buffer in graphics memory.uavTxfrBd
is aD3D11_BUFFER_DESC
structure for describing the UAV transfer buffer. Gets passed down toID3D11Device::CreateBuffer(...)]
to create a transfer buffer for transferring data back and forth between the system memory and the UAV graphics memory.bd
which is aD3D11_BUFFER_DESC
used in initializing the constants buffer that transfers uniforms' data to the graphics memory.uavViewDesc
is aD3D11_UNORDERED_ACCESS_VIEW_DESC
structure for describing the UAV view. Gets passed down toID3D11Device::CreateUnorderedAccessView(...)
to create the UAV view for sending the counter data into the shader.Finally, the class holds pointer to a D3D11 interface for the buffer used in setting const/uniform data. So, we add pointers to the two buffers and a UAV view used in setting and receiving UAV data. These are created in the
gs_shader::BuildUavBuffer(...)
(called from the vertex and pixel shader constructors) and have to be reinitialized in the event ofgs_vertex/pixel_shader::Rebuild(...)
.uavBuffer
is a pointer toID3D11Buffer
representing the UAV buffer in graphics memory. Obtained by callingD3D11Device::CreateBuffer(...)]
withuavBd
as one of parameters.uavTxfrBuffer
is a pointer toID3D11Buffer
for sending and receiving the UAV data. Obtained by callingD3D11Device::CreateBuffer(...)]
withuavTxfrBd
as one of parameters.constants
buffer pointer for setting uniformsuavView
is a pointer toID3D11UnorderedAccessView
and is used to sending data to the UAV region in graphics memory. Obtained by callingID3D11Device::CreateUnorderedAccessView(...)
withuavBuffer
anduavViewDesc
as arguments.device_draw(...) [modified function in libobs-d3d11 implementation]
This function encapsulates much of a typical draw activity, including loading vertex buffer, updating blend, raster, Z-stencil states, view+proj matrix, invoking
UploadParams(...)
on shader parameters to vertex and pixel shaders, and finally drawing primitives. It is called fromgs_draw(...)
.As we are introducing the concept of results that become available after the draw, we add a call to
gs_shader::DownloadResults(...)
on the vertex and pixel shaders, after the primitive draw has finished.d3d11-shaderprocessor
After
shader-parser
has parsed the intermediate shader code into data structures,d3d11-shaderprocessor
code will generate the final HLSL shader code from these data structures.Our changes here are for generating code that parses the added effect syntax for atomic counters, and utilizes the UAV variables of D3D to implement atomic counters. Unfortunately, in HLSL there is no direct analogue to
atomic_uint
type and noatomicCounterIncrement(...)
function, so we translate our intermediate shader code to become other things in HLSL that needs to be generated:RWStructuredBuffer<uint> __uavBuffer : register(u1);
is used to declare a UAV memory chunk in the shader that we will use for storing the atomic counter variables. Such a buffer will be added when UAV counters were encountered in the effect code, and theuavBuffer
handle ofgs_shader
will be connected to this buffer when [gs_shader::BuildUavBuffer(...)](gs_shaderBuildUavBuffer-new-function) or
gs_vertex/pixel_shader::Rebuild(...)` is called.__uavBuffer
by name, but we can index into it like an array of unsigned 32-bit integers. We will use theatomicCounterIndex
member ofgs_shader_param
as an index into the buffer (which was inherited fromatomic_counter_index
ofshader_var
).InterlockedAdd(...)
will be used to replaceatomicCounterIncrement(...)
added to the effect language.atomicCounterIncrement(varName)
will need to be translated intoInterlockedAdd(__uavBuffer[0], 1)
, where0
happened to be the atomic counter index forvarName
, and1
is because "increment" is equivalent to "add 1 and assign";ShaderProcessor::SeekUntil(...) [new function]
This protected utility function is added for iterating through
cf_token
s ofShaderParser
until a token matching a string is found. Used byShaderProcessor::PeekAndSkipAtomicUint(...)
andShaderProcessor::ReplaceAtomicIncrement(...)
functions.ShaderProcessor::SeekWhile(...) [new function]
This protected utility function is added for iterating through
cf_token
s ofShaderParser
for as long as tokens match a string. Used byShaderProcessor::PeekAndSkipAtomicUint(...)
andShaderProcessor::ReplaceAtomicIncrement(...)
functions.ShaderProcessor::PeekAndSkipAtomicUint(...) [new function]
This function as added to eat up all tokens that are part of
uniform atomic_uint myVar;
declarations. As mentioned before, we will be using numeric index into the__uavBuffer
of the shader code instead of counter variable names, so all tokens that are part of declarations foratomic_uint
s will be completely ignored - no output will be produced. Returnstrue
when one such declaration was encountered and swallowed up.ShaderProcessor::ReplaceAtomicIncrement(...) [new function]
This works on translating the
atomicCounterIncrement(...)
statement added to the effect language intoInterlockedAdd(...)
statements of HLSL. In order to translate variable name of intermediate shader code into a numeric index that can be used with__uavBuffer
- theparams
array ofshader_parser
is scanned forshader_var
with the matching name and theatomic_counter_index
of that shader variable is used.ShaderProcessor::BuildString(...) [modified function]
This one obtains tokens from the parser of the intermediate shader code, and replaces keywords of the effect language into things that actually exist in HLSL, so the final HLSL string is constructed. Our additions of the atomic counter syntax are no exception to these needs, and even require some additional handling. Changes are:
-
ShaderProcessor::ReplaceAtomicIncrement(...)
is called wheneveratomicCounterIncrement
keyword is encountered, and will navigate tokens to replace the entire increment statement.ShaderProcessor::PeekAndSkipAtomicUint(...)
, which is called after all other keywords are checked for a need of conversion. If the function returnstrue
- this meansatomic_uint
declaration was encountered, and all tokens of the declaration statement will be be skipped in the HLSL.RWStructuredBuffer<uint> __uavBuffer : register(u1);
into output to declare the UAV memory block for the counters, but only for the shaders that have atomic counter variables. So, instead ofstringstream output
function variable there are now string streamstempOutput
andfinalOutput
.tempOutput
is being written to as tokens are being processed, just likeoutput
was before. During processing of the tokens, we will learn if the UAV block will be needed or not. And if it is needed, in thefinalOutput
we will insertRWStructuredBuffer<uint> __uavBuffer : register(u1);
statement afterstatic const bool obs_glsl_compile = false
but before the rest of the code, which has been constructed intempOutput
. NowfinalOutput
stream has the final HLSL code to be copied intooutputString
, which is the function argument that is passed by reference. Phew.d3d11-shader
Defines much of the functionality for initializing and interacting with an active HLSL shader.
Changes here will be for adding the needed structure and logic for results, initializing all the D3D11 device and context handles needed for interacting with the UAV buffer containing atomic counter variables, and using them.
gs_shader_get_result_by_name(...) [libobs-d3d11 implementation]
Implements
gl_shader_get_result_by_name(...)
function signature for the GL subsystem.Analogous to
gs_shader_get_param_by_name(...)
and just searches for the right result withname
field that matches.gs_shader_set_atomic_uint(...) [libobs-d3d11 implementation]
Implements
gs_shader_set_atomic_uint(...)
function signature for the D3D11 subsystem.Copies data from a
const void* data
pointer intocurValue
vector ofgs_shader_param
, resizing the vector when necessary.gs_shader_get_result(...) [libobs-d3d11 implementation]
Implements
gs_shader_get_result(...)
function signature for the D3D11 subsystem.Just copies data into destination pointer from the
curValue
data array ofgs_shader_result
.gs_shader::BuildUavBuffer(...) [new function]
Iterates through the vector of
gs_shader_param
. For each param that is also a result, usespos
member to determine the largest value of the mapping index, so the required size of the UAV block is known. Once this size,uavSize
, is known, and is not zero, initializes theuavBd
,uavTxfrBd
, anduavViewDesc
descriptors, and uses them to create theuavBuffer
anduavTxfrBuffer
, anduavView
of thegs_shader
, which are used for sending data to and from the UAV graphics memory.Analogous to the behavior of
gs_shader::BuildConstantBuffer(...)
, except there is some additional complexity due to UAV use and support for bi-directional data flow.gs_vertex/pixel_shader::Rebuild(...) [modified functions]
These functions reconstruct a vertex or pixel shader, maintaining the descriptors for const/uniform buffer but reinstantiating the const buffer and making sure all params work in the new instance of the shader.
Similarly to how const data is handled, we maintain the
uavBd
,uavTxfrBd
, anduavViewDesc
descriptors, but we create new instances of theuavBuffer
anduavTxfrBuffer
, anduavView
ofgs_shader
, which are required for sending data to and from the UAV graphics memory.gs_shader::UpdateParam(...) [modified function]
This function adjusts
constData
data array, passed by reference, in response to an inputgs_shader_param
. Thepos
member of each param dictates where in the const data (uniform) memory chunk the param's data will go. When new or modified param data is due to be inserted into the chunk of const data, a pass-by-reference booleanuploadConst
is set totrue
, to flag that the const data will need to be uploaded to refresh the uniform variable(s).We mirror this arrangement as we add support for sending UAV memory chunks to the shaders, triggered by the counter params+results that need it:
uavData
data array that is passed into the function by reference. It will be adjusted in response to the input param when it is of typeGS_SHADER_PARAM_ATOMIC_UINT
. Once again,pos
member of each param is used as a mapping index, except this time it's indexing into the UAV data instead of the const/uniform data.uploadUav
, a boolean passed by reference, that will need to be set totrue
whenever an inputgs_shader_param
is of typeGS_SHADER_PARAM_ATOMIC_UINT
. We always force UAV data refresh anytime a UAV variable is encountered - even if the UAV memory chunk appears unchanged. We want UAV counters to be set to predictable values as we begin drawing with the effect.gs_shader::UploadParams(...) [modified function]
This function initializes a local variable
constData
, which is data vector, and then callsgs_shader::UpdateParam(...)
on every param ofgs_shader
untilconstData
contains all the values to be assigned to the uniforms. If new or modified param data was encountered during the updates - theuploadConst
flag gets set, and the function usesID3D11DeviceContext::Map(...)
to map theconstants
buffer, so the constructed data block inconstData
is copied and ends up in this const/uniform buffer.We mirror this structure as we introduce UAV counter results. We add
uavData
data array anduploadUav
boolean, and we also process these bygs_shader::UpdateParam(...)
. If any of the params were atomic counter results, theuploadUav
flag is set and:ID3D11DeviceContext::OMSetRenderTargetsAndUnorderedAccessViews
to activate our UAV viewID3D11DeviceContext::Map(...)
on theuavTxfrBuffer
to load the transfer buffer.uavTxfrBuffer
to theuavBuffer
in graphics memory.gs_shader::DownloadResults(...) [new function]
After the draw we need to download the UAV data back:
resultsData
to receive the UAV memory chunk.uavBuffer
in the graphics memory to theuavTxfrBuffer
in the system memory.ID3D11DeviceContext::Map(...)
to copy data fromuavTxfrBuffer
to theresultsData
vector.resultsData
vector to individualgs_shader_result
, using thepos
member of each result as a mapping index.gs_pixel_shader::gs_pixel_shader(...) [modified constructor]
This a constructor for a class that represents an active pixel shader, and is derived from
gs_shader
base class. It instantiates and makes calls to d3d11-shaderprocessor, so the previously generated intermediate shader code can be parsed again into useful data structures for interacting with the shader. The final HLSL code is assembled andID3D11Device::CreatePixelShader(...)
is called to create the shader and keep and handle to it insidegs_pixel_shader
.Here, the only modification was changing shader model from
ps_4_0
tops_5_0
in order to support UAV buffer.Motivation and Context
Pixel Match Switcher is being developed to provide users with ability to switch scenes (or toggle sources and filters) when a template image is being matched in a video source. In order for this to be fast we needed to introduce results into OBS effect system.
How Has This Been Tested?
Tested on Windows 10 and Debian 10.
Verified that existing functionality of OBS is not broken, and that results work when using the Pixel Match Switcher plugin.
Types of changes
Checklist: