New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Beginnings of new element pipeline #138
Conversation
This successfully renders the tiger; fills and strokes are supported. Other parts of the imaging model, not yet. Progress toward #119
Translate all piet-gpu shaders into DXIL and MSL; move generated files into the shader/gen directory.
As of this patch, cli works in release mode, but hangs in debug. There are some validation errors about incompatible resouce states.
Fix incorrect workgroup sizes, and change strategy for assigning binding numbers; ultimately we should get correct values for those from shader compilation, but this works for now.
This wires up the text rendering. WIP though, as it tends to get stuck. Not sure if that's text related or just being triggered by it.
This changes gradients and clips to the new encoding. Lightly tested.
An update on the current state. As of the most recent commit, it should basically be at feature parity with the previous version. I think it would be reasonable to merge, but want to think through a few loose ends first. One is the tuning of workgroup size. In the current state, some workgroups are 256 and some are 512. That's going to hurt compatibility with low-spec devices. @eliasnaur mentioned this above. The easiest fix is to change all to 256 max and have them all respect There are two reasons the workgroups are large. One is performance tuning (done on 5700 XT only so far). The other is that the monoid scan stages are currently limited to a height of 2, meaning that the total problem size is limited to workgroup^2. With the current large workgroups, that's not a serious limitation. With smaller ones, it may be. The code for that is not conceptually difficult but is tedious. The prefix_tree test is a model of scaling to arbitrary height. A limit of 3 might be reasonable. This brings us to another related question. Currently I have N_SEQ at 8 for many of the monoid stages. This means that each thread processes 8 items sequentially. That's great for large problems, but, combined with the large number of dispatches (because we aren't doing a single-pass approach) means that you get very low utilization when there are small problems. It's not clear to me whether this is even worth optimizing - it's arguably not a problem in real usage, but might make benchmarks look bad. One approach is to dynamically tune, but that makes the number of shader permutations even more out of control. I might look into whether I can do specialization constants portably to try to mitigate that. There are still a lot of clip improvements to be done, but I'd like to land this and schedule that later. Lastly, this work is triggering some validation errors on DX12, but those were already present. Those should also be captured in a separate issue. |
We'll revert this later, but for now trying to keep tests green.
Make max workgroup size 256 and respect LG_WG_FACTOR. Because the monoid scans only support a height of 2, this will reduce the maximum scene complexity we can render. But it also increases compatibility. Supporting larger scans is a TODO.
Mark shader/gen subdirectories as autogenerated to avoid cluttering diffs.
ff0fba4
to
9601989
Compare
This successfully renders the tiger; fills and strokes are supported.
Other parts of the imaging model, not yet.
Progress toward #119