Join GitHub today
GitHub is home to over 50 million developers working together to host and review code, manage projects, and build software together.
Sign upExperiment with new shader #137
Conversation
Replace the lookup table with an area calculation.
|
Interesting, this is less code than I imagined it would be. |
| vec2 a = from.y + (window - from.x) * (to.y - from.y) / (to.x - from.x) + 0.5; | ||
| float ymin = min(min(a.x, a.y), 1) - 1e-6; | ||
| float ymax = max(a.x, a.y); | ||
| float b = min(ymax, 1); |
This comment has been minimized.
This comment has been minimized.
pcwalton
Apr 30, 2019
Collaborator
These should be 1.0 instead of 1 to be standards compliant, I'm pretty sure.
|
Mali offline compiler says 4.75 cycles ALU + 1 texture for the original shader vs. 6.25 cycles ALU. So it depends on whether the texture access will complete in 1.5 cycles. Seems doubtful that texture access even in cache is that fast, so I think this is a win. (Also it could potentially be optimized more.) |
|
The two multiplies by 0.5 can be combined, but I'm not sure that's going to make a real difference. Feel free to have another pass at optimization :) I'm hoping to get to my own implementation of this tomorrow during my GPU-2D retreat. It's going to be very similar, but I'm going to do a few things slightly differently. One difference is that I'm always computing the whole tile, while you're using the rasterizer to do smaller tile parts. To that end, I'm going to cast horizontal (rather than vertical) rays, and do an early-out when the window is zero-height - I'm figuring these branches should be more coherent in the horizontal case. |
|
I tried using point sprites to compute the whole tile and it was a performance regression. But in compute it might be different of course. |
|
This seems slightly worse in performance than what I have now on my Intel MacBook Pro, but removing the area LUT will reduce complexity, so I may go with it anyway. |
|
@raphlinus If your whole-tile-at-a-time approach has significantly better performance than using the rasterizer like PF does, there is potentially the possibility of doing it on GLES3 without compute shader by rendering the tiles as point sprites and doing the work in the vertex shader. Of course, the problem is then getting the output to the fragment shader for rendering; for 16x16 tiles you have 1024 8-bit alpha of output. You can pack 16 of them into a uvec4 and do it in 64 varyings. I fear that the overhead of doing all this packing and unpacking will swamp any gains that we might get, though. In any case, we can probably have a whole-tile compute-based rasterizer option in Pathfinder without changing the CPU-side code too much. |
|
I wrote about this topic and others in my piet-metal notes document. Among other things, it has an idea to do a direct LUT from the |
|
Closing this for now. |
raphlinus commentedApr 26, 2019
Replace the lookup table with an area calculation.