Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Experiment with new shader #137

Closed
wants to merge 1 commit into from
Closed

Experiment with new shader #137

wants to merge 1 commit into from

Conversation

@raphlinus
Copy link

raphlinus commented Apr 26, 2019

Replace the lookup table with an area calculation.

Replace the lookup table with an area calculation.
@pcwalton
Copy link
Collaborator

pcwalton commented Apr 30, 2019

Interesting, this is less code than I imagined it would be.

vec2 a = from.y + (window - from.x) * (to.y - from.y) / (to.x - from.x) + 0.5;
float ymin = min(min(a.x, a.y), 1) - 1e-6;
float ymax = max(a.x, a.y);
float b = min(ymax, 1);

This comment has been minimized.

@pcwalton

pcwalton Apr 30, 2019

Collaborator

These should be 1.0 instead of 1 to be standards compliant, I'm pretty sure.

@pcwalton
Copy link
Collaborator

pcwalton commented Apr 30, 2019

Mali offline compiler says 4.75 cycles ALU + 1 texture for the original shader vs. 6.25 cycles ALU. So it depends on whether the texture access will complete in 1.5 cycles. Seems doubtful that texture access even in cache is that fast, so I think this is a win. (Also it could potentially be optimized more.)

@raphlinus
Copy link
Author

raphlinus commented Apr 30, 2019

The two multiplies by 0.5 can be combined, but I'm not sure that's going to make a real difference. Feel free to have another pass at optimization :)

I'm hoping to get to my own implementation of this tomorrow during my GPU-2D retreat. It's going to be very similar, but I'm going to do a few things slightly differently. One difference is that I'm always computing the whole tile, while you're using the rasterizer to do smaller tile parts. To that end, I'm going to cast horizontal (rather than vertical) rays, and do an early-out when the window is zero-height - I'm figuring these branches should be more coherent in the horizontal case.

@pcwalton
Copy link
Collaborator

pcwalton commented Apr 30, 2019

I tried using point sprites to compute the whole tile and it was a performance regression. But in compute it might be different of course.

@pcwalton
Copy link
Collaborator

pcwalton commented Apr 30, 2019

This seems slightly worse in performance than what I have now on my Intel MacBook Pro, but removing the area LUT will reduce complexity, so I may go with it anyway.

@pcwalton
Copy link
Collaborator

pcwalton commented May 4, 2019

@raphlinus If your whole-tile-at-a-time approach has significantly better performance than using the rasterizer like PF does, there is potentially the possibility of doing it on GLES3 without compute shader by rendering the tiles as point sprites and doing the work in the vertex shader. Of course, the problem is then getting the output to the fragment shader for rendering; for 16x16 tiles you have 1024 8-bit alpha of output. You can pack 16 of them into a uvec4 and do it in 64 varyings. I fear that the overhead of doing all this packing and unpacking will swamp any gains that we might get, though.

In any case, we can probably have a whole-tile compute-based rasterizer option in Pathfinder without changing the CPU-side code too much.

@raphlinus
Copy link
Author

raphlinus commented May 4, 2019

I wrote about this topic and others in my piet-metal notes document. Among other things, it has an idea to do a direct LUT from the a calculation (which are the y values of the edge crossings with the left and right edges of the window). That would need careful evaluation of precision, but looks appealing.

@pcwalton
Copy link
Collaborator

pcwalton commented Mar 27, 2020

Closing this for now.

@pcwalton pcwalton closed this Mar 27, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked issues

Successfully merging this pull request may close these issues.

None yet

2 participants
You can’t perform that action at this time.