Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor transform handling in flatten & support miter join #410

Merged
merged 6 commits into from
Nov 21, 2023

Conversation

armansito
Copy link
Collaborator

@armansito armansito commented Nov 10, 2023

This PR implements the miter join style and adds additional test scenes for stroking under non-uniform scale and skew transforms. This includes some changes to transform handling in the flatten stage to keep stroking correct under these transforms:

  • Flattening (including for fills, offset curves, caps, and joins) is now performed in a curve's local coordinate space (pre-transform) and the transform is applied at the time a line gets output (i.e. LineSoup time). This may be unnecessary for fills but the shader currently uses the same code for flattening fills and strokes.

  • The line count estimate for subdivision uses local control point coordinates and makes use of the scale factor which is decomposed from the transform matrix.

  • The bounding box computation is now based on the union of output line segments, which get accumulated at LineSoup time.

This PR is based on #408

@armansito armansito changed the base branch from gpu-shitty-strokes to main November 10, 2023 22:13
@armansito armansito changed the base branch from main to gpu-shitty-strokes November 10, 2023 22:49
// See https://www.iquilezles.org/www/articles/ellipses/ellipses.htm
// This is the correct bounding box, but we're not handling rendering
// in the isotropic case, so it may mismatch.
stroke = 0.5 * linewidth * vec2(length(transform.mat.xz), length(transform.mat.yw));
bbox += vec4(-stroke, stroke);
stroke = offset * vec2(length(transform.mat.xz), length(transform.mat.yw));
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

stroke is now unused. I've been thinking about removing this field from the cubic struct or possibly replacing Cubic with CubicPoints + just the path_ix.

Copy link
Collaborator

@DJMcNab DJMcNab left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this also need to be reflected in the CPU shaders, or will those continue to use the kurbo based stroke expansion?

Otherwise, a couple of tiny style questions

let c = tan_prev.x * tan_next.y - tan_prev.y * tan_next.x;
let d = dot(tan_prev, tan_next);
let hypot = length(vec2f(c, d));
let miter_limit = unpack2x16float(style_flags & STYLE_MITER_LIMIT_MASK).x;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd probably choose to use the indexing form of vector access (i.e. unpack2x16float(...)[0]) here, as this isn't semantically an x point but just the first 16 bits.

But that's not based on any checking what the style is elsewhere

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have some uses of the unpack intrinsics in the fine shader that perform a swizzle using the xyzw accessors, though I suppose changing those to use the color (rgba) accessors would be semantically more correct.

Using [0] here sounds good to me.

p10: vec2f, p11: vec2f,
transform: Transform
) {
let line_ix = atomicAdd(&bump.lines, 2u);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this semantically equivalent to output_line_with_transform called twice, and only written this way to reduce atomic traffic?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's exactly right. The only reason I added this variant is to reduce the atomic traffic and I was outputting a pair of lines frequently enough for joins that adding this seemed worthwhile.

Copy link
Collaborator

@DJMcNab DJMcNab Nov 13, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, that makes sense.

I'd probably add a comment to that effect, or possible add a function (output_line_into_allocated_space?) which these both call

I did notice one case where four three lines were handled in a row (when handling miter joins) . Handling that case all in one bump could make sense (and could additionally avoid creating some variables). But that also adds would seem to add quite a lot of noise for tiny gains

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is a good idea. I'm not sure yet how significant the impact of the additional atomic here is but I suspect it might matter. At any rate, I like the idea of adding a output_line_into_allocated_space helper as I need something to that effect for round caps and joins where the line count is typically a lot higher.

@armansito
Copy link
Collaborator Author

Does this also need to be reflected in the CPU shaders, or will those continue to use the kurbo based stroke expansion?

Yes, I'm planning to make the changes to the CPU shaders in a follow-up PR very soon (I'm planning to get that done over the next couple weeks). The kurbo-based stroke expansion will only be used in the "CPU stroke mode" (for lack of a better term) and for dashing in the short term.

The CPU shaders themselves are a CPU incarnation of the GPU shaders with a better test environment, so I'm planning to mirror these changes there pretty much 1:1.

Base automatically changed from gpu-shitty-strokes to main November 14, 2023 00:29
- Miter join style is now supported by the GPU stroker
- Slightly tweaked the miter limit test cases to to exercise a miter
  limit boundary condition and test inserting the miter join on both
  sides of a stroke within the same path.
* Flattening (including for fills, offset curves, caps, and joins) is
  now performed in a curve's local coordinate space (pre-transform) and
  the transform is applied at the time a line is output. This may be
  unnecessary is for fills but it keeps the code mostly uniform.

* The line count estimate for subdivision factors in the scale factor
  which is decomposed from the transform matrix.

* The bounding box computation is now precise and purely based on the
  union of output line segments.
Copy link
Contributor

@raphlinus raphlinus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm commenting on the transforms only, as I've done a deep dive. I'll dig into the rest a bit later.

In addition to what's inline:

The handling of scale_factor and transform should be different in the fill and stroke case. In the fill case, the scale_factor should be 1 and the cubic should be transformed before going into flatten_cubic. That gets near-optimal flattening even in the presence of non-angle-preserving transform.

In the stroke case, scale_factor should be computed as stated, and the transform applied post-stroking. That gets correct computation of miters, elliptical pen for round, etc. It's also not optimal flattening (more so the more eccentric the transform), but we've decided the extra complexity for optimal flattening in that case is not worth it.

Let me know if any of this is not clear.

@@ -52,15 +52,15 @@ fn approx_parabola_inv_integral(x: f32) -> f32 {
return x * sqrt(1.0 - B + (B * B + 0.5 * x * x));
}

fn estimate_subdiv(p0: vec2f, p1: vec2f, p2: vec2f, sqrt_tol: f32) -> SubdivResult {
fn estimate_subdiv(p0: vec2f, p1: vec2f, p2: vec2f, sqrt_tol: f32, transform_scale: f32) -> SubdivResult {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this needs an additional transform_scale argument, and the scaling is subtly incorrect with it added.

The val return has this meaning: an optimal subdivision of the quadratic into val / 2 subdivisions (which may be fractional) will have an error sqrt_tol^2 (ie the desired tolerance). There are two cases. In the non-cusp case, the error scales as the inverse square of val (doubling val causes the error to be one fourth), so the tolerance is actually not needed for the calculation (which is why it's applied in the caller). In the cusp case, this scaling breaks down and the tolerance parameter is needed to compute the correct result.

In both cases, it's better to handle the scaling in the caller; it's not helpful to do it here.

Copy link
Collaborator Author

@armansito armansito Nov 14, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this needs an additional transform_scale argument, and the scaling is subtly incorrect with it added.

The val return has this meaning: an optimal subdivision of the quadratic into val / 2 subdivisions (which may be fractional) will have an error sqrt_tol^2 (ie the desired tolerance). There are two cases. In the non-cusp case, the error scales as the inverse square of val (doubling val causes the error to be one fourth), so the tolerance is actually not needed for the calculation (which is why it's applied in the caller). In the cusp case, this scaling breaks down and the tolerance parameter is needed to compute the correct result.

Thank you, this is very illuminating as to how val gets computed (as I didn't understand before why the tolerance was only considered when the signs are different).

I copied your comment into the code as it's a very useful read.

@@ -139,7 +139,9 @@ fn flatten_cubic(cubic: Cubic) {
let Q_ACCURACY = ACCURACY * 0.1;
let REM_ACCURACY = ACCURACY - Q_ACCURACY;
let MAX_HYPOT2 = 432.0 * Q_ACCURACY * Q_ACCURACY;
var n_quads = max(u32(ceil(pow(err * (1.0 / MAX_HYPOT2), 1.0 / 6.0))), 1u);
let scale = vec2(length(transform.mat.xz), length(transform.mat.yw));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The correct math for this is 0.5 * length(vec2(mat.x + mat.w, mat.y - mat.z)) + length(mat.x - mat.w, mat.y + mat.z)) (using a bit of shorthand). This is not substantially more complicated or inefficient than what you have, so I'd like to see the real thing.

To see the difference, consider the skew matrix [[1 0] [1 1]]. The correct answer is the golden mean ≈1.618, the formula given is sqrt(2) ≈1.414.

The formula given is accurate for uniform scaling and rotation, and also for nonuniform axis-aligned scaling, so it's not hard to see why the incorrect equation persists.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That makes sense. I had this initially written similarly to the correct version but the existing formula looked right under my test cases (as backed up by your explanation) and this slipped through the cracks. Fixed this now to do the correct thing.

}
let params = estimate_subdiv(qp0, qp1, qp2, tol);
// TODO: Estimate an accurate subdivision count for strokes, handling cusps.
let tol = sqrt(REM_ACCURACY);
Copy link
Contributor

@raphlinus raphlinus Nov 14, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe the correct way to do scaling is:

    let scaled_sqrt_tol = sqrt(REM_ACCURACY / scale_factor);
    ... apply fudge factor here, which will later be reverted...
    let params = estimate_subdiv(qp0, qp1, qp2, scaled_sqrt_tol);
...
    let n = max(u32(ceil(val * (0.5 / scaled_sqrt_tol))), 1u);

Copy link
Collaborator

@dfrg dfrg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Daniel and Raph already looked at this so I just did an additional sanity check and this all looks reasonable to me. Let's land it!

@armansito
Copy link
Collaborator Author

Daniel and Raph already looked at this so I just did an additional sanity check and this all looks reasonable to me. Let's land it!

Thanks!

@armansito armansito merged commit 4eb0f51 into main Nov 21, 2023
4 checks passed
@armansito armansito deleted the stroke-join-styles branch November 21, 2023 18:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants