New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Transform 3D precision issues #3070

Closed
kvark opened this Issue Sep 17, 2018 · 11 comments

Comments

Projects
None yet
1 participant
@kvark
Member

kvark commented Sep 17, 2018

We have inconsistent plane splitting results (presumably) because of the thresholds/precision.
Original bug: https://bugzilla.mozilla.org/show_bug.cgi?id=1491088

@kvark

This comment has been minimized.

Member

kvark commented Sep 17, 2018

Visiting the linked page and trying to spin around (using the only control on the page), looks like some of the roads disappear under specific angles:

ff-roads-missing

@kvark

This comment has been minimized.

Member

kvark commented Sep 17, 2018

Let's inspect the page and find the missing road ("road road-2"):

ff-road-inspection

Important things to notice is the local transform offset of 490 and the fact it's rendered as a linear gradient.

@kvark

This comment has been minimized.

Member

kvark commented Sep 18, 2018

Let's make a WebRender capture now. In Firefox, just hit "Ctrl+Shift+3". WebRender is going to save the contents of scenes and built frames on disk, in the wr-capture folder of the current directory of Firefox. On Linux, this would be ~/wr-capture, on other systems it may require launching Firefox specifically from either an accessible folder, or with sufficient privileges. There is bugzilla entry 1487462 to make this configurable.

We switch to the console and check that the capture can be loaded:

cd wr/wrench # Wrench is used to load the captures
git checkout `cat ~/wr-capture/wr.txt` # Check out the exact revision used to save the capture
cargo run -- load ~/wr-capture # Load the capture

ff-road-wrench

You can see that the DPI is different (and I replayed it on a different monitor), but otherwise all is good and ready for us to continue the investigation.

@kvark

This comment has been minimized.

Member

kvark commented Sep 18, 2018

Let's open ~/wr-capture/scene-1-0.ron, it's a fairly big text file describing the scene contents.

Looking for 490 offset (that we say earlier in the inspector for the road element), we quickly find the reference frame:

(
    item: PushReferenceFrame((
        reference_frame: (
            transform: Some(Value((1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 490, 0, 1))),
            perspective: None,
            id: Spatial(10, (1, 135)),
        ),
    )),
    clip_and_scroll: (
        scroll_node_id: Spatial(5, (1, 135)),
        clip_node_id: None,
    ),
    info: (
        rect: ((1, 1), (0, 0)),
        clip_rect: ((1, 1), (0, 0)),
        is_backface_visible: true,
        tag: None,
    ),
),// [61]

Now we can find the linear gradient item attached to this Spatial(10, (1, 135)) frame:

(
    item: SetGradientStops([
        (
            offset: 0,
            color: (
                r: 0.7333333492279053,
                g: 0.7333333492279053,
                b: 0.7333333492279053,
                a: 1,
            ),
        ),// [0]
        (
            offset: 0.24843944609165192,
            color: (
                r: 0.7333333492279053,
                g: 0.7333333492279053,
                b: 0.7333333492279053,
                a: 1,
            ),
        ),// [1]
        (
            offset: 0.24968789517879486,
            color: (
                r: 0.20000001788139343,
                g: 0.20000001788139343,
                b: 0.20000001788139343,
                a: 1,
            ),
        ),// [2]
        (
            offset: 0.9987515807151794,
            color: (
                r: 0.20000001788139343,
                g: 0.20000001788139343,
                b: 0.20000001788139343,
                a: 1,
            ),
        ),// [3]
        (
            offset: 1,
            color: (
                r: 0.7333333492279053,
                g: 0.7333333492279053,
                b: 0.7333333492279053,
                a: 1,
            ),
        ),// [4]
    ]),
    clip_and_scroll: (
        scroll_node_id: Spatial(10, (1, 135)),
        clip_node_id: None,
    ),
    info: (
        rect: ((0, 0), (0, 0)),
        clip_rect: ((0, 0), (0, 0)),
        is_backface_visible: true,
        tag: None,
    ),
),// [64]
(
    item: Gradient((
        gradient: (
            start_point: (359, -0.000000000000021316282072803006),
            end_point: (359, 40.04999923706055),
            extend_mode: Clamp,
        ),
        tile_size: (718, 50),
        tile_spacing: (0, 0),
    )),
    clip_and_scroll: (
        scroll_node_id: Spatial(10, (1, 135)),
        clip_node_id: None,
    ),
    info: (
        rect: ((0, 0), (718, 50)),
        clip_rect: ((0, 0), (718, 50)),
        is_backface_visible: true,
        tag: None,
    ),
),// [65]

The gradient stops appear to match what we saw in the Firefox inspector, so that must be our road.

@kvark

This comment has been minimized.

Member

kvark commented Sep 18, 2018

It would be great to understand what happens to this particular item when it gets processed by WebRender internal stages. Fortunately, chasing infrastructure is meant to solve this. All we need is the local rect. Let's open ~/wr-capture/backend.ron and put this line in the framebuffer config:

chase_primitive: LocalRect(((0, 0), (718, 50))), # previously: Nothing

Once this is set, we just need to re-load the capture in wrench: cargo run -- load ~/wr-capture.
Hmm, the new logging suggests we are tracking more than just a single road:

loaded [DocumentId(IdNamespace(1), 0)]
Chasing PrimitiveIndex(25)
Chasing PrimitiveIndex(27)
Chasing PrimitiveIndex(25)
Chasing PrimitiveIndex(27)

This is undesired. Let's change 50 to 51 in both the chase_primitive field and rect of the item:

chase_primitive: LocalRect(((0, 0), (718, 51))),

No need to rebuild anything, just load the capture again. Here is the full console log:

loaded [DocumentId(IdNamespace(1), 0)]
Chasing PrimitiveIndex(27)
	preparing a run of length 1 in pipeline PipelineId(1, 135)
	effective clip chain from ClipNodeRange { first: 0, count: 0 } (applied)
	updating clip task with pic rect TypedRect(718.0×50.0 at (0.0,0.0))
	segment tasks have been created for clipping
	considered visible and ready with local rect TypedRect(718.0×50.0 at (0.0,0.0))
		BrushSegment { local_rect: TypedRect(718.0×50.0 at (0.0,0.0)), clip_task_id: Opaque, may_need_clip_mask: false, edge_flags: LEFT | TOP | RIGHT | BOTTOM, extra_data: [0.0, 0.0, 0.0, 0.0], brush_flags: (empty) }
	task target TypedRect(718×51 at (857,0))
	PrimitiveHeader { local_rect: TypedRect(718.0×51.0 at (0.0,0.0)), local_clip_rect: TypedRect(718.0×50.0 at (0.0,0.0)), task_address: RenderTaskAddress(5), specific_prim_address: GpuCacheAddress { u: 844, v: 705 }, clip_task_address: RenderTaskAddress(32767), transform_id: TransformPaletteId(0) }
	LinearGradient PrimitiveHeaderIndex(6), task relative bounds TypedRect(634.54034×406.4295 at (865.7369,725.69745))
@kvark

This comment has been minimized.

Member

kvark commented Sep 19, 2018

The primitive log looks fine: the gradient goes through all clipping and culling, generates a single segment and ends up somewhere in a batch list, since the primitive header is printed at the end. Still, it doesn't tell us how it participated in the plane splitting, and that's because only pictures participate in splitting, not individual items. We need to know the picture containing our primitive, and we need to chase that instead.

Here comes the change where we track the owning picture as well as allow specifying the primitive index directly for chasing: #3085

The new output starts like this:

Chasing PrimitiveIndex(27) by local rect
	added to picture primitive PrimitiveIndex(26)

Let's now switch to chasing the owning picture, by going back to backend.ron and modifying the chase target:

chase_primitive: Index((26)),

The new output looks much juicier:

Chasing PrimitiveIndex(26) by index
	preparing a run of length 1 in pipeline PipelineId(1, 135)
	effective clip chain from ClipNodeRange { first: 0, count: 0 } (applied)
	updating clip task with pic rect TypedRect(635.4107×407.43988 at (864.8665,725.69745))
	considered visible and ready with local rect TypedRect(718.0×51.0 at (0.0,0.0))
	task target TypedRect(2560×1411 at (0,0))
	PrimitiveHeader { local_rect: TypedRect(718.0×51.0 at (0.0,0.0)), local_clip_rect: TypedRect(2000000000.0×2000000000.0 at (-1000000000.0,-1000000000.0)), task_address: RenderTaskAddress(290), specific_prim_address: GpuCacheAddress { u: 344, v: 174 }, clip_task_address: RenderTaskAddress(32767), transform_id: TransformPaletteId(16777228) }
		split polygon [(1461.235918898072,1133.1373291015623,517.1855088504277), (864.8665309353021,745.9051088154263,-35.83929913708146), (907.953592402319,725.6974487304686,-64.69882242583208), (1500.277200828006,1083.9445004752108,446.93088361729707)]
		split polygon [(1500.277200828006,1083.9445004752108,446.93088361729707), (1461.235985845073,1133.1373291015623,517.1855088504278), (1461.235918898072,1133.1373291015623,517.1855088504277), (1461.235918898072,1133.1373291015623,517.1855088504277)]
		split polygon [(907.9536653210769,725.6974487304688,-64.69882242583193), (1500.277200828006,1083.9445004752108,446.93088361729707), (907.953592402319,725.6974487304686,-64.69882242583208), (907.953592402319,725.6974487304686,-64.69882242583208)]
@kvark

This comment has been minimized.

Member

kvark commented Sep 19, 2018

Clearly, polygons are generated by the plane splitter. Just to be sure, let's confirm that those are actually rendered in the right place, just occluded by others. For this, we need a GPU capturing tool, like RenderDoc.

In RenderDoc, open the "Launch Application" tab and specify the required paths and parameters:
renderdoc-settings

I often use --no-batch for GPU capturing so that I can look at individual draw calls in more detail. Another useful parameter would be --angle to render with D3D11, but this is only available on Windows.

Before we start, let's force-enable the GPU markers so that it's easier to navigate a capture. The code rightfully checks for the "EXT_debug_marker" extension, but the reality is that almost no drivers publicly advertise it, yet most support the entry points for debug markers. We can force it by modifying the following line in renderer.rs:

let ext_debug_marker = true || device.supports_extension("GL_EXT_debug_marker");

Don't forget to build Wrench again with cargo build. Now we can hit the "Launch" button, wait a bit, then trigger a GPU capture by hitting F12 on the keyboard, then exit with Esc. We can find the "SplitComposite" section in the last pass containing most of the scene:

wr-renderdoc-split

RenderDoc doesn't support pixel history on GL, so we are going to poke around the draw calls with manual bi-section. Eventually, we find event 7798 that draws our road. And it looks great with other roads, all of which gets later occluded by the main grass plane:

wr-renderdoc-roads

Note that this is an alpha pass, and all those draw calls aren't writing to depth buffer. They just draw on top of each other, and whatever ends up visible depends only on the ordering produced by the plane splitter. At least we know that the roads are there, just sorted behind the main plane.

@kvark

This comment has been minimized.

Member

kvark commented Sep 20, 2018

Let's dive deeper into the logic of plane-splitting that we confirmed to be faulty. Rust has standard logging with log crate that WebRender and its dependencies use liberally. We can capture the output of the relevant dependencies (plane-split and binary-space-partition) with this shell command:

RUST_LOG=plane_split=trace,binary_space_partition=debug cargo run --features env_logger -- load ~/wr-capture/ >split.log

Notice a few important things:

  1. we enable the env_logger (optional) dependency, which forces the specified log level to be printed to the stderr, which we forward to a file.
  2. we set RUST_LOG to the list of crate/module prefixes and their logging levels. Default level is error. The crate namespaces have underscores instead of dashes.

When reading the log, we may need to fix some output or add more info. We can do this by overriding the plane-split dependency to a local copy in webrender/Cargo.toml:

#plane-split = "0.13"
plane-split = { path = "../../wr-eco/plane-split" }

The same can be done by linking to a local copy of binary-space-partition, but this time from plane-split's Cargo.toml:

#binary-space-partition = "0.1.2"
binary-space-partition = { path = "../binary-space-partition" }

The extra info I needed was - exact anchors of the cut planes, as well as the plane details of the base.

Plane anchor is a user-defined (from the plane-splitter perspective) integer value, which we specify exactly to be the primitive index. This makes it easy for us to see where the picture (hosting our missing road) participated:

DEBUG 2018-09-20T02:01:42Z: plane_split::bsp: 	Cutting anchor 26 by 15
TRACE 2018-09-20T02:01:42Z: plane_split::bsp: 		base Plane { normal: (-0.0000000011748241597928949,-0.8191520123440764,0.5735764819731106), offset: 631.5661998078797 }
DEBUG 2018-09-20T02:01:42Z: plane_split::polygon: 		One is completely outside of the other
DEBUG 2018-09-20T02:01:42Z: plane_split::bsp: 		Coplanar at -0.0002584101965794616
DEBUG 2018-09-20T02:01:42Z: plane_split::bsp: 	Cutting anchor 26 by 15
TRACE 2018-09-20T02:01:42Z: plane_split::bsp: 		base Plane { normal: (-0.0000000011748241597928949,-0.8191520123440764,0.5735764819731106), offset: 631.5661998078797 }
DEBUG 2018-09-20T02:01:42Z: plane_split::polygon: 		One is completely outside of the other
DEBUG 2018-09-20T02:01:42Z: plane_split::bsp: 		Coplanar at -0.00031870985560544796
DEBUG 2018-09-20T02:01:42Z: plane_split::bsp: 	Cutting anchor 26 by 15
TRACE 2018-09-20T02:01:42Z: plane_split::bsp: 		base Plane { normal: (-0.0000000011748241597928949,-0.8191520123440764,0.5735764819731106), offset: 631.5661998078797 }
DEBUG 2018-09-20T02:01:42Z: plane_split::polygon: 		One is completely outside of the other
DEBUG 2018-09-20T02:01:42Z: plane_split::bsp: 		Coplanar at -0.00022596033340960275

It looks to be colliding with some primitive index 15. Let's chase it to get more info:

Chasing PrimitiveIndex(15) by index
	preparing a run of length 1 in pipeline PipelineId(1, 135)
	effective clip chain from ClipNodeRange { first: 0, count: 0 } (applied)
	updating clip task with pic rect TypedRect(937.2262×617.1816 at (811.9972,578.9969))
	considered visible and ready with local rect TypedRect(720.0×600.0 at (0.0,0.0))
	task target TypedRect(2560×1411 at (0,0))
	PrimitiveHeader { local_rect: TypedRect(720.0×600.0 at (0.0,0.0)), local_clip_rect: TypedRect(2000000000.0×2000000000.0 at (-1000000000.0,-1000000000.0)), task_address: RenderTaskAddress(290), specific_prim_address: GpuCacheAddress { u: 568, v: 140 }, clip_task_address: RenderTaskAddress(32767), transform_id: TransformPaletteId(16777223) }
		split polygon [(1749.2233886718752,771.9133112728194,1.3043500822873049), (1413.3020169887895,1196.178441187616,607.2176771889777), (811.9972346499076,770.0890569503898,-1.3009567029228848), (1219.8029891465399,578.9968953651626,-274.2088131898918)]
		split polygon [(1219.8029891465399,578.9968953651626,-274.2088131898918), (1749.223388671875,771.9132875213796,1.3043161617201118), (1749.2233886718752,771.9133112728194,1.3043500822873049), (1749.2233886718752,771.9133112728194,1.3043500822873049)]

The local_rect is pretty large for the scene, and we can quickly identify it as the "ground" item in the original page inspector. So the fragment pasted is exactly our point of interest in the code.

@kvark

This comment has been minimized.

Member

kvark commented Sep 20, 2018

(note: this part doesn't teach anything new, it just advances the story)

Before we try to debug the problematic case, let's compare the split.log of our missing road with the one that ends up visible. Going back to inspector, grabbing a visible road, seeing offset 275, and then looking for the corresponding reference frame and linear gradient item in scene.ron, exactly like we did for the original road, and... wait. There isn't exactly a reference frame with those offsets, even though there are similar. It appears that the road we are looking for has a rotation transform, which makes the offset in the combined matrix (that we see in the capture) to be different... Let's change tactics - there are only 7 roads there corresponding to exactly 7 linear gradients in the same order as they are specified in DOM. Counting the 4th one yields the local rectangle of ((0, 0), (400, 50)), which we trace right away:

Chasing PrimitiveIndex(31) by local rect
	added to picture primitive PrimitiveIndex(30)

Now let's find the logging piece involving primitive 30 colliding with 15:

DEBUG 2018-09-20T02:01:42Z: plane_split::bsp: 	Cutting anchor 30 by 15
TRACE 2018-09-20T02:01:42Z: plane_split::bsp: 		base Plane { normal: (-0.0000000011748241597928949,-0.8191520123440764,0.5735764819731106), offset: 631.5661998078797 }
DEBUG 2018-09-20T02:01:42Z: plane_split::polygon: 		One is completely outside of the other
DEBUG 2018-09-20T02:01:42Z: plane_split::bsp: 		Coplanar at 0.00004218902233787958
DEBUG 2018-09-20T02:01:42Z: plane_split::bsp: 	Cutting anchor 30 by 15
TRACE 2018-09-20T02:01:42Z: plane_split::bsp: 		base Plane { normal: (-0.0000000011748241597928949,-0.8191520123440764,0.5735764819731106), offset: 631.5661998078797 }
DEBUG 2018-09-20T02:01:42Z: plane_split::polygon: 		One is completely outside of the other
DEBUG 2018-09-20T02:01:42Z: plane_split::bsp: 		Coplanar at 0.00003853979183077172
DEBUG 2018-09-20T02:01:42Z: plane_split::bsp: 	Cutting anchor 30 by 15
TRACE 2018-09-20T02:01:42Z: plane_split::bsp: 		base Plane { normal: (-0.0000000011748241597928949,-0.8191520123440764,0.5735764819731106), offset: 631.5661998078797 }
DEBUG 2018-09-20T02:01:42Z: plane_split::polygon: 		One is completely outside of the other
DEBUG 2018-09-20T02:01:42Z: plane_split::bsp: 		Coplanar at 0.00003969250064983498

The difference with our problematic primitive 26 is pretty clear: this one always ends up with positive distance to the ground plane, hence getting ordered in front of it.

Our extra logging allows to extract the planes in question:

[15] = Plane { normal: (-0.0000000011748241597928949,-0.8191520123440764,0.5735764819731106), offset: 631.5661998078797 }
[26] = Plane { normal: (-0.0000000011748234773238120,-0.8191519873803397,0.5735765176250158), offset: 631.5662336925388 }
[30] = Plane { normal: (-0.0000000018060725319020477,-0.8191520167368874,0.5735764756995269), offset: 631.5661953014859 }

They are all the same within the precision thresholds. What happens then is that the roads aren't really moved slightly closer to the camera, as I'd expect, but instead they are just drawn with some offset/rotation after the ground plane, and they rely on the fact that Preserve3D elements on the same plane are drawn in the order they are provided. In our current implementation, they are considered slightly different, and limited floating point precision makes the ordering fluctuate unstably.

@kvark

This comment has been minimized.

Member

kvark commented Sep 20, 2018

There is a number of things that could be done to alleviate the precision issues in our case.

First of all, the polygons generated from the same transform should really have the same (normal, offset) pair (what constitutes their "plane"). In order to achieve that, we are going to add a new method to Polygon::from_transformed_rect_with_inverse, which has an extra parameter for the inverse transformation, which we solely use for plane construction. It's also relatively straightforward to test: given a transform, we can construct polygons both ways and check if they end up the same or extremely close to each other.

let normal = TypedVector3D::new(inv_transform.m13, inv_transform.m23, inv_transform.m33).normalize();
let offset = -TypedVector3D::new(transform.m41, transform.m42, transform.m43)
    .dot(normal) * transform.m44;

Secondly, we should detect if planes are co-planar before checking if points of one plane are outside of another plane. This, once again, would ensure that the results are more consistent no matter where the local rectangle was on a plane originally:

let (intersection, dist) = if self.plane.normal
    .dot(poly.plane.normal)
    .approx_eq(&T::one())
{
    debug!("\t\tNormals roughly match");
    (Intersection::Coplanar, self.plane.offset - poly.plane.offset)
} else {
    let intersection = self.intersect(&poly);
    let dist = self.plane.signed_distance_sum_to(&poly);
    (intersection, dist)
};

Thirdly, we should consider planes co-planar more aggressively, so that the original order of elements kicks in more cases. This can be done by just raising the threshold for plane distance evaluation:

-            Intersection::Coplanar if dist.approx_eq(&T::zero()) => {
+            Intersection::Coplanar if is_zero(dist) => {..}

All of these changes went into servo/plane-split#21, which made our wrench capture look much nicer:
wrech-fixed

@kvark

This comment has been minimized.

Member

kvark commented Sep 20, 2018

Before filing the WebRender PR to update plane-split and use the new polygon constructor, let's test the changes on an actual Firefox build. To do that, I need to rebase my local WR changes onto the revision that the checked out Firefox has:

[kvark@carbon firefox]$ cat gfx/webrender_bindings/revision.txt 
5b5b4145ecef117acb02fd1f9b72bf02e85c650b

After rebasing, we can copy over gfx/webrender and/or apply a diff patch to the tree. Note that we could have gone with a patch right away, but I find git to be better at rebasing changes than the patch tool, since it has more detailed info on the timeline of changes.

Having a local build also allows us to launch a try push:

git commit
# have "try: -b do -p win64,linux64,macosx64 -u all[linux64-qr,windows10-64-qr,macosx64-qr] -t all[linux64-qr,windows10-64-qr,macosx64-qr]"  there
git push try

Here comes #3093, and it concludes our bug story. In this live journey, we visited the Inspector, WebRender capturing and chasing infrastructure, logging, RenderDoc, and quite a few very technical details about how plane-splitting works. We didn't touch an actual gdb/lldb/shader debugging session, but hopefully other bug stories will cover that ;)

bors-servo added a commit that referenced this issue Sep 20, 2018

Auto merge of #3093 - kvark:plane-split, r=gw3583
Construct split planes with help of the inversed transform

Fixes #3070
Try: https://treeherder.mozilla.org/#/jobs?repo=try&revision=5eb3a20402b513ccfa2d526b14f32ed707821e92

Meat of the fix is within `plane-split` 0.13.1 update, which we use in this PR in order to construct consistently transformed polygons for splitting.

<!-- Reviewable:start -->
---
This change is [<img src="https://reviewable.io/review_button.svg" height="34" align="absmiddle" alt="Reviewable"/>](https://reviewable.io/reviews/servo/webrender/3093)
<!-- Reviewable:end -->
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment