Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

(Do not land) Move tile decomposition to frame building. #2513

Closed
wants to merge 5 commits into from

Conversation

@nical
Copy link
Collaborator

nical commented Mar 13, 2018

Step 2 of #2370 (Step 1 being #2507).


This change is Reviewable

@nical
Copy link
Collaborator Author

nical commented Mar 13, 2018

The PR is in a pretty awful state, it is currently more of a proof of concept than than something to land. The awfulness is mostly a lot of code duplication coming from that the way prim_store.rs and batch.rs are structured expect a batch item per primitive, whereas what I am doing here is allowing tiled images to generate several batch items (each tile) per primitive (the whole image is now a primitive instead of being broken up into multiple primitives during scene building).

I think that this can be refactored into something acceptable, but it's possible that more involved architecture changes would be preferable to support splitting primitives during frame building (or it's possible that I misunderstood the code that deals with brush segments and that I can rely on something similar when there may be many tiles). If you think so, please let me know sooner rather than later.

Even once this has been made nice and clean, I don't think it would be reasonable to land before we are also culling tiles during frame building (which this PR doesn't do yet), otherwise this would badly regress performance in some cases.

@nical
Copy link
Collaborator Author

nical commented Mar 13, 2018

One reason we might want want to take the time to generalize the notion that a primitive can be split during frame building is that I think it would be good to move gradient decomposition to the frame building phase as well for the same reasons I am moving image decomposition.

@bors-servo
Copy link
Contributor

bors-servo commented Mar 13, 2018

The latest upstream changes (presumably #2507) made this pull request unmergeable. Please resolve the merge conflicts.

@nical nical force-pushed the nical:lazy-img-tiling branch from 40b70e2 to 4b4e3d3 Mar 13, 2018
@nical
Copy link
Collaborator Author

nical commented Mar 13, 2018

The code is in a better shape now, although I still have to cull tiles before anything can land.

After a bit of digging I am erring on the side of not using the brush segmentation infrastructure for image tiles, because:

  • Individual tiles can sample from different cache textures and end up in different batches.
  • There can be many tiles, I am not sure how reasonable it is to request potentially hundreds of them to be contiguously stored in the gpu cache, whereas individual primitives get their own distinct cache requests.
  • Some of the per-primitive data that is shared by segments needs to be per-tile (local_rect), so it would need extra some work in the brush image shader to make that happen.

All in all it looks quite a bit simpler to split image primitives without relying on the segment infrastructure, although I am not saying it is better in the long run.

The downsides I can think of are that with segments we can share the clip rect between tiles in the gpu cache, and overall having a single system for splitting primitives sounds cleaner (although I am not sure how this holds in practice).

Now is a good time to let me know if you can think of a better way than the approach I am going for in this PR.

cc @glennw @kvark @mrobinson

fix
@glennw
Copy link
Member

glennw commented Mar 13, 2018

@nical I haven't looked at this at all yet, but the comment above about prim_store and batch expecting a single batch item per-primitive doesn't sound quite right.

For example, borders produce 8 instances with 2 shaders for each primitive. I'm not sure if that's relevant here or not.

I think using the segment infrastructure here should work, and will probably work out well. I'll try to take a detailed look at this today and offer some more useful comments.

@glennw
Copy link
Member

glennw commented Mar 14, 2018

(We discussed this on IRC and @nical is going to take a look at using the segment infrastructure and see if there's any gotchas with it).

@nical
Copy link
Collaborator Author

nical commented Mar 14, 2018

As I am digging into this I realize that images are actually not converted to brushes yet (in the majority of cases).
This means that we have to first port at least a good portion of the current image primitive features to the image brush shader.
I think that it's reasonable to drop support for tile spacing in the brush shader (and rely on cpu-side decomposition), however I am less confident about repetition. It's kind of common to see a very thin 1 or 2px wide but very tall repeated background image to emulate gradients (from the old days were proper gradients weren't a thing). There is also the (not deprecated) practice of repeating a very small image patterns over a large area. Using decomposition for these would generated thousands of segments on some sites.

Assuming we do support repeating in the brush image shader, we then need to store per segment:

  • the segment rect (4 floats)
  • the source uv rect (4 floats)
  • the stretch size (2 floats)

Bumping the number of vecs per segment from 2 to 3 (for all brush shaders IIUC). Is this going to be an issue?

I'll start by trying to selectively port over image primitives that have the stretch size equal to the primitive size.\

Edit:

Actually it looks like the UV is stored in a separate allocation and the segment data can get away with storing only a 32bit address which can be packed with the stretch size.

@glennw
Copy link
Member

glennw commented Mar 14, 2018

Some additional options to consider:

You could have a separate GPU cache handle stored in the brush::image primitive. You can write per-segment data to this, store the handle in the user data of the instance, and then fetch from that in the image shader (by adding the segment index). This allows you to store an arbitrary amount of extra data per-segment for images, without affecting other brush primitives.

Another option is to do the image repeating on the CPU via extra instances. To handle those bad cases of 1x1 repeats, what we would do is detect if the repeat is less than a threshold (e.g. 128px) and then use request_render_task to build a pre-repeated texture cache entry. This would drastically reduce the number of instances that we then send on the CPU for these cases. I quite like this idea, it might be a bit of extra work to implement, but it's nice that we then don't pay any extra shader cost for the common case, and just one (cached) render task for this edge case.

@glennw
Copy link
Member

glennw commented Mar 14, 2018

We also discussed on IRC that it's feasible to modify the way brush segment generation works to supply a separate user data field per segment. This is the preferred way to pass texture cache handles, since it avoids any complexities with invalidating GPU cache data if the texture cache entry gets evicted / moved.

@bors-servo
Copy link
Contributor

bors-servo commented Mar 15, 2018

The latest upstream changes (presumably #2517) made this pull request unmergeable. Please resolve the merge conflicts.

@glennw
Copy link
Member

glennw commented Apr 3, 2018

@nical Do we still need this one open?

@nical
Copy link
Collaborator Author

nical commented Apr 3, 2018

Closing in favor of #2572.

@nical nical closed this Apr 3, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked issues

Successfully merging this pull request may close these issues.

None yet

3 participants
You can’t perform that action at this time.