This allows drawing a render task where the allocated space in
the surface cache is smaller than the required size if we were
to draw the complete contents of a render task.
The primary motivation of this task is to fix some existing bugs.
Specifically, if an off-screen render task is intersecting with
the main screen rect, we would only allocate enough space in the
surface cache texture for the visible portion. Then, the render
task would be drawn, and in some cases could extend outside the
bounds of the allocated rect for the task. This can result in
image corruption - for instance, a long scrolling region which
is drawn to an off-screen target (due to a filter) could draw
outside its region, affecting other tasks in the same surface
cache texture layer.
A second motivation is as a performance and power saving optimization.
In the future, we may want to render and cache "tiles" that can be
cached and provided to the OS compositor interfaces, to avoid work
during scrolling.
Finally, the implementation also contains the infrastructure necessary
to enable us to completely skip texture-cache sub-render tasks when
a frame is re-rendered, which is a performance gain in some cases.
To achieve this:
* Instead of one alpha-batcher per target, there is one per render task.
* Each alpha-batcher tracks the task-relative, screen-space bounding
rect of primitives added to it.
* Once each alpha-batcher is complete, we check if the allocated rect
for this task completely contains the combined bounding rect of the
primitives inside it.
* If it *does* contain it (common case) then we run a batch merging
step where this set of batches is merged with any other alpha batch
tasks that are also self-contained. This merging step can be very
simple, since we know there is no overlap ordering constraints.
* If it *doesn't* contain it, then we will submit this set of batches
separately to the merged batch list, applying a scissor rect set to
the size of the render task allocation.
The batching results should be as good or better than previously, except
in the genuine case where this fixes a bug, in which case there may be
a small number of extra draw calls.
Since the batching of each render task is independent, and also only
references read-only shared state, we can easily run each batch creation
task on a worker thread, if profiling shows any benefit to that.