Support partial rendering of off-screen pictures / render tasks.
This allows drawing a render task where the allocated space in the surface cache is smaller than the required size if we were to draw the complete contents of a render task. The primary motivation of this task is to fix some existing bugs. Specifically, if an off-screen render task is intersecting with the main screen rect, we would only allocate enough space in the surface cache texture for the visible portion. Then, the render task would be drawn, and in some cases could extend outside the bounds of the allocated rect for the task. This can result in image corruption - for instance, a long scrolling region which is drawn to an off-screen target (due to a filter) could draw outside its region, affecting other tasks in the same surface cache texture layer. A second motivation is as a performance and power saving optimization. In the future, we may want to render and cache "tiles" that can be cached and provided to the OS compositor interfaces, to avoid work during scrolling. Finally, the implementation also contains the infrastructure necessary to enable us to completely skip texture-cache sub-render tasks when a frame is re-rendered, which is a performance gain in some cases. To achieve this: * Instead of one alpha-batcher per target, there is one per render task. * Each alpha-batcher tracks the task-relative, screen-space bounding rect of primitives added to it. * Once each alpha-batcher is complete, we check if the allocated rect for this task completely contains the combined bounding rect of the primitives inside it. * If it *does* contain it (common case) then we run a batch merging step where this set of batches is merged with any other alpha batch tasks that are also self-contained. This merging step can be very simple, since we know there is no overlap ordering constraints. * If it *doesn't* contain it, then we will submit this set of batches separately to the merged batch list, applying a scissor rect set to the size of the render task allocation. The batching results should be as good or better than previously, except in the genuine case where this fixes a bug, in which case there may be a small number of extra draw calls. Since the batching of each render task is independent, and also only references read-only shared state, we can easily run each batch creation task on a worker thread, if profiling shows any benefit to that.