The fewer places that rely on a full world rect, the easier it
is to do clustered culling, where we accept / reject primitives
in groups.

The main use for world rects of primitivs is overlap calculations
during batching. Instead, switch to use the surface relative rect
for batching. This is safe since we know that when we merge batches
from different surfaces, they will never be overlapping in the
allocated render target.

Other changes:
 * Refactor the initial picture traversal to use a state object
   that maintains an internal stack of picture / surface info.
   This is easier to reason about, and will be helpful once we
   start using this to pass information about caching state.
 * Use world rect rather than clipped prim world rect for bounds
   during plane-splitting. This was landed previously but
   backed out due to an unrelated bug in that patch.
 * Change get_raster_rects to not calculate the transform, since
   most uses of this method don't require it.
 * Change conservative tiling calculations to use the world rect
   for bounds instead of clipped prim rect. All we're trying to
   do here is reject tiles outside the viewport, so this simplifies
   the code and removes the need for the primitive world rect in
   one more location.