Skip to content

Latest commit

 

History

History
233 lines (173 loc) · 8.6 KB

2d.md

File metadata and controls

233 lines (173 loc) · 8.6 KB

2D engine documentation

This document describes 2D graphics cores such as the GC320. Do not confuse with the VG core (such as GC355) which is a beefed up 2D core with a completely different interface.

Important: be sure to set the PIPE to 2D before using the 2D engine. Otherwise, the device will hang on the first rendering command and nothing will seem to happen at all.

As the state footprint is pretty small, it is recommended to program all relevant 2D engine state for an operation (before flushing) at once before a command instead of relying on a context to be maintained as with 3D rendering (although this is still a possibility).

Using the 2D and 3D engine simultaneously within a program can be tricky. Some of the SoCs such as Marvell Armada 510 have 2D and 3D in the same core, whereas others such as Freescale i.MX6 have multiple cores. In the first case it is easy, just flush the caches and switch between the PIPEs, though there is some overhead involved. In the latter case, however, the cores run independently and synchronization has to go through the CPU. This necessitates either a stall or a complex queuing mechanism that waits for signals on both cores.

2D commands

The 2D engine supports the following top-level commands:

  • Clear
  • Line
  • Bit blit
  • Stretch blit
  • Multi source blit

2D commands are executed by setting the opcode in register DE.DEST_CONFIG.COMMAND (and other state as necessary) and then queuing DRAW_2D commands in the command stream. These commands can be supplied with up to 256 rectangles which will be drawn using the same state and origin settings.

Filter blits are also available as 2D commands, but I was unable to get this to do anything (I don't think the blob does either). Use the video rasterizer for these as described below.

Video rasterizer

The video rasterizer, part of the 2D engine does hardware scaling using an arbitrary 9-tap separable filter with 5 bit subpixel precision,

It supports the following top-level commands:

  • Horizontal filter blit
  • Vertical filter blit
  • One-pass filter blit

Video rasterizer commands are submitted in a different way from normal 2D commands, as they are triggered by a write to a register instead of through the command stream. The video rasterizer only processes one rectangle per invocation.

Input: Y/U/V planar or Y/U/V interleaved images or RGBA images Output: RGBA formats (output to planar is possible too on some chips)

Source and destination formats

Format        Source    Destination     Notes
-----------------------------------------------
A1R5G5B5        +            +
A4R4G4B4        +            +
X1R5G5B5        +            +
X4R4G4B4        +            +
R5G6B5          +            +
A8R8G8B8        +            +
X8R8G8B8        +            +
A8              +            +          8-bit alpha only
MONOCHROME      +            -          1-bit monochrome
INDEX8          +            -          8-bit indexed
YUY2            +            ?          YUV 4:2:2 interleaved per four bytes, Y0 Cb Y1 Cr
UYVY            +            ?          YUV 4:2:2 interleaved per four bytes, Cb Y0 Cr Y1
YV12            +            ?          YUV 4:2:0 8-bit Y plane and 8 bit 2x2 subsampled V and U planes
NV12            +            ?          YUV 4:2:0 8-bit Y plane and 8 bit 2x2 subsampled V and U planes
NV16            +            ?          YUV 4:2:2 8-bit Y plane and interleaved U/V plane with 2x1 subsampling
RG16            +            ?          Bayer interleaved: RGRG..(odd line), GBGB..(even line), 8-bit samples

Additional formats can be supported by using RGBA or UV swizzles.

PE10/PE20

There are two versions of the Pixel Engine (PE) for the 2D pipe, PE10 and PE20. These can be distinguished by the feature bit PE20.

GPUs with feature bit PE20 have various different features from GPUs without the bit (considered PE10). Also the registers that are used for the same features can be different. PE20 registers are usually a superset of the PE10 equivalent. Make sure to use the right registers according to the PE type or it will not work.

Commands

Clear

Fills an area with a solid color.

Even though ROP is not used for clears, set it to 0xcc when clearing to avoid loading the pattern or destination.

For PE20 the clear color is specified in register DE_CLEAR_PIXEL_VALUE32, and is always specified in A8R8G8B8 format.

For PE10 the clear color is specified in registers DE_CLEAR_PIXEL_VALUE_LOW and DE_CLEAR_PIXEL_VALUE_HIGH as an 8-byte pattern, so it has to be provided in the format of the target surface. An additional register CLEAR_BYTE_MASK contains a bit mask that specifies which bytes within each 8-byte unit to clear.

Lines

Draws lines using the Bresenham algorithm.

The first pixel (x1,y1) is drawn, last pixel (x2,y2) is not

Monochrome blits

Mono expansion can be used for primitive font rendering or black and white patterns such as checkerboards.

When blitting from LOCATION_STREAM make sure that there are enough bytes available in the stream. Source size is ignored in the case of monochrome blits.

Mono expansion uses registers SRC_COLOR_FG and SRC_COLOR_BG to determine the colors to use for 0 and 1 pixels respectively.

Restrictions:

  • In case of source LOCATION_STREAM can only draw one rectangle at a time. There is no such restriction for LOCATION_MEMORY.

Raster operations

Raster operation foreground and background codes. Even though ROP is not used in CLEAR, HOR_FILTER_BLT, VER_FILTER_BLT and alpha-enabled BIT_BLTs, ROP code still has to be programmed, because the engine makes the decision whether source, destination and pattern are involved in the current operation and the correct decision is essential for the engine to complete the operation as expected.

ROP builds a lookup table for a logical operation with 2, 3 or 4 inputs (depending on ROP type). So for a ROP3, for example, the ROP pattern will be 2^3=8 bits.

These are the input bit for the ROPs, per ROP type:

ROP2_PATTERN [untested] bit 0 destination bit 1 pattern

ROP2_SOURCE [untested] bit 0 destination bit 1 source

ROP3 (uses ROP_FG only) bit 0 destination bit 1 source bit 2 pattern

ROP4 (uses ROP_FG and ROP_BG) bit 0 destination bit 1 source bit 2 pattern bit "3" foreground/background (ROP_FG / ROP_BG)

ROP3/4 examples:

10101010  0xaa   destination
01010101  0x55   !destination
11001100  0xcc   source
00110011  0x33   !source
11110000  0xf0   pattern
00001111  0x0f   !pattern

Patterns

An repeated 8×8 pattern can be used with 2D engine operations LINE and BIT_BLT. This pattern can be combined with the color using ROP.

Alpha blending

  • The blend equation is always akin OpenGL's GL_FUNC_ADD, source and destination (multiplied by blend factor) are added.

  • Alpha values can come from the source/destination per pixel or a global value defined in the state.

Rotation and mirroring

  • There are two ways to do source and destination rotation: through register ROT_ANGLE and through register SOURCE_ROTATION_CONFIG / DEST_ROTATION_CONFIG. The former is more flexible and can rotate (0, 90, 180, 270) as well as flip in X and Y. However it is not supported on every GPU (which ones?).

  • There are also two ways to do mirroring: though register ROT_ANGLE and through register CONFIG (enable "mirror blit"). Both methods seem roughly equivalent, but hardware support may be different. Mirroring seems to be ROT_ANGLE is supported with the NEW_2D capability ("mirror blit extension"). This is only available in new hardware (gc880, gc2000).

Miscellaneous notes

Stretch blit

Stretch blit destination format must support alpha blending.

YUV color spaces

The 2D engine supports two YUV color spaces [1]: BT.601 and BT.709. Conversion from YUV to RGB is done in the following way for each of the color spaces:

16 ≤ Y ≤ 235
16 ≤ U ≤ 240
16 ≤ V ≤ 240
A = Y - 16
B = U - 128
C = V - 128

BT.601:

R = clip((298*A + 410*C + 128) >> 8)
G = clip((298*A - 101*B - 209*C + 12
B = clip((298*A + 519*B + 128) >> 8)

BT.709:

R = clip((298*A + 461*C + 128) >> 8)
G = clip((298*A - 55*B - 137*C + 128) >> 8)
B = clip((298*A + 543*B + 128) >> 8)

References

[1] Vivante GC320 reference manual (used to be hosted on the Vivante site, but was taken off line)