# Basic Rasterization

__framebuffers, dispatching primitives, instances__ 

A distinctive aspect of a rasterizer is the way output is handled. In a compute shader, generated image is accessed in random positions to store the output value. In a rasterizer, it is convenient that multiple targets and depth-stencil buffers are aligned and there is an implicit process consuming colors, depth, and stencil values, operating with them and then updating the respective buffers. The attached images for colors can be set in the fragment shader using out locations, while the depth-stencil operation can be statically defined in the pipeline or dynamically in the command buffer manager.

In this tutorial we will render different quads showing the setup of graphics pipelines and how instances works.

In [None]:
import vulky as vk
import torch

vk.create_device(debug=True)

Next, some constants to parameterize the example:

In [None]:
SCREEN_WIDTH = 512
SCREEN_HEIGHT = 512
NUMBER_OF_SPRITES = 30

Let's create the images for render target and depth buffer. They will have the same dimension. Automatically vulky will use the format for depth-stencil buffer with better definition.

In [None]:
render_target = vk.render_target(
    image_format=vk.Format.VEC4,
    width=SCREEN_WIDTH,
    height=SCREEN_HEIGHT
)

depth_buffer = vk.depth_stencil(
    width=SCREEN_WIDTH,
    height=SCREEN_HEIGHT
)

We will define a buffer with the properties for the sprites, one for each instance.

In [None]:
sprite_properties = vk.structured_buffer(
    count=NUMBER_OF_SPRITES,
    element_description=dict(
        offset=vk.vec3,
        size=float,
        color=vk.vec4,
    )
)

Now let's populate randomly. They will represent boxes in the screen, the offset z will be used for depth and the color

In [None]:
with sprite_properties.map(mode='in', clear=True) as b:
    b.offset = torch.rand(NUMBER_OF_SPRITES, 3)*2 - 1.0
    b.offset.z *= 0.5
    b.offset.z += 0.5
    b.size = torch.randn(NUMBER_OF_SPRITES, 1)*0.05
    b.color = vk.vec4(0.5, 0.5, 0.5, 0.5) + torch.randn(NUMBER_OF_SPRITES, 4)*0.1
    # b.color is a vec4, b.color[0] is not the first color but the first component for all colors
    b.color[0] = 1.0  # set the red component for all 30 colors to 1.
    # but cast to torch.Tensor is possible, meaning that is valid
    b.color.as_subclass(torch.Tensor)[:NUMBER_OF_SPRITES//2, 3] = 1.0  
    # That makes all alpha components for the first 15 colors 1.0

Mapped buffers can be accessed by the fields. The set access to the field will copy a tensor or constant to the field for all elements in the buffer. Notice that if the element is a vector or a matrix, the indexing will behave differently. Next, let's define the vertex and fragment shader code.

In [None]:
vertex_shader_code = """
#version 450
#extension GL_EXT_scalar_block_layout: enable

layout(location = 0) out vec4 out_color;

struct SpriteInfo
{
    vec3 offset;
    float size;
    vec4 color;
};

layout(scalar, set=0, binding=0) buffer SpriteInfos {
    SpriteInfo[] data;
} infos;

vec2[] quad = {
    vec2(-1.0, -1.0), 
    vec2(1.0, -1.0),
    vec2(-1.0, 1.0),
    vec2(-1.0, 1.0),
    vec2(1.0, -1.0),
    vec2(1.0, 1.0)
};

void main()
{
    SpriteInfo info = infos.data[gl_InstanceIndex];
    vec4 P = vec4(vec3(quad[gl_VertexIndex], 0)*info.size + info.offset, 1.0);
    gl_Position = P;
    out_color = info.color;
}
"""
fragment_shader_code = """
#version 450
layout(location = 0) in vec4 in_color;
layout(location = 0) out vec4 out_color;
void main()
{
    out_color = in_color;
}
"""

Notice that the vertex shader will operate on ```gl_VertexIndex``` and ```gl_InstanceIndex``` builtins. Now, let's define the pipeline object that will relate shaders, the resources bindings and the targets of the framebuffer.

In [None]:
pipeline = vk.pipeline_graphics()
pipeline.attach(0, render_target=vk.Format.VEC4)
pipeline.attach(1, depth_buffer=vk.Format.DEPTH_STENCIL)
pipeline.layout(set=0, binding=0, sprite_properties=vk.DescriptorType.STORAGE_BUFFER)
with pipeline.shader_stages(vk.ShaderStage.VERTEX):
    pipeline.load_shader_from_source(vertex_shader_code)
with pipeline.shader_stages(vk.ShaderStage.FRAGMENT):
    pipeline.load_shader_from_source(fragment_shader_code)
pipeline.close()

Once the pipeline is closed, the layout is defined, and derived objects can be created, framebuffers and descriptor set collections. The framebuffer object defines the images that are bound to the pipeline before execution. In vulkan, a render pass (involving a framebuffer) can define subpasses to optimize for dependencies between targets. In vulky, it is simplified to a single subpass.  

In [None]:
framebuffer = pipeline.create_framebuffer(
    width=SCREEN_WIDTH,
    height=SCREEN_HEIGHT,
    render_target=render_target,
    depth_buffer=depth_buffer
)

The pipeline is also used to create the descriptor set, in this example to bind a buffer with all sprite properties.

In [None]:
global_bindings = pipeline.create_descriptor_set_collection(set=0, count=1)
global_bindings[0].update(sprite_properties=sprite_properties)

Next, we will populate a command buffer. As an example, we will record the commands with a manager and freeze it before submitting. This is the way vulky allows to re-submit the same command buffer. Although, in this example we wont submit several time, equivalent to use within a context.

In [None]:
man = vk.graphics_manager()

The render target object starts its layout as render target, we will declare that with the use. This is not effective in the command buffer, but it starts a track of the state for that resource for the purpose of barrier and layout transitioning.

In [None]:
man.use(render_target, vk.ImageUsage.RENDER_TARGET)

Now, we can clear the render target with dark blue. To do so, let's transition to a general layout (specifying use ANY), then clear the color, then, set back to the layout optimal for render target (use RENDER_TARGET). Depth-stencil images dont need to transition because it will be done internally.

In [None]:
man.image_barrier(render_target, vk.ImageUsage.ANY)
man.clear_color(render_target, (0.0, 0.0, 0.4, 1.0))
man.image_barrier(render_target, vk.ImageUsage.RENDER_TARGET)

man.clear_depth_stencil(depth_buffer, 1.0, 0)

Then, lets set the pipeline and the framebuffer. Also, the global descriptor set.

In [None]:
man.set_pipeline(pipeline)
man.set_framebuffer(framebuffer)
man.bind(global_bindings[0])

At this point the graphics pipeline is setup and the implicit render pass to draw primitives. By default, the pipeline is assuming triangles topology. That can change. The next instruction will dispatch 30 instances of 6 vertices (2 triangles each).

In [None]:
man.dispatch_primitives(vertices=6, instances=30)

After populating the command buffer we can use freeze to close the command and prepare for submit.

In [None]:
man.freeze()

Now we ask vulky to submit a command buffer. The difference between ```close``` and ```freeze``` is that the inner command buffer if closed gets released automatically to be reused to record again. On the contrary, ```freeze``` will keep it available to re-submit if needed. 

In [None]:
vk.submit(man)

Next we copy the render target to a staging buffer. The difference is that now we require to transition the layout of the render target to the general case (use ANY).

In [None]:
staging = vk.tensor(render_target.height, render_target.width, 4) 
with vk.graphics_manager() as b:
    b.use(render_target, vk.ImageUsage.RENDER_TARGET)
    b.image_barrier(render_target, vk.ImageUsage.ANY)
render_target.save(staging)

In [None]:
import matplotlib.pyplot as plt
plt.imshow(staging.cpu())
plt.gca().axis('off')
plt.tight_layout(pad=0.0)
plt.show()

Let's prepare a simple animation for this example

In [None]:
video_data = vk.tensor(100, SCREEN_HEIGHT, SCREEN_WIDTH, 3)  # 100 frames
initial_offset = torch.rand(NUMBER_OF_SPRITES, 3)*2 - 1.0
initial_offset[..., 2] *= 0.5
initial_offset[..., 2] += 0.5

for i in range(len(video_data)):
    alpha = i / len(video_data)
    # update buffer
    # we use inout because we won't change everything, just a field
    with sprite_properties.map(mode='inout') as b:  
        b.offset = initial_offset
        b.offset.y += torch.abs(torch.sin(((initial_offset[...,0] * 100)%1.0)*10 + alpha*30))*0.1
        b.offset.x += torch.cos(((initial_offset[...,1]*40)%1.0)*10 + alpha*4)*0.3
    # re-submit commands to gpu
    with vk.graphics_manager() as b:  # transit from general to render target before rendering
        b.use(render_target, vk.ImageUsage.ANY)
        b.image_barrier(render_target, vk.ImageUsage.RENDER_TARGET)
    vk.submit(man)  # by default will wait until finishes
    with vk.graphics_manager() as b:  # transit from render target to general before saving
        b.use(render_target, vk.ImageUsage.RENDER_TARGET)
        b.image_barrier(render_target, vk.ImageUsage.ANY)
    render_target.save(staging)
    video_data[i] = staging[...,:3]  # copy current frame to video (only RGB)
    
vk.save_video(video_data, 'teaser3.webp', 10)

In [None]:
import moviepy.editor
moviepy.editor.ipython_display("teaser3.webp", filetype='image')