Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Minimal Compute Shader Example #4834

Closed
clayjohn opened this issue Apr 12, 2021 · 35 comments
Closed

Minimal Compute Shader Example #4834

clayjohn opened this issue Apr 12, 2021 · 35 comments

Comments

@clayjohn
Copy link
Member

clayjohn commented Apr 12, 2021

Your Godot version:
4.0 dev

Issue description:
Before 4.0 alpha releases we will need to put together some tutorials on using the new rendering device API. For starters we should write a compute shader tutorial that explains the below code:

func _ready():
	
	# We will be using our own RenderingDevice to handle the compute commands
	var rd = RenderingServer.create_local_rendering_device()
	
	# Create shader and pipeline
	var shader_file = load("res://compute_example.glsl")
	var shader_bytecode = shader_file.get_bytecode()
	var shader = rd.shader_create(shader_bytecode)
	var pipeline = rd.compute_pipeline_create(shader)
	
	# Data for compute shaders has to come as an array of bytes
	var pba = PackedByteArray()
	pba.resize(64)
	for i in range(16):
		pba.encode_float(i*4, 2.0)
	
	# Create storage buffer
	# Data not needed, can just create with length
	var storage_buffer = rd.storage_buffer_create(64, pba)
	
	# Create uniform set using the storage buffer
	var u = RDUniform.new()
	u.uniform_type = RenderingDevice.UNIFORM_TYPE_STORAGE_BUFFER
	u.binding = 0
	u.add_id(storage_buffer)
	var uniform_set = rd.uniform_set_create([u], shader, 0)
	
	# Start compute list to start recording our compute commands
	var compute_list = rd.compute_list_begin()
	# Bind the pipeline, this tells the GPU what shader to use
	rd.compute_list_bind_compute_pipeline(compute_list, pipeline)
	# Binds the uniform set with the data we want to give our shader
	rd.compute_list_bind_uniform_set(compute_list, uniform_set, 0)
	# Dispatch 1x1x1 (XxYxZ) work groups
	rd.compute_list_dispatch(compute_list, 2, 1, 1)
	#rd.compute_list_add_barrier(compute_list)
	# Tell the GPU we are done with this compute task
	rd.compute_list_end()
	# Force the GPU to start our commands
	rd.submit()
	# Force the CPU to wait for the GPU to finish with the recorded commands
	rd.sync()
	
	# Now we can grab our data from the storage buffer
	var byte_data = rd.buffer_get_data(storage_buffer)
	for i in range(16):
		print(byte_data.decode_float(i*4))
	

And compute.glsl

#[compute]

#version 450

layout(local_size_x = 8, local_size_y = 1, local_size_z = 1) in;

layout(set = 0, binding = 0, std430) buffer ColorBuffer {
	float data[];
}
color_buffer;

void main() {
	color_buffer.data[gl_GlobalInvocationID.x] = gl_GlobalInvocationID.x;
}
@aroidzap
Copy link

aroidzap commented Sep 2, 2021

For newer Godot builds you have to change:

var shader_bytecode = shader_file.get_bytecode()
var shader = rd.shader_create(shader_bytecode)

to:

var shader_spirv = shader_file.get_spirv()
var shader = rd.shader_create_from_spirv(shader_spirv)

@underdoeg
Copy link

underdoeg commented Feb 10, 2022

Would be nice to also have an example on how to modify a texture with a compute shader. I tried something but wasn't able to use a texture created with the rendering device with a Control. (something like)

rd = RenderingServer.create_local_rendering_device()
var fmt = RDTextureFormat.new()
fmt.width = 64
fmt.height = 64
fmt.format = RenderingDevice.DATA_FORMAT_R8G8B8A8_UINT
fmt.usage_bits = RenderingDevice.TEXTURE_USAGE_CAN_UPDATE_BIT
var view = RDTextureView.new()
tex = rd.texture_create(fmt, view, [img.get_image().get_data()])
# this will not work because tex is not part of a lookup table in renderer storage
RenderingServer.canvas_item_add_texture_rect(get_canvas_item(),  get_rect(), tex);

@clayjohn
Copy link
Member Author

Here is a gist from another person who adapted the above sample to a more recent version of the engine. https://gist.github.com/winston-yallow/aab20fa437bfa3dd80bfb0ed6605d28e

@mbrlabs
Copy link

mbrlabs commented Apr 12, 2022

Here are some simple real-world applications of Compute Shaders that could be added to the godot-demo-projects repository:

  • Calculate a mandelbrot set in a compute shader using n GPU threads and export it as PNG. Also show it as a sprite in Godot
  • Various image filters like: Color Inversion, Grayscaling, Gaussian blur, edge detection, etc.
  • Mesh generation. Could be simple like meshing 1000 cubes in a 10x10x10 grid or more complex like meshing a voxel volume.

@Calinou
Copy link
Member

Calinou commented Apr 12, 2022

Various image filters like: Color Inversion, Grayscaling, Gaussian blur, edge detection, etc.

Isn't this better done with standard fragment shaders like in the 2D Sprite Shaders demo?

@mbrlabs
Copy link

mbrlabs commented Apr 12, 2022

Yeah probably. I just listed them because these are simple and approachable problems which demonstrate aspects of compute shaders which almost everyone needs when developing their own:

  • input of lots of data
  • output of lots of data
  • distribution of data/work packets across gpu cores
  • using the output

@winston-yallow
Copy link
Contributor

Isn't this better done with standard fragment shaders like in the 2D Sprite Shaders demo?

I think there is value in showing how to do it in compute shaders. Not necessarily for the image filters listed, but in a more general sense. Manipulating textures this way has quite a few advantages (for example only regenerating them on demand) and can be used for stuff like generating flow maps for fluid sims.

@ajunca
Copy link

ajunca commented Jun 13, 2022

I'm trying to achieve part of what is commented, share a texture between the compute pipeline and a viewport, but without success. The gist with the updated code was very helpfull to get basic compute shader up and running, but I'm really stuck on sharing a texture. Any news/example/something on that?

I tried several things:
1- Create the texture from within the local rendering device. Binding to an image2D in the compute shader works. Binding the texture to the viewport shader material does not complain with any warning/error message but I can not get the real image data inside the shader.
2- Create the texture outside of local rendering device (for example with a normal ImageTexture). Binding to the viewport no problem at all. Binding to the compute shader it complains that the texture is not valid.

It seems that maybe resources can be only shared from local rendering device to RenderingServer but not viceversa? If so, it seems that approch 1 is somehow the way. But probably I'm totally wrong altogether (very shallow insights).

@clayjohn
Copy link
Member Author

Right now it isnt possible to share resources between RenderingDevices. It is something we plan on adding in the future, but haven't had time to figure out quite yet.

@ajunca
Copy link

ajunca commented Jun 13, 2022

Thanks I understand. I'm guessing that is neither prepared to copy data between RenderingDevices within GPU realm, correct? Without doing GPU->CPU->GPU?

@clayjohn
Copy link
Member Author

Thanks I understand. I'm guessing that is neither prepared to copy data between RenderingDevices within GPU realm, correct? Without doing GPU->CPU->GPU?

Not yet, no.

@EpEpDragon
Copy link

I'm trying to render a bunch of similar meshes over a area. I figured out the code above but I'm having trouble seeing what I need to add to draw meshes (from predefined triangles). Do I need to define a render pipeline, or does it fit into the compute pipeline. I saw some draw related functions that seem to match with a render pipeline, but I'm unsure of what I need to call when.

@winston-yallow
Copy link
Contributor

The compute stuff is not made for rendering. I don't think there even is a way to share data from a compute shader to the main rendering device in Godot. You would need to pass it back the the CPU and then send it to the GPU again for rendering.

If you are just looking to render many meshes that share the same geometry, then MultiMeshes may be better suited for this task than a compute shader.

@EpEpDragon
Copy link

Using MultiMesh was my first thought, but instances don't work with Frustum culling. And since the objects are very dense and cover large areas (grass blades) having culling per-instance would help a lot with performance. So the solution seems to be to calculate the culled transform buffer and instance count in a compute shader and then let the renderer draw from it.

I know how to do this in unity using the function "DrawMeshInstancedIndirect" which reads from a GPU buffer. If there is a way to make MultiMesh get its data from a GPU buffer that would be perfect, but that doesn't seem to be the case.

@Calinou
Copy link
Member

Calinou commented Jul 5, 2022

Using MultiMesh was my first thought, but instances don't work with Frustum culling. And since the objects are very dense and cover large areas (grass blades) having culling per-instance would help a lot with performance.

You can use several MultiMeshInstances to reduce the number of draw calls while still benefiting from frustum culling and occlusion culling. This also makes distance-based culling more viable (as discarding pixels in a shader isn't nearly as fast as actually culling the instance).

That said, Godot 4 automatically uses Vulkan instancing when possible, so you may find that using individual MeshInstance nodes is fast enough already.

@EpEpDragon
Copy link

I have thought about using chunks of MultiMeshInstances, but if it is at all possible to use the rendering API to directly instantiate a meshes from a GPU buffer I would very much like implement that. Even if it ends up being unnecessary in a lot of cases, there will still be some where it is useful.

@clayjohn
Copy link
Member Author

clayjohn commented Jul 6, 2022

I have thought about using chunks of MultiMeshInstances, but if it is at all possible to use the rendering API to directly instantiate a meshes from a GPU buffer I would very much like implement that. Even if it ends up being unnecessary in a lot of cases, there will still be some where it is useful.

We have plans to allow sharing GPU resources between the main RenderingDevice and user created devices (which includes using compute buffers for indirect drawing) however, we have pushed this feature to 4.1 or 4.2 as we have to prioritize other features for 4.0. for now you are stuck copying data back to the CPU from compute shaders if you want to use it.

@EpEpDragon
Copy link

Ok cool, until that time I'll poke around the source and see what I can figure out.

@clayjohn
Copy link
Member Author

clayjohn commented Jul 6, 2022

Ok cool, until that time I'll poke around the source and see what I can figure out.

Feel free to join us over at chat.godotengine.org if you have questions or want to discuss the source in more detail! We wouldn't oppose extending the API for 4.0 if we had someone interested and motivated to do the work.

@EpEpDragon
Copy link

Ok, sounds good! I'm very much still a novice when it comes to GPU related stuff and don't know how much time I can spend on this, but I'll definitely share if I get something to work.

@Chaosus Chaosus changed the title Minimal Compute Example Minimal Compute Shader Example Jul 23, 2022
@alfredbaudisch
Copy link

alfredbaudisch commented Jul 26, 2022

An example like this would also show real world usage of Compute Shaders in games: https://youtu.be/PGk0rnyTa1U?t=682 - 1M+ dust particles are spawned with no performance hit and react to the pull force of the world position of the vacuum cleaner (the project from the video is open-source, could be migrated from Unity to Godot).

@sefalkner
Copy link

Interesting example, but syncing the texture data with the CPU every frame would likely result in too much lag.
Is there an async readback possible for the above texture example?

@Calinou
Copy link
Member

Calinou commented Jul 26, 2022

1M+ dust particles are spawned with no performance hit and react to the pull force of the world position of the vacuum cleaner

This should already be achievable in 4.0.alpha with GPUParticles3D and attractors + collision, without needing custom shaders 🙂

@alfredbaudisch
Copy link

alfredbaudisch commented Jul 26, 2022

@Calinou but how to achieve something like the below with the proposed GPUParticles + Attractors? https://github.com/SebLague/Super-Chore-Man/blob/4b96e829d2d61fda9133c3dec5cd38256beaf11e/Assets/Scripts/Particles/DustCompute.compute#L13

RWStructuredBuffer<uint> numParticlesConsumed;
uint numParticles;

if (sqrDst < sqrEatDst || dot(offset2,offset2) < sqrEatDst || dot(offset3, offset3) < sqrEatDst) {
	particles[i].sizeMultiplier = 0;
	InterlockedAdd(numParticlesConsumed[0],1);
}

The compute shader keeps uniforms to keep track of the cleaning progress, which communicates with the GameObject and updates the progress text in the UI each frame.

@huisedenanhai
Copy link

huisedenanhai commented Aug 16, 2022

How about enable TEXTURE_USAGE_STORAGE_BIT by default (or by option) for ImageTexture, and let TextureStorage exposes API for converting ImageTexture RID to RenderingDevice texture RID?

Then binding an ImageTexture to the compute pipeline will be as simple as

var img_tex: ImageTexture

var u = RDUniform.new()
u.uniform_type = RenderingDevice.UNIFORM_TYPE_IMAGE
u.binding = 0
u.add_id(img_tex.get_rd_texture_rid())

I think that covers most use-cases that require updating storage images by compute shader, for example, calculating ocean height fields with FFT by compute shader and apply the result in shader material.

I do not see the point to use a local RenderingDevice, the most intuitive way to insert compute jobs to the render process is to use the main device retrived by RenderingServer.get_rendering_device(). There is no need to share data between devices.

@huisedenanhai
Copy link

huisedenanhai commented Aug 17, 2022

How about enable TEXTURE_USAGE_STORAGE_BIT by default (or by option) for ImageTexture, and let TextureStorage exposes API for converting ImageTexture RID to RenderingDevice texture RID?

I just implemented these APIs in my fork huisedenanhai/godot@32a05a5

For testing, I created a simple compute shader. It initializes an image with uv coords.

#[compute]

#version 450

layout(local_size_x = 32, local_size_y = 32) in;

layout(set = 0, binding = 0, rgba32f) writeonly uniform image2D target_image;

void main() {
  ivec2 index = ivec2(gl_GlobalInvocationID.xy);
  ivec2 size = imageSize(target_image);
  if (index.x >= size.x || index.y >= size.y) {
    return;
  }

  vec2 uv = (vec2(index) + 0.5) / vec2(size);
  imageStore(target_image, index, vec4(uv, 0, 1.0));
}

The ImageTexture should be created with proper usage bits

var usage = RenderingDevice.TEXTURE_USAGE_STORAGE_BIT\
	| RenderingDevice.TEXTURE_USAGE_CAN_UPDATE_BIT\
	| RenderingDevice.TEXTURE_USAGE_SAMPLING_BIT

var tex = ImageTexture.create_from_image_with_usage(image, usage)

Then dispatch the compute shader to initialize the ImageTexture, and attach the exact same texture to a shader material as albedo. No need to copy data between GPU and CPU back and forth.

Screen Shot 2022-08-17 at 14 21 17

I can make a PR if anyone like this.

@winston-yallow
Copy link
Contributor

winston-yallow commented Aug 17, 2022

@huisedenanhai that's really cool, but I think it's a bit offtopic for a Godot documentation issue. The best way to get this into Godot would probably be to open a detailed proposal (and mention there that you already implemented this!)
https://github.com/godotengine/godot-proposals

@clayjohn
Copy link
Member Author

@huisedenanhai Reduz and I discussed your commit on RocketChat. The TLDR is that your API looks pretty good but we think we will go in a slightly different direction. But it is too early to decide now as we are in the middle of reorganizing the rendering code to make this sort of thing easier. If you want to discuss further feel free to open a proposal and/or join us on RocketChat

reduz
10:26 AM
clayjohn I think we should kind of do the opposite, have a RenderingDevice API that lets you initialize a texture from an existing RD RID

clayjohn
10:27 AM
reduz So what would that look like? The user would create an ImageTexture, grab the RD RID from that, then re-initialize the texture with a different set of usage bits?


reduz
10:28 AM
I dont think that possible, IMO ImageTexture is good as is, I would have a different type of texture you create from RD
like TextureRD or something like that, ImageTexture has a purpose already so I would not repurpose this knowing we have so many texture types
likely TextureRD2D, TextureRDLayered, etc.
you can inherit those and do whathever you want
ImageTexture already imposes some conditions, like the transfer to/from requirement also. I think there is no point to repurpose that if we can make a truly custom one that you can create as you wish
say you want to use as a Texture2D a slice to an array, as an example. You can't do that with ImageTexture.
but you can with TextureRD2D
or actually you may likely not even need this, you can just inherit Texture2D, and do anything you want already.
all that you need is an API to register the RenderingDevice RID in the RenderingServer to be used as a texture
and likely an API in RenderingServer to obtain the texture RID

clayjohn
10:37 AM
That makes sense. The user added a texture_get_rd_rid in TextureStorage to do that
But I guess that is opposite
Because the user relied on creating an image first in the RenderServer (and have the rendering server access RenderingDevice), but you want to create in RenderingDevice first then register in RenderingServer


reduz
10:38 AM
yeah RenderingServer will likely use this too for its own function, but you also need RenderingServer::texture_create_from_rd_rid(RID p_rd_rid)
we need to add these kind of APIs to the Mesh API too
in RenderingServer, including an extra array flag to create the array as compatible with compute (storage)
with this you can probably use for the most part compute to do things without getting into the internals of RenderingServer
like do custom render to textures and the like
or to meshes

clayjohn
10:40 AM
Would this be necessary if the resource was created using the RD API directly?

reduz
[10:39 AM](https://chat.godotengine.org/channel/rendering?msg=tWW2j8GnG7M4evEbv)
in RenderingServer, including an extra array flag to create the array as compatible with compute (storage)
Shouldn't we just need to register the resource in Texture Storage?


reduz
10:41 AM
clayjohn I think for ease of use, we should expose the APIs in RenderingServer too
most users who want to do with compute will do just fine using that
like funny compute effects
I think accessing TextureStorage and the likes will be more common if you are doing a custom renderer, or custom post processes
as in, blitting into the actual rendering code

clayjohn
10:42 AM
Well that is what the user has now


reduz
10:43 AM
that makes sense, but for this I would wait until we are done with the rendering reorganization
since we don't really know what and how we will expose
I mean lets do one thing at a time

clayjohn
10:43 AM
Fair enough

@huisedenanhai
Copy link

huisedenanhai commented Aug 17, 2022

I agree with reduz. What I implemented is not the full solution for bridging RD RID with RenderingServer. Actually I tried to implement registering RD RID to texture storage before exposing RD RID from Texture, but it is not very easy at a glimpse. The relation between RID and RD RID is not a one-to-one mapping. A texture may hold two RD RID, the other one is for the SRGB view when the format supports it.

The texture_get_rd_rid is always needed, even if there is some way to wrap a RD RID to resources RID. The compute shader might want to access ImageTextures loaded from disk, as a mask or whatever.

Sorry for being off the topic. I did not notice this is a doc-issue. I will head to RocketChat if I have more for discussion.

I will open a proposal after the rendering reorganization is done, if needed.

@mastir
Copy link

mastir commented Dec 15, 2022

Can someone explain how to run multiple shader passes in this example? Here is code as i understand this flow

	var compute_list = rd.compute_list_begin()
	# Create a compute pipeline
	for i in range(2,7):
		var c = RDPipelineSpecializationConstant.new()
		c.constant_id = 0
		c.value = i
		var pipeline = rd.compute_pipeline_create(shader, [c])
		rd.compute_list_bind_compute_pipeline(compute_list, pipeline)
		rd.compute_list_bind_uniform_set(compute_list, uniform_set, 0)
		rd.compute_list_dispatch(compute_list, 128, 128, 1)
		
	rd.compute_list_end()

i use constant to choose stage, maybe there is better way to achive this?

layout (constant_id = 0) const int CURRENT_STEP = 2;

void main(){
	if (CURRENT_STEP == 1){
		InitBuffers();
	}
	if (CURRENT_STEP == 2){
		PrepareUpdate();
	}
	if (CURRENT_STEP == 3){
		UpdateMainBuffer();
	}
        ...

@Chaosus
Copy link
Member

Chaosus commented Dec 15, 2022

Fixed by #6159, I guess.

@Chaosus Chaosus closed this as completed Dec 15, 2022
@jtsorlinis
Copy link

Are there any updates on the sharing buffers between rendering devices? Or a task/issue?

I feel like this really opens up what is possible with compute shaders, as the current workflow forces you to go gpu (compute) -> cpu -> gpu (render) which can be quite limiting.

@MyStarrySpace
Copy link

Have there been any updates on this? I am also trying to experiment with GPU-based instancing using the particles system and writing position data based on image textures, and it would be great to have a way to pass things from GPU to GPU.

@underdoeg
Copy link

Going from gpu to cpu should already work with buffers. https://docs.godotengine.org/en/stable/tutorials/shaders/compute_shaders.html#retrieving-results

@jtsorlinis
Copy link

The issue is more GPU -> GPU. I've raised a proposal here: godotengine/godot-proposals#6989

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests