You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
we pass this in - and I attach your code - gpu_driver.cc recreate the nvidia driver for amd gpu 7900xt - gpu leveraging the ROCT-Thunk-Interface _setup /gpu_memcpy /gpu_compute etc - leverage the attached classes - use a pm4 queue
Here is an attempt at recreating the NVIDIA driver code to work with an AMD GPU (7900XT), leveraging the ROCT-Thunk-Interface and the provided PM4 queue classes:
Use ROCT-Thunk-Interface for low-level AMD GPU access instead of NVIDIA-specific interfaces.
Allocate GPU memory using hsaKmtAllocMemory() instead of NVIDIA allocators.
Use the provided PM4Queue class to manage the AMD command queue.
Build the shader dispatch sequence using PM4 packets like PM4SetShaderRegPacket, PM4DispatchDirectPacket, etc.
Wait for shader completion using PM4ReleaseMemoryPacket.
This provides the general structure, but more details would need to be filled in, like proper memory allocation, any kernel mode driver interactions, shader compilation, etc. The low-level register setup would also need to be adapted for the specific AMD GPU. But hopefully this illustrates the high-level approach of porting the NVIDIA code to AMD leveraging the ROCT-Thunk-Interface and PM4 queue.
I had it asked before passing the pm4packets - and it gave me this variation - where it didn't have the pm4packet header.
In this updated version, we use PM4 packets to perform memory copies and kernel dispatches. The main changes are in the gpu_memcpy and gpu_compute functions.
For gpu_memcpy, we create a PM4 packet with the PM4_HDR_IT_OPCODE_COPY opcode to copy data from the source to the destination. We fill in the necessary fields of the packet, such as the source and destination addresses and the size of the data. Then, we enqueue the packet into the HSA queue and wait for it to complete using a signal.
Similarly, for gpu_compute, we create a PM4 packet with the PM4_HDR_IT_OPCODE_DISPATCH_DIRECT opcode to dispatch the kernel. We provide the kernel object, argument buffer, and grid dimensions in the packet. We enqueue the packet into the HSA queue and wait for it to complete using a signal.
The rest of the code remains largely the same, with the addition of the PM4_HEADER_BUILD macro to construct the PM4 packet headers.
Please note that this code assumes you have the necessary headers and libraries for the AMD GPU PM4 packets, such as amd_gpu_pm4.h. Make sure to link against the appropriate libraries when compiling the code.
Again, this is a simplified example, and you may need to adapt it based on your specific requirements and the kernel code object you are using.
if you could articulate concisely and clearly - what the end game is - I am happy to get OPUS to have another crack.
If I need to include other code / context - happy to drop that in.
The text was updated successfully, but these errors were encountered:
PREFACE
admitably - it's garbage - garbage out.
Im not entirely clear on what you're attempting to do with amd driver internals
it seems like you want to recreate this driver code for amd gpu / 7900xt?
https://github.com/geohot/cuda_ioctl_sniffer/blob/master/gpu_driver.cc
if not - please articulate. below is 2 attempts to create the driver - first with the referenced pm4 header (and an earlier one without it)
understand Claude.ai - has read every repo - every line of code - it knows all the code / headers from the AMD ROCT-Thunk-Interface repo
in addition - I do some cat *.py of a few a files (PM4Packet / queue /cpp / hpp) - so context looks like this
https://gist.github.com/johndpope/68b233e6d85f863fbb775aabca5ed65f
we pass this in - and I attach your code - gpu_driver.cc
recreate the nvidia driver for amd gpu 7900xt - gpu leveraging the ROCT-Thunk-Interface _setup /gpu_memcpy /gpu_compute etc - leverage the attached classes - use a pm4 queue
Here is an attempt at recreating the NVIDIA driver code to work with an AMD GPU (7900XT), leveraging the ROCT-Thunk-Interface and the provided PM4 queue classes:
Key changes:
Use ROCT-Thunk-Interface for low-level AMD GPU access instead of NVIDIA-specific interfaces.
Allocate GPU memory using hsaKmtAllocMemory() instead of NVIDIA allocators.
Use the provided PM4Queue class to manage the AMD command queue.
Build the shader dispatch sequence using PM4 packets like PM4SetShaderRegPacket, PM4DispatchDirectPacket, etc.
Wait for shader completion using PM4ReleaseMemoryPacket.
This provides the general structure, but more details would need to be filled in, like proper memory allocation, any kernel mode driver interactions, shader compilation, etc. The low-level register setup would also need to be adapted for the specific AMD GPU. But hopefully this illustrates the high-level approach of porting the NVIDIA code to AMD leveraging the ROCT-Thunk-Interface and PM4 queue.
I had it asked before passing the pm4packets - and it gave me this variation - where it didn't have the pm4packet header.
In this updated version, we use PM4 packets to perform memory copies and kernel dispatches. The main changes are in the gpu_memcpy and gpu_compute functions.
For gpu_memcpy, we create a PM4 packet with the PM4_HDR_IT_OPCODE_COPY opcode to copy data from the source to the destination. We fill in the necessary fields of the packet, such as the source and destination addresses and the size of the data. Then, we enqueue the packet into the HSA queue and wait for it to complete using a signal.
Similarly, for gpu_compute, we create a PM4 packet with the PM4_HDR_IT_OPCODE_DISPATCH_DIRECT opcode to dispatch the kernel. We provide the kernel object, argument buffer, and grid dimensions in the packet. We enqueue the packet into the HSA queue and wait for it to complete using a signal.
The rest of the code remains largely the same, with the addition of the PM4_HEADER_BUILD macro to construct the PM4 packet headers.
Please note that this code assumes you have the necessary headers and libraries for the AMD GPU PM4 packets, such as amd_gpu_pm4.h. Make sure to link against the appropriate libraries when compiling the code.
Again, this is a simplified example, and you may need to adapt it based on your specific requirements and the kernel code object you are using.
if you could articulate concisely and clearly - what the end game is - I am happy to get OPUS to have another crack.
If I need to include other code / context - happy to drop that in.
The text was updated successfully, but these errors were encountered: