You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi, first of all I want to thank you for all you've done!
I've decided to use ILGPU in my deep learning library project. I hope it is as fast as c++ cuda. Except compiling time, I believe that it doesn't have any latency while accessing gpu, which was the issue I am so afraid of. I made a couple tests for this.
I am still confused if I should use ILGPU, because @m4rs-mt didn't provide some basic kernels I don't know how to improve my code. I want to know how to improve the performance of SGEMM. Can you provide at least simple Matrix Multiplication kernel? I need to benchmark against CUBLAS and try to improve the performance.
The text was updated successfully, but these errors were encountered:
@faruknane Thank you very much for your feedback. Your project looks quite interesting.
I am still confused if I should use ILGPU, because @m4rs-mt didn't provide some basic kernels I don't know how to improve my code
I'm afraid I don't fully understand your point here. Did you take a look at the sample repository or the documentation? There are several basic kernels and use cases that show how to use the library in an appropriate way to get started. If you want to write a matrix multiplication kernel, you can refer to a sample implementation and convert the kernel to the ILGPU world based on the kernels in the sample repository. However, I totally agree that in the near future we should add such a simple kernel to the sample repository to simplify the process of "getting used to the library".
Hi, first of all I want to thank you for all you've done!
I've decided to use ILGPU in my deep learning library project. I hope it is as fast as c++ cuda. Except compiling time, I believe that it doesn't have any latency while accessing gpu, which was the issue I am so afraid of. I made a couple tests for this.
I am still confused if I should use ILGPU, because @m4rs-mt didn't provide some basic kernels I don't know how to improve my code. I want to know how to improve the performance of SGEMM. Can you provide at least simple Matrix Multiplication kernel? I need to benchmark against CUBLAS and try to improve the performance.
The text was updated successfully, but these errors were encountered: