-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[2GPU] Memcpy2D of matrixXmatrix -- src size (and form) #590
Comments
Hey, it seems I found a bug.
If I understand right, to finish Memcpy2D we need at least
@ggerganov , what do you think about this? And there are some more, about P2P memory access, but I don't sure yet. |
I'm afraid it will be difficult for me to help here, because I don't have a multi-GPU system to test with and I am not very familiar with this code. In general, multi-GPU support in If you think you've found a bug, please provide a proposed fix and steps to reproduce. |
I'm trying to do this. At the moment I can't understand why we can use result of src0 x src1 as is on single gpu, but can't on several gpus. For hipBLAS we can't use memcpy2D to copy from gpu1 to gpu0, because memcpy2d doesn't support p2p mem access. We should copy by hand in a loop using memcpyd2d... At least for gpus which doesn't support p2p access. As for twin gpu - I have a luck to have access to system with that HW on the work. Upd: I impl loop, to copy data from src to dst with correct pitches and using DtoD. Could somebody explain what we are expecting to be in dst after call to ggml_cuda_op_mul_mat? Upd2: |
Hello, Mr. @ggerganov , thank you for awesome project.
I'm trying to understand how this code should work.
What I clarify atm, is:
src1_ncols*ne0*sizeof(float)
, when running on two GPUs.where WxH is
row_diff*sizeof(float)*src1_ncols
What I asking to hint, or explain me, please:
Thank you 💯
[in] | width | Width of matrix transfer (columns in bytes)
[in] | height | Height of matrix transfer (rows)
The text was updated successfully, but these errors were encountered: