New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cuda RPC error when using then() #56244
Comments
Yeah the I think we should:
|
@lw Sorry I actually intended that lambda to be a
In fact, a similar error would occur for any py type thats not a tensor/list/dict/tuple of tensors (since as you mentioned we don't know how to extractDataPtrs for those).
Ideally this shouldn't error out as it works fine with CPU RPC and this particular call doesn't use the GPU at all, it would be surprising if it's not supported out of the box. With respect to CUDA streams I don't really expect any changes in behavior since this is a CPU only call (in practical use cases it could be used for things like control messages, metrics collection, etc). Could we modify |
This actually relates to our discussion yesterday on adding devices to @lw mentioned that one candidate solution might be letting |
This might make sense if we can pass the right devices everywhere in RPC/gradient compression logic, and then python futures using In general I'm also concerned this pattern in the issue might also not work for gradient compression (i.e. |
Yeah that's an idea but I'm not sure it would fully help here. Passing around a list of devices would help us to support callbacks that change devices. However we would still need to extract data ptrs, because we need those data ptrs to record them with the CUDA caching allocator. (I've found no way around that). Hence I think we still need to go for that Python-side pickling approach to get it to work. I'm still very worried about the perf hit of this (especially given what we found out here) but I don't know what else we can do? |
I think this issue has been fixed in #56516. Could you confirm? |
馃悰 Bug
The following results in an error on master (add into
rpc/rpc_test.py
):Stacktrace:
From my understanding, the root cause is as follows. I could be mistaken though as I'm not too familiar with CUDA rpc:
extractDataPtrs
defined forRpcCudaFuture
runs.then()
is not a c10::ivalue::Future, it's a at::CudaFuture per https://github.com/pytorch/pytorch/blob/master/aten/src/ATen/core/ivalue_inl.h#L469 (createInstance is overriden by CUDAFuture). Since the future in (2) is of typeRpcCudaFuture
the createInstance should dispatch toat::cudaFuture
.then()
, it seems like extractDataPtrs defined inCUDAFuture
runs, and errors out with the above error. I was confused as to why extractDataPtrs defined inRpcCudaFuture
wasn't running, but I think this is because the future in (3) isat::CudaFuture
and notRpcCudaFuture
. extractDataPtrs usesgetSubValues
which throws on non-torchscript python objs.Additional context
Hit this bug while looking into #55757
cc @osalpekar @jiayisuse @lw @beauby @pritamdamania87 @mrshenli @jjlilley @gqchen @rohan-varma @pietern @zhaojuanmao @satgera @aazzolini @agolynski @SciPioneer @H-Huang @mrzzd @cbalioglu
The text was updated successfully, but these errors were encountered: