You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
It seems the system has crashed before writing the core dump files, maybe the reason is ib_write_bw will not release GPU resources there are some problems (for example RNR error). however, the Cuda and kernel didn't release these resources and lead to the system crash.
I tried to reproduce it with loopback, and it didnt reproduce.
i pressed the ctrl+c while passing traffic and also when allocating the GPU buffer.
can you tell what is the exact time you tried to kill the process?
client
mlx5 nic
./ib_write_bw -d mlx5_0 -i 1 --use_cuda=0 server_ip_address -a
server
mlx5 NIC
run: ./ib_write_bw -d mlx5_0 -i 1 --use_cuda=0
when pressing ctrl+c to kill the process, the hole system will crash and report system deadlock.
it will not happened if we don't use the param --use_cuda;
The text was updated successfully, but these errors were encountered: