Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ib_write_bw --cuda will lead to system deallock #118

Closed
antonywei opened this issue Jan 13, 2021 · 4 comments
Closed

ib_write_bw --cuda will lead to system deallock #118

antonywei opened this issue Jan 13, 2021 · 4 comments

Comments

@antonywei
Copy link

client
mlx5 nic
./ib_write_bw -d mlx5_0 -i 1 --use_cuda=0 server_ip_address -a

server
mlx5 NIC
run: ./ib_write_bw -d mlx5_0 -i 1 --use_cuda=0

when pressing ctrl+c to kill the process, the hole system will crash and report system deadlock.

it will not happened if we don't use the param --use_cuda;

@drossetti
Copy link

can you copy the crash dump here?

@antonywei
Copy link
Author

It seems the system has crashed before writing the core dump files, maybe the reason is ib_write_bw will not release GPU resources there are some problems (for example RNR error). however, the Cuda and kernel didn't release these resources and lead to the system crash.

@sshaulnv
Copy link
Contributor

I tried to reproduce it with loopback, and it didnt reproduce.
i pressed the ctrl+c while passing traffic and also when allocating the GPU buffer.
can you tell what is the exact time you tried to kill the process?

@HassanKhadour
Copy link
Contributor

Closing the Issue, Please re-open if reproduce.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants