Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

All the connections lost after storage upgrade because the nvme-stas stopped reconnecting after 60 times #261

Closed
igaw opened this issue Oct 21, 2022 · 2 comments
Labels
bug Something isn't working
Milestone

Comments

@igaw
Copy link
Contributor

igaw commented Oct 21, 2022

When a storage array needs to reboot due to software upgrade and this takes more than 10 minutes, nvme-stas will drop all connections.

It's a known issue when connecting manually using "nvme connect" and could be avoid by adding "-l -1" to make the retrying infinite.

However, with nvme-stas, it's connected automatically. nvme-stas better to make it infinite as well.

@martin-belanger
Copy link
Collaborator

When a storage array needs to reboot due to software upgrade and this takes more than 10 minutes, nvme-stas will drop all connections.

It's not that nvme-stas "drops" the connections, but rather that nvme-stas didn't get the "remove" kernel event indicating that the kernel had dropped the connection.

Testing indicates that the events are generated by the kernel, but nvme-stas is not "seeing" them.

This seems to be happening on systems with lots of NVMe connections (> 150).

@martin-belanger martin-belanger added this to the 2.0 milestone Oct 31, 2022
@martin-belanger martin-belanger added the bug Something isn't working label Oct 31, 2022
@martin-belanger
Copy link
Collaborator

Fixed in nvme-stas 2.0 by #269

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants