You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
To Reproduce
Create a nexus with 1 child being a remote replica served by an nvmf target, i.e.:
$ mayastor-client nexus list -c
NAME PATH SIZE STATE REBUILDS CHILDREN
5b5b04ea-c1e3-11ea-bd82-a7d5cb04b391 nvmf://127.0.0.1:8430/nqn.2019-05.io.openebs:nexus-5b5b04ea-c1e3-11ea-bd82-a7d5cb04b391 60000000 online 0 nvmf://127.0.0.1:8420/nqn.2019-05.io.openebs:5b5b04ea-c1e3-11ea-bd82-a7d5cb04b391
Publish the nexus over nvmf, connect to it with the kernel NVMe initiator and send IO e.g. with fio.
Make the nvmf target serving the replica inaccessible. In this case, sending SIGSTOP to that mayastor instance.
The mayastor instance serving the nexus detects a timeout:
[2020-12-01T17:54:24.942056536+00:00 WARN mayastor::spdk:bdev_nvme.c:1149] Warning: Detected a timeout. ctrlr=0x557ec85b6d70 qpair=(nil) cid=2
and the nexus is reconfigured to fault the child and remove it from the nexus:
[2020-12-01T17:57:50.844321020+00:00 INFO mayastor::bdev::nexus::nexus_bdev:nexus_bdev.rs:454] nexus-5b5b04ea-c1e3-11ea-bd82-a7d5cb04b391: Dynamic reconfiguration event: ChildFault started
[2020-12-01T17:57:50.844403241+00:00 INFO mayastor::bdev::nexus::nexus_channel:nexus_channel.rs:102] nexus-5b5b04ea-c1e3-11ea-bd82-a7d5cb04b391(thread:"mayastor_nvmf_tcp_pg_core_0"), refreshing IO channels
[2020-12-01T17:57:50.845618145+00:00 INFO mayastor::bdev::nexus::nexus_channel:nexus_channel.rs:244] nexus-5b5b04ea-c1e3-11ea-bd82-a7d5cb04b391: Reconfigure completed
[2020-12-01T17:57:50.846884423+00:00 INFO mayastor::bdev::nexus::nexus_bdev:nexus_bdev.rs:468] nexus-5b5b04ea-c1e3-11ea-bd82-a7d5cb04b391: Dynamic reconfiguration event: ChildFault completed 0
The child removal is complete:
[2020-12-01T17:57:50.847500441+00:00 ERROR mayastor::bdev::nexus::nexus_io:nexus_io.rs:344] :nexus-5b5b04ea-c1e3-11ea-bd82-a7d5cb04b391: state: Mutex { data: Open } blk_cnt: 112607, blk_size: 512
nvmf://127.0.0.1:8420/nqn.2019-05.io.openebs:5b5b04ea-c1e3-11ea-bd82-a7d5cb04b391: Faulted(IoError), blk_cnt: 122880, blk_size: 512
has no children left...
[2020-12-01T17:57:50.847578102+00:00 INFO mayastor::core::bdev:bdev.rs:168] Received remove event for bdev 127.0.0.1:8420/nqn.2019-05.io.openebs:5b5b04ea-c1e3-11ea-bd82-a7d5cb04b391n1
[2020-12-01T17:57:50.847634549+00:00 INFO mayastor::bdev::nexus::nexus_child:nexus_child.rs:367] Removing child nvmf://127.0.0.1:8420/nqn.2019-05.io.openebs:5b5b04ea-c1e3-11ea-bd82-a7d5cb04b391
[2020-12-01T17:57:50.847766549+00:00 INFO mayastor::bdev::nexus::nexus_child:nexus_child.rs:405] Child nvmf://127.0.0.1:8420/nqn.2019-05.io.openebs:5b5b04ea-c1e3-11ea-bd82-a7d5cb04b391 removed
However, the replica is still listed as a child in the nexus:
$ mayastor-client nexus list -c
NAME PATH SIZE STATE REBUILDS CHILDREN
5b5b04ea-c1e3-11ea-bd82-a7d5cb04b391 nvmf://127.0.0.1:8430/nqn.2019-05.io.openebs:nexus-5b5b04ea-c1e3-11ea-bd82-a7d5cb04b391 60000000 faulted 0 nvmf://127.0.0.1:8420/nqn.2019-05.io.openebs:5b5b04ea-c1e3-11ea-bd82-a7d5cb04b391
but when you try to destroy the nexus mayastor fails an assertion:
Managed to hit a different assertion if the NVMf target comes back online before the initiator detects a "Controller Fatal status" and decides to reset it:
jonathan-teh
changed the title
Assertion failure when destroying a nexus in faulted state
Assertion failure when unpublishing or destroying a nexus in faulted state
Dec 16, 2020
Describe the bug
Mayastor fails an assertion when attempting to destroy a nexus that is in the faulted state:
To Reproduce
Create a nexus with 1 child being a remote replica served by an nvmf target, i.e.:
Publish the nexus over nvmf, connect to it with the kernel NVMe initiator and send IO e.g. with
fio
.Make the nvmf target serving the replica inaccessible. In this case, sending
SIGSTOP
to that mayastor instance.The mayastor instance serving the nexus detects a timeout:
and eventually notices that a reset is required:
At this point, send
SIGCONT
to the mayastor instance serving the replica.Back at the other mayastor instance, the (only) child is faulted:
Network errors are logged:
and the nexus is reconfigured to fault the child and remove it from the nexus:
The child removal is complete:
However, the replica is still listed as a child in the nexus:
but when you try to destroy the nexus mayastor fails an assertion:
where the assertion is here.
Expected behavior
The nexus should be destroyed.
Screenshots
** OS info:**
Additional context
Developer debug build.
The text was updated successfully, but these errors were encountered: