Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runner/rbd: misc implicit failover fixes #471

Merged
merged 2 commits into from Aug 30, 2018

Conversation

mikechristie
Copy link
Collaborator

@mikechristie mikechristie commented Aug 30, 2018

  1. Update lock state when getting a RTPG or INQUIRY to make sure when a
    path is added back after a failover, we re-grab the lock later.

  2. Use BUSY instead of ALUA STATE TRANSITION to work around a bug in the linux alua layer.

@mikechristie mikechristie changed the title runner/rbd: update lock state on RTPG/INQUIRY runner/rbd: misc implicit failover fixes Aug 30, 2018
alua.c Outdated
@@ -387,8 +387,8 @@ static int alua_set_state(struct tcmu_device *dev, struct alua_grp *group,
* @group_list: list of alua groups
* @enabled_group_id: group id of the local enabled alua group
*
* If the handler is not able to update the remote nodes's state during STPG
* handling we update it now.
* If the handler is not able to update the remote nodes's state during alua
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: alua -> ALUA

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. Fixed.

Patches merged.

Mike Christie added 2 commits August 30, 2018 12:38
Update lock state when getting a RTPG or INQUIRY to make sure when a
path is added back after a failover, we re-grab the lock.

Signed-off-by: Mike Christie <mchristi@redhat.com>
Linux does not handle alua state transition well starting with this
patch:

    commit 2b35865e7a290d313c3d156c0c2074b4c4ffaf52
    Author: Hannes Reinecke <hare@suse.de>
    Date:   Fri Feb 19 09:17:13 2016 +0100

        scsi_dh_alua: Recheck state on unit attention

Before this patch linux would retry for up to (cmd retries *
cmd timeout) seconds. With that patch it only retries 5 times. Because
we are not doing disk/ssd IO the retries could be used up before we even
start to take the lock. This results in the cmd being failed to the
multipath layer, the path being failed and the IO retried on another
path. This could then repeat over and over on each path.

We work around this by just returning BUSY.

Signed-off-by: Mike Christie <mchristi@redhat.com>
@mikechristie mikechristie merged commit cfc727d into open-iscsi:master Aug 30, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants