Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ec: IO failure when shrinking dispersed volume during io running #2948

Merged
merged 2 commits into from
Nov 13, 2021

Conversation

mohit84
Copy link
Contributor

@mohit84 mohit84 commented Nov 12, 2021

IO Failures are found when performing a shrink operation on a
distributed-dispersed volume, while IO is in progress.

RCA: During rebalance operation execution while layout has changed
dht_creaete_cbk retry create operation under lock in 2nd attempt.
It takes decision based on error set by posix_create in xdata
in first attempt. ec(ec_manager_create) does not pass xdata to the
upper xlator so dht_create is not able to take decision to
reattempt fop creation in case if layout has changed and throw
an EIO error.

Solution: Pass the xdata to the upper xlator to avoid an issue.
Fixes: #2947
Signed-off-by: Mohit Agrawal moagrawa@redhat.com

IO Failures are found when performing a shrink operation on a
distributed-dispersed volume, while IO is in progress.

RCA: During rebalance operation execution while layout has changed
     dht_creaete_cbk retry create operation under lock in 2nd attempt.
     It takes decision based on error set by posix_create in xdata
     in first attempt. ec(ec_manager_create) does not pass xdata to the
     upper xlator so dht_create is not able to take decision to
     reattempt fop creation in case if layout has changed and throw
     an EIO error.

Solution: Pass the xdata to the upper xlator to avoid an issue.
Fixes: gluster#2947
Signed-off-by: Mohit Agrawal <moagrawa@redhat.com>
@mohit84
Copy link
Contributor Author

mohit84 commented Nov 12, 2021

/run regression

@@ -228,11 +228,14 @@ ec_manager_create(ec_fop_data_t *fop, int32_t state)
case -EC_STATE_DISPATCH:
case -EC_STATE_PREPARE_ANSWER:
case -EC_STATE_REPORT:
cbk = fop->answer;

GF_ASSERT(cbk != NULL);
Copy link
Contributor

@xhernandez xhernandez Nov 12, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's not guaranteed. cbk may be NULL if there are not enough consistent bricks or the operation has failed early for some other reason.

GF_ASSERT(fop->error != 0);

if (fop->cbks.create != NULL) {
fop->cbks.create(fop->req_frame, fop, fop->xl, -1, fop->error,
NULL, NULL, NULL, NULL, NULL, NULL);
NULL, NULL, NULL, NULL, NULL, cbk->xdata);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
NULL, NULL, NULL, NULL, NULL, cbk->xdata);
NULL, NULL, NULL, NULL, NULL, cbk == NULL ? NULL : cbk->xdata);

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Fixes: gluster#2947
Signed-off-by: Mohit Agrawal <moagrawa@redhat.com>
@mohit84
Copy link
Contributor Author

mohit84 commented Nov 12, 2021

/run regression

@gluster-ant
Copy link
Collaborator

1 test(s) failed
./tests/bugs/read-only/bug-1134822-read-only-default-in-graph.t

0 test(s) generated core

5 test(s) needed retry
./tests/000-flaky/glusterd-restart-shd-mux.t
./tests/00-geo-rep/georep-basic-dr-rsync-arbiter.t
./tests/basic/volume-snapshot-clone.t
./tests/bugs/changelog/bug-1208470.t
./tests/bugs/read-only/bug-1134822-read-only-default-in-graph.t
https://build.gluster.org/job/gh_centos7-regression/1812/

@mohit84
Copy link
Contributor Author

mohit84 commented Nov 12, 2021

/run regression

@mohit84 mohit84 merged commit 8f2471a into gluster:devel Nov 13, 2021
mohit84 added a commit to mohit84/glusterfs that referenced this pull request Nov 15, 2021
…ster#2948)

* ec: IO failure when shrinking dispersed volume during io running

IO Failures are found when performing a shrink operation on a
distributed-dispersed volume, while IO is in progress.

RCA: During rebalance operation execution while layout has changed
     dht_creaete_cbk retry create operation under lock in 2nd attempt.
     It takes decision based on error set by posix_create in xdata
     in first attempt. ec(ec_manager_create) does not pass xdata to the
     upper xlator so dht_create is not able to take decision to
     reattempt fop creation in case if layout has changed and throw
     an EIO error.

Solution: Pass the xdata to the upper xlator to avoid an issue.
> Fixes: gluster#2947
> Signed-off-by: Mohit Agrawal <moagrawa@redhat.com>
> (Cherry picked from commit f81bf52)
> (Reviewed on upstream link gluster#2948)

Fixes: gluster#2947
Signed-off-by: Mohit Agrawal <moagrawa@redhat.com>
Shwetha-Acharya pushed a commit that referenced this pull request Jan 7, 2022
…) (#2951)

* ec: IO failure when shrinking dispersed volume during io running

IO Failures are found when performing a shrink operation on a
distributed-dispersed volume, while IO is in progress.

RCA: During rebalance operation execution while layout has changed
     dht_creaete_cbk retry create operation under lock in 2nd attempt.
     It takes decision based on error set by posix_create in xdata
     in first attempt. ec(ec_manager_create) does not pass xdata to the
     upper xlator so dht_create is not able to take decision to
     reattempt fop creation in case if layout has changed and throw
     an EIO error.

Solution: Pass the xdata to the upper xlator to avoid an issue.
> Fixes: #2947
> Signed-off-by: Mohit Agrawal <moagrawa@redhat.com>
> (Cherry picked from commit f81bf52)
> (Reviewed on upstream link #2948)

Fixes: #2947
Signed-off-by: Mohit Agrawal <moagrawa@redhat.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

IO failure when shrinking distributed dispersed volume while performing IO
3 participants