Skip to content
This repository has been archived by the owner on Jul 6, 2023. It is now read-only.

Conversation

raghavendra-talur
Copy link
Member

When the brick to be replaced is a "source" brick in a replicate or disperse volume, we should fail the replace brick. This should be ideally handled in Gluster but no harm in having extra check in Heketi.

Signed-off-by: Raghavendra Talur rtalur@redhat.com

@raghavendra-talur
Copy link
Member Author

@heketi/maintainers This is still work in progress. I need to add tests, but would like your review if the approach is right or not.

@centos-ci
Copy link
Collaborator

Can one of the admins verify this patch?

@obnoxxx
Copy link
Contributor

obnoxxx commented Apr 6, 2017

@raghavendra-talur The approach looks good and the code clean enough. But as you said: it's WIP, tests are missing.

@raghavendra-talur
Copy link
Member Author

@obnoxxx yes, WIP tag because of no tests. I will add a test,

@@ -131,6 +131,29 @@ func (v *VolumeEntry) replaceBrickInVolume(db *bolt.DB, executor executors.Execu
return err
}

// Get self heal status for this brick's volume
healinfo, err := executor.HealInfo(oldBrickNodeEntry.ManageHostName(), v.Info.Name)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Getting heal info only from one node may be not sufficient, cant we have a fallback ?

for _, brickHealStatus := range healinfo.Bricks.BrickList {
// Gluster has a bug that it does not send Name for bricks that are down.
// Skip such bricks; it is safe because it is not source if it is down
if brickHealStatus.Name == "information not available" {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are there any status available than this ?


output, err := s.RemoteExecutor.RemoteCommandExecute(host, command, 10)
if err != nil {
return nil, fmt.Errorf("Unable to get heal info of volume name: %v", volume)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can remove "name" from the above log.

var healInfo CliOutput
err = xml.Unmarshal([]byte(output[0]), &healInfo)
if err != nil {
return nil, fmt.Errorf("Unable to determine heal info of volume name: %v", volume)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here

@lpabon
Copy link
Contributor

lpabon commented Apr 25, 2017

@raghavendra-talur Sounds like this is a state of the Volume. If the volume is self-healing, then I'm assuming that no other state can change can occur until either canceled or completed, right?

@obnoxxx
Copy link
Contributor

obnoxxx commented May 17, 2017

@lpabon yes, it's a state of the volume, and I guess no other state change should occur, except for possibly remove? Not sure about expansion.

@lpabon lpabon added this to the Release 5 milestone Jun 13, 2017
@lpabon lpabon added the bug label Jun 20, 2017
@lpabon
Copy link
Contributor

lpabon commented Jun 20, 2017

Closes #32

@lpabon
Copy link
Contributor

lpabon commented Jun 29, 2017

We also need some tests. Any status @raghavendra-talur ?

@raghavendra-talur raghavendra-talur force-pushed the disallow-remove-device-during-self-heal branch from c3c9d8a to 53a8fdd Compare July 20, 2017 00:37
The conditions to be considered are
1. If the brick to be replaced is source for some files, don't proceed
2. We kill the brick to be replaced, if that would make the replica set
lose quorum, don't proceed.

This has a side effect that remove device won't work on volumes which
have replica count 2. Choosing data safety over migration features.

Signed-off-by: Raghavendra Talur <rtalur@redhat.com>
@raghavendra-talur raghavendra-talur force-pushed the disallow-remove-device-during-self-heal branch from 53a8fdd to e3d7f21 Compare July 20, 2017 00:41
@raghavendra-talur
Copy link
Member Author

@lpabon Added tests, ready for review.

While adding tests, found a bug in Gluster which affects heketi. Filed a bug in Gluster https://bugzilla.redhat.com/show_bug.cgi?id=1473026 and accounted for the same in heketi.

@obnoxxx @MohamedAshiqrh @jarrpa @humblec @ramkrsna Please have a look.

@raghavendra-talur
Copy link
Member Author

@heketi/dev ping

@lpabon
Copy link
Contributor

lpabon commented Jul 26, 2017

@raghavendra-talur Sweet! What else is needed. What do you suggest we do with this PR?

@raghavendra-talur
Copy link
Member Author

@lpabon nothing else pending here. Just review and merge.

@@ -26,6 +26,7 @@ type Executor interface {
VolumeExpand(host string, volume *VolumeRequest) (*Volume, error)
VolumeReplaceBrick(host string, volume string, oldBrick *BrickInfo, newBrick *BrickInfo) error
VolumeInfo(host string, volume string) (*Volume, error)
HealInfo(host string, volume string) (*HealInfo, error)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do not return GlusterFS specific XML data back to Heketi. The executor should interpret the information and the it should return what the result is, instead of letting the caller figure it out.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lpabon We need to match the db data with gluster data. So I should either send db data to executor or get executor data to app_* part. I don't see any way to reconcile data without doing so.
Any suggestions?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@raghavendra-talur : I think @lpabon means not to return the XML data raw, but have the executor extract all info we need into an internal structure (defined by heketi/your needs). Does not mean to lose any data.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct. The executor's job is to abstract any of the fine details of the call. This is letting the caller deal with raw XML data. Instead let the executor look at the XML data and return a simple struct of what the call is supposed to provide.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lpabon I was trying to have a look at this today, and I don't understand your request. it looks like there's already a function that extracts the relevant data into a structure? https://github.com/heketi/heketi/pull/718/files#diff-2320cf68f8e9328f659fe584a9a33620

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jarrpa I checked it again, and since this is information about the system, it could be that at some time we look at other values in the structure.

I remove this request. It is ok as it is.

@lpabon
Copy link
Contributor

lpabon commented Aug 15, 2017

@raghavendra-talur please resolve the conflict.

@heketi/dev If the team is ok with this change it LGTM.

@lpabon
Copy link
Contributor

lpabon commented Aug 15, 2017

@raghavendra-talur @obnoxxx I would like to see this pass the CI functional tests after the conflict is resolve. Also, @raghavendra-talur could you create a new issue for Release 6 to create a functional test for this change to test it? Thanks.

@obnoxxx
Copy link
Contributor

obnoxxx commented Aug 15, 2017

@lpabon since @raghavendra-talur is not available currently, I rebased the branch and pushed a new PR in #834 . PTAL there...

@obnoxxx
Copy link
Contributor

obnoxxx commented Aug 15, 2017

@lpabon, the straight rebase in PR #834 passed the travis tests and the ci functional tests. hence, i'm going to merge according to your LGTM above.

@obnoxxx
Copy link
Contributor

obnoxxx commented Aug 15, 2017

@lpabon also created the issue #835 for creating a functional test.

@obnoxxx
Copy link
Contributor

obnoxxx commented Aug 15, 2017

it's done via merging #834 and creating #835.

@obnoxxx obnoxxx closed this Aug 15, 2017
@lpabon
Copy link
Contributor

lpabon commented Aug 16, 2017

👍

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants