Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

osddaemon: run vgchange -ay during init #3771

Closed
wants to merge 1 commit into from

Conversation

galexrt
Copy link
Member

@galexrt galexrt commented Sep 4, 2019

Description of your changes:

Running vgchange -ay will enable all VGs on the host. This will remove
the need to have the lvm* services running on the host. In such a case
with a host, this would lead to all OSDs on that host to be unavailable
and the OSD Pods crashlooping as they can't find their VG and / or LV.

Signed-off-by: Alexander Trost galexrt@googlemail.com

Which issue is resolved by this Pull Request:
Resolves #3640

Checklist:

  • Reviewed the developer guide on Submitting a Pull Request
  • Documentation has been updated, if necessary.
  • Unit tests have been added, if necessary.
  • Integration tests have been added, if necessary.
  • Pending release notes updated with breaking and/or notable changes, if necessary.
  • Upgrade from previous release is tested and upgrade user guide is updated, if necessary.
  • Code generation (make codegen) has been run to update object specifications, if necessary.
  • Comments have been added or updated based on the standards set in CONTRIBUTING.md
  • Add the flag for skipping the CI if this PR does not require a build. See here for more details.

[test ceph min]

@galexrt galexrt added ceph main ceph tag ceph-osd labels Sep 4, 2019
}

// Activate all VGs on the server
context.Executor.ExecuteCommand(false, "", "/sbin/vgchange", "-ay")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should consider using just vgchange and letting exec.Cmd search the PATH for where vgchange actually lives so we can support environments where this might be a different path.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will implement to search through PATH, though to note this will be executed inside the Rook Ceph container image used in the OSD init container.

Copy link
Member

@BlaineEXE BlaineEXE Sep 6, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is [probably] not necessary. If ExecuteCommand uses exec.Cmd, then Go will automatically search PATH. https://godoc.org/os/exec#LookPath

@@ -55,10 +55,11 @@ func StartOSD(context *clusterd.Context, osdType, osdID, osdUUID string, pvcBack
}

context.Executor.ExecuteCommand(false, "", "/sbin/vgchange", "-an")

context.Executor.ExecuteCommand(false, "", "/sbin/vgchange", "-ay")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#3755 is working on a change here for the specific vg for the osd being started. Let's keep this line as is, and wait for #3755 to change it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The vgname related change is part of this PR - #3779

}

// Activate all VGs on the server
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's add this in an else statement, related to the change coming in #3755

@travisn
Copy link
Member

travisn commented Sep 24, 2019

@galexrt Related changes may have already covered the fix? Or is there something else after the rebase?

Copy link
Member

@leseb leseb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this change still relevant?

@galexrt
Copy link
Member Author

galexrt commented Oct 12, 2019

Yes, this PR is still relevant as the VGs are only activated when the OSDs are backed by PVCs, see

if pvcBackedOSD {
volumeGroupName, err = getVolumeGroupName(lvPath)
if err != nil {
return fmt.Errorf("error fetching volume group name for OSD %s. %+v", osdID, err)
}
go handleTerminate(context, lvPath, volumeGroupName)
if err := context.Executor.ExecuteCommand(false, "", "/sbin/vgchange", "-an", volumeGroupName); err != nil {
return fmt.Errorf("failed to deactivate volume group for lv %+v. Error: %+v", lvPath, err)
}
if err := context.Executor.ExecuteCommand(false, "", "/sbin/vgchange", "-ay", volumeGroupName); err != nil {
return fmt.Errorf("failed to activate volume group for lv %+v. Error: %+v", lvPath, err)
}
}
.

I have updated the PR to enable the VGs all the time. Does anything speak against that?

pkg/daemon/ceph/osd/daemon.go Outdated Show resolved Hide resolved
pkg/daemon/ceph/osd/daemon.go Outdated Show resolved Hide resolved
@galexrt
Copy link
Member Author

galexrt commented Oct 18, 2019

@leseb @travisn @BlaineEXE PTAL

return fmt.Errorf("failed to activate volume group for lv %+v. Error: %+v", lvPath, err)
}
if err := context.Executor.ExecuteCommand(false, "", "vgchange", "-ay", volumeGroupName); err != nil {
return fmt.Errorf("failed to activate volume group for lv %+v. %+v", lvPath, err)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't lvPath a string? If so please use %q instead. Thanks.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed it now.

if err := context.Executor.ExecuteCommand(false, "", "vgchange", "-ay", volumeGroupName); err != nil {
return fmt.Errorf("failed to activate volume group for lv %+v. Error: %+v", lvPath, err)
}
if err := context.Executor.ExecuteCommand(false, "", "vgchange", "-ay", volumeGroupName); err != nil {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The volumeGroupName won't be set if it's not a PVC, if you could take another look per discussion

Running `vgchange -ay` will enable all VGs on the host. This will remove
the need to have the lvm* services running on the host. In such a case
with a host, this would lead to all OSDs on that host to be unavailable
and the OSD Pods crashlooping as they can't find their VG and / or LV.

Signed-off-by: Alexander Trost <galexrt@googlemail.com>
@galexrt
Copy link
Member Author

galexrt commented Oct 18, 2019

The volumeGroupName (LvPath) is not available for non-PVC backed OSDs.

So it'll be either just enabling all VGs or finding a working way to get the LVPath from the non-PVC backed OSDs as well.

If anyone knows a good way to get the LVPath for existing OSDs, please let me know. I'll ask in the huddle on Tuesday.

@mykaul
Copy link
Contributor

mykaul commented Oct 19, 2019

I'm not sure it's a good idea to blindly enable all VGs on a host. If anything, we should have a filter of some sort. There might be VGs we don't own.

@leseb
Copy link
Member

leseb commented Dec 12, 2019

c-v should incorporate the necessary de-activate call soon, so we should wait for that instead of running this in Rook. @galexrt can we close this? Thanks.

@leseb
Copy link
Member

leseb commented Feb 12, 2020

Closing this due to inactivity. Feel free to re-open. Thanks. Also, we are moving away from LVM and the current implementation is stable so we probably can skip this.

@leseb leseb closed this Feb 12, 2020
@andrewrynhard
Copy link

@leseb Any idea on the timeline for moving away from LVM? I'd like to deploy rook but this is the one thing holding me up.

@leseb
Copy link
Member

leseb commented Feb 12, 2020

@leseb Any idea on the timeline for moving away from LVM? I'd like to deploy rook but this is the one thing holding me up.

@andrewrynhard If you're using PVC, it'll be in 1.3, if not it's a bit difficult because a lot of things need to be implemented in raw mode.
There is this #4768 to bring a minimal support of raw when not running on PVC. However, it means you will only be able to bootstrap OSD in a full disk or a partition. Basically, Bluestore block/wal/db will be on that device, so no dedicated device, no more than one osd on one device and no encryption. That might work for most of the people out there though.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ceph main ceph tag ceph-osd
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[BUG]: All OSDs (in special node) failed after restart a node!
7 participants