Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Configure OSDs with bluestore #88

Merged
merged 3 commits into from Oct 19, 2016
Merged

Configure OSDs with bluestore #88

merged 3 commits into from Oct 19, 2016

Conversation

travisn
Copy link
Member

@travisn travisn commented Oct 18, 2016

Castled can now be configured with devices to be provisioned for OSDs with bluestore, or OSDs with filestore in a local directory.

  • When configuring devices for bluestore, castled must be running privileged.
  • When utilizing a directory for filestore, no elevation is necessary, though if you want to use the default config directory (/var/lib/castle), you will need to create and chown before invoking castled.
  • If neither devices (storage-devices) nor directories (--storage-dirs) are specified, the current directory is assumed to be a filestore directory.

BlueStore example:
Three devices are formatted for bluestore configuration of the OSDs. The config files are generated in /var/lib/castle by default.
sudo ./castled --discovery-url=$discovery_token --private-ipv4=${COREOS_PRIVATE_IPV4} --storage-devices=sdb,sdc,sdd --force-format=true

FileStore example:
The current directory is used for all filestore journal and data. The --config-dir option is specified to use a different directory for config files.
castled --discovery-url=$discovery_token --private-ipv4=${COREOS_PRIVATE_IPV4} --config-dir=/home/core

@@ -1 +1 @@
Subproject commit dc578817ca6d169e2225f1ec9428942668f46a8c
Subproject commit 763750b0f06f6e27b976853bfe314aa555757e61
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why was this updated?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i was just looking at this, it wasn't intended...


mon0
mon1
mon2
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why are these needed?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this was my last cleanup item. Some mon config is written in the current dir instead of the desired dir.

@@ -16,11 +16,11 @@ var (
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you need to rebase this to what is in master. castlectl is no longer used.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, done

rootCmd.Flags().StringVar(&cfg.devices, "devices", "", "comma separated list of devices to use")
rootCmd.Flags().StringVar(&cfg.devices, "storage-devices", "", "comma separated list of devices to use for storage")
rootCmd.Flags().StringVar(&cfg.directories, "storage-dirs", "", "comma separated list of directories to use for storage")
rootCmd.Flags().StringVar(&cfg.configDir, "config-dir", "/var/lib/castle", "directory for storing configuration")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need a different name for this. If it was truly a config dir then it would have gone in /etc. I think its closer to a data-dir but we would need to explain the overlap/confusion with storage-*

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think data-dir is the right flag. Our data-dir is made up of a number of mapped volumes (through storage-devices, and storage-dirs) but as a whole its still our data-dir.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

my first inclination is to always use /var/lib/castle and not expose it. It's really just a small directory where we generate configuration for running ceph.

@@ -89,7 +89,7 @@ func (c *cephLeader) handleOrchestratorEvents() {
// Perform a full refresh of the cluster to ensure the monitors are running with quorum
err := c.configureCephMons(e.Context())
if err != nil {
log.Printf("failed to configure ceph mons. %v", err)
log.Printf("FAILED TO CONFIGURE CEPH MONS. %v", err)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why the all caps?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just to make the error more obvious among the verbose logs. Log levels would really help.

if err != nil {
return fmt.Errorf("command %s failed: %+v", cmd, err)
}
func partitionBluestoreDevice(device, serial string, executor exec.Executor) error {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does this happen unconditionally?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no, only if they pass --force-format=true or if we don't find a file system. The method previous to this implements those checks

func (p *ProcManager) Run(sudo bool, daemon string, args ...string) error {
sudoMsg := ""
if sudo {
sudoMsg = "sudo "
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is sudo needed?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i removed all usages of it, but missed this one. I'll remove it too

@bassam
Copy link
Member

bassam commented Oct 18, 2016

This is a big PR, would have been good to break it down into a few commits that build on each other.

@bassam
Copy link
Member

bassam commented Oct 18, 2016

which external tools do we still depend on in castled?

rootCmd.Flags().StringVar(&cfg.devices, "devices", "", "comma separated list of devices to use")
rootCmd.Flags().StringVar(&cfg.devices, "storage-devices", "", "comma separated list of devices to use for storage")
rootCmd.Flags().StringVar(&cfg.directories, "storage-dirs", "", "comma separated list of directories to use for storage")
rootCmd.Flags().StringVar(&cfg.configDir, "config-dir", "/var/lib/castle", "directory for storing configuration")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think data-dir is the right flag. Our data-dir is made up of a number of mapped volumes (through storage-devices, and storage-dirs) but as a whole its still our data-dir.

@@ -124,16 +189,17 @@ func createOSDBootstrapKeyring(conn client.Connection, clusterName string) error
// get-or-create-key for client.bootstrap-osd
bootstrapOSDKey, err := client.AuthGetOrCreateKey(conn, "client.bootstrap-osd", []string{"mon", "allow profile bootstrap-osd"})
if err != nil {
return err
return fmt.Errorf("failed to get or create osd auth key %s. %+v", bootstrapOSDKeyringPath, err)
}

log.Printf("succeeded bootstrap OSD get/create key, bootstrapOSDKey: %s", bootstrapOSDKey)

// write the bootstrap-osd keyring to disk
bootstrapOSDKeyringDir := filepath.Dir(bootstrapOSDKeyringPath)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need a bootstrap key per OSD or can we use a bootstrap key per node, or not at all?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i know we talked about having a bootstrap key per node. I'll open a separate ticket for this

@@ -144,7 +210,7 @@ func createOSDBootstrapKeyring(conn client.Connection, clusterName string) error
}

// format the given device for usage by an OSD
func formatOSD(device string, forceFormat bool, executor exec.Executor) error {
func formatDevice(device, serial string, forceFormat bool, executor exec.Executor) error {
// format the current volume
devFS, err := sys.GetDeviceFilesystem(device, executor)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what does this return when a disk has been partitioned for bluestore? I think need to figure out if this is a device that we ourselves have partitioned in the past too, right?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, checking for our partitioning needs to be tightened up. I'll look at that

}
args := []string{}
gb := 1024
args = append(args, getSGDiskArgs(walPartition, 1, gb-1, fmt.Sprintf("osd-%s-wal", serial))...)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please add a good long comment about the partition scheme? its hard to parse it out of the calls to SGdisk

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

definitely will add a comment

args := []string{}
gb := 1024
args = append(args, getSGDiskArgs(walPartition, 1, gb-1, fmt.Sprintf("osd-%s-wal", serial))...)
args = append(args, getSGDiskArgs(databasePartition, gb, gb, fmt.Sprintf("osd-%s-db", serial))...)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was ignition's sgdisk package not useful for our use cases?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There were a few reasons I didn't pull it in:

  • it would have required pulling in their internal logging package,
  • it has transactional semantics that aren't really transactional
  • it didn't expose the ability to have the last partition take all the remaining space
  • it was very simple code

@bassam
Copy link
Member

bassam commented Oct 18, 2016

Would be great if you can move all calls to exec to a system package so that platform dependencies are centralized and can be tightly controlled.

Copy link
Member Author

@travisn travisn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still tracking down the mon folder issue...

@@ -16,11 +16,11 @@ var (
)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, done

rootCmd.Flags().StringVar(&cfg.devices, "devices", "", "comma separated list of devices to use")
rootCmd.Flags().StringVar(&cfg.devices, "storage-devices", "", "comma separated list of devices to use for storage")
rootCmd.Flags().StringVar(&cfg.directories, "storage-dirs", "", "comma separated list of directories to use for storage")
rootCmd.Flags().StringVar(&cfg.configDir, "config-dir", "/var/lib/castle", "directory for storing configuration")
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

my first inclination is to always use /var/lib/castle and not expose it. It's really just a small directory where we generate configuration for running ceph.

if err != nil {
return fmt.Errorf("command %s failed: %+v", cmd, err)
}
func partitionBluestoreDevice(device, serial string, executor exec.Executor) error {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no, only if they pass --force-format=true or if we don't find a file system. The method previous to this implements those checks

func (p *ProcManager) Run(sudo bool, daemon string, args ...string) error {
sudoMsg := ""
if sudo {
sudoMsg = "sudo "
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i removed all usages of it, but missed this one. I'll remove it too

@@ -89,7 +89,7 @@ func (c *cephLeader) handleOrchestratorEvents() {
// Perform a full refresh of the cluster to ensure the monitors are running with quorum
err := c.configureCephMons(e.Context())
if err != nil {
log.Printf("failed to configure ceph mons. %v", err)
log.Printf("FAILED TO CONFIGURE CEPH MONS. %v", err)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just to make the error more obvious among the verbose logs. Log levels would really help.

@travisn
Copy link
Member Author

travisn commented Oct 18, 2016

These are the only external commands I see server-side that we call out to:

sgdisk
lsblk
bash
df 

bash and df are only used in one place by the osd to see if a device has a filesystem. here is the command: df --output=source,fstype | grep '^/dev/%s ' | awk '{print $2}'. I didn't look for another way to do this yet.

The client tool also calls modprobe, mount, etc, but they should not be triggered server-side.

bassam
bassam previously approved these changes Oct 18, 2016
@bassam
Copy link
Member

bassam commented Oct 18, 2016

regarding bash and df, we should be able to use syscall.Statfs

@bassam bassam dismissed their stale review October 18, 2016 05:36

still reviewing

bassam
bassam previously requested changes Oct 18, 2016
// Partitions a device for use by a bluestore osd.
// If there are any partitions or formatting already on the device, it will be wiped.
// The result will be three partitions:
// 1. WAL: Write ahead log
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what are the sizes of the partitions. Also what about the case when we want to colocate the WAL/meta of multiple OSDs, some of these partitions will be empty/unused right?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does it make sense to reserve more space on each disk for future work? You don't want to be messing with partitions frequently. For a comparison, take a look at https://github.com/coreos/scripts/blob/master/build_library/disk_layout.json. This is how CoreOS lays out partitions.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We also need to store some information on this disk that tells us its one of ours, and possibly a version number of the partitioning scheme used. we might need to migrate it.

args := []string{}
// This is a simple scheme to create the wal and db each of size 1GB,
// with the remainder of the disk being allocated for the raw data.
// TODO: Look at the disk size to determine smarter sizes for the wal and db.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes that's two small. we need to use more informed sizes.

// The id is the partition number.
// The offset and length are in MB. Under the covers this is translated to sectors.
// If the length is -1, all remaining space will be assigned to the partition.
func getPartitionArgs(id, mbOffset, mbLength int, label string) []string {
const sectorsPerMB = 2048
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this fixed?

@bassam bassam dismissed their stale review October 18, 2016 23:20

continuing review

@@ -1,8 +1,10 @@
VAGRANTFILE_API_VERSION = "2"

$channel = "alpha"
$nodes = 3
$nodes = 1
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is the default "demo" scenario a single node now? is that important?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i'll change it back to 3

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the idea of this demo was to start one machine, and then show growth by starting the other machines. this is also why autostart is false for node 2 and 3

@@ -11,10 +13,12 @@ Vagrant.configure(VAGRANTFILE_API_VERSION) do |config|
set_coreos_box(config, $channel)

(0..($nodes - 1)).each do |i|
config.vm.define name="castle%02d" % i, primary: (i == 0), autostart: (i == 0) do |node|
config.vm.define name="castle%02d" % i, primary: false, autostart: true do |node|
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why change this? I like vagrant up bringing up the first node. and vagrant ssh going to the first node.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i'll change it back to primary: (i == 0), but leave autostart: true. The 2nd and 3rd machines weren't starting before

$new_discovery_url="https://discovery.etcd.io/new?size=1"
$node_disks = ENV['CASTLE_NODE_DISKS'] != nil ? ENV['CASTLE_NODE_DISKS'].to_i : 3
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I dont think you need the env variables. just set it to a constant.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will do

4. Start the Ubuntu VM
5. Install VirtualBox 5.0
6. sudo apt-get install nfs-common nfs-kernel-server
7. vagrant up
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these instructions are for internal audience. I think they need to be for a public audience.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, i'll remove it for now until it stabilizes with bluestore

# Add persistent storage volumes
def attach_volumes(node, num_volumes, volume_size)

#if $provider == :virtualbox
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please remove this line.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will do

@travisn travisn merged commit c68f860 into rook:master Oct 19, 2016
@travisn travisn deleted the bluestore branch October 19, 2016 21:08
thotz pushed a commit to thotz/rook that referenced this pull request Jun 5, 2020
* change ob ClaimRef from UID to ObjectReference

Signed-off-by: jeffvance <jeff.h.vance@gmail.com>

* added UID to ob.ClaimRef

Signed-off-by: jeffvance <jeff.h.vance@gmail.com>

* allow obc to have no bkt name fields for brownfield

Signed-off-by: jeffvance <jeff.h.vance@gmail.com>

* shouldProvision

Signed-off-by: jeffvance <jeff.h.vance@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants