New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Configure OSDs with bluestore #88
Conversation
@@ -1 +1 @@ | |||
Subproject commit dc578817ca6d169e2225f1ec9428942668f46a8c | |||
Subproject commit 763750b0f06f6e27b976853bfe314aa555757e61 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why was this updated?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i was just looking at this, it wasn't intended...
|
||
mon0 | ||
mon1 | ||
mon2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why are these needed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this was my last cleanup item. Some mon config is written in the current dir instead of the desired dir.
@@ -16,11 +16,11 @@ var ( | |||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you need to rebase this to what is in master. castlectl is no longer used.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, done
rootCmd.Flags().StringVar(&cfg.devices, "devices", "", "comma separated list of devices to use") | ||
rootCmd.Flags().StringVar(&cfg.devices, "storage-devices", "", "comma separated list of devices to use for storage") | ||
rootCmd.Flags().StringVar(&cfg.directories, "storage-dirs", "", "comma separated list of directories to use for storage") | ||
rootCmd.Flags().StringVar(&cfg.configDir, "config-dir", "/var/lib/castle", "directory for storing configuration") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we need a different name for this. If it was truly a config dir then it would have gone in /etc. I think its closer to a data-dir but we would need to explain the overlap/confusion with storage-*
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think data-dir is the right flag. Our data-dir is made up of a number of mapped volumes (through storage-devices, and storage-dirs) but as a whole its still our data-dir.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
my first inclination is to always use /var/lib/castle and not expose it. It's really just a small directory where we generate configuration for running ceph.
@@ -89,7 +89,7 @@ func (c *cephLeader) handleOrchestratorEvents() { | |||
// Perform a full refresh of the cluster to ensure the monitors are running with quorum | |||
err := c.configureCephMons(e.Context()) | |||
if err != nil { | |||
log.Printf("failed to configure ceph mons. %v", err) | |||
log.Printf("FAILED TO CONFIGURE CEPH MONS. %v", err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why the all caps?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just to make the error more obvious among the verbose logs. Log levels would really help.
if err != nil { | ||
return fmt.Errorf("command %s failed: %+v", cmd, err) | ||
} | ||
func partitionBluestoreDevice(device, serial string, executor exec.Executor) error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does this happen unconditionally?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no, only if they pass --force-format=true or if we don't find a file system. The method previous to this implements those checks
func (p *ProcManager) Run(sudo bool, daemon string, args ...string) error { | ||
sudoMsg := "" | ||
if sudo { | ||
sudoMsg = "sudo " |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why is sudo
needed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i removed all usages of it, but missed this one. I'll remove it too
This is a big PR, would have been good to break it down into a few commits that build on each other. |
which external tools do we still depend on in castled? |
rootCmd.Flags().StringVar(&cfg.devices, "devices", "", "comma separated list of devices to use") | ||
rootCmd.Flags().StringVar(&cfg.devices, "storage-devices", "", "comma separated list of devices to use for storage") | ||
rootCmd.Flags().StringVar(&cfg.directories, "storage-dirs", "", "comma separated list of directories to use for storage") | ||
rootCmd.Flags().StringVar(&cfg.configDir, "config-dir", "/var/lib/castle", "directory for storing configuration") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think data-dir is the right flag. Our data-dir is made up of a number of mapped volumes (through storage-devices, and storage-dirs) but as a whole its still our data-dir.
@@ -124,16 +189,17 @@ func createOSDBootstrapKeyring(conn client.Connection, clusterName string) error | |||
// get-or-create-key for client.bootstrap-osd | |||
bootstrapOSDKey, err := client.AuthGetOrCreateKey(conn, "client.bootstrap-osd", []string{"mon", "allow profile bootstrap-osd"}) | |||
if err != nil { | |||
return err | |||
return fmt.Errorf("failed to get or create osd auth key %s. %+v", bootstrapOSDKeyringPath, err) | |||
} | |||
|
|||
log.Printf("succeeded bootstrap OSD get/create key, bootstrapOSDKey: %s", bootstrapOSDKey) | |||
|
|||
// write the bootstrap-osd keyring to disk | |||
bootstrapOSDKeyringDir := filepath.Dir(bootstrapOSDKeyringPath) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we need a bootstrap key per OSD or can we use a bootstrap key per node, or not at all?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i know we talked about having a bootstrap key per node. I'll open a separate ticket for this
@@ -144,7 +210,7 @@ func createOSDBootstrapKeyring(conn client.Connection, clusterName string) error | |||
} | |||
|
|||
// format the given device for usage by an OSD | |||
func formatOSD(device string, forceFormat bool, executor exec.Executor) error { | |||
func formatDevice(device, serial string, forceFormat bool, executor exec.Executor) error { | |||
// format the current volume | |||
devFS, err := sys.GetDeviceFilesystem(device, executor) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what does this return when a disk has been partitioned for bluestore? I think need to figure out if this is a device that we ourselves have partitioned in the past too, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed, checking for our partitioning needs to be tightened up. I'll look at that
} | ||
args := []string{} | ||
gb := 1024 | ||
args = append(args, getSGDiskArgs(walPartition, 1, gb-1, fmt.Sprintf("osd-%s-wal", serial))...) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you please add a good long comment about the partition scheme? its hard to parse it out of the calls to SGdisk
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
definitely will add a comment
args := []string{} | ||
gb := 1024 | ||
args = append(args, getSGDiskArgs(walPartition, 1, gb-1, fmt.Sprintf("osd-%s-wal", serial))...) | ||
args = append(args, getSGDiskArgs(databasePartition, gb, gb, fmt.Sprintf("osd-%s-db", serial))...) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Was ignition's sgdisk package not useful for our use cases?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There were a few reasons I didn't pull it in:
- it would have required pulling in their internal logging package,
- it has transactional semantics that aren't really transactional
- it didn't expose the ability to have the last partition take all the remaining space
- it was very simple code
Would be great if you can move all calls to exec to a system package so that platform dependencies are centralized and can be tightly controlled. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still tracking down the mon folder issue...
@@ -16,11 +16,11 @@ var ( | |||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, done
rootCmd.Flags().StringVar(&cfg.devices, "devices", "", "comma separated list of devices to use") | ||
rootCmd.Flags().StringVar(&cfg.devices, "storage-devices", "", "comma separated list of devices to use for storage") | ||
rootCmd.Flags().StringVar(&cfg.directories, "storage-dirs", "", "comma separated list of directories to use for storage") | ||
rootCmd.Flags().StringVar(&cfg.configDir, "config-dir", "/var/lib/castle", "directory for storing configuration") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
my first inclination is to always use /var/lib/castle and not expose it. It's really just a small directory where we generate configuration for running ceph.
if err != nil { | ||
return fmt.Errorf("command %s failed: %+v", cmd, err) | ||
} | ||
func partitionBluestoreDevice(device, serial string, executor exec.Executor) error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no, only if they pass --force-format=true or if we don't find a file system. The method previous to this implements those checks
func (p *ProcManager) Run(sudo bool, daemon string, args ...string) error { | ||
sudoMsg := "" | ||
if sudo { | ||
sudoMsg = "sudo " |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i removed all usages of it, but missed this one. I'll remove it too
@@ -89,7 +89,7 @@ func (c *cephLeader) handleOrchestratorEvents() { | |||
// Perform a full refresh of the cluster to ensure the monitors are running with quorum | |||
err := c.configureCephMons(e.Context()) | |||
if err != nil { | |||
log.Printf("failed to configure ceph mons. %v", err) | |||
log.Printf("FAILED TO CONFIGURE CEPH MONS. %v", err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just to make the error more obvious among the verbose logs. Log levels would really help.
These are the only external commands I see server-side that we call out to:
bash and df are only used in one place by the osd to see if a device has a filesystem. here is the command: The client tool also calls modprobe, mount, etc, but they should not be triggered server-side. |
regarding bash and df, we should be able to use |
// Partitions a device for use by a bluestore osd. | ||
// If there are any partitions or formatting already on the device, it will be wiped. | ||
// The result will be three partitions: | ||
// 1. WAL: Write ahead log |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what are the sizes of the partitions. Also what about the case when we want to colocate the WAL/meta of multiple OSDs, some of these partitions will be empty/unused right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does it make sense to reserve more space on each disk for future work? You don't want to be messing with partitions frequently. For a comparison, take a look at https://github.com/coreos/scripts/blob/master/build_library/disk_layout.json. This is how CoreOS lays out partitions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We also need to store some information on this disk that tells us its one of ours, and possibly a version number of the partitioning scheme used. we might need to migrate it.
args := []string{} | ||
// This is a simple scheme to create the wal and db each of size 1GB, | ||
// with the remainder of the disk being allocated for the raw data. | ||
// TODO: Look at the disk size to determine smarter sizes for the wal and db. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes that's two small. we need to use more informed sizes.
// The id is the partition number. | ||
// The offset and length are in MB. Under the covers this is translated to sectors. | ||
// If the length is -1, all remaining space will be assigned to the partition. | ||
func getPartitionArgs(id, mbOffset, mbLength int, label string) []string { | ||
const sectorsPerMB = 2048 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this fixed?
@@ -1,8 +1,10 @@ | |||
VAGRANTFILE_API_VERSION = "2" | |||
|
|||
$channel = "alpha" | |||
$nodes = 3 | |||
$nodes = 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why is the default "demo" scenario a single node now? is that important?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i'll change it back to 3
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the idea of this demo was to start one machine, and then show growth by starting the other machines. this is also why autostart is false for node 2 and 3
@@ -11,10 +13,12 @@ Vagrant.configure(VAGRANTFILE_API_VERSION) do |config| | |||
set_coreos_box(config, $channel) | |||
|
|||
(0..($nodes - 1)).each do |i| | |||
config.vm.define name="castle%02d" % i, primary: (i == 0), autostart: (i == 0) do |node| | |||
config.vm.define name="castle%02d" % i, primary: false, autostart: true do |node| |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why change this? I like vagrant up bringing up the first node. and vagrant ssh going to the first node.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i'll change it back to primary: (i == 0)
, but leave autostart: true
. The 2nd and 3rd machines weren't starting before
$new_discovery_url="https://discovery.etcd.io/new?size=1" | ||
$node_disks = ENV['CASTLE_NODE_DISKS'] != nil ? ENV['CASTLE_NODE_DISKS'].to_i : 3 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I dont think you need the env variables. just set it to a constant.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will do
4. Start the Ubuntu VM | ||
5. Install VirtualBox 5.0 | ||
6. sudo apt-get install nfs-common nfs-kernel-server | ||
7. vagrant up |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
these instructions are for internal audience. I think they need to be for a public audience.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, i'll remove it for now until it stabilizes with bluestore
# Add persistent storage volumes | ||
def attach_volumes(node, num_volumes, volume_size) | ||
|
||
#if $provider == :virtualbox |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please remove this line.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will do
* change ob ClaimRef from UID to ObjectReference Signed-off-by: jeffvance <jeff.h.vance@gmail.com> * added UID to ob.ClaimRef Signed-off-by: jeffvance <jeff.h.vance@gmail.com> * allow obc to have no bkt name fields for brownfield Signed-off-by: jeffvance <jeff.h.vance@gmail.com> * shouldProvision Signed-off-by: jeffvance <jeff.h.vance@gmail.com>
Castled can now be configured with devices to be provisioned for OSDs with bluestore, or OSDs with filestore in a local directory.
/var/lib/castle
), you will need to create andchown
before invoking castled.storage-devices
) nor directories (--storage-dirs
) are specified, the current directory is assumed to be a filestore directory.BlueStore example:
Three devices are formatted for bluestore configuration of the OSDs. The config files are generated in /var/lib/castle by default.
sudo ./castled --discovery-url=$discovery_token --private-ipv4=${COREOS_PRIVATE_IPV4} --storage-devices=sdb,sdc,sdd --force-format=true
FileStore example:
The current directory is used for all filestore journal and data. The --config-dir option is specified to use a different directory for config files.
castled --discovery-url=$discovery_token --private-ipv4=${COREOS_PRIVATE_IPV4} --config-dir=/home/core